Zephyr bt_conn_unref() not consistently working

I'm developing a device that uses an nRF52832 as a central device to connect to up to 4 other peripherals to send coordinated commands. The device must wake up, establish connections route commands and then disconnect and wait for input to do it all over again. I started with the central multi-link example and have been growing it to fit my needs. I've decided to develop in Zephyr rather than NCS to try and take advantage of the portability and the Bluetooth stack is a Nordic contribution to the Zephyr system.

I can find my devices and connect to them. I have it able to maintain multiple connections, but I have an issue with disconnecting and reconnecting. Upon connecting in the device found callback I use bt_conn_le_create(). Upon a disconnect I call bt_conn_unref(). But after the first connection to a device if I try to reconnect, bt_conn_le_create() fails with the message "Found valid connection in disconnected state" and code -22. At that point I am never able to reestablish a connection until I power cycle the device. I have logging to see that it goes through the correct code to unref it on disconnect. I have implemented a way that upon failure to reconnect I can search for the connection and try to unref it again every time it fails. After several iterations it will unref and successfully remake the connection to the device. It can happen in 1 minutes, or take 3 or 4 minutes of cycling through the process of trying to make a new connection, failing, canning bt_conn_unref() and then trying again.

I was surprised that it worked at all, but I cannot reconcile why it takes time to release the connection after calling that function many times. I do need better performance by disconnecting immediately. What can I do to fix this?

I am still a little confused with how to determine what version of Zephyr I am on. I think it's 2.something but I cannot figure out how to query that. I know west is v0.14.0 and I've been using SDK v0.15.2. How to get the zephyr version number from my installation?

  • Hi

    One of our developers made a short article on ownership and how to use the bt_conn_ref and unref functions. Please give it a read and see if that helps you understand:

    "Ownership and references are a complex topic. It can be limited a bit and made simpler by reifying ownership as a pointer value in a variable, which I recommend. After all, reference counting is supposed to make resource management easier, not harder. The following is a sketch of the technique.

    Declare variables as only-for-owned-references by use of code comments. Their value shall be NULL when not owing a reference. Function parameters are indeed variables that may be declared as only-for-owned-references. A function return value is also variable-like for this purpose.

    Owning references must be conserved. When moving a owned reference from a variable, immediately when able set that variable's value to NULL to reify the move. Consider using an atomic exchange operation.

    When obtaining ownership trough a return value, as soon as possible reify that by assigning the returned pointer value to a variable that is only-for-owned-references. Make sure the variable was NULL before to avoid losing a reference! Consider using an atomic exchange operation.

    Owned references should never be forgotten. Forgetting an owned reference is a resource leak. The "source" of that leak, we vaguely define as the missing logic that would conserve the reference appropriately. This also applies to failure-handling code!

    When an owning variable is about to go out of scope, you may assert that its value is NULL. It should always be that if we follow the rules above.

    Keep in mind the ownership semantics for any API you use is. For example, `bt_conn_ref` entrusts you with a reference for safekeeping, and `bt_conn_ref` moves a reference into the host where it is no longer the applications job to keep it safe. Other API may have similar, or more complicated semantics. Some API moves the ownership only sometimes (e.g. on success, which is indicated by return code).

    It's interesting to note that `bt_conn_ref` seems to duplicate a reference. But this is not possible in the abstract sense I have in mind when I say "reference". The reference itself is not a number or pointer or anything, so there is nothing meaningful to copy. Instead, `bt_conn_ref` operates outside of this "reference" safety net, and the most consistent way of thinking about this is that `bt_conn_ref` has a large supply of references that point to the object you want, that it can hand out. And it want's them back! (..through `bt_conn_unref`.)"

    You say you've been using nRF Connect SDK version v0.15.2, is this a typo? Since this is not a real NCS version. When building a sample in the nRF Connect SDK, you will get the Zephyr version used stated at the beginning of the build log for example: 

    Building peripheral_uart
    C:\WINDOWS\system32\cmd.exe /d /s /c "west build --build-dir c:/ncs/v2.4.0/nrf/samples/bluetooth/peripheral_uart/build c:/ncs/v2.4.0/nrf/samples/bluetooth/peripheral_uart"
    
    [0/1] Re-running CMake...
    Loading Zephyr default modules (Zephyr base (cached)).
    -- Application: C:/ncs/v2.4.0/nrf/samples/bluetooth/peripheral_uart
    -- CMake version: 3.20.5
    -- Cache files will be written to: C:/ncs/v2.4.0/zephyr/.cache
    -- Zephyr version: 3.3.99 (C:/ncs/v2.4.0/zephyr)

    Best regards,

    Simon

  • Thank you for some explanation. Where can I go to read more on how the bt_conn_ref() works. I do not have a firm grasp on what is going on. So far I understand that I get a pointer to a connection reference, but the only explanation in the comments is about it incrementing and decrementing a counter. I would like to know where to look to learn about ownership and references particularly about their application to connection references in the Bluetooth stack.

    To answer your question about my version of NCS, I am not using NCS. I'm trying to maintain greater portability across device. But my issues are with understanding the implementation of the Bluetooth stack.

  • In the connection management API here you can read more on how the bt_conn_ref() works: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/zephyr/connectivity/bluetooth/api/connection_mgmt.html#c.bt_conn_ref 

    So are you just implementing the Zephyr RTOS on your own then? We will not be able to support you if you move outside of the SDKs we are familiar with and provide support for I'm afraid.

    Best regards,

    Simon

  • I do not expect support support for systems not implemented by Nordic but I would like to find more information on the conn.c module. The above quoted article refers to ownership and how reference counting  is supposed to simplify this. I would like to find more information on that to try to become more of an expert on how it works.

    Am I mistaken that the Bluetooth implementation was written by Nordic?

  • Hi

    I've asked the team, and the author of the mentioned article for some more specifics on what you're looking for. I'll get back to you when I hear from them.

    Best regards,

    Simon

Related