Zephyr bt_conn_unref() not consistently working

I'm developing a device that uses an nRF52832 as a central device to connect to up to 4 other peripherals to send coordinated commands. The device must wake up, establish connections route commands and then disconnect and wait for input to do it all over again. I started with the central multi-link example and have been growing it to fit my needs. I've decided to develop in Zephyr rather than NCS to try and take advantage of the portability and the Bluetooth stack is a Nordic contribution to the Zephyr system.

I can find my devices and connect to them. I have it able to maintain multiple connections, but I have an issue with disconnecting and reconnecting. Upon connecting in the device found callback I use bt_conn_le_create(). Upon a disconnect I call bt_conn_unref(). But after the first connection to a device if I try to reconnect, bt_conn_le_create() fails with the message "Found valid connection in disconnected state" and code -22. At that point I am never able to reestablish a connection until I power cycle the device. I have logging to see that it goes through the correct code to unref it on disconnect. I have implemented a way that upon failure to reconnect I can search for the connection and try to unref it again every time it fails. After several iterations it will unref and successfully remake the connection to the device. It can happen in 1 minutes, or take 3 or 4 minutes of cycling through the process of trying to make a new connection, failing, canning bt_conn_unref() and then trying again.

I was surprised that it worked at all, but I cannot reconcile why it takes time to release the connection after calling that function many times. I do need better performance by disconnecting immediately. What can I do to fix this?

I am still a little confused with how to determine what version of Zephyr I am on. I think it's 2.something but I cannot figure out how to query that. I know west is v0.14.0 and I've been using SDK v0.15.2. How to get the zephyr version number from my installation?

Parents
  • I've come back enlightened. I was bouncing between a couple of projects and now had time to dive deeper into this one.

    The issue is now resolved so I will explain what I have learned. Reference counting is something that I did not fully understand. I was thinking that it was something more specific to the module, but it turns out that it is a more general concept. To sum it up, an object (in this case the connection object) exists in a library layer abstracted away from me the application programmer. The connection is handled without my need to manage it, but I have to interact with it. When I need to interact with it, such as sending or receiving a message, I get a pointer to it. In the documentation under "Connection Management" it says-

    "Connection objects are reference counted, and the application is expected to use the bt_conn_ref() API whenever storing a connection pointer for a longer period of time, since this ensures that the object remains valid (even if the connection would get disconnected). Similarly the bt_conn_unref() API is to be used when releasing a reference to a connection."

    To me it sounded like bt_conn_ref() and bt_conn_unref() are opposites this turned out to not be true. In the code examples for a central I noticed that br_conn_ref() is not used. The function bt_conn_le_create() is where the first counting reference shows up. My initial assumption that bt_conn_ref() and bt_conn_unref() have a 1:1 relationship is incorrect. Any function that gives you the connection reference increments the reference counter for that reference. These aren't listed in the overview paragraph, but they can be found in each function's definition. So if you aren't looking for it's easy to miss. It would be nice if in the definition of bt_conn_unref() it listed the functions that will require it's use.

    Ultimately the mistake in my code was after the usage of bt_conn_lookup_addr_le(). I don;t think that this was shown in any of the example code, but it is necessary for sending a message originating from the central device to a particular peripheral device. In my case, this central needs to connect to 4 peripherals and route the messages correctly. I have to look up the connection using the MAC address. That gets me the connection pointer and increments the reference counter. As soon as I write to an attribute I have to call bt_conn_unref(). Her's is what I ended up writing:

    central_send_error_t central_send_tx(uint8_t * p_mac, uint8_t type, uint8_t * p_data, uint16_t len)
    {
    	int err = 0;
    	bt_addr_le_t bt_addr_le;
    	memcpy(&bt_addr_le.a.val, p_mac, BT_ADDR_SIZE);
    	bt_addr_le.type = type;
    	struct bt_conn * p_conn = bt_conn_lookup_addr_le(BT_ID_DEFAULT, &bt_addr_le);
    	if(p_conn == NULL){
    		LOG_ERR("Invalid MAC address: %02x:%02x:%02x:%02x:%02x:%02x", bt_addr_le.a.val[0], bt_addr_le.a.val[1], bt_addr_le.a.val[2], bt_addr_le.a.val[3], bt_addr_le.a.val[4], bt_addr_le.a.val[5]);
    		bt_conn_unref(p_conn);
    		return CEN_SND_BAD_MAC;
    	}
    	LOG_HEXDUMP_DBG(p_data, len, "Packet: ");
    	LOG_DBG("Send on handle 0x%04x", m_tx_value_handle);
    	// LOG_HEXDUMP_INF(p_mac, BT_ADDR_SIZE, "MAC: ");
    	err = bt_gatt_write_without_response_cb(p_conn, m_tx_value_handle, p_data, len, false, _write_finished_cb, 0);
    	if(err){
    		LOG_ERR("Write error: %d", err);
    		bt_conn_unref(p_conn);
    		return;
    	}
    	bt_conn_unref(p_conn);
    	return CEN_SND_NO_ERROR;
    }

    Before I was looking up the connection, but never calling bt_conn_unref(), so the reference count was going up and up with every communication. I had a work around that was calling it if found an existing connection, but it would only decrement it by on each time it failed. So the longer I was connected, the more messages were written. It would take longer to unref the connection.

    The connection management summary says to get a connection reference "whenever storing a connection pointer for a longer period of time." This was vague, but I had assumed that could be for the duration of the connection. That combined with the acquisition of a connection reference with every write unknown to me was causing my issue.

Reply
  • I've come back enlightened. I was bouncing between a couple of projects and now had time to dive deeper into this one.

    The issue is now resolved so I will explain what I have learned. Reference counting is something that I did not fully understand. I was thinking that it was something more specific to the module, but it turns out that it is a more general concept. To sum it up, an object (in this case the connection object) exists in a library layer abstracted away from me the application programmer. The connection is handled without my need to manage it, but I have to interact with it. When I need to interact with it, such as sending or receiving a message, I get a pointer to it. In the documentation under "Connection Management" it says-

    "Connection objects are reference counted, and the application is expected to use the bt_conn_ref() API whenever storing a connection pointer for a longer period of time, since this ensures that the object remains valid (even if the connection would get disconnected). Similarly the bt_conn_unref() API is to be used when releasing a reference to a connection."

    To me it sounded like bt_conn_ref() and bt_conn_unref() are opposites this turned out to not be true. In the code examples for a central I noticed that br_conn_ref() is not used. The function bt_conn_le_create() is where the first counting reference shows up. My initial assumption that bt_conn_ref() and bt_conn_unref() have a 1:1 relationship is incorrect. Any function that gives you the connection reference increments the reference counter for that reference. These aren't listed in the overview paragraph, but they can be found in each function's definition. So if you aren't looking for it's easy to miss. It would be nice if in the definition of bt_conn_unref() it listed the functions that will require it's use.

    Ultimately the mistake in my code was after the usage of bt_conn_lookup_addr_le(). I don;t think that this was shown in any of the example code, but it is necessary for sending a message originating from the central device to a particular peripheral device. In my case, this central needs to connect to 4 peripherals and route the messages correctly. I have to look up the connection using the MAC address. That gets me the connection pointer and increments the reference counter. As soon as I write to an attribute I have to call bt_conn_unref(). Her's is what I ended up writing:

    central_send_error_t central_send_tx(uint8_t * p_mac, uint8_t type, uint8_t * p_data, uint16_t len)
    {
    	int err = 0;
    	bt_addr_le_t bt_addr_le;
    	memcpy(&bt_addr_le.a.val, p_mac, BT_ADDR_SIZE);
    	bt_addr_le.type = type;
    	struct bt_conn * p_conn = bt_conn_lookup_addr_le(BT_ID_DEFAULT, &bt_addr_le);
    	if(p_conn == NULL){
    		LOG_ERR("Invalid MAC address: %02x:%02x:%02x:%02x:%02x:%02x", bt_addr_le.a.val[0], bt_addr_le.a.val[1], bt_addr_le.a.val[2], bt_addr_le.a.val[3], bt_addr_le.a.val[4], bt_addr_le.a.val[5]);
    		bt_conn_unref(p_conn);
    		return CEN_SND_BAD_MAC;
    	}
    	LOG_HEXDUMP_DBG(p_data, len, "Packet: ");
    	LOG_DBG("Send on handle 0x%04x", m_tx_value_handle);
    	// LOG_HEXDUMP_INF(p_mac, BT_ADDR_SIZE, "MAC: ");
    	err = bt_gatt_write_without_response_cb(p_conn, m_tx_value_handle, p_data, len, false, _write_finished_cb, 0);
    	if(err){
    		LOG_ERR("Write error: %d", err);
    		bt_conn_unref(p_conn);
    		return;
    	}
    	bt_conn_unref(p_conn);
    	return CEN_SND_NO_ERROR;
    }

    Before I was looking up the connection, but never calling bt_conn_unref(), so the reference count was going up and up with every communication. I had a work around that was calling it if found an existing connection, but it would only decrement it by on each time it failed. So the longer I was connected, the more messages were written. It would take longer to unref the connection.

    The connection management summary says to get a connection reference "whenever storing a connection pointer for a longer period of time." This was vague, but I had assumed that could be for the duration of the connection. That combined with the acquisition of a connection reference with every write unknown to me was causing my issue.

Children
No Data
Related