Intermittent BLE connection

I am working with nRF Connect SDK 2.6.1 on a Central and a Peripheral and I am running the BAP similar to the nRF Audio Example. I am using out of band pairing over UART for the initial secure connection. The OOB pairing works fine.

After connection is established and bonds are saved, the peripheral basically always advertises if bonds exist. The central device is turned on and off and scans when it turns on. I know this is a little different than traditionally, but the Basic Audio Profile forced which devices are the Central and the Peripheral.

If I continuously cycle power to the Central and watch the log output, quite frequently I can see that sometimes it needs multiple attempts to connect. It seems to happen more frequently when the area is a little more ACTIVE in the BLE band. 

I'm using Wireshark with the nRF Sniffer to look at the BLE traffic, and it seems like the Peripheral (Slave on the Wireshark log), stops responding to the Centrals (Master) packets. I'm using security level 4 so Wireshark is unable to decrypt the message, but I should still see a response from the Peripheral in order to maintain the connectivity.

It seems quite frequently the peripheral stops responding after it sends the OP Code LL_START_ENC_REQ.  Here's the log of the connection and when the peripheral stops responding.

Here is a successful connection where you can see constant communication between the central and the peripheral:

On my log output on the Central and the Peripheral when the failure occurs I can see the connection occur but then a "Security failed" notification. It then seems to struggle with the disconnection but eventually does, it then starts advertising again and then successfully connects for the most part on the 2nd attempt. Sometimes it takes a couple of attempts but it seems to always reconnect.


-- [00:04:21.230,010] <dbg> event: thread_status_event_supv: Conn evnt
-- [00:04:21.580,444] <wrn> ble_supv: Security failed: level 1 err 9
-- [00:04:21.580,474] <wrn> ble_supv: Failed to disconnect -128
-- [00:04:21.580,718] <dbg> event: thread_status_event_supv: Disc Evnt
-- [00:04:21.640,258] <inf> ble_supv: Advertising to xx:xx:xx:xx:xx:xx (random) started

-- [00:04:25.829,498] <dbg> event: thread_status_event_supv: Conn evnt

I am wondering about the procedure when the Central device is powered down and disconnects. I'm basically just calling the functions to stop streaming and disconnect and then powering down. The peripheral does seem to take a couple of seconds to recognize the disconnection and receive the "disconnect" callback. But even if I ensure to wait for this callback on the peripheral to occur before turning the central back on, I still get the occasional failure. Is there still something hanging up in the peripheral device from the connection that is occuring due to my procedure for the disconnection? What should the procedure be to properly stop streaming and disconnect from a peripheral? Is it possible its somehow using the "conn" handle from the previous connection?

The current BLE disconnection procedure when the central is shut down. There is a little more error checking, but basically if we're streaming, stop the streaming and then disconnect:

if (!bt_content_ctlr_media_state_playing()){
        return 0; // we are already paused
    }
bt_content_ctrl_stop(default_conn);
bt_conn_disconnect(default_conn, BT_HCI_ERR_REMOTE_USER_TERM_CONN);
Its also possible the disconnection procedure has nothing to do with this failure. I'm using whitelisting on the peripheral so it only connects to the bonded device. But I am just using regular advertising rather than directed advertising. I didn't see the need for directed advertising since we're using out of band pairing. 
Parents
  • Hi adunk,

    After connection is established and bonds are saved, the peripheral basically always advertises if bonds exist. The central device is turned on and off and scans when it turns on. I know this is a little different than traditionally, but the Basic Audio Profile forced which devices are the Central and the Peripheral.

    I am not familiar with what BAP requires here, but this sound pretty much the same as how a pair of Central and Peripheral would work since Bluetooth 4.x, no? Could I perhaps miss something here?

    But if dwelling in the reasons of the setup is not directly related to the issue, we can skip this part.

    I am wondering about the procedure when the Central device is powered down and disconnects. I'm basically just calling the functions to stop streaming and disconnect and then powering down. The peripheral does seem to take a couple of seconds to recognize the disconnection and receive the "disconnect" callback.

    This sound like the Central powering down happen too quick that the call to disconnect didn't actually work on the Central yet. The Peripheral would then just keep waiting for Central packets that stop coming and eventually time out.

    Since you already have a sniffer setup, can you check if that is the case? You could temporarily disable encryption just for this test too.

    Also, how does this power cycle happen? Power cycle usually mean a physical power cut and restore. However, you mentioned the central ending the stream and disconnect, that sounds like software is involved.

    Is there still something hanging up in the peripheral device from the connection that is occuring due to my procedure for the disconnection?

    From what you have described here, that can seem to be the case, but there shouldn't be anything like that...

    Let's also look at these logs:

    -- [00:04:21.230,010] <dbg> event: thread_status_event_supv: Conn evnt
    -- [00:04:21.580,444] <wrn> ble_supv: Security failed: level 1 err 9
    -- [00:04:21.580,474] <wrn> ble_supv: Failed to disconnect -128
    -- [00:04:21.580,718] <dbg> event: thread_status_event_supv: Disc Evnt
    -- [00:04:21.640,258] <inf> ble_supv: Advertising to xx:xx:xx:xx:xx:xx (random) started

    -- [00:04:25.829,498] <dbg> event: thread_status_event_supv: Conn evnt

    I assume ble_supv is one of your code module, and the code is based on one of the sample code. In that case, "Failed to disconnect -128" comes from bt_conn_disconnect(), and error 128 means there wasn't a connection.

    This is quite odd. The two modules of the Bluetooth stack involved here are SMP and CONN. Could you please enable these configs to get more logs?
    CONFIG_BT_SMP_LOG_LEVEL_DBG
    CONFIG_BT_CONN_LOG_LEVEL_DBG  

    There could be a significant increase in amount of log, so you might also need to configure more buffers for the log module. Please also make sure CONFIG_LOG_MODE_DEFERRED is enabled to minimize the impact of logging to the Bluetooth stack's operations.

    What should the procedure be to properly stop streaming and disconnect from a peripheral? Is it possible its somehow using the "conn" handle from the previous connection?

    It would be best to wait for a callback confirming the success of an operation before moving on to the next step.

    The current BLE disconnection procedure when the central is shut down. There is a little more error checking, but basically if we're streaming, stop the streaming and then disconnect:

    if (!bt_content_ctlr_media_state_playing()){
            return 0; // we are already paused
        }
    bt_content_ctrl_stop(default_conn);
    bt_conn_disconnect(default_conn, BT_HCI_ERR_REMOTE_USER_TERM_CONN);
    Its also possible the disconnection procedure has nothing to do with this failure.

    Right, so this looks like there is no wait for any procedures to complete at all. Still, in an ideal world, the peripheral should be robust enough that it isn't impacted by what the central does.

    I'm using whitelisting on the peripheral so it only connects to the bonded device. But I am just using regular advertising rather than directed advertising. I didn't see the need for directed advertising since we're using out of band pairing. 

    Noted.

    Hieu

  • I noticed in the latest SDK release, an issue was fixed in the nRF5340. Is this a symptom of this issue? Is it trying to connect again during the connection process?

  • Could you please share how the v2.7.0 change item could result in the issue? I don't quite follow.

    One thing stands out in the Peripheral log to me:

    <dbg> bt_conn: bt_conn_set_state: connected -> disconnect-complete

    This is odd because this log can only be produced here: https://github.com/nrfconnect/sdk-zephyr/blob/v3.5.99-ncs1/subsys/bluetooth/host/conn.c#L800

    Does this consistently appear before a failure?

    Could you please set a breakpoint there and use debug mode to get the call stack leading up to that?

    While doing that, could you please also enable CONFIG_BT_HCI_CORE_LOG_LEVEL_DBG?

    Also, regarding the initial suspicion that the connection handle is somehow being reused and mixed up, do you see this being the case in the full log? The log so far is only for the attempt where the issue happens, so I don't see what the connection handle is before that.

    Another thing is, if you want to upload large amount of log text, you might want to use the Insert > Code function:

    Unfortunately, I am out of office tomorrow. Another engineer will follow-up with you. We are all facing abnormally high loading, so please excuse us if there is any delay in following up.

  • Regarding the v2.7.0 change, I don't know what exactly is going on in the BLE stack during the connection procedure, but I was speculating if the Central tried to connect a 2nd time to the Peripheral in the middle of the connection process, it could cause a disconnection from the 1st attempt or possibly a failed 2nd connection attempt because its now already connected. I think this would be a question for someone more familiar with whoever addressed this "issue" that was resolved in 2.7.0. Etc-were other people seeing similar issues that I'm seeing?

    I will follow up with the responses to the remainder of your questions.

Reply
  • Regarding the v2.7.0 change, I don't know what exactly is going on in the BLE stack during the connection procedure, but I was speculating if the Central tried to connect a 2nd time to the Peripheral in the middle of the connection process, it could cause a disconnection from the 1st attempt or possibly a failed 2nd connection attempt because its now already connected. I think this would be a question for someone more familiar with whoever addressed this "issue" that was resolved in 2.7.0. Etc-were other people seeing similar issues that I'm seeing?

    I will follow up with the responses to the remainder of your questions.

Children
No Data
Related