This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SD_BLE_GAP_DISCONNECT event not received in some cases

If we try to force terminate an ogoing BLE connection by invoking sd_ble_gap_disconnect, we don't receive the SD_BLE_GAP_DISCONNECT event in some cases even after the supervision timeout.It is not easily reproducible and we have not been able to capture sniffer logs for the same. We did defensively added an application side timeout logic which fires up after 20 seconds to cleanup the connection context in such cases. One of the key step as part of connection cleanup is to restart the BLE advertisement in Connectable mode (as we change it to NonConnectable mode after first connection to prevent more than one concurrent connections). But this step (invoking sd_ble_gap_adv_start) fails with NRF_ERROR_CONN_COUNT in such cases. It seems like softdevice still hasn't cleaned up the connection context for first connection and it rejects advertising again in connectable to prevent more than one concurrent connections. I have following specific questions:

1. What can cause softdevice to completely miss raising SD_BLE_GAP_DISCONNECT event in some cases? Can we configure any additional timeout (otherthan supervision timeout) in softdevice to generate an explicit timeout event in such cases?

2. Is there a way to force softdevice to cleanup the connection context in case of such failures.

  • Hi 

    This does not sound like normal behavior of the SoftDevice, if the link actually disconnects you should get the SD_BLE_GAP_DISCONNECT event eventually. 

    Are you checking the return code of the call to sd_ble_gap_disconnect(..), to make sure it returns NRF_SUCCESS?

    Can you please let me know which nRF device you are using, and which type/version of the SoftDevice and SDK you are using?

    Does the issue occur on a standard Nordic DK, on custom hardware, or on both?

    Best regards
    Torbjørn

  • Are you checking the return code of the call to sd_ble_gap_disconnect(..), to make sure it returns NRF_SUCCESS?

    Yes, we do check the return status and it is returning NRF_SUCCESS even in those cases. Following is the code used for disconnection:

    BTStatus_t prvBTDisconnect( uint8_t ucAdapterIf,
                                const BTBdaddr_t * pxBdAddr,
                                uint16_t usConnId )
    {
        ret_code_t xErrCode;
    
        xErrCode = sd_ble_gap_disconnect( usConnId, BLE_HCI_REMOTE_USER_TERMINATED_CONNECTION );
        BT_NRF_PRINT_ERROR( sd_ble_gap_disconnect, xErrCode );
        return BTNRFError( xErrCode );
    }
    
    #define BT_NRF_PRINT_ERROR( function, errorcode )                                                                                  \
        if( ( errorcode ) != NRF_SUCCESS )                                                                                             \
        {                                                                                                                              \
            NRF_LOG_ERROR( "Error, cannot execute " # function ", err_code: %d, %s\n", ( errorcode ), nrf_strerror_get( errorcode ) ); \
        }                                                                                                                              \
    

    Can you please let me know which nRF device you are using, and which type/version of the SoftDevice and SDK you are using?

    We are using nrf52840 on a custom board.

    Softdevice: S140 Version 6.1.1 

    NRF SDK Version - 15.2.0

    Does the issue occur on a standard Nordic DK, on custom hardware, or on both?

    We have seen this on custom hardware, haven't tried on Nordic DK.

    Going though the logs, I also found that even after disconnect request, the connection remains active as I can see connection param update events being exchanged even after that.

    [22:13:00.544] <warning>: Timed out waiting for next network message.
    [22:13:00.544] <info>: Closing underlying network connection...
    # Here we invoke disconnect, we assume it happened successfully 
    # because we didn't see any error logs, but connection remains active
    [22:13:48.226] <info>: Received Connection Params: MIN_CI: 39, MAX_CI: 39, SL: 0, ST: 500
    [22:13:48.227] <info>: Received Connection Params: MIN_CI: 6, MAX_CI: 6, SL: 0, ST: 500
    [22:13:48.227] <info>: Received Connection Params: MIN_CI: 6, MAX_CI: 6, SL: 0, ST: 500
    

  • Hi 

    Thanks a lot for the additional information. I have shared the details with the stack team in order to get some input from them. 

    Is this a peripheral or a central link where you see the issue?

    Could the disconnect call happen at any time after the connection is established, or will the connection be running for a minimum amount of time before this call can occur?

    The fact that you receive updated connection parameters imply that you try to disconnect shortly after the connection is established, but I am not sure if this is really the case since there is a 48 second gap in the log. 

    There might be a limit on how fast you can disconnect after a connection is established, but I would need to confirm this with the stack developers. 

    And if there is any way to get a stack trace this would obviously be helpful. How often does the issue typically occur in your testing?

    Have you checked multiple boards to see if the issue is consistent across different hardware, or if it could somehow be hardware related?

    Best regards
    Torbjørn

  • Thanks Torbjorn for taking this forward with the stack team.

    Is this a peripheral or a central link where you see the issue?

    This is a peripheral device which receives incoming connection from a mobile device.

    Could the disconnect call happen at any time after the connection is established, or will the connection be running for a minimum amount of time before this call can occur?

    The device goes through a custom security handshake process for every new connection where there are timeouts defined for each step in the process. The timeout for the first step is 10 seconds, so device forces a disconnection if it doesn't receive the the first handhsake packer within 10 seconds of establishing a new connection. Subsequently disconnection can happen at different stages with different timeouts.

    And if there is any way to get a stack trace this would obviously be helpful. How often does the issue typically occur in your testing?

    I have not been able to reproduce it so far in my test setup. We saw only 12 such cases in production in the month of June (with a rough reproducibility rate of 0.03%). 

    Have you checked multiple boards to see if the issue is consistent across different hardware, or if it could somehow be hardware related?

    I will check it once I am able to reproduce it.

  • Hi 

    Thanks for sharing more details. 

    Are you able to identify at which of these handshake steps it disconnects, or how old the connection is at the time it happens?

    When you say a reproducibility rate of 0.03%, do you mean the error has occurred on 0.03% of the devices you have produced in that period?

    Is the issue discovered in production, or is it a field return from customers?

    Best regards
    Torbjørn

Related