This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

The scan observer stops getting called after a failed attempt to connect to a peripheral using whitelist.

Hello All,

I have the following issue
- a peripheral and a central using nrf52832 with SDK 15.3 and S132
- on the peripheral I advertise with a whitelist
- the central scans with UUID filter activated for a specific UUID

Now when the central sees the peripheral, it tries to connect to it but since it is not in the centrals whitelist the connection is not established.
The problem on the central side is that the observer function nrf_ble_scan_on_ble_evt() stops getting called after the failed attempt to connect.

I reproduced this behavior using the ble_app_multirole_lesc example with some minor changes.
On the central I commented out the call to ble_advertising_start() so that it only scans.
On the peripheral I removed the scan_start() call so it only advertises and I activated the whitelist which has one entry.

I did put logging in nrf_ble_scan_on_adv_report() so that I can see whenever it gets called. When I run the central code, there is continuous log showing advertising packets from different devices. After a failed attempt to connect, I turn off the peripheral and there is no log output from nrf_ble_scan_on_adv_report() any more on the central.

This is how the log looks like on the central:
<info> ble_scan: B460775CF63C
<info> ble_scan: 775F1B8AEA41
<info> ble_scan: EE8BD6D1ABF4
<info> app: CENTRAL: Connected, handle: 0.
<info> app: CENTRAL: Searching for HRS on conn_handle 0x0
<info> peer_manager_handler: Connection security failed: role: Central, conn_handle: 0x0, procedure: Bonding, error: 4352
<warning> peer_manager_handler: Disconnecting conn_handle 0.
<info> app: CENTRAL: Disconnected, handle: 0, reason: 0x3E
<info> app: Scanning
<info> ble_scan: 44237CE1653F
<info> ble_scan: EE8BD6D1ABF4

Obviously this leads to a problem state where the central can not find any other devices to connect to. Stopping and starting the scan does not help. Only if the same peripheral appears again, the central starts calling nrf_ble_scan_on_adv_report again.

Do you know what could be causing this issue and how to resolve it? Does the SoftDevice get stuck somewhere?

Thanks,
Skara


Parents
  • Hi Skara

    Any chance you could provide a sniffer trace when this issue occurs?

    Assuming you have an nRF52DK to spare you can run that as a sniffer, using the nRF Sniffer application.

    Best regards
    Torbjørn 

  • Hi Torbjørn,

    sure, here is the capture of multiple connection attempts.
    connect_non_whitelisted_device.pcapng

  • Hi Skara

    Thanks for the added details. I have been able to reproduce the issue now. It seems the SoftDevice is somehow getting stuck when trying to connect to the device, but I haven't been able to find a way to solve it yet. 

    I will have to look more into this and get back to you later. 

    Best regards
    Torbjørn

  • Hi again

    Thanks a lot for the patience. After spending more time with this I believe I have tracked the issue down to the nrf_ble_scan_start() function, which is set up to ignore the NRF_ERROR_INVALID_STATE error when calling sd_ble_gap_scan_start(..) to restart the scan. 

    I made the following small modification to the nrf_ble_scan_start() function (to log when INVALID_STATE is returned), and notice that before the stack ends up in the 'stuck' state I see the "Are we scanning or not??" message printed. 

    ret_code_t nrf_ble_scan_start(nrf_ble_scan_t const * const p_scan_ctx)
    {
        VERIFY_PARAM_NOT_NULL(p_scan_ctx);
    
        ret_code_t err_code;
        scan_evt_t scan_evt;
    
        memset(&scan_evt, 0, sizeof(scan_evt));
    
        nrf_ble_scan_stop();
    
        // If the whitelist is used and the event handler is not NULL, send the whitelist request to the main application.
        if (is_whitelist_used(p_scan_ctx))
        {
            if (p_scan_ctx->evt_handler != NULL)
            {
                scan_evt.scan_evt_id = NRF_BLE_SCAN_EVT_WHITELIST_REQUEST;
                p_scan_ctx->evt_handler(&scan_evt);
            }
        }
    
        // Start the scanning.
        err_code = sd_ble_gap_scan_start(&p_scan_ctx->scan_params, &p_scan_ctx->scan_buffer);
    
        // It is okay to ignore this error, because the scan stopped earlier.
        if ((err_code != NRF_ERROR_INVALID_STATE) && (err_code != NRF_SUCCESS))
        {
            NRF_LOG_ERROR("sd_ble_gap_scan_start returned 0x%x", err_code);
            return (err_code);
        }
        else if(err_code == NRF_ERROR_INVALID_STATE)
        {
            NRF_LOG_INFO("Are we scanning or not????");
        }
        NRF_LOG_DEBUG("Scanning");
    
        return NRF_SUCCESS;
    }

    Could you try to make the same change on your end, make sure that logging is enabled for the scan module, and see if you also get this message when the problem occurs?

    Best regards
    Torbjørn

  • Hey,

    indeed I do see the "Are we scanning or not?" output when the device ends up in this state. Do you know why is the device entering this invalid state and how can I recover in that case?

    Detecting this state might be a bit of a problem in my application though. I do start the scanner every time I get  a BLE_GAP_EVT_CONNECTED or BLE_GAP_EVT_DISCONNECTED event. Since the stack is stuck and no events are sent out, the scanner will not be restarted.

    In my tests I use a timer that restarts the scanner upon timeout and that is when I see the "Are we scanning or not?" message.

    By the way I have the same behavior when using SDK12.3 on the central.


    Thank you again for your help.
    Skara

  • Hi Skara

    In my case I was running an app_timer that would try to start scanning at 5 second intervals. This would recover the stack after the issue occurred, but obviously this is not an ideal workaround. 

    It could be interesting to try and set a flag when the issue occurs, and check this flag in main. Once the flag is set you try to start scanning again from main context. 

    I will have to spend some more time with this, and will try to get you an update early next week. 

    Best regards
    Torbjørn

  • Hi Torbjørn,

    were you able to find anything new regarding this issue?

    Thanks
    Sakra

Reply Children
  • Hi Skara

    My deepest apologies. It has been some busy months here, and I forgot to follow up on your case. Thanks a lot for the reminder. 

    Debugging this further it seems the problem is related to the missed handling of the BLE_GAP_EVT_TIMEOUT event. 

    I tried to track if any BLE events were sent after the occurrence of the "Are we scanning or not" situation, and found that the only BLE event I could see was the BLE_GAP_EVT_TIMEOUT event. 
    This event signals that the connection attempt failed, but is not being picked up by any of the event handlers in the application, or by the scan module. 

    The scan module only responds to this event when the timeout source is BLE_GAP_TIMEOUT_SRC_SCAN, but in this case the source is BLE_GAP_TIMEOUT_SRC_CONN because it is a connection that times out. 

    In your main file I notice that there is some log output if the event is passed to the on_ble_central_evt(..) function, but since the role is set to BLE_GAP_ROLE_INVALID the event will not be forwarded here. 

    Could you try to start scanning manually if the BLE_GAP_EVT_TIMEOUT event occurs with source set to BLE_GAP_TIMEOUT_SRC_CONN and role set to BLE_GAP_ROLE_INVALID?

    I believe this should solve the issue. 

    Best regards
    Torbjørn

  • Hey Torbjørn,

    no worries :) Thank you for taking your time, to help solving my issue.

    I'm not sure I understand exactly what you mean.

    As I already mentioned, I restart the scanner on CONNECTED/DISCONNECTED events. In my case the application gets stuck somewhere after a connection attempt is being made in the nrf_ble_scan_connect_with_target function, where sd_ble_gap_connect is called and it returns success. There are no more events after that to which I can react.

    I do have NRF_SDH_BLE_LOG_ENABLED and the NRF_LOG_LEVEL set to Debug so I can see all the events coming to that module, but after that sd_ble_gap_connect where my app gets stuck, there are no more events.

    My application restarts the scanner manually after about a minute where I get the NRF_ERROR_INVALID_STATE response but still no events are logged.

    Where and when exactly should I be expecting/handling this BLE_GAP_EVT_TIMEOUT?

    Thank you!
    Skara

  • Hi Skara

    Could you try to add the following code to the on_ble_evt(..) function, inside the switch case?

    case BLE_GAP_EVT_TIMEOUT:
        // The issue occurs when the connection times out, in which case scanning hasn't been started when we think it has
        if (p_ble_evt->evt.gap_evt.params.timeout.src == BLE_GAP_TIMEOUT_SRC_CONN)
        {
            NRF_LOG_DEBUG("Connection Request timed out. Restart scanning...");
            nrf_ble_scan_start(&m_scan);
        }
        break;

    Best regards
    Torbjørn

  • Hi Torbjørn,

    I did place that code in the on_ble_evt function but it never gets called.

    As I've written previously, I am logging all events that come. I do log them in the ble_evt_handler function which is the observer for ble events. I also log them in the nrf_sdh_ble_evts_poll function. After the sd_ble_gap_connect that returns success there are no more events coming.

    In the picture above you can see after I start the scanning device it finds lots of ble devices which do not match the filter. Than I turn on the second board which is advertising only and right after I see the 'Connection status: 0' log, I unplug the advertising device. At that point the scanning device is already hanging. There are no more logs coming until I start the advertising device again.

    Cheers,
    Skara

  • Hi 

    Are you able to share your latest version of your code with me, so I can test it out here?

    I am basing my test on the code you sent earlier, and with the changes I made I am unable to get it to lock up. 

    Best regards
    Torbjørn

Related