Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

Out of range catastrophe

Hello everyone!

This thread is similar to Indoor BLE Range Improvements , as same system is used. There we discuss about S140 Soft Device scheduling conflict and here we want to talk about problem caused by device out of range. 

Background 

We are in development phase for indoor household appliances with many wireless connected entities. There is a requirement to support following topology: 8 battery powered devices and single mains powered device. As the target environment is indoor and low rate of data exchange (low bandwidth) is needed, we agree to test out BLE5 LE CODED PHY (S=8, 125kbit/s) aka. LongRange, using Minew MS88SF3 module that features nRF52840 chipset.

So, we prepare a simple mock-up test application to check out how the system will behave in real-life scenarios. Test application was very simple, central BLE device scans continuously and stops after all 8 connections are established. Peripheral BLE devices advertise when not in connection. With such approach system shall always converge to have all devices connected, as if one device gets disconnected advertising/scanning shall re-started and the drop connections should re-establish. Every second 64 bytes of dummy data were exchange between central and peripheral device based on server-initiate update (notification type). Tx power for advertising and connection was set to 8dBm. Connection interval was set to 1500ms, SlaveLatency to 2 and Supervision Timeout to 15000ms. Well, it works perfectly on a desk! 

BLE settings

Platform description:

  • IC:               nRF52840
  • Module:       Minew MS88SF3
  • SDK:            nRF5_SDK_17.1.0_ddde560
  • Softdevice:  s140_nrf52_7.2.0 
  • IDE:             SEGGER Embedded Studio for ARM Release 7.10a Build 2022121504.52072
  • OS:              Windows 10

"Out of range" problem

We done a couple of test and found out that:

  • If we take Device A (one of the 8 peripheral devices, all in connection) out of range, it disconnects. If then moved back to range, it connects back without any problems - NORMAL EXPECTED OPERATION,
  • If we take Device A (one of the 8 peripheral devices, all in connection) out of range, it disconnects. If we then move Device B (one of the 8 peripheral devices, all beside Device A in connection) out of range, it disconnects. Then Device B does not re-connect when moved back to range. Only after moving Device A back to range, both Devices A & B gets re-connected. It is like Device A is blocking Device B from re-connectiong, regardless Device B is in range! - ABNORMAL OPERATION,
  • We observe that only first disconnected device (due to out of range) blocks other from re-connecting. Moving first disconnected device back to range triggers all other devices to re-connect,
  • We suspect that Device A (goes out of range first) block Device B in context of advertisement, as both devices are peripheral. But that is not the case as we done separate tests to eliminate that possibility, where the first disconnected device did not start advertising at disconnection. Same effect was observed, other devices did not re-connect and thus conclude that advertisement do not play role in that effect, meaning that the problem lives on Central device.
  • If we disable data transmission on Central Device and Device A and repeat point 2., Device B gets re-connected when moved back to range. In that test case Device A did not blocked Device B from re-connection.

We done a couple of tests addressing "blocking problem" and there was a consistent outcome. Following picture shows the above described problem on real test mock-up system with 9 peripheral devices. For that test we disable data transmiting for Central device, Dev#6 and Dev#16. During that test following events takes place:

1. Dev#14 lost connection (not on purpose, might cause moving people around it, closing doors) and was automatically reconnected back - Not expected, but NO PROBLEM!
2. Dev#6 lost connection on purpose, to test that central device is working OK. Reconnected OK!
3. Dev#16 was moved out of range and gets disconnected - OK, expected!
4. Dev#6 remove battery to test if will reconnect on putting battery back - OK, RECONNECTS!
5. Repeate point 4. - RECONNECTS!
6. Repeate point 4. - RECONNECTS! --> Consistent reconnection-OK!
7. Moving Dev#16 back to range and device reconnects! It disconnects and reconnect 2x due to moving the device! - OK, expected!
8. Moving Dev#15 out of range. That device do not have tx disabled. Device disconnects. - OK, expected!
9. Repate point 4. - DOESN'T RECONNECTS! ABNORMAL BEHAVIOUR!
10. Moving Dev#15 back to range. It connects back! - OK, expected for Dev#15 to re-connect!
11. Dev#6 connect back right after Dev#15 reconnects! STRANGE BEHAVIOUR, as it Dev#15 blocked Dev#6 from re-connecting!
12. Dev#15 lost connection (not on purpose, might cause moving people around it, closing doors) and was automatically reconnected back. - Not expected, but NO PROBLEM!

All events are shown on the picture:

Therefore, following questions arise:

  1. Why does the Device A block Device B from re-connection as described at point 2. (As said, we think it is a Central Device issue)? What is the rational explanation for that?
  2. Why is there a different behaviour between point 2. & 3.? As transmission is the only difference it must be the source of problems, or?!
  3. How can we mitigate that "blocking problem", where Device A blocks Device B from re-connection within a valid range?
  4. Do you receive any similar reported problems? If so, how do they solve it?

Thank you for all the help!

BR, Žiga

 

  • Can you try my suggestion on removing the whitelist implementation on your scanning device so we could narrow down whether the issue is in fact due to the whitelist implementation or not? Then you can see whether the scanner is able to find device B in the same scenario or not. 

    It is the only thing we can think of that would cause behavior like this, but I'm not able to spot what's wrong reviewing your project as there are a lot of files and lines of code...

    Best regards,

    Simon

  • Can you try my suggestion on removing the whitelist implementation on your scanning device so we could narrow down whether the issue is in fact due to the whitelist implementation or not?

    As already said:

    There is no filtering/whitelist problem at all. It is simple the central device stops to scan. The advertising report callback from Soft Device is not being invoked anymore. With the fact that peripheral device advertise correct packets with expected magic number for filtering, as inspected by nRF Sniffer + Wireshark.

    Problem is more fundamental and it lies one step before even reaching the whitelist logic. As stated the advertising report callback does not come any more, even though sniffing shows that peripheral is advertising. 

    Any idea why advertising report callback stops to trigger?

    And again, when moving DevA back in range, everything goes to normal. Adv Reports start to come and other devices connects as expected. 

    It is the only thing we can think of that would cause behavior like this, but I'm not able to spot what's wrong reviewing your project as there are a lot of files and lines of code...

    I understand that...

    BR, Žiga

  • Hi Žiga

    We just need you to try without the filtering, as that's what we deem is most likely causing the scanner not to look for device B. Our theory is that when device A goes out of range, the scanner is specifically scanning for the address/ID of device A, and won't care about any other devices until device A has reconnected (or a power reset is triggered for example).

    Best regards,

    Simon

  • Hello Simonr,

    sorry for late response. I will test your suggestions and will come back with results in a couple of days.

    Thank you for all the help!

    BR, Žiga

  • Hello Simonr,

    I've done as you recommended, so removed all the logic regarding whitelist, see the advertisement report event handler:

    ////////////////////////////////////////////////////////////////////////////////
    /**
    *		BLE on advertisement report event handler 
    *   
    * @note     This function is being executed in main BLE stack callback!
    *
    * @param[in] 	p_ble_evt   - Pointer to BLE event informations
    * @return 		void
    */
    ////////////////////////////////////////////////////////////////////////////////
    static inline void ble_c_evt_on_adv_report(ble_evt_t const * p_ble_evt)
    {
        static  uint8_t             man_data[32]    = { 0 };
                uint16_t            man_data_len    = 0;
                 ble_c_obs_data_t   observer_data   = { 0 };
    
        // Get advertisment info
        const ble_gap_evt_adv_report_t * const p_adv_report = &( p_ble_evt->evt.gap_evt.params.adv_report );
    
        // Get advertisement type
        const ble_gap_adv_report_type_t * const p_adv_type = &( p_adv_report->type );
    
        // Get advertisement data and lenght
        const uint8_t * p_adv_data      = p_adv_report->data.p_data;
        const uint16_t  adv_data_len    = p_adv_report->data.len;
    
        // Advertisment status OK
        if ( BLE_GAP_ADV_DATA_STATUS_COMPLETE == p_adv_type->status )
        {
            // Get manufacturer data
            if ( eBLE_C_OK == ble_c_get_manufacturer_data( p_adv_data, adv_data_len, (uint8_t*) &man_data, &man_data_len ))
            {
                // Advertisment request for connection
                if ( 1U == p_adv_type->connectable )
                {
                    BLE_C_DBG_PRINT( "Connectable advertising report from %02X:%02X:%02X:%02X:%02X:%02X", p_adv_report->peer_addr.addr[5], p_adv_report->peer_addr.addr[4], p_adv_report->peer_addr.addr[3],
                                                                                                          p_adv_report->peer_addr.addr[2], p_adv_report->peer_addr.addr[1], p_adv_report->peer_addr.addr[0] );
    
                    // Is this device already connected
                    //if ( false == ble_c_check_dev_conn( p_adv_report->peer_addr.addr ))
                    {                      
                        // Check for magic request
                        //if ( true == ble_c_check_magic_request( man_data, man_data_len ))
                        {
                            BLE_C_DBG_PRINT( "Connection initiated..." );
    
                            // Continue scanning
                            (void) sd_ble_gap_scan_start( NULL, &g_ble_c.scan_data );
    
                            // Stop scanning before connection
                            (void) sd_ble_gap_scan_stop();
    
                            // Connect to peer device
                            if ( NRF_SUCCESS != sd_ble_gap_connect( &p_adv_report->peer_addr, &g_ble_c.scan_params, &g_ble_c.conn_params, BLE_C_CONN_CFG_TAG ))
                            {
                                BLE_C_DBG_PRINT( "Attept to connect failed!" );
                            }
                        }
                    }
                }
    
                // Advertisement only broadcast
                else
                {
                    BLE_C_DBG_PRINT( "Broadcasting advertising report!" );
    
                    // Assemble observer data packet
                    observer_data.rssi = p_adv_report->rssi;
                    observer_data.size = man_data_len;
                    memcpy( &observer_data.data, &man_data, man_data_len );
                    memcpy( &observer_data.mac, p_adv_report->peer_addr.addr, sizeof( observer_data.mac ));
                
                    // Put data to observer data
                    if ( eRING_BUFFER_OK != ring_buffer_add( g_ble_c.observer_buf, (ble_c_obs_data_t*) &observer_data ))
                    {
                        // Buffer overflow!
                        BLE_C_DBG_PRINT( "Observer buffer overflow! Increse buffer size via \"BLE_C_OBSERVER_BUF_SIZE\" macro!");
                    }
                }
            }
        }
    
        // Continue scanning
        (void) sd_ble_gap_scan_start( NULL, &g_ble_c.scan_data );
    }

    After two days of testing the results are the same. Dev B still do not re-connect as DevA is blocking it for some jet unknown reason. As you can se from the code, I added additional debug message before calling "sd_ble_gap_connect" in order to confirm whitelist logic is disable correctly.


    Furthermore I have new informations about the problem that might be useful for your team to further investigate/analyse the problem.  

    I observe that connectable advertising report is received and after that connection is not established right away. That stands only for peripheral devices that are not close to the central device (located at -85/-95 dBm). In some cases, there might be a time difference for couple of minute (worst case I measure was ~4min). During that time, other devices that are close to central do not get connected, are ignored (they are advertising, confirmed with nRF Sniffer). 

    I provided a logged file, where you can see the described scenario. Focus on two devices with the same naming convention and roles as we have had used in the post above:

    • DevA, MAC:E4:8E:30:81:78:D9, device that blocks DevB, located away from central
    • DevB, MAC: CE:AA:75:A4:1C:F1, device close to central, but blocked to connect

    Look at the line between 61 and 88 inside attached debug log file  from central device - "OutOfRange_Central_DebugLog__30_06_2023.txt":

    1. Line 61, Time 07:28:41.313: BLE_C: Connectable advertising report from E4:8E:30:81:78:D9
    2. Line 70, Time 07:32:37.449: New connection established! Link count: 4 (with E4:8E:30:81:78:D9 device)

    (Blocked device with MAC CE:AA:75:A4:1C:F1 was advertising during that time. That device is close to central.)

    3. Line 78, 07:32:37.638: BLE_C: Connectable advertising report from CE:AA:75:A4:1C:F1

    4. Line 81, 07:32:38.737: BLE_C: New connection established! Link count: 5 (with CE:AA:75:A4:1C:F1) 

    Look at the times! Between connectable adv report of DevA and actual connection establishement took 4min. After DevA is connected, DevB gets connected immediately. 

    Questions

    1. Why does it take more time for distand device to established connection?
    2. Why is this connection est. delay only there for distand devices? 
    3. Is such behaviour expected and compliant with SIG BLE standard?
    4. Can this be mitigated by any way by application SW?

    Debug log file:

    (30.06.2023  07:28:08.535)            Central Mockup
    (30.06.2023  07:28:08.546)    ********************************************************
    (30.06.2023  07:28:08.557)     Software Version 0.0.4.0 (Builted on May 24 2023 11:35:03)
    (30.06.2023  07:28:08.562)     Hardware Version 1.9.0.0
    (30.06.2023  07:28:08.563)     
    (30.06.2023  07:28:08.570)     Enter 'help' to display supported commands
    (30.06.2023  07:28:08.577)    ********************************************************
    (30.06.2023  07:28:08.581)    Ready to take orders...
    (30.06.2023  07:28:08.852)    BLE_C: Change default connection parameters!
    (30.06.2023  07:28:08.862)    WARNING: Scanning disabled! Enable via "BLE Scan Control" parameter.
    (30.06.2023  07:28:16.560)    APP: PAIR button pressed!
    (30.06.2023  07:28:16.669)    BLE_C: Scanning started!
    (30.06.2023  07:28:16.676)    BLE_C: Connectable advertising report from F6:7C:55:2B:92:12
    (30.06.2023  07:28:17.973)    APP: PAIR button releassed!
    (30.06.2023  07:28:31.868)    BLE_C: Scanning started!
    (30.06.2023  07:28:31.882)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:31.891)    BLE_C: New connection established! Link count: 1
    (30.06.2023  07:28:31.898)    BLE_C:    Connection handle: 0
    (30.06.2023  07:28:31.905)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:28:31.915)    BLE_C:             Peer MAC: F6:7C:55:2B:92:12
    (30.06.2023  07:28:31.922)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:28:31.929)    BLE_C:        Slave latency: 2
    (30.06.2023  07:28:31.938)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:28:31.950)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:31.960)    BLE_C: Connectable advertising report from CE:AA:75:A4:1C:F1
    (30.06.2023  07:28:32.191)    BLE_C: Scanning started!
    (30.06.2023  07:28:32.211)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:32.225)    BLE_C: New connection established! Link count: 2
    (30.06.2023  07:28:32.235)    BLE_C:    Connection handle: 1
    (30.06.2023  07:28:32.244)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:28:32.259)    BLE_C:             Peer MAC: CE:AA:75:A4:1C:F1
    (30.06.2023  07:28:32.269)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:28:32.278)    BLE_C:        Slave latency: 2
    (30.06.2023  07:28:32.288)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:28:32.304)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:33.090)    BLE_C: Connectable advertising report from E2:88:C0:EE:68:12
    (30.06.2023  07:28:33.306)    BLE_C: Scanning started!
    (30.06.2023  07:28:33.323)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:33.337)    BLE_C: New connection established! Link count: 3
    (30.06.2023  07:28:33.345)    BLE_C:    Connection handle: 2
    (30.06.2023  07:28:33.355)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:28:33.366)    BLE_C:             Peer MAC: E2:88:C0:EE:68:12
    (30.06.2023  07:28:33.374)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:28:33.381)    BLE_C:        Slave latency: 2
    (30.06.2023  07:28:33.388)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:28:33.399)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:33.842)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 0 (response).
    (30.06.2023  07:28:33.944)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 1 (response).
    (30.06.2023  07:28:34.265)    BLE_C: Connectable advertising report from C8:B3:A5:1E:3B:30
    (30.06.2023  07:28:35.122)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 2 (response).
    (30.06.2023  07:28:39.459)    BLE_C: Scanning started!
    (30.06.2023  07:28:39.474)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:39.485)    BLE_C: New connection established! Link count: 4
    (30.06.2023  07:28:39.493)    BLE_C:    Connection handle: 3
    (30.06.2023  07:28:39.502)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:28:39.512)    BLE_C:             Peer MAC: C8:B3:A5:1E:3B:30
    (30.06.2023  07:28:39.521)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:28:39.529)    BLE_C:        Slave latency: 2
    (30.06.2023  07:28:39.538)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:28:39.549)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:28:41.313)    BLE_C: Connectable advertising report from E4:8E:30:81:78:D9
    (30.06.2023  07:28:45.175)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 3 (response).
    (30.06.2023  07:30:22.997)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:30:23.005)    BLE_C: Connection lost! Link count: 3
    (30.06.2023  07:30:23.012)    BLE_C:   Connection handle: 1
    (30.06.2023  07:30:23.023)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:30:23.028)    BLE_C: Scanning started!
    (30.06.2023  07:32:37.424)    BLE_C: Scanning started!
    (30.06.2023  07:32:37.439)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:37.449)    BLE_C: New connection established! Link count: 4
    (30.06.2023  07:32:37.456)    BLE_C:    Connection handle: 4
    (30.06.2023  07:32:37.464)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:32:37.475)    BLE_C:             Peer MAC: E4:8E:30:81:78:D9
    (30.06.2023  07:32:37.483)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:32:37.492)    BLE_C:        Slave latency: 2
    (30.06.2023  07:32:37.497)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:32:37.509)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:37.638)    BLE_C: Connectable advertising report from CE:AA:75:A4:1C:F1
    (30.06.2023  07:32:38.609)    BLE_C: Scanning started!
    (30.06.2023  07:32:38.726)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:38.737)    BLE_C: New connection established! Link count: 5
    (30.06.2023  07:32:38.746)    BLE_C:    Connection handle: 1
    (30.06.2023  07:32:38.756)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:32:38.769)    BLE_C:             Peer MAC: CE:AA:75:A4:1C:F1
    (30.06.2023  07:32:38.780)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:32:38.792)    BLE_C:        Slave latency: 2
    (30.06.2023  07:32:38.800)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:32:38.816)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:39.364)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 4 (response).
    (30.06.2023  07:32:39.583)    BLE_C: Connectable advertising report from D6:E7:74:FA:FE:0F
    (30.06.2023  07:32:39.915)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 1 (response).
    (30.06.2023  07:32:40.676)    BLE_C: Scanning started!
    (30.06.2023  07:32:40.690)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:40.700)    BLE_C: New connection established! Link count: 6
    (30.06.2023  07:32:40.708)    BLE_C:    Connection handle: 5
    (30.06.2023  07:32:40.715)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:32:40.727)    BLE_C:             Peer MAC: D6:E7:74:FA:FE:0F
    (30.06.2023  07:32:40.734)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:32:40.742)    BLE_C:        Slave latency: 2
    (30.06.2023  07:32:40.748)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:32:40.759)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:42.417)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 5 (response).
    (30.06.2023  07:32:42.633)    BLE_C: Connectable advertising report from DD:5F:43:78:58:0E
    (30.06.2023  07:32:46.866)    BLE_C: Scanning started!
    (30.06.2023  07:32:46.880)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:46.890)    BLE_C: New connection established! Link count: 7
    (30.06.2023  07:32:46.898)    BLE_C:    Connection handle: 6
    (30.06.2023  07:32:46.906)    BLE_C:                 Role: CENTRAL
    (30.06.2023  07:32:46.917)    BLE_C:             Peer MAC: DD:5F:43:78:58:0E
    (30.06.2023  07:32:46.925)    BLE_C:  Connection interval: 1000 ms
    (30.06.2023  07:32:46.934)    BLE_C:        Slave latency: 2
    (30.06.2023  07:32:46.941)    BLE_C:   Supervisor Timeout: 15000 ms
    (30.06.2023  07:32:46.954)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:32:55.542)    BLE_C: ATT MTU updated to 250 bytes on connection handle: 6 (response).
    (30.06.2023  07:34:04.929)    BLE_C: Connectable advertising report from FC:B4:D7:B8:28:4B
    (30.06.2023  07:42:14.868)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:42:14.878)    BLE_C: Connection lost! Link count: 6
    (30.06.2023  07:42:14.883)    BLE_C:   Connection handle: 0
    (30.06.2023  07:42:14.894)    BLE_C: ------------------------------------------------------------
    (30.06.2023  07:42:14.898)    BLE_C: Scanning started!

    I hope these additional observations might spur some idea within your team. Looking forward to your answer!

    Thank you again for all the help!

    BR, Žiga

Related