This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE_HCI_INSTANT_PASSED disconnection after LL_CHANNEL_MAP_REQ during flash erase

We're seeing an issue where the soft device stops responding and ultimately disconnects with the reason BLE_HCI_INSTANT_PASSED.  The disconnection only seems to happen when a LL_CHANNEL_MAP_REQ message is received while the Nordic is performing a flash erase.  The Wireshark capture from a sniffer shows the master is resending the LL_CHANNEL_MAP_REQ, but the slave doesn't send a response until after the instant has passed, triggering the BLE_HCI_INSTANT_PASSED disconnection.

During the time when the flash erase is happening, we've captured instances with and without the channel map update request.  Without the request, the device also stops responding, but eventually recovers:

If the slave receives a LL_CHANNEL_MAP_REQ message during the erase, it misses the instant and disconnects: 

We're using Soft Device S140 Version 6.1.1.

Is there any reason why the slave would stop responding during the flash erase if we're using the soft device flash API?  What determines how far into the future the instant is calculated during a channel map update?  Is there any other explanation for this behavior?

Thanks!

  • I wasn't aware of this, actually. Did you try it out?

     

    BretH said:
    Due to the guaranteed delivery feature of the LL, this seems to be a hole in the BLE spec, or is there a possibility of the softdevice to recover without disconnecting?

     It is not possible to not disconnect, unfortunately. The central will also consider the link lost, because this packet is not replied to. I think the central decides what instance the LL_CHANNEL_MAP_REQ will apply from. If it says 6 connection intervals, then in my view, this is the "bad behavior", since this is far shorter than the connection timeout itself (but not illegal as far as I know). 

  • I increased the connection interval to 22.5 ms and ran a stress test we've been using to detect this issue. It performed 50 iterations in row without a single failure. Previously the failure rate was about 1 in 10.  Unfortunately, increasing the connection interval isn't an option for this project for reasons that aren't related to this issue, but the results of this test support our understanding of the failure mode.

    I wasn't aware of this, actually. Did you try it out?

    I think you mean try the partial erase?  That happens in the soft device, so I don't think I can modify it.

    I noticed there is a bug fix in the release notes of next version of the Soft Device (7.0.1, we’re using 6.1.1), that seems somewhat related, but doesn’t explicitly apply to the nRF52840:

    The wording makes is sound like the bug is just scheduling more time than is needed and not that it actually causes any connection issues.  Can you comment on this?  Again, we're too far along in our development process to switch soft device versions for this project, but it would be good to know for the future.

  • I think the central decides what instance the LL_CHANNEL_MAP_REQ will apply from. If it says 6 connection intervals, then in my view, this is the "bad behavior", since this is far shorter than the connection timeout itself (but not illegal as far as I know). 

    I reviewed a handful of BLE sniffer trace logs that involved masters of Android and iOS phones as well as Windows desktops, and almost all used instants that were 6-8 connection events later. It seems this is typical behavior.

    The central will also consider the link lost, because this packet is not replied to.

    Does this mean that although BLE is usually robust with retries, the channel map update or connection param update instant must have a round trip packet exchange in the specific connection event? If the slave fails to receive the packet or the master fails to receive the response, a connection will always disconnect? It seems like in general, there would be more frequent disconnections occurring especially if these update procedures happen when devices are at range or in a negative RF environment where packet retries would occur regularly.

  • BretH said:
    Does this mean that although BLE is usually robust with retries, the channel map update or connection param update instant must have a round trip packet exchange in the specific connection event?

     Yes. According to the BLE specification, the central decides on connection parameters. Since this "request" is effective from a specific connection event, if the peripheral fails to reply/ACK this packet before this event, the devices shall disconnect. 

    I tried to find the section that says so in the spec, but without luck, but it is also mentioned here:
    https://stackoverflow.com/questions/48447645/android-ble-peripheral-disconnects-with-status-code-ble-hci-instant-passed0x28

    What surprises me is that the softdevice performs the erase page operation even though it is in a connection with a connection interval that would suggest it doesn't have time for it. Is it the peer manager that performs this erase page, or do you do it manually? Is there some way for me to reproduce this? I thought that the softdevice or peer manager would see that there is no time to do this at this point, and save the operation for later. Can you please describe where the erase page call is coming from?

    BR,

    Edvin

  • Can you please describe where the erase page call is coming from?

    The erase is made through a call to nrf_fstorage_erase() with the nrf_fstorage_sd backend. I assume this is easy to reproduce.

    With a connection interval of 15ms, the soft device will be unable to schedule the erase until a disconnection, so perhaps it goes ahead and performs the operation immediately.

    Regardless, we now understand the expected behavior. We will resolve our issue by performing a thorough flash erase within bank 1 of flash during mcu init rather than doing the page erases just-in-time during a BLE connection.

Related