Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

ble connectivity state machine error

While investigating problems I'm experiencing with bootloader_secure I believe I've identified an error in the HCI TX finite state machine (components/serialization/common/transport/ser_phy/ser_phy_hci.c function hci_tx_fsm_event_process). The relevant code is unchanged from SDKs 11 through
15.

I've observed the following sequence after allocating a transmit buffer for a BLE event:

  • SEND transmits, enters WAIT_FOR_FIRST_TX_END
  • WAIT_FOR_FIRST_TX_END gets SLIP/SENT, sets timeout, enters WAIT_FOR_ACK
  • WAIT_FOR_ACK times out, transmits, enters WAIT_FOR_ACK_OR_TX_END
  • WAIT_FOR_ACK_OR_TX_END times out

However, unlike WAIT_FOR_ACK the processing for WAIT_FOR_ACK_OR_TX_END does not detect and process the timeout event. The effect is that the FSM stays in this state with no expected events that would kick it out.  In my DFU experience this causes nrfutil to exit with an NRF_ERROR_INTERNAL synthesized by pc-ble-driver because failure to resolve the transmit causes a deadlock in the connectivity application.

Adding the processing for HCI_TIMER_EVT at least invokes the error callback, allowing the connectivity application to regain control.  Though this doesn't solve the underlying problem.

  • Hi,

    Thank you for reporting this issue, and for the time and effort spent searching for a root cause. It is highly appreciated! I have registered the issue in our internal tracker, for the SDK developers to investigate further. I will get back to you when we have a solution, or if we need more information. If you find out more about the issue and/or have anything to add, then please share that here as well.

    Regards,
    Terje

  • To be clear: did you open an issue just for the state machine problem, or for the more general problem with nrfutils not handling interleaved SLIP transactions, as documented on the github issue I linked to?

    This is the patch I'm using for the state machine.  It's based on SDK 11.0.0 but with whitespace converted to Linux standard.

  • Hi,

    Thank you very much for sharing your patch.

    I opened an issue for the state machine problem. The more general nrfutils issues should be handled on Github, but I will mention it for the developers just to be sure. In general, if the tool has a Github page then you can usually report issues there, and then they will be taken care of by the developers.

    Regards,
    Terje

Related