While investigating problems I'm experiencing with bootloader_secure I believe I've identified an error in the HCI TX finite state machine (components/serialization/common/transport/ser_phy/ser_phy_hci.c function hci_tx_fsm_event_process). The relevant code is unchanged from SDKs 11 through
15.
I've observed the following sequence after allocating a transmit buffer for a BLE event:
- SEND transmits, enters WAIT_FOR_FIRST_TX_END
- WAIT_FOR_FIRST_TX_END gets SLIP/SENT, sets timeout, enters WAIT_FOR_ACK
- WAIT_FOR_ACK times out, transmits, enters WAIT_FOR_ACK_OR_TX_END
- WAIT_FOR_ACK_OR_TX_END times out
However, unlike WAIT_FOR_ACK the processing for WAIT_FOR_ACK_OR_TX_END does not detect and process the timeout event. The effect is that the FSM stays in this state with no expected events that would kick it out. In my DFU experience this causes nrfutil to exit with an NRF_ERROR_INTERNAL synthesized by pc-ble-driver because failure to resolve the transmit causes a deadlock in the connectivity application.
Adding the processing for HCI_TIMER_EVT at least invokes the error callback, allowing the connectivity application to regain control. Though this doesn't solve the underlying problem.