S140 7.3.0 Softdevice Assertion Failed at PC=0xa806 using L2CAP

I'm seeing an assertion failure at PC=0xa806 in nrf52840 + S140 7.3.0, while using L2CAP.

I know the most common cause for softdevice assertion failures is interrupt/timing issues, so I have triple-checked this is not the case. I have no critical sections that disable softdevice interrupts, and at the time of crash only softdevice interrupts are using the served priorities 0, 1 or 4. These are the interrupts enabled at the time of the crash:

 ERROR device          > irq 0 prio 0x80
 ERROR device          > irq 1 prio 0x0
 ERROR device          > irq 3 prio 0x40
 ERROR device          > irq 6 prio 0x40
 ERROR device          > irq 11 prio 0x0
 ERROR device          > irq 16 prio 0x40
 ERROR device          > irq 17 prio 0x40
 ERROR device          > irq 20 prio 0x60
 ERROR device          > irq 22 prio 0xc0
 ERROR device          > irq 25 prio 0x80
 ERROR device          > irq 32 prio 0x20
 ERROR device          > irq 47 prio 0x40

I strongly suspect it's some issue related to L2CAP buffer management, since it always follows the same pattern: RX is started, 3 TX's are started, the first TX finishes, the RX finishes, and another RX is started with the same buffer as the finished TX (I'm pulling rx/tx buffers from the same pool, not sure if it's relevant). Also maybe relevant: fragmentation is used (PDU size bigger than MPS).

 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_CREDIT
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_RX 0x2002661a
 INFO  device          > sd_ble_l2cap_ch_rx 0x20026808
 INFO  device          > sd_ble_l2cap_ch_tx 0x20026be4 len=0x22
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_TX 0x20026be4
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_CREDIT
 INFO  device          > sd_ble_l2cap_ch_tx 0x2002661a len=0x24
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_TX 0x2002661a
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_CREDIT
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_RX 0x20026808
 INFO  device          > sd_ble_l2cap_ch_rx 0x2002661a
 INFO  device          > sd_ble_l2cap_ch_tx 0x200269f6 len=0x22
 INFO  device          > sd_ble_l2cap_ch_tx 0x20026808 len=0x32
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_TX 0x200269f6
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_CREDIT
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_TX 0x20026808
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_CREDIT
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_RX 0x2002661a
 INFO  device          > sd_ble_l2cap_ch_rx 0x20026808   // HERE the evil pattern starts
 INFO  device          > sd_ble_l2cap_ch_tx 0x20026be4 len=0x1e0
 INFO  device          > sd_ble_l2cap_ch_tx 0x2002661a len=0x1e0
 INFO  device          > sd_ble_l2cap_ch_tx 0x200269f6 len=0x1d3
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_TX 0x20026be4
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_CREDIT
 INFO  device          > BLE_L2CAP_EVTS_BLE_L2CAP_EVT_CH_RX 0x20026808
 INFO  device          > sd_ble_l2cap_ch_rx 0x20026be4
 ERROR device          > panicked at 'Softdevice assertion failed: an assertion inside the softdevice's code has failed. Most common cause is disabling interrupts for too long. Make sure you're using nrf_softdevice::interrupt::free instead of cortex_m::interrupt::free, which disables non-softdevice interrupts only. PC=0xa806'

If I send data slower so that the 3 TX's streak never happens, it keeps running forever fine.

This is the softdevice configuration:

let sd_config = nrf_softdevice::Config {
    clock: Some(sd::nrf_clock_lf_cfg_t {
        source: sd::NRF_CLOCK_LF_SRC_XTAL as u8,
        rc_ctiv: 0,
        rc_temp_ctiv: 0,
        accuracy: 7,
    }),
    conn_gap: Some(sd::ble_gap_conn_cfg_t {
        conn_count: 20,
        event_length: 15,
    }),
    conn_gatt: Some(sd::ble_gatt_conn_cfg_t { att_mtu: 114 }),
    conn_gattc: Some(sd::ble_gattc_conn_cfg_t {
        write_cmd_tx_queue_size: 0,
    }),
    conn_gatts: Some(sd::ble_gatts_conn_cfg_t { hvn_tx_queue_size: 0 }),
    gatts_attr_tab_size: Some(sd::ble_gatts_cfg_attr_tab_size_t { attr_tab_size: 1024 }),
    gap_role_count: Some(sd::ble_gap_cfg_role_count_t {
        adv_set_count: 1,
        periph_role_count: 4,
        central_role_count: 16,
        central_sec_count: 0,
        _bitfield_1: sd::ble_gap_cfg_role_count_t::new_bitfield_1(0),
    }),
    conn_l2cap: Some(sd::ble_l2cap_conn_cfg_t {
        ch_count: 1,
        rx_mps: 247,
        tx_mps: 247,
        rx_queue_size: 3,
        tx_queue_size: 3,
    }),
    ..Default::default()
};

The issue is also present in older softdevice versions:
7.2.0 PC=0xa822
7.0.1 PC=0xa7f6

How can I debug this? Thank you!

  • Hi 

    I have forwarded your findings to the SoftDevice team, and I will let you know once I hear back from them. 

    Best regards
    Torbjørn

  • Hi again

    Potentially this could be a bug in the SoftDevice, but we need to look into it a bit more to be sure. 

    There is a possible workaround, but it will only work if you don't need flow control by credits (that is, you can resupply the RX buffer as soon as the previous one was released by the SoftDevice).
    In this case you can use sd_ble_l2cap_ch_flow_control before establishing the CoC channel, with local_cid=BLE_L2CAP_CID_INVALID and a big amount of credits, for example credits=0xFFFF. After establishing the channel, you should set credits to 0 for local_cid of the new channel (!=BLE_L2CAP_CID_INVALID) so the peer will have 0xFFFF credits for the lifetime of the channel. You should carefully read the description of the func (especially the note) before deciding to use this method. 
    Best regards
    Torbjørn
  • Hello Torbjørn

    Thanks for the reply! Indeed the workaround works, I can now queue as many packets as I want for tx without hitting the assertion failure.

    > so the peer will have 0xFFFF credits for the lifetime of the channel

    As I understand it, this means we're giving 0xFFFF credits to the peer at channel setup and never give more credits. Thus every packet sent by the peer uses up one credit, so after 0xFFFF packets the channel will stop working. Is this correct? If so, can you suggest a workaround without this limitation? (our devices are connected 24/7 and transfer quite a bit of data, so 0xFFFF packets is not that many unfortunately).

  • Hi

    It's good to hear you got better results with the workaround Slight smile

    After setting credits to 0 for the established channel you can at any time query remaining credits by calling sd_ble_l2cap_ch_flow_control(conn_handle, local_cid, 0, p_credits).
    On return *p_credits will contain the number of remaining credits. When this number comes close to 0 you can resupply credits by issuing the following two calls:
    sd_ble_l2cap_ch_flow_control(conn_handle, local_cid, 0xFFFF, NULL)
    sd_ble_l2cap_ch_flow_control(conn_handle, local_cid, 0, NULL)

    Note that the assert you have seen happens during credits sending so it is better to do that at some time when you did not see the assert happening (is it when not many TXes are queued?).

    Best regards
    Torbjørn

Related