This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Got NRF_FAULT_ID_SD_ASSERT on S130 while doing sd_ble_gatts_hvx

When I repeatedly (>290 times) send the same notification to the connected host via sd_ble_gatts_hvx, I get a sd_assertion:

Fault identifier:  0x1
Program counter:   0x104FA
Fault information: 0x0
Fault identifier:  0x1
Program counter:   0x14D98
Fault information: 0x0

The payload is 10 bytes long, using S130 with SDK 12.1 on the PCA10028. sd_ble_tx_packet_count_get always returns 7 and the event BLE_EVT_TX_COMPLETE comes after every notification. I am trying to notify repeatedly a sensor reading to a connected host. The assertion sometimes comes after less readings or more. Can anybody give me a hint on that?

Here is the EmBlitz project: notifytest.zip

Here the Android host project: bluenodes.apk

The oscillogram showing disable_irq/enable_irq activities: image description

Parents
  • (Adding this as an answer, as I'm way above the comment length limit)

    I'm sorry for the confusion, my enable/disable was intended for ARMCC, yours is the right solution for GCC. Your IRQ lock load is relatively low, so I think we can rule out this as a reason for the crashes for now.

    Running your code, I'm able to get the gatt error within 30 seconds, but I can't seem to get the timeslot error as long as I break on app_error_handler. I noticed that the error handler returns, instead of looping forever or restarting, could this be the reason for the timeslot errors?

    On the GATT error, I also noticed your err_code is an 8 bit value. The GATTS functions in the Softdevice may return error codes with a larger prefix (like NRF_STK_BASE_NUM (0x3000)). Going by this, the error 4 reported to the app error handler is likely to be subject to an overflow when stored to this 8 bit value, and as _hvx() can't return 4, it's likely that this was a 0x3004 BLE_ERROR_NO_TX_PACKETS, as you suspected.

    The main source of confusion, however, stems from the sd_ble_tx_packet_count_get() function. This function will only tell you how many packets the connection can send in total, before it has to get TX COMPLETE. It will not tell you have many you currently have left. So for the mesh_gatt_evt_push() function, sampling this value is moot, as we may have several packets in progress at the time of checking. If you want the correct number, you have to sample it once when the connection opens, then keep track yourself, as shown here. You can extend on the mesh_gatt module by using this number for flow control, and drop incoming mesh values if you don't have enough TX events left for a GATT transmission, before storing it in your handle storage.

Reply
  • (Adding this as an answer, as I'm way above the comment length limit)

    I'm sorry for the confusion, my enable/disable was intended for ARMCC, yours is the right solution for GCC. Your IRQ lock load is relatively low, so I think we can rule out this as a reason for the crashes for now.

    Running your code, I'm able to get the gatt error within 30 seconds, but I can't seem to get the timeslot error as long as I break on app_error_handler. I noticed that the error handler returns, instead of looping forever or restarting, could this be the reason for the timeslot errors?

    On the GATT error, I also noticed your err_code is an 8 bit value. The GATTS functions in the Softdevice may return error codes with a larger prefix (like NRF_STK_BASE_NUM (0x3000)). Going by this, the error 4 reported to the app error handler is likely to be subject to an overflow when stored to this 8 bit value, and as _hvx() can't return 4, it's likely that this was a 0x3004 BLE_ERROR_NO_TX_PACKETS, as you suspected.

    The main source of confusion, however, stems from the sd_ble_tx_packet_count_get() function. This function will only tell you how many packets the connection can send in total, before it has to get TX COMPLETE. It will not tell you have many you currently have left. So for the mesh_gatt_evt_push() function, sampling this value is moot, as we may have several packets in progress at the time of checking. If you want the correct number, you have to sample it once when the connection opens, then keep track yourself, as shown here. You can extend on the mesh_gatt module by using this number for flow control, and drop incoming mesh values if you don't have enough TX events left for a GATT transmission, before storing it in your handle storage.

Children
No Data
Related