Intermittent Watchdog Reset on nRF52832 (NCS v3.1.0) in Dual-Role BLE Device – Suspected BLE Subsystem Deadlock (NCSDK-31528 / NCSDK-30959)

Background:
  • Platform: nRF52832 with nRF Connect SDK (NCS) v3.1.0
  • BLE Roles: The device operates simultaneously as:
    • A BLE Central (NUS Client), connecting to an external peripheral
    • A BLE Peripheral (NUS Server), being connected by an external central
      This forms a "dual-role" (central + peripheral) architecture.
  • Primary Service: Both connections use the Nordic UART Service (NUS) for data communication.
Issue Description:
The device experiences intermittent watchdog resets after running continuously for periods ranging from several minutes to several days. The issue has a low reproduction rate and occurs in production environments, where real-time log capture via UART or RTT is not feasible. As a result, identifying the root cause has been extremely difficult.
Suspected Causes:
We suspect this behavior may be related to the following known issues in V3.1.0:
  • NCSDK-31528: Deadlock on the system workqueue in central role caused by tx_notify
  • NCSDK-30959: Potential deadlock in the Bluetooth subsystem when CONFIG_BT_HCI_ACL_FLOW_CONTROL is disabled
Both issues could block critical tasks, preventing the watchdog from being fed and ultimately triggering a reset.
Questions:
  1. In the absence of real-time logging, are there practical methods to determine whether a watchdog reset was caused by one of these BLE-related deadlocks? (e.g., analyzing reset reason registers, preserving RAM across resets, using watchdog timeout callbacks, or enabling crash dumps?)
  2. Is the dual-role configuration (Central + Peripheral) with concurrent NUS Client and NUS Server known to have stability issues in NCS v3.1.0?
Any insights or recommendations would be greatly appreciated!
Parents
  • Hi,

    1. It is possible to log to noinit RAM and read that RAM after reset. RAM is never explicitly erased, so most likely the RAM content will be valid after the watchdog reset. However, you should add a checksum to verify that the data is intact. You can for instance do this in the watchdog interrupt handler, which gives you a bit of time to copy data to RAM (but not enough time to write to flash).

    2. I would not say it is known to have stability issues. However, there are some corner cases which can lead do deadlocks, as you have referred to. One potential solution worth testing is to enable CONFIG_BT_HCI_ACL_FLOW_CONTROL=y. You will likely get build warnings with this and need to also adjust up some buffer sizes (as specified by the warnings). It is also a good idea to try increasing CONFIG_BT_BUF_EVT_RX_COUNT.

Reply
  • Hi,

    1. It is possible to log to noinit RAM and read that RAM after reset. RAM is never explicitly erased, so most likely the RAM content will be valid after the watchdog reset. However, you should add a checksum to verify that the data is intact. You can for instance do this in the watchdog interrupt handler, which gives you a bit of time to copy data to RAM (but not enough time to write to flash).

    2. I would not say it is known to have stability issues. However, there are some corner cases which can lead do deadlocks, as you have referred to. One potential solution worth testing is to enable CONFIG_BT_HCI_ACL_FLOW_CONTROL=y. You will likely get build warnings with this and need to also adjust up some buffer sizes (as specified by the warnings). It is also a good idea to try increasing CONFIG_BT_BUF_EVT_RX_COUNT.

Children
No Data
Related