This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Intermittent Failure of nRF52832

We have an application that sends a short BLE packet to a mobile app every second, for about 15 minutes, and then shuts down and sleeps.  Meanwhile, it also writes to an SPI flash and accesses various I2C devices.  The firmware uses FreeRTOS and runs 8 to 10 tasks.  After extensive testing with thousands of runs, we find that 15% of the time, units do not finish.  Running with RTTviewer, we can't see anything unusual -- the units just stop.  It doesn't look like either the stack over or general fault handler are called.  The error rate *seemed* to depend somewhat on which unit we tested (we tested approximately 25 units, hundreds of runs each).  

We finally figured out the "fix" was to move from SoftDevice 132 version 5.0.0 to version 5.1.0 -- at which point the problem virtually disappeared.

There is nothing in the 5.1.0 release notes that jumps out at us as a bug that was fixed that would account for the change in error rate.  We are using short, conventional packets with 20 bytes of data payload, and after BLE initialization, we never change it.  We are using conventional, non-secure initialization/handshake.  The units may run for 5 or 10 or 12 minutes and then fail.

We *seem* to have fixed our problem by moving to 5.1.0 SD 132, but it would be comforting to have some idea what the problem was so we can fix it altogether.  

BTW, we are using nRF5_SDK_14.2.0_17b948a.

One possible symptom:  When we have the JLINK adapter attached and are running RTT Viewer, occasionally the firmware will appear to freeze, and RTT Viewer will rapidly repeat the same 30 lines of output over and over endlessly -- about 1200 characters.

Parents
  • Hi Jeffery,

    There seems to be a deadlock in your application and it looks like it is timing dependent (Since it is not effecting 100% of your devices)

    In the release of the softdevice 5.1, we have

    Fixed an issue where Radio Notification could be suppressed between connection events when Connection Event Length
    Extension was enabled (DRGN-7687).
    Flash events are notified through radio notification. So if there was a suppressed radio notification event, there could be a flash event missing and if you application was waiting on flash activity or other radio notification to proceed, then this could create a deadlock on version 5.0 which was fixed in 5.1

    Fixed an issue where the SoftDevice could get stuck in a deadlock where it would always NACK what the peer was sending.
    This could happen if LE Data Packet Length Extension was used and ble_cfg.conn_cfg.params.gap_conn_cfg.eve
    was less than 5 (DRGN-9494). nt_length
    Fixed an issue where the SoftDevice could get stuck in a deadlock where it would always NACK what the peer was sending.
    This could happen if the peer reduced the data length during the Data Length Update procedure (DRGN-9367).

    This is something most likely you would be seeing if the application is using an event length of less than 5 or the peer reduced the data length of the connection events (you can check this if you manage to get the air sniffer log when the deadlock happens).

    /Susheel

Reply
  • Hi Jeffery,

    There seems to be a deadlock in your application and it looks like it is timing dependent (Since it is not effecting 100% of your devices)

    In the release of the softdevice 5.1, we have

    Fixed an issue where Radio Notification could be suppressed between connection events when Connection Event Length
    Extension was enabled (DRGN-7687).
    Flash events are notified through radio notification. So if there was a suppressed radio notification event, there could be a flash event missing and if you application was waiting on flash activity or other radio notification to proceed, then this could create a deadlock on version 5.0 which was fixed in 5.1

    Fixed an issue where the SoftDevice could get stuck in a deadlock where it would always NACK what the peer was sending.
    This could happen if LE Data Packet Length Extension was used and ble_cfg.conn_cfg.params.gap_conn_cfg.eve
    was less than 5 (DRGN-9494). nt_length
    Fixed an issue where the SoftDevice could get stuck in a deadlock where it would always NACK what the peer was sending.
    This could happen if the peer reduced the data length during the Data Length Update procedure (DRGN-9367).

    This is something most likely you would be seeing if the application is using an event length of less than 5 or the peer reduced the data length of the connection events (you can check this if you manage to get the air sniffer log when the deadlock happens).

    /Susheel

Children
No Data
Related