NRF_FAULT_ID_SD_ASSERT @PC 0x15074

I'm having an error in my project where the app_error_fault_handler(...) is triggered generating an NRF_FAULT_ID_SD_ASSERT. I've overwritten the function and stored the ID and PC which is 0x01 (NRF_FAULT_ID_SD_ASSERT) and PC@0x15074.

My Hard- & Software setup:

nRF52840-DK (PCA10056)
nRF5 SDK v17.0.2
S140 v7.2.0
FreeRTOS v10.0.0

In my setup I have

24 development boards,
connected to 4 different masters -
so each master hast to handle 6 devices.
The connection interval is fixed @100ms.

I don't know whether the amount of boards is related to the error; maybe only because of more BLE traffic in the area. Unfortunately the error is not quite easy to reproduce and only occurs ever 1-2 days at some boards (different, not the same).

From this post I've learned, that the error can be caused of the application code blocking the softdevice: devzone.nordicsemi.com/.../s140-7-2-0-assert-pc-0x15074

But I've already checked my critialSections and hardware interrupts; I only have one GPIO ISR, with a priority of: GPIOTE_CONFIG_IRQ_PRIORITY 6.

So what kind of code sections can cause such an error?

Is it possible to get more information about the error. I now know the ID and the PC; the "info" field of the app_error_fault_handler(...) unfortunately is 0x00. If I assume that the error is cause from my application code, it would be very helpful to know the function, task or interrupt which caused the error.

I appreciate any kind of help,

Manuel

Parents

0 Einar Thorsrud over 4 years ago

Hi,

This assert is from an internal module in the SoftDevice that handles radio events, and is caused by an event occurring later than expected, as explained in the other thread you found. The assert does not provide more information about why that happened, but typical reasons are high priority interrupts or long critical sections, which I see you have considered. Another typical issue is that you use a timeslot that does not end on time.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Manuel Rohrauer over 4 years ago in reply to Einar Thorsrud

Hi Einar,

thank you for your quick reply. How exactly can I check if the issue is on the timeslot area. How do I check if it hangs there or it does not end?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Einar Thorsrud over 4 years ago in reply to Manuel Rohrauer

Other than keeping track of time you don't really know, so well, you need to keep track of time and ensure you don't overstay it (use resources for longer than allowed).

Another point, which may be far fetched, but it is thinkable that there could be a mismatch between the high frequency clock and low frequency clock (which are independent). If they drift too much from each other you could potentially see problems. What is your 32.768 kHz clock source? To rule this out, you could run with synthesized LF clock, where it is derived form the HF clock. That results in high power consumption, but if you find other reasons unlikely it could be work a test just to check if this could be related or not.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Einar Thorsrud over 4 years ago in reply to Manuel Rohrauer

Other than keeping track of time you don't really know, so well, you need to keep track of time and ensure you don't overstay it (use resources for longer than allowed).

Another point, which may be far fetched, but it is thinkable that there could be a mismatch between the high frequency clock and low frequency clock (which are independent). If they drift too much from each other you could potentially see problems. What is your 32.768 kHz clock source? To rule this out, you could run with synthesized LF clock, where it is derived form the HF clock. That results in high power consumption, but if you find other reasons unlikely it could be work a test just to check if this could be related or not.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Manuel Rohrauer over 4 years ago in reply to Einar Thorsrud
We are using the external crystal on the DK Board. I'll try the settings for the synchronized clock to rule that out. To changed that; I'll have to change the following settings in sdk_config.h

NRFX_CLOCK_CONFIG_LF_SRC 2

CLOCK_CONFIG_LF_SRC 2

NRF_SDH_CLOCK_LF_SRC 2

Is that correct? I think I don't need both NRFX_CLOCK_CONFIG_LF_SRC & CLOCK_CONFIG_LF_SRC in my file?

Additionally: Let's say the error comes from the clock drift of the external crystal. What about the internal RC oscillator. I'll ask, because the synthesized LF clock needs 100µA and the internal RC 1µA
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Einar Thorsrud over 4 years ago in reply to Manuel Rohrauer

Hi,

For project that use a SoftDevice you only need to set NRF_SDH_CLOCK_LF_SRC to 2 to use synthesized 32.768 kHz clock. As you are using the crystal oscillator from before this is a bit far fetched, but it could be worth a shot, if only to rule it out.

Have you seen this assert on multiple boards or only one?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Manuel Rohrauer over 4 years ago in reply to Einar Thorsrud

On multiple boards but not always after the same uptime. Sometimes the reset occurs after several days and sometimes after one day. The picture shows a list of the boards and the uptime. Each blue marked line was an assert reset. The DFU update was made 13 days ago.

It really looks like its a timeslot error, cause more BLE devices in the area means lesser timeslots right?

On the other hand; I've changed a bit of application code (using FreeRTOS functions like xQueueSend & xQueueSendFromISR) and I think the reset was not that often than before.

So to clarify:
Could it be something from within the FreeRTOS, some criticalSections which are called there? Let's say its a critialSection; can I then add some debug features or something to get more that the PC of the error there?

For the clock:
I've just updated the boards with the synthesized clock; we will know more tomorrow.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Einar Thorsrud over 4 years ago in reply to Manuel Rohrauer

Hi,

Manuel Rohrauer said:
On multiple boards but not always after the same uptime.

I see. Combined with the fact that you use the crystal on the DK I think we can almost rule out clock drift as a cause. But It will anyway be interesting to see the results tomorrow.

Manuel Rohrauer said:
It really looks like its a timeslot error, cause more BLE devices in the area means lesser timeslots right?

Yes, that is true. The more connections, the less time for other things. So if you use radio timeslots, you will get fewer and/or shorter time slots with increasing number of connections.

Manuel Rohrauer said:
On the other hand; I've changed a bit of application code (using FreeRTOS functions like xQueueSend & xQueueSendFromISR) and I think the reset was not that often than before.

That is interesting. What exactly did you do, and what happens if you revert this (just for comparison)?

Manuel Rohrauer said:
Could it be something from within the FreeRTOS, some criticalSections which are called there?

That could be. If there are too long critical sections, then this assert can happen. Basically anything that prevents the SoftDevice for doing what it needs within defined timeouts, can cause this.

Manuel Rohrauer said:
Let's say its a critialSection; can I then add some debug features or something to get more that the PC of the error there?

You can try to look at the stack trace, but the assert will not happen until the SoftDevice runs, so if it was caused by some application code blocking for too long that will have already happened. It could also be from an interrupt and not a critical section, so what you will see and the usefulness of it depends on what has happened. It is worth checking, though.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel