This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Breakpoint and reset in SoftDevice addresses 0xA60 and 0xA80

Hi,

Our application sometimes hits a breakpoint in SoftDevice at either address 0xA60 or (less often) 0xA80.

Based on this post https://devzone.nordicsemi.com/f/nordic-q-a/9622/application-debug-with-softdevice I believe that it might be an assert in SoftDevice to do with the realtime BLE requirements. Is this correct? Is this maybe caused by an ISR taking too long meaning SoftDevice cannot do it's BLE tasks?

Any help would be great, thanks.

Parents

0 Vidar Berg over 5 years ago

Hi,

0xA60 is where the hardfault handler is located in the MBR. Please disable the breakpoint on hardfault as shown by the screenshot below (assuming you use SES) and check if the program enters the hardfault handler in your application or if it ends up in the app error handler.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 PRvdO over 5 years ago in reply to Vidar Berg

Thanks for the quick response.

Do you know what is at address 0xA80?

The 0xA60 reset only seems to happen after 3 hours so it will take a while to reproduce, in the mean time, the 0xA80 reset happens when multiple BLE devices are connected. Each connected device sends data over BLE which is forwarded via the UART so could it be to do with either of those?

Thanks again.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg over 5 years ago in reply to PRvdO

The MBR reset handler starts at 0xA80 so it looks like it just came out of a system reset. I suggest you check RESETREAS register to find out what the reset source was (RESETREAS)
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 PRvdO over 5 years ago in reply to Vidar Berg

So, it seems like the watchdog is the cause of the crash. However, this crash only occurs if the system is left idle for hours and then we connect a phone via BLE. We have repeated the test but with the watchdog disabled and the system runs fine (we were hoping that it would get stuck in a loop and we could find it without the Watchdog but it just keeps running).

Are you aware of anything that might change after a long period of time? We can leave the system for days and it works fine but as soon as we connect a phone it reboots.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg over 5 years ago in reply to PRvdO

Is the WD being fed during this idle period? Note that Softdevice tasks such as advertisement events are run in the background without waking the application and will cause the WDT counter to be incremented even if you have enabled the pause on sleep option.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 PRvdO over 5 years ago in reply to Vidar Berg

I believe the WDT is being fed as we are using FreeRTOS and we have some periodic tasks which run during this period. Currently, the watchdog itself is actually being fed by a periodic task but with it's priority set to the same level as idle.

That's good to know about it being incremented in the background though, thanks.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg over 5 years ago in reply to PRvdO

Ok, good to know that it is being reloaded. I understand it's time-consuming to run this test, but have you tried to extend the WD timeout as well? As for the CPU halting at 0xA80, I recently found out that the debugger will halt the CPU on reset if the "reset" breakpoint is enabled (you can see this breakpoint is disabled in the first screenshot I posted here). The problem is then to find out what caused it to reset in the first place. Maybe it's always by the WD timing out.

Has this never happened with the WDT disabled?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Vidar Berg over 5 years ago in reply to PRvdO

Ok, good to know that it is being reloaded. I understand it's time-consuming to run this test, but have you tried to extend the WD timeout as well? As for the CPU halting at 0xA80, I recently found out that the debugger will halt the CPU on reset if the "reset" breakpoint is enabled (you can see this breakpoint is disabled in the first screenshot I posted here). The problem is then to find out what caused it to reset in the first place. Maybe it's always by the WD timing out.

Has this never happened with the WDT disabled?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 PRvdO over 5 years ago in reply to Vidar Berg

It does seem that it is a genuine watchdog reset but without the watchdog the system runs fine so it is not stuck but instead it seems that a task is just taking too long meaning the watchdog is not kicked.

The weird thing is still the fact that it only happens after a phone connects via BLE and the system has been on for a few hours and not every time a phone connects, even though the same tasks run.

I was wondering if it is the fact you mentioned about the watchdog incrementing in the background so I reduced the watchdog time and let it run thinking it would keep incrementing until it overflowed and caused a reset, but it didn't.

So, it does seem like it is a problem in our code where something takes too long which we can fix. However, I still don't understand why it only happens after some time, any ideas?

Thank you!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg over 5 years ago in reply to PRvdO

In the failing cases, maybe the WD is already close to expiring when you get the connection event, and the code responsible for reloading is being delayed enough by that to cause a reset. Have you noticed any impact on failure rate if you increase the timeout and or frequency of the periodic tasks that reload the WD?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 PRvdO over 5 years ago in reply to Vidar Berg

This was our worry too but, strangely, we can reduce the failure rate but we are struggling to increase the failure rate. We have tested it a few times with the watchdog timeout increased to 10 seconds (from 3 seconds) and the system seems to work without crashing and also doesn't hang anywhere. We have also tried to decrease the watchdog to 1 second but the failure stays the same.

Is it possible to print the current value of the watchdog timer so that we can see if it is being fed and if it is incrementing in the background as you said it might be.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg over 5 years ago in reply to PRvdO

So I've discussed this with a colleague who is experienced with Freertos and he thought the problem could be that the BLE task is not yielding to other thread responsible for feeding the WD. Have you tried to do any profiling to see if there's heavy CPU usage around the time a connection is established?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 PRvdO over 5 years ago in reply to Vidar Berg

We have done power profiling and it seems that the CPU usage does not change at the time we see the issue. If it was CPU usage that it would likely happen every time and not just after a prolonged idle period.

By reducing the watchdog time to about 10ms instead of 3 seconds then we can see the same problem but much quicker. With this quicker watchdog we see that the system can work fine indefinitely, connecting and disconnecting phones; sending commands; etc., but then if we disconnect the phones, leaving the system to idle for about a minute, and afterwards we connect a single phone then the watchdog kicks in and reboots the system.

This makes it seem like either the watchdog is not being kicked properly or as you previously said it is incrementing in the background building up overtime getting close to the overflow value, then when the system wakes and performs it's actions this CPU usage is long enough that it pushes the watchdog to overflow before it can idle and get kicked.

We use the nrfx_wdt_channel_feed(m_channel_id); function to kick the watchdog, is this correct?

Is it possible to print the current watchdog timer value? So that we can see if it does actually increase over time and that it is reloaded correctly.

Thanks.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel