Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

WDT kicking in, disabling NRF_FAULT_ID_SD_ASSERT PC 0xA62

The watchdog timer is kicking in our code base and I am having trouble tracking down the cause. In a bit to try and debug the issue I disabled the WDT and now I am getting a NRF_FAULT_ID_SD_ASSERT with the PC at 0xA62.

At this point I'm stuck as to how to keep debugging the issue, are there some common tricks to use to get more info? I see a few posts with the same fault but no reference I can check for this value.

We are using the nRF5 SDK, 17.1.0 and softdevice S140 7.2.0

Parents
  • So after some more testing.

    Our watchdog is actually feed by repeated app timer handler function (that's something that needs looking at after this issue).

    What is happening is that when the buttons are pressed, we move into another state where we start another app timer based timer (idle timer). At this point the feed function isn't being called often enough - WDT timeout is 2000ms, we call the feed function every 50ms. After the idle timer is started the feed function timer isn't called again for a little over 5 seconds. For this testing I've disabled to WDT so that I can dig more into the cause.

    We have checks for NRF_ERROR_NO_MEM when calling start timer, and check for != 0.

    While writing this I realised that the feed function timer handler isn't be called for the length of time that the idle timer is set for.

  • That seems about right that the WDT got triggered and the WDT ISR handler in the MBR (Master Boot Record) is stuck looping. Have you implemented the WDT ISR on the app side?

  • Hi,

    I am sorry for the delay. Have you made any more progress on this on your end?

    Is it so that the issue you are facing now is that existing repeated timer does not fire until the new timer expires? (seemingly there is a dependency between the timers)? Or did I misunderstand?

  • Hi Einar,

    I've had not had a chance to look at this in detail until yesterday. It looks to be a result of the combination of multiple timers.

    When we start the motor, initial button pressed, a timer is started that tracks the motor run time. When the specific button sequence occurs it stops the tracking timer and starts the idle timer. It's at this point the 50ms timer is delayed (I also see it stop but I think that may be related to my debugging).

    If I remove the call to stop the tracking timer, or remove the call to start the idle timer, the heartbeat continues normally.

    My understanding of what is occuring, the timers are behaving, when the specific button sequence triggers the timer stops and starts, the WDT interrupt continues to fire, as I see the app button timer continues to fire. Then when the idle timer expires the 50ms timer then kicks back in.

    I'm not sure how to debug any further at this point. Can you advise how to best move forward?

  • Hi,

    It is difficult to unserstand what could have triggered this, so I think we need to go back to debugging a bit. You write you get NRF_FAULT_ID_SD_ASSERT. Is that in the app_error_fault_handler()? And the PC you refered to as being 0xA62, where did you get that? Whas it form reading the CPU registers, or the pc parameter in app_error_fault_handler()? If it was, we need to understand what could have made it 0xA62 which is not expected. However, if not, can you check what the PC in app_error_fault_handler is? If there was an assert in the SoftDevice, I can use the PC to identify which assert it was.

  • Hi Einar,

     

    Yeah, sorry I realise I've been jumping around a lot in my posts as I haven't had time to dedicate to the issue until recently, so it's been a rather scatter gun approach.

     

    I had been seeing the assert after I had disabled the WDT, however I am no longer seeing that issue. I have tried replicating it but I no longer see the assert, so I believe it was an issue with modifications I made to the code in an effort to help identify the issue.

     

    The issue I am facing is that, in the intended release candidate, the WDT is kicking in under a specific test case (the button sequence mentioned in earlier posts). In an effort to locate the issue, what I have discovered is that I have an issue the app timers.

    We are feed the WDT from an app timer, and under this specific test case, what happens is that the app timer callback responsible for feeding the WDT is delayed, this is called every 50ms, we've named this timer the heartbeat.

     

    Originally triggering the issue required toggling the external user buttons in a certain sequence, this caused the WDT to timeout. Disabling the WDT and retesting I saw that the heartbeat timer was delayed by a length of timer that matched the timeout for the next app based timer.

    To replicate the issue, I now have an external micro toggling the inputs in a way that causes the issue. With the WDT enabled the behaviour is the same, with the WDT disabled it behaves the same as manually triggering the issue.

    Once in this “waiting” state, if I interact with the system in any way that triggers another timer, the heartbeat timer then starts triggering again. From this I assume this issue is related to app timer scheduling, this is where I need a pointer in the right direction on where to start poking under the hood to check where it’s going wrong.

  • Hi,

    I see. Could it be that you are seeing the same issue as discussed in this thread? It has not been resolved yet, but there are potential workarounds.

Reply Children
Related