WDT and Error Handler

Question

This ticket started as a mail thread so I add it here so no data are missing: 
 ============================================================================================================================ 
 Yes I based my code from this sample code so I use the exact same steps and variables (5 seconds timeout). I do not use the wdt_feed() and it still works and I assume that if we do not feed the dog it will hang after 5 seconds. But it does not do that but I can see sporadic reboot within one hour. 
 For the callback issue I do use a debugger and it never enter the callback. 
 So having a WDT driver that do not needs feed, no working callback and sporadic restarts seems to be a broken implementation in Zephyr! 
 How can you fix this and when? 
 ============================================================================================================================ 
 Regarding the watchdog - there is a nice sample presenting how to use the API: 
 https://github.com/zephyrproject-rtos/zephyr/blob/master/samples/drivers/watchdog/src/main.c 
 Have you tried to follow this one? 
 Basically you need to call wdt_install_timeout(), then wdt_setup(), and then feed the watchdog with wdt_feed() within the time configured when installing the timeout. 
 And when checking if the watchdog callback function is called, remember that this function is called two cycles of the 32 kHz clock before the reset is done, so it is a very little time to do something. In case you’d want to set a breakpoint in this function, you’ll need to use the WDT_OPT_PAUSE_HALTED_BY_DBG option. 
 ============================================================================================================================ 
 I have two question regarding WTD and Error-handler. 
 
 I have enabled WDT in Zephyr and implemented the setup and enable of the WDT and it seems to work, but … I do not use the wdt_feed() api but still it works but I could get sporadic restart within an hour a so. I also note that the callback function is not called when watchdog has expired. 
 For the Error-Handler; And there are a lot of different fault handler for different failure and exceptions. And I want to enable reset when this happens. So is there any support for reset when we end up in any error handler?

jlz · Accepted Answer

So I sent a question 2 months ago stating that feeding was not used. According to my investigation it behaved normally without kicking the dog. I assumed that this was handled in zephyr and basically because it is best handled in the OS. Adding feeding everywhere in custom code where you think you would have an issue does not make sense. The aim is not to change anything the the Zephyr and keeping it clean and more easy to maintain. I strongly recommend that you implement this for zephyr. 
 I have to test this before we know if it works.

angl · Answer

Hi, 
 jlz said: My investigation reading the TRM and the sample code and also test the WDT it seems that feeding is not needed or part of the OS when the WDT function are enabled in Zephyr. 
 This is not true. You definitely need to feed the watchdog after you enable it. Otherwise, it will reset your system. And you actually observe this: 
 jlz said: When my code runs it last for 2-4 hours and if during this time, if I hang a process for 5 seconds it will reboot. And this is expected behavior (I use 5 second timeout). 
 When you do not hang a process, your code runs but as you stated it: 
 jlz said: 3. sleeps most of the time and just doing sporadic activity's. 
 and since you use the WDT_OPT_PAUSE_IN_SLEEP option, the watchdog timer does not count down during the CPU sleep periods. Therefore, the watchdog will not timeout after 5 seconds of the overall time, but after 5 seconds of the cumulated activity time of the CPU (this is the time when the watchdog is instructed to work in your configuration). And this is not only the time in which your application tasks are executed but also any interrupts that are handled in your system, scheduling etc. Hence, what you describe as a "sporadic reset" after 2-4 hours is actually the reset caused by the watchdog because it was not fed. You can confirm this by doing some indication in the watchdog handler, for instance by changing a pin state in this handler and monitoring of this pin with a logic analyzer (please note that the handler is fired only two cycles of the 32.768 kHz clock before the watchdog reset is done, so it is a very short time to do anything; if you for example turn a LED in the handler, you probably won't be able to notice it is turning on because the reset will shortly turn it off). 
 And the above is the answer to your question: 
 jlz said: So back to my OP question. Why do the system reboot within 2-4h? 
 What to do to prevent this reboots? Feed the watchdog. Where to do it? It highly depends on your application. If you for instance have a task that mostly sleeps but will be woken up at least once per 5 seconds (or what you decide to configure the watchdog to) of the CPU activity time, it will be sufficient to place the feeding there. If the situation is more complicated and there is a possibility that only some event handler is executed without involving the task for a longer time (still, the cumulated activity time of the CPU), you'll need to add the feeding also in this handler. The key thing is to feed the watchdog only as a confirmation that the vital parts of your application are executed as expected.

WDT and Error Handler

Top Replies