This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

WDT and Error Handler

This ticket started as a mail thread so I add it here so no data are missing:

============================================================================================================================ 

Yes I based my code from this sample code so I use the exact same steps and variables (5 seconds timeout). I do not use the wdt_feed() and it still works and I assume that if we do not feed the dog it will hang after 5 seconds. But it does not do that but I can see sporadic reboot within one hour.

For the callback issue I do use a debugger and it never enter the callback.

So having a WDT driver that do not needs feed, no working callback and sporadic restarts seems to be a broken implementation in Zephyr!

How can you fix this and when?

============================================================================================================================

Regarding the watchdog - there is a nice sample presenting how to use the API:

https://github.com/zephyrproject-rtos/zephyr/blob/master/samples/drivers/watchdog/src/main.c

Have you tried to follow this one?

Basically you need to call wdt_install_timeout(), then wdt_setup(), and then feed the watchdog with wdt_feed() within the time configured when installing the timeout.

And when checking if the watchdog callback function is called, remember that this function is called two cycles of the 32 kHz clock before the reset is done, so it is a very little time to do something. In case you’d want to set a breakpoint in this function, you’ll need to use the WDT_OPT_PAUSE_HALTED_BY_DBG option.

============================================================================================================================

I have two question regarding WTD and Error-handler.

  1. I have enabled WDT in Zephyr and implemented the setup and enable of the WDT and it seems to work, but …
    I do not use the wdt_feed() api but still it works but I could get sporadic restart within an hour a so. I also note that the callback function is not called when watchdog has expired.
  2. For the Error-Handler; And there are a lot of different fault handler for different failure and exceptions. And I want to enable reset when this happens. So is there any support for reset when we end up in any error handler?
Parents
  • Can you  try the following patch? This will try to reduce the latency of Watchdog interrupts

    diff --git a/drivers/watchdog/wdt_nrfx.c b/drivers/watchdog/wdt_nrfx.c
    index 4297789c6a..2200441ae7 100644
    --- a/drivers/watchdog/wdt_nrfx.c
    +++ b/drivers/watchdog/wdt_nrfx.c
    @@ -197,7 +197,7 @@ static int init_wdt(struct device *dev)
                    return -EBUSY;
            }
    
    -       IRQ_CONNECT(CONFIG_WDT_NRF_IRQ, CONFIG_WDT_NRF_IRQ_PRI,
    +       IRQ_DIRECT_CONNECT(CONFIG_WDT_NRF_IRQ, 0,
                        nrfx_isr, nrfx_wdt_irq_handler, 0);
            irq_enable(CONFIG_WDT_NRF_IRQ);
    

  • I can but please do the verification on your end. I have no time to test your platform.

  • Hi,

    I was asked to take a look at this thread as Håkon is out of office for some time. It's a bit hard to follow, so can you please confirm:

    1. You have enabled the watchdog. And configured it to WDT_OPT_PAUSE_IN_SLEEP. So it should not run while the cpu sleeps.

    2. You are not feeding it?

    3. This leads to sporadic resets (i.e. resets after 1-3hours when the application does nothing) ?

    If you enable the watchdog but do not feed it, it will eventually reset the device. When the reset happens depends on your configuration. If you configured it to pause while the CPU is sleeping it will still run while the zephyr os runs in the background at a higher interrupt level. So your application will be asleep (no running tasks), but the watchdog will run while the OS operations run. So the varying reset time will depend on how much activity you have in the os, even if your applications does nothing. Because of this you need to setup a recurring task that feeds the watchdog before it times out.

  • 1. yes

    2. no I do not and this information was proved in the OP.

    3. sleeps most of the time and just doing sporadic activity's.

    My investigation reading the TRM and the sample code and also test the WDT it seems that feeding is not needed or part of the OS when the WDT function are enabled in Zephyr. When my code runs it last for 2-4 hours and if during this time, if I hang a process for 5 seconds it will reboot. And this is expected behavior (I use 5 second timeout). If there was any issue with feeding it would reset within 5 seconds?

  • You need to feed the Watchdog. Since you are only doing sporadic activities, you are sleeping most of the time, which means the Watchdog is halted. Since the watchdog only runs while the cpu is running it can take hours before it resets. The time it takes before it resets depends on how much time your application spends in "sleep". If you had a cpu intensive application the watchdog would reset your device much sooner. I would recommend setting up a task to feed the watchdog every 4th second or something like that.

  • Hi,

    My investigation reading the TRM and the sample code and also test the WDT it seems that feeding is not needed or part of the OS when the WDT function are enabled in Zephyr.

    This is not true. You definitely need to feed the watchdog after you enable it. Otherwise, it will reset your system. And you actually observe this:

    When my code runs it last for 2-4 hours and if during this time, if I hang a process for 5 seconds it will reboot. And this is expected behavior (I use 5 second timeout).

    When you do not hang a process, your code runs but as you stated it:

    3. sleeps most of the time and just doing sporadic activity's.

    and since you use the WDT_OPT_PAUSE_IN_SLEEP option, the watchdog timer does not count down during the CPU sleep periods. Therefore, the watchdog will not timeout after 5 seconds of the overall time, but after 5 seconds of the cumulated activity time of the CPU (this is the time when the watchdog is instructed to work in your configuration). And this is not only the time in which your application tasks are executed but also any interrupts that are handled in your system, scheduling etc. Hence, what you describe as a "sporadic reset" after 2-4 hours is actually the reset caused by the watchdog because it was not fed. You can confirm this by doing some indication in the watchdog handler, for instance by changing a pin state in this handler and monitoring of this pin with a logic analyzer (please note that the handler is fired only two cycles of the 32.768 kHz clock before the watchdog reset is done, so it is a very short time to do anything; if you for example turn a LED in the handler, you probably won't be able to notice it is turning on because the reset will shortly turn it off).

    And the above is the answer to your question:

    So back to my OP question. Why do the system reboot within 2-4h?

    What to do to prevent this reboots? Feed the watchdog. Where to do it? It highly depends on your application. If you for instance have a task that mostly sleeps but will be woken up at least once per 5 seconds (or what you decide to configure the watchdog to) of the CPU activity time, it will be sufficient to place the feeding there. If the situation is more complicated and there is a possibility that only some event handler is executed without involving the task for a longer time (still, the cumulated activity time of the CPU), you'll need to add the feeding also in this handler. The key thing is to feed the watchdog only as a confirmation that the vital parts of your application are executed as expected.

Reply
  • Hi,

    My investigation reading the TRM and the sample code and also test the WDT it seems that feeding is not needed or part of the OS when the WDT function are enabled in Zephyr.

    This is not true. You definitely need to feed the watchdog after you enable it. Otherwise, it will reset your system. And you actually observe this:

    When my code runs it last for 2-4 hours and if during this time, if I hang a process for 5 seconds it will reboot. And this is expected behavior (I use 5 second timeout).

    When you do not hang a process, your code runs but as you stated it:

    3. sleeps most of the time and just doing sporadic activity's.

    and since you use the WDT_OPT_PAUSE_IN_SLEEP option, the watchdog timer does not count down during the CPU sleep periods. Therefore, the watchdog will not timeout after 5 seconds of the overall time, but after 5 seconds of the cumulated activity time of the CPU (this is the time when the watchdog is instructed to work in your configuration). And this is not only the time in which your application tasks are executed but also any interrupts that are handled in your system, scheduling etc. Hence, what you describe as a "sporadic reset" after 2-4 hours is actually the reset caused by the watchdog because it was not fed. You can confirm this by doing some indication in the watchdog handler, for instance by changing a pin state in this handler and monitoring of this pin with a logic analyzer (please note that the handler is fired only two cycles of the 32.768 kHz clock before the watchdog reset is done, so it is a very short time to do anything; if you for example turn a LED in the handler, you probably won't be able to notice it is turning on because the reset will shortly turn it off).

    And the above is the answer to your question:

    So back to my OP question. Why do the system reboot within 2-4h?

    What to do to prevent this reboots? Feed the watchdog. Where to do it? It highly depends on your application. If you for instance have a task that mostly sleeps but will be woken up at least once per 5 seconds (or what you decide to configure the watchdog to) of the CPU activity time, it will be sufficient to place the feeding there. If the situation is more complicated and there is a possibility that only some event handler is executed without involving the task for a longer time (still, the cumulated activity time of the CPU), you'll need to add the feeding also in this handler. The key thing is to feed the watchdog only as a confirmation that the vital parts of your application are executed as expected.

Children
No Data
Related