This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

DFU and Watchdog timer (WDT) reset

As many others have found, and answered questions here and here and in other posts, DFU will prematurely exit IF you have enabled WDT in your application, as the current Nordic SDK Buttonless DFU doesn't consider the WDT. I hope that Nordic recognizes this flaw and provide some flexibility in the DFU with WDT in the next SDK release.

We fixed this issue with a hack similar to this post in SDK 12.x.x and 13 and the issue came up again as we switched to SDK14 as there seems to be a new feature to put the processor to sleep in DFU mode in the absence of events, using a call to

sd_app_evt_wait() 

in

static void wait_for_event()
{
    while (true)
    {

        /* if the watchdog is enabled/running then kick it  */
        if ((bool)(NRF_WDT->RUNSTATUS) == true)
        {
          NRF_WDT->RR[0] = WDT_RR_RR_Reload;
        }

        app_sched_execute();
        if (!NRF_LOG_PROCESS())
        {
        #ifdef BLE_STACK_SUPPORT_REQD
            (void)sd_app_evt_wait();
        #else
            __WFE();
        #endif
        }
    }
}

Now, if WDT was not configured to sleep when processor is sleep, this will cause a WDT reset if there are no events for time remaining until WDT times out.

My proposed fix for this was to setup another timer to frequently wake up the processor and kick the watchdog in case there are no DFU events for a long time (i.e. longer than remaining WDT ticks)

add this at the end of timers_init() in nrf_dfu.c

app_timer_create(&nrf_dfu_wdt_kick_timer_id,
                                APP_TIMER_MODE_REPEATED,
                                wdt_kick_timer_handler);

The timeout handler for the timer, which simply kicks the dog:

static void wdt_kick_timer_handler (void * p_context)
{
    /* if the watchdog is enabled/running then kick it  */
    if ((bool)(NRF_WDT->RUNSTATUS) == true)
    {
        NRF_WDT->RR[0] = WDT_RR_RR_Reload;
    }
}

start the timer in nrf_dfu_init()

ret_val = app_timer_start(nrf_dfu_wdt_kick_timer_id,
                          APP_TIMER_TICKS(WDT_KICK_TIMEOUT_MS),
                          NULL);

A completely different approach would be to change the WDT behavior setting to NRF_WDT_BEHAVIOUR_PAUSE_SLEEP_HALT when initing it. This is supposed to pause the WDT if the processor is in sleep or halt mode. I haven't tried this setting, as it has the caveat of forcing this setting to the WDT, e.g. if for some reason no tasks in my application run, then processor will be in sleep mode and WDT will be paused, not causing a reset, which defeats the purpose of WDT

P.S. Although I managed to fix the WDT reset issue by kicking the dog in DFU, DFU sometimes fails prematurely in the middle of package transfer. I haven't figured out the cause to that issue yet.

Any comments, suggestions would be appreciated.

  • Hi Farhang,

    the use of the WDT is application specific and it is difficult to create a generic solution that fits all, e.g. the number of RR registers enabled and the WDT configuration ( Sleep or Halt). Hence, we have left it to the application developer to feed WDT in the bootloader.

    That said, I agree that we should at a minimum state in the bootloader documentation that if the WDT is enabled by the application, then it must be feed by the bootloader during the DFU process to avoid the device to reset prematurely. I think also we should point out where in the bootloader code it must be feed, e.g. add the following comment in wait_for_event():

    /* 
       WDT: If the Watchdog Timer has been enabled by the application, 
            then the it must be feed here 
    */ 
    

    Your solution looks good and I think that we can suggest it to other users that experience this issue.

    Best regards

    Bjørn

  • I have also had troubles with the WDT and the DFU process. I’m currently migrating from SDK13 to SDK14, which involves updating the BL+SD then the APP (all via UART without HWFC at 9600 baud).

    Although I had already applied WDT reset modification to the wait_for_event() in the SDK13 bootloader programmed on the production units, I found that the WDT would still interfere with the bootloader during the BL+SD DFU. After the BL+SD image was transferred, the bootloader performs a soft reset and then runs nrf_dfu_mbr_copy_sd() (which ultimately runs sd_mbr_command()). This function appears to take between 600ms – 800ms (and possibly longer sometimes…). During this time, no app timer callbacks will be invoked as the app timer is actually polled / not interrupt driven. Thus the WDT reset will not be called in the wait_for_event().

    The solution to my problem involved increasing the WDT period to 4 seconds. I also do a further step of updating to a temporary SDK13 app which has the WDT disabled before doing the SDK14 update (BL+SD and then SDK14 app). Let me repeat:

    1. Upload a temporary app which does not enable the WDT
    2. Increase WDT in production app to 4 seconds.

    Note 1: I highly recommend that you do something similar when upgrading. If the WDT resets the bootloader during the nrf_dfu_mbr_copy_sd() function this will brick your device. The bootloader becomes corrupt and the only way to recover from this is to reflash the bootloader via a programmer. I’ve had to throw away some production units as they were sealed and could only be updated via the DFU.

    Note 2: I also initially added an app timer which calls a function to reset the WDT in the bootloader. However, it should stop resetting the WDT after 10 minutes or so (or 2x how long you expecting to stay in the bootloader). Otherwise, you’ll get into the situation where your WDT is not able to bring your bootloader out of an actual program corruption. When I found out that the app timer callbacks were actually being polled, I then changed the WDT callback to be invoked via an interrupt generated by TIMER1 with an interrupt priority of APP_IRQ_PRIORITY_HIGH. This still did not fix the problem and the WDT was still bricking the units during the BL+SD DFU! I suspect that sd_mbr_command() is an “atomic function” where global interrupts are disabled while it is in there for 600ms – 800ms. Although I’ve left this in my bootloader, I have not had any further issues after implementing my aforementioned solution.

  • If your APP has such a little timeout why don't you just stop feeding WDT in APP until it resets the board instead of doing soft reset? After WDT timeout happens it disables WDT and you can reconfigure it in bootloader with longer timeout if needed. Or just keep it disabled.

  • @Neo, this is another way to do it. We initially thought about this but decided against it as it removes the ability for the WDT to get it out of an error state while in the DFU mode. However, when it comes to the BL+SD update, we don't have a choice - if something goes wrong here it will brick the device. The solution I described minimizes the risks for us and has been working well in production (100 or so high value units).

  • Hi  ,

         Sdk17.0.1 also has this problem, do you have a solution?

    Best regards

Related