Issue when application that uses TASK WDT gets a FOTA with an application that uses simple WDT

Dear All,

Currently I am developing an application that is using the TASK WDT, using nRF COnnect SDK v2.1.0. The application itself works fine. When I tried to do a FOTA, even though the task WDT related to the FOTA operation was treated in time, I was getting a watchdog reset in the middle of the operation without any fault message or anything like that. This was with using the default KConfig settings and the hardware fallback enabled. My task WDT callback looks like this:

/* Callback for Task WDT */
void task_wdt_callback(int channel_id, void* user_data)
{
    task_wdt_t* handle = user_data; // Cast to struct

    /* We have WDT HW fallback to account for, no time to waste */
    LOG_ERR("**** Timeout on ID %u with name %s ****", channel_id, handle->channel_name);
    while (LOG_PROCESS())
        ; // Print all remaining logging
    LOG_PANIC();

    /* Can retry a few times. because we have the ID... Keep a log of which modules timeout? */

    /* Can try to save in persistent memory */

    /* Reboots due to HW WDT Fallback, default delay needed to finish printing */

    /* In case of WDT HW fallback failure, still do a reboot */
    // NVIC_SystemReset(); // Reset reason will be SREQ ...
}


So I was expecting to see the LOG message. Then I realised that it might be that the hardware WDT delay was too short.


After some tweaking I got this setup to be working well:

CONFIG_WATCHDOG=y
CONFIG_WDT_DISABLE_AT_BOOT=n
CONFIG_TASK_WDT=y
CONFIG_TASK_WDT_CHANNELS=15
CONFIG_TASK_WDT_MIN_TIMEOUT=10000
CONFIG_TASK_WDT_HW_FALLBACK=y
CONFIG_TASK_WDT_HW_FALLBACK_DELAY=1000


After fixing this problem I got to a new one. The FOTA file that I am trying to apply was made on nRF Connect SDK v1.6.0 and it is running the regular, hardware WDT.

Once my device downloads the file and tries to apply it, I see the boot banner of that application, sometimes it evens starts running for a short amount of time and then the device is reset and the reset cause is DOG, without any obvious sign as to why does that happen.

I have tried compiling both projects with

CONFIG_WDT_DISABLE_AT_BOOT=y

but this did not make a difference.


Once the download is complete this is what I see:

[00:02:52.891,357] <inf> dfu_target_mcuboot: MCUBoot image-0 upgrade scheduled. Reset device to apply
[00:02:52.914,001] <inf> sodaq_nrf9160_fota: FOTA Completed, rebooting in 5s
*** Booting Zephyr OS build v3.1.99-ncs1  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0x10000
I: Jumping to the first image slot*** Booting Zephyr OS build v2.6.0-rc1-ncs1  ***
Flash regions		Domain		Permissions
00 03 0x00000 0x20000 	Secure		rwxl
04 31 0x20000 0x100000 	Non-Secure	rwxl

Non-secure callable region 0 placed in flash region 3 with size 32.

SRAM region		Domain		Permissions
00 07 0x00000 0x10000 	Secure		rwxl
08 31 0x10000 0x40000 	Non-Secure	rwxl

Peripheral		Domain		Status
00 NRF_P0               Non-Secure	OK
01 NRF_CLOCK            Non-Secure	OK
02 NRF_RTC0             Non-Secure	OK
03 NRF_RTC1             Non-Secure	OK
04 NRF_NVMC             Non-Secure	OK
05 NRF_UARTE1           Non-Secure	OK
06 NRF_UARTE2           Secure		SKIP
07 NRF_TWIM2            Non-Secure	OK
08 NRF_SPIM3            Non-Secure	OK
09 NRF_TIMER0           Non-Secure	OK
10 NRF_TIMER1           Non-Secure	OK
11 NRF_TIMER2           Non-Secure	OK
12 NRF_SAADC            Non-Secure	OK
13 NRF_PWM0             Non-Secure	OK
14 NRF_PWM1             Non-Secure	OK
15 NRF_PWM2             Non-Secure	OK
16 NRF_PWM3             Non-Secure	OK
17 NRF_WDT              Non-Secure	OK
18 NRF_IPC              Non-Secure	OK
19 NRF_VMC              Non-Secure	OK
20 NRF_FPU              Non-Secure	OK
21 NRF_EGU1             Non-Secure	OK
22 NRF_EGU2             Non-Secure	OK
23 NRF_DPPIC            Non-Secure	OK
24 NRF_REGULATORS       Non-Secure	OK
25 NRF_PDM              Non-Secure	OK
26 NRF_I2S              Non-Secure	OK
27 NRF_GPIOTE1          Non-Secure	OK

SPM: NS image at 0x20200
SPM: NS MSP at 0x20032368
SPM: NS reset vector at 0x32149
SPM: prepare to jump to Non-Secure image.
*** Booting Zephyr OS build v2.6.0-rc1-ncs1  ***
*** Booting Zephyr OS build v3.1.99-ncs1  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Swap type: revert
I: Starting swap using move algorithm.
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Bootloader chainload address offset: 0x10000
I: Jumping to the first image slot[00:00:00.449,462] <inf> spi_nor: W25Q16JV: SFDP v 1.5 AP ff with 1 PH
[00:00:00.449,523] <inf> spi_nor: PH0: ff00 rev 1.5: 16 DW @ 80
*** Booting Zephyr OS build v3.1.99-ncs1  ***
[00:00:00.463,623] <inf> spi_nor: W25Q16JV: 2 MiBy flash
[00:00:00.467,895] <inf> mcuboot_util: Swap type: none

Reset cause(s):
	DOG


This is the same in both cases where I am either disabling or enabling the watchdog at boot on the BIN that I am downloading.

So my question is this:

Is there an issue with compatibility between the TASK WDT and the regular WDT? Is there a way to stop prevent the HW WDT to fire at boot if the previous application was using TASK WDT with HW WDT as a fallback?

  • Hi,

    The HW WDT is (deliberately) quite limited with regards to how it can be disabled. Once enabled, it will continue to run and the configuration cannot be changed, except after a reset of the WTD peripheral. And that is not reset during a normal soft reset (see Reset behavior).

    So if the currently running firmware (you have based on 1.6) is using the WDT, you will need to continue to feed it. As you are getting resets with DOG as reason in the RESETREAS, that means it has not been fed fast enough, timed out, and reset the device.

    The Task WDT has the option of using a physical WDT as fall back, and so it also must feed it. So it could be that you can get away with just increasing the frequency of the WDT feeding in the Task WDT configuration. You adjust that with a combination of CONFIG_TASK_WDT_MIN_TIMEOUT and CONFIG_TASK_WDT_HW_FALLBACK_DELAY. Alternatively, add some special handling to ensure it is feed often enough. But of course, make sure you don't just feed it all the time regardless of the state of the device, as if you do, there will not be much point of the WDT.)

    As a side note, perhaps an alternative could be to keep using the physical WDT peripheral in the nRF and not the task WDT library at all? The WDT peripheral in the nRF has 8 channels, so you can monitor 8 tasks separately with that if you like (though again, you cannot change the configuration registers once it is started, so it may be difficult if you do not already use it like this in the existing firmware). Remember that the task WDT is a zephyr library, and if things go really haywire it may not be able to save you in the same way as a physical WDT timer can.

    Lastly, if you need to change the configuration of the HW WDT, it is possible to first upload via DFU some firmware that does not start it but lets it time out and reset (assuming you do not start it in for instance an immutable bootloader - if you do, it can never change). Then, after the WDT reset the WDT is not running, and if your firmware does not start it, well - it is not started, and you can configure it as you like or not use it at all.

  •  

    Thanks a lot for your response. My issue is this:

    My original application is using the TASK WDT with the HW WDT enabled as a fallback.

    What I am trying to do is update the device with an application that is using the standard HW WDT alone.

    What I am doing in the new application is to simply feed the WDT as the first thing in the main.

    But this still does not seem to have an impact as the device resets and the reset reason is still DOG.

    Is there a way to reset or disable the watchdog on the original application with the TASK WDT, right before doing the reset to boot to the new application?

  • Giannis Anastasopoulos said:
    What I am trying to do is update the device with an application that is using the standard HW WDT alone.

    Ah, I see. I misunderstood the situation. The fundamentals remain the same, though.

    Giannis Anastasopoulos said:
    What I am doing in the new application is to simply feed the WDT as the first thing in the main.

    That should normally be good enough. What is the reload-value of the WDT (the old configuration is the relevant here, as that is what is being used)? That will let us know how much time you have.

    Also, is there by any change more than one channel enabled?

    If we know these two things, then we know what you need to do in the new application in order to continue to feed the watchdog.

    It would also be relevant to know if there are any other parts of the system (typically bootlaoder(s)) that enables the WDT? If not, you would at least be able to re-configure it to suit your new needs after a WDT reset.

    Giannis Anastasopoulos said:
    Is there a way to reset or disable the watchdog on the original application with the TASK WDT, right before doing the reset to boot to the new application?

    No. There is no support for disabling the Task WDT. And more importantly, you cannot disable the HW WDT (which you write is enabled here as the fallback) in any other way than a reset of some sort which is not a soft reset or CPU lockup. The only way to disable it would be to trigger a reset of a form that disables it (like a WDT reset by not feeding it).

  • In an attempt to rule out the possibility of a channel different than the first one being used in the initial application, in the FOTA application I am doing this:

        int wdt_channel_id = 0;
        while (wdt_channel_id >= 0) {
            max_wdt_channel_id = wdt_channel_id;
            wdt_channel_id = wdt_install_timeout(wdt, &wdt_config);
            LOG_INF("WDT Channel %d set", wdt_channel_id);
        }
        LOG_INF("MAX channel ID: %d", max_wdt_channel_id);


    And I feed all the channels like this:

        for(int i = 0; i <= max_wdt_channel_id; i++){
            LOG_INF("Feeding watchdog, channel: %d", i);
            wdt_feed(wdt, i);
        }


    But my device still resets due to DOG reset
    18:38:03.090 -> *** Booting Zephyr OS build v2.6.0-rc1-ncs1  ***
    18:38:03.136 -> [00:00:00.240,051] <inf> IMS: IMS dispatch thread active
    18:38:03.136 -> [00:00:00.251,159] <inf> fs_nvs: 3 Sectors of 4096 bytes
    18:38:03.136 -> [00:00:00.251,190] <inf> fs_nvs: alloc wra: 2, 5c0
    18:38:03.136 -> [00:00:00.251,190] <inf> fs_nvs: data wra: 2, 514
    18:38:03.184 -> Reset caus[00:00:00.343,994] <inf> DEVICE_SETTINGS: NVM now accepting new settings
    18:38:03.469 -> e(s):
    18:38:03.469 -> 	SREQ
    18:38:03.469 -> [00:00:00.354,461] <inf> regular_reset: Handler active
    18:38:03.469 -> [00:00:00.354,858] <inf> IMS_STATS: Stats listener active
    18:38:03.469 -> [00:00:00.355,041] <inf> IMS_STATS: IMS Stats listener active
    18:38:03.469 -> [00:00:00.355,255] <inf> WDT: WDT Channel 0 set
    18:38:03.469 -> [00:00:00.355,285] <inf> WDT: WDT Channel 1 set
    18:38:03.469 -> [00:00:00.355,316] <inf> WDT: WDT Channel 2 set
    18:38:03.469 -> [00:00:00.355,316] <inf> WDT: WDT Channel 3 set
    18:38:03.469 -> [00:00:00.355,346] <inf> WDT: WDT Channel 4 set
    18:38:03.469 -> [00:00:00.355,346] <inf> WDT: WDT Channel 5 set
    18:38:03.469 -> [00:00:00.355,377] <inf> WDT: WDT Channel 6 set
    18:38:03.469 -> [00:00:00.355,377] <inf> WDT: WDT Channel 7 set
    18:38:03.469 -> [00:00:00.355,407] <inf> WDT: WDT Channel -12 set
    18:38:03.469 -> [00:00:00.355,407] <inf> WDT: MAX channel ID: 7
    18:38:03.469 -> [00:00:00.355,438] <inf> WDT: Feeding watchdog, channel: 0
    18:38:03.469 -> [00:00:00.355,438] <inf> WDT: Feeding watchdog, channel: 1
    18:38:03.469 -> [00:00:00.355,468] <inf> WDT: Feeding watchdog, channel: 2
    18:38:03.469 -> [00:00:00.355,468] <inf> WDT: Feeding watchdog, channel: 3
    18:38:03.469 -> [00:00:00.457,611] <inf> WDT: Feeding watchdog, channel: 4
    18:38:03.469 -> [00:00:00.457,641] <inf> WDT: Feeding watchdog, channel: 5
    18:38:03.469 -> [00:00:00.457,672] <inf> WDT: Feeding watchdog, channel: 6
    18:38:03.469 -> [00:00:00.457,702] <inf> WDT: Feeding watchdog, channel: 7
    18:38:03.469 -> [00:00:00.457,916] <inf> IMS_WS2812: Listener active
    18:38:03.469 -> [00:00:00.458,190] <inf> IMS_BATT: Listener active
    18:38:03.469 -> [00:00:00.458,282] <inf> WDT: Feeding watchdog, channel: 0
    18:38:03.469 -> [00:00:00.458,282] <inf> WDT: Feeding watchdog, channel: 1
    18:38:03.469 -> [00:00:00.458,312] <inf> WDT: Feeding watchdog, channel: 2
    18:38:03.469 -> [00:00:00.458,374] <inf> WDT: Feeding watchdog, channel: 3
    18:38:03.469 -> [00:00:00.458,404] <inf> WDT: Feeding watchdog, channel: 4
    18:38:03.469 -> [00:00:00.458,435] <inf> WDT: Feeding watchdog, channel: 5
    18:38:03.469 -> [00:00:00.458,435] <inf> WDT: Feeding watchdog, channel: 6
    18:38:03.469 -> [00:00:00.458,435] <inf> WDT: Feeding watchdog, channel: 7
    ... (not important device init)
    18:38:14.319 -> ⸮*** Booting Zephyr OS build v3.1.99-ncs1  ***
    18:38:14.549 -> I: Starting bootloader
    18:38:14.596 -> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
    18:38:14.596 -> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    18:38:14.596 -> I: Boot source: none
    18:38:14.596 -> I: Swap type: revert
    18:38:15.298 -> I: Starting swap using move algorithm.
    18:38:15.298 -> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
    18:38:55.727 -> I: Bootloader chainload address offset: 0x10000
    18:38:55.727 -> I: Jumping to the first image slot
    [00:00:00.457,183] <inf> spi_nor: W25Q16JV: SFDP v 1.5 AP ff with 1 PH
    18:38:56.522 -> [00:00:00.457,214] <inf> spi_nor: PH0: ff00 rev 1.5: 16 DW @ 80
    18:38:56.522 -> *** Booting Zephyr OS build v3.1.99-ncs1  ***
    18:38:56.522 -> [00:00:00.471,466] <inf> spi_nor: W25Q16JV: 2 MiBy flash
    18:38:56.522 -> [00:00:00.475,799] <inf> mcuboot_util: Swap type: none
    
    18:38:56.522 -> 
    
    18:38:56.522 -> Reset cause(s):
    18:38:56.522 -> 	DOG
    


    As you can see I am feeding all the HW channels right at boot up, but the device is still resetting after a few seconds. And the reset reason is DOG and there is no hardfault there

  • Based on a test that I just did, it seems that the HW watchdog is configured from the application that is running the Task WDT to expire at 10s. In the app that I am trying to FOTA, if I feed the WDT in less than 10s then the application does not reset.

    So based on that result, it seems that due to the fact that the watchdog is set from the first application to expire in 10s and since the WDT cannot be stopped due to system reset, the WDT is running with the old configuration even though I am setting it up again using the

    wdt_install_timeout(wdt, &wdt_config); 


    This means that the only way that my settings will ever have impact is if the device was reset by the reset pin or powercycled, in which case the device would revert back to the old firmware though, since the new firmware is not yet confirmed.

Related