Firmware Reset failing with MCUboot and WDT

Hi all,

We have an application which uses both MCUboot and a watchdog timer. This combination seems to be causing issues anytime we attempt a soft reset using sys_reboot or NVIC reset directly.

Our system;

  • Custom board using nRF5340
  • ncs v2.2.0

When we attempt a soft reset the following happens;

  1. Device powers down and enters bg_thread_main() in "ncs\v2.2.0\zephyr\kernel\init.c", it doesn't enter our application main()
  2. Device enters MCUboot
  3. After sometime the watchdog resets, the device enters MCUboot again
  4. Finally the device enters bg_thread_main() again, this time our application main() runs successfully

All this is monitored using UART log (shown below).

This issue looks similar to another devzone post, however, we've tried the suggestion of adding CONFIG_BOOT_WATCHDOG_FEED to our mcuboot.conf but we still get the same sequence of events.

From the RESET table, software resets don't reset the WDT which explains why we get 2 resets but it doesn't explain why we don't enter our application main loop in 1. We've tried forcing a WDT timer reset instead of a soft reset and this does fix the issue - we only get one reset and it enters our application main loop successfully. 

One solution is if we tie a gpio directly to the RESET pin and use this gpio for resets from within the application. This isn't ideal as it requires hardware changes and limits ours design.

Does anyone have any other suggestions we can try?

Parents
  • Hello,

    The CONFIG_BOOT_WATCHDOG_FEED symbol should be selected by default when the build target is a nRF device. You can confirm this by if the symbol is set in the generated configuration file (build/mcuboot/zephyr/.conf). Either way, I would suggest that you try to make the WD timeout longer (e.g., 30 seconds), if you have not done so already. Then debug the device to see where it hangs when the program isn't reaching main().

    Thanks,

    Vidar

  • Hi Vidar,

    Thanks for your reply.

    You're right that the "CONFIG_BOOT_WATCHDOG_FEED" option is included by default in the mcuboot configuration.


    From what I can tell, it seems the application is hanging on the main() function inside bg_thread_main(). All the functions called before main() appear to be exiting without any issues.

  • Hi,

    hugzy123 said:
    From what I can tell, it seems the application is hanging on the main() function inside bg_thread_main(). All the functions called before main() appear to be exiting without any issues.

    I would have expected the "boot banner" to be printed before the WD reset if the system initialization in bg_thread_main() completed succesfully. 

    https://github.com/nrfconnect/sdk-zephyr/blob/c0689b16ff127d1b71a4cc20310d476e1807e26e/kernel/init.c#L297 

    Could you place a breakpoint at the call to z_sys_init_run_level(INIT_LEVEL_POST_KERNEL); and another at the call to z_sys_init_run_level(INIT_LEVEL_APPLICATION); to check if both are reached?

  • I would have expected the "boot banner" to be printed before the WD reset if the system initialization in bg_thread_main() completed succesfully. 

    There is a boot banner being printed, its actually printed three times; once just after the application reset, a second after the first bootloader sequence has finished and a third after the WDT reset and the second bootloader sequence.

    Could you place a breakpoint at the call to z_sys_init_run_level(INIT_LEVEL_POST_KERNEL); and another at the call to z_sys_init_run_level(INIT_LEVEL_APPLICATION); to check if both are reached?

    I'm using UART so I can't use breakpoints, however, I have added multiple log messages and observed the application reaches just before main(); in the function bg_thread_main on the first failed reset attempt (before the WDT reset). 

     

  • The boot banner is printed during system initialization in bg_thread_main(), so normally it should appear once during the startup of mcuboot and then a second time when the application boots up.

    Is it possible to attach a debugger to your custom board? If not, do you think it would be possible to reproduce this on a DK so I can try to debug it here?

  • Hi Vidar,

    Apologies for the delayed response; I’ve been tied up with other projects. I’ve now returned to this and managed to reproduce the issue on the nRF53 DK. I’ve attached a .zip file containing the source code, board files, and overlays.

    The firmware is designed to initialise and activate LED1 on the DK, stay active for 2 seconds, then reboot. This works as expected for the first ~4 reboots*, but after that, the device fails to reboot correctly. It hangs for around 30 seconds (until the WDT window elapses), at which point the watchdog timer kicks in and the device reboots properly. Strangely, this doesn’t happen when the RTT viewer is connected. The issue occurs regardless of whether I use sys_reboot or NVIC_SystemReset to trigger the reboot. The firmware is built using nRF SDK 2.2.0.

    *The issue tends to occur between the 3rd and 5th reboot.

    To compile and run the firmware:

    1. Set nRF toolchain and SDK to NCS v2.2.0
    2. Select "demo_boards_cpuapp_ns" as the board type
    3. Include the overlay-rtt.conf Kconfig fragment
    4. Add external_flash.overlay and power_enhance.overlay
    5. Flash the firmware onto the DK
    6. Don't connect to RTT! Instead, connect via serial interface to view logs through UART

    Below is a screenshot of my build configuration from the nRF VS Code extension.

    When the issue arises (typically after 3-5 reboots), the logs will freeze on the following output for about 30 seconds.

    After the WDT has elapsed the device will reset and boot.

    demo_wdt_issue.zip

    Let me know if you need anything more from me.

  • Thank you for this example. I'm able to reproduce the same here now, and it looks like the program hangs in the SPU_IRQHandler when the WD times out.

    I will continue to debug this to try find out what is triggering the SPU IRQ. 

    hugzy123 said:

    I also tried connecting the RESET pin directly to a GPIO to trigger a pin reset instead of a software reset. However, when I attempted this, our custom board freezes after setting the GPIO low, I think its because the RESET pin remains low.

    This should work, but it is important that the GPIO is never becomes configured as an output low during boot. Is this GPIO configured in the code your prodived?

Reply
  • Thank you for this example. I'm able to reproduce the same here now, and it looks like the program hangs in the SPU_IRQHandler when the WD times out.

    I will continue to debug this to try find out what is triggering the SPU IRQ. 

    hugzy123 said:

    I also tried connecting the RESET pin directly to a GPIO to trigger a pin reset instead of a software reset. However, when I attempted this, our custom board freezes after setting the GPIO low, I think its because the RESET pin remains low.

    This should work, but it is important that the GPIO is never becomes configured as an output low during boot. Is this GPIO configured in the code your prodived?

Children
  • Thank you for this example. I'm able to reproduce the same here now, and it looks like the program hangs in the SPU_IRQHandler when the WD times out.

    Great to hear you can see it your side too.

    This should work, but it is important that the GPIO is never becomes configured as an output low during boot. Is this GPIO configured in the code your prodived?

    It isn't in the demo I provided, however, it can be added easily.

    On the DK you just need to add a solder bridge to SB43 so the reset can be controlled via the pin, then add a jumper between a gpio and the RESET pin.

    I've used GPIO (0,26) below)

    Then you can add the following function to the demo example which triggers a pin reset.

    #define CONTROL_PIN NRF_GPIO_PIN_MAP(0,26) 
    void pin_reset(void)
    {
    	 LOG_DBG("Resetting Device"); 
    
        // Configure the GPIO pin
       // nrfx_gpiote_out_config_t pin_config = NRFX_GPIOTE_CONFIG_OUT_SIMPLE(true);
        nrfx_gpiote_out_config_t pin_config =
        {
            .action = GPIOTE_CONFIG_POLARITY_LoToHi,
            .init_state = GPIOTE_CONFIG_OUTINIT_High,
            .task_pin = false,
        };
        nrfx_err_t err = nrfx_gpiote_out_init(CONTROL_PIN, &pin_config);
        if (err != NRFX_SUCCESS) 
            LOG_DBG("nrfx error 2 ox%X",err); 
        k_msleep(300);
        // Set the pin high
            // power up unused ram for mcuboot
        if (IS_ENABLED(CONFIG_RAM_POWER_DOWN_LIBRARY))
        {
            power_up_unused_ram();
        }
        nrfx_gpiote_out_clear(CONTROL_PIN);
    	LOG_DBG("triggered reset pin"); //shouldn't get to here
    }

    In my testing I call pin_reset() in main after the 2 second delay and led initialisation. 

    It seems like because the reset pin never goes low again it doesn't complete a reset. You can see this using the "Boot Reset" button on the DK; if I hold the button down the device won't reset, It's only when I release the button does it reset.

  • I'm sorry for the delayed response. I've been working on debugging the code to identify what is triggering the SPU event. It turned out to be quite challenging because the security violation is not caused by the CPU but by another bus master.

    I have not been able to pinpoint the source of the flash access error, nor have I found any other reports of this event happening under similar conditions. However, upgrading to SDK v2.4.4 may resolve this issue as it includes several errata workarounds that were not present in SDK v2.2.0.

    A workaround for now may be to issue a soft reset from the SPU ISR in TF-M to avoid having to wait for the WDT timeout. For example, by adding the following to ncs/v2.2.0/modules/tee/tf-m/trusted-firmware-m:

    diff --git a/platform/ext/common/faults.c b/platform/ext/common/faults.c
    index eb87c971c..9124030c2 100644
    --- a/platform/ext/common/faults.c
    +++ b/platform/ext/common/faults.c
    @@ -107,3 +107,8 @@ __attribute__((naked)) void UsageFault_Handler(void)
             "b         .                      \n"
         );
     }
    +
    +void SPU_IRQHandler(void)
    +{
    +    NVIC_SystemReset();
    +}
    \ No newline at end of file

    hugzy123 said:
    It seems like because the reset pin never goes low again it doesn't complete a reset. You can see this using the "Boot Reset" button on the DK; if I hold the button down the device won't reset, It's only when I release the button does it reset.

    I think you are right. The specification does not guarantee the pin state in reset, only when going out of reset. I know this approach with using a GPIO to trigger pinreset worked with the nRF52840 but there are some differences in the reset mechanism between the nRF52840 and the nRF5340. 

  • Thanks for your reply.

    I have not been able to pinpoint the source of the flash access error, nor have I found any other reports of this event happening under similar conditions. However, upgrading to SDK v2.4.4 may resolve this issue as it includes several errata workarounds that were not present in SDK v2.2.0.

    Upgrading to a newer SDK does sound like a better fix, however, we've been struggling to get the partitions to confirm to the alignment rules in the newer SDK versions for the trusted firmware module.

    A workaround for now may be to issue a soft reset from the SPU ISR in TF-M to avoid having to wait for the WDT timeout. For example, by adding the following to ncs/v2.2.0/modules/tee/tf-m/trusted-firmware-m:

    Thanks, we've verified this workaround in our current firmware. Do you know if there are any downsides to using this patch? 

    I think you are right. The specification does not guarantee the pin state in reset, only when going out of reset. I know this approach with using a GPIO to trigger pinreset worked with the nRF52840 but there are some differences in the reset mechanism between the nRF52840 and the nRF5340. 

    We're looking into adding an IC we can use which would generate a pulse on the pinreset line from an output GPIO of the nRF module.

  • hugzy123 said:
    Upgrading to a newer SDK does sound like a better fix, however, we've been struggling to get the partitions to confirm to the alignment rules in the newer SDK versions for the trusted firmware module.

    Did you use the same memory layout in both versions? The SPU requires 32K aligment: https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/security/tfm.html#tf-m_partition_alignment_requirements.

    hugzy123 said:
    Thanks, we've verified this workaround in our current firmware. Do you know if there are any downsides to using this patch? 

    I do not foresee any problems the workaround itself. 

  • Did you use the same memory layout in both versions? The SPU requires 32K aligment: https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/security/tfm.html#tf-m_partition_alignment_requirements.

    Thank you, I will look into this.

    I do not foresee any problems the workaround itself

    Great, we'll use this for now.

Related