Firmware Reset failing with MCUboot and WDT

Hi all,

We have an application which uses both MCUboot and a watchdog timer. This combination seems to be causing issues anytime we attempt a soft reset using sys_reboot or NVIC reset directly.

Our system;

  • Custom board using nRF5340
  • ncs v2.2.0

When we attempt a soft reset the following happens;

  1. Device powers down and enters bg_thread_main() in "ncs\v2.2.0\zephyr\kernel\init.c", it doesn't enter our application main()
  2. Device enters MCUboot
  3. After sometime the watchdog resets, the device enters MCUboot again
  4. Finally the device enters bg_thread_main() again, this time our application main() runs successfully

All this is monitored using UART log (shown below).

This issue looks similar to another devzone post, however, we've tried the suggestion of adding CONFIG_BOOT_WATCHDOG_FEED to our mcuboot.conf but we still get the same sequence of events.

From the RESET table, software resets don't reset the WDT which explains why we get 2 resets but it doesn't explain why we don't enter our application main loop in 1. We've tried forcing a WDT timer reset instead of a soft reset and this does fix the issue - we only get one reset and it enters our application main loop successfully. 

One solution is if we tie a gpio directly to the RESET pin and use this gpio for resets from within the application. This isn't ideal as it requires hardware changes and limits ours design.

Does anyone have any other suggestions we can try?

Parents
  • Hello,

    The CONFIG_BOOT_WATCHDOG_FEED symbol should be selected by default when the build target is a nRF device. You can confirm this by if the symbol is set in the generated configuration file (build/mcuboot/zephyr/.conf). Either way, I would suggest that you try to make the WD timeout longer (e.g., 30 seconds), if you have not done so already. Then debug the device to see where it hangs when the program isn't reaching main().

    Thanks,

    Vidar

  • Hi Vidar,

    Thanks for your reply.

    You're right that the "CONFIG_BOOT_WATCHDOG_FEED" option is included by default in the mcuboot configuration.


    From what I can tell, it seems the application is hanging on the main() function inside bg_thread_main(). All the functions called before main() appear to be exiting without any issues.

  • Thank you for this example. I'm able to reproduce the same here now, and it looks like the program hangs in the SPU_IRQHandler when the WD times out.

    Great to hear you can see it your side too.

    This should work, but it is important that the GPIO is never becomes configured as an output low during boot. Is this GPIO configured in the code your prodived?

    It isn't in the demo I provided, however, it can be added easily.

    On the DK you just need to add a solder bridge to SB43 so the reset can be controlled via the pin, then add a jumper between a gpio and the RESET pin.

    I've used GPIO (0,26) below)

    Then you can add the following function to the demo example which triggers a pin reset.

    #define CONTROL_PIN NRF_GPIO_PIN_MAP(0,26) 
    void pin_reset(void)
    {
    	 LOG_DBG("Resetting Device"); 
    
        // Configure the GPIO pin
       // nrfx_gpiote_out_config_t pin_config = NRFX_GPIOTE_CONFIG_OUT_SIMPLE(true);
        nrfx_gpiote_out_config_t pin_config =
        {
            .action = GPIOTE_CONFIG_POLARITY_LoToHi,
            .init_state = GPIOTE_CONFIG_OUTINIT_High,
            .task_pin = false,
        };
        nrfx_err_t err = nrfx_gpiote_out_init(CONTROL_PIN, &pin_config);
        if (err != NRFX_SUCCESS) 
            LOG_DBG("nrfx error 2 ox%X",err); 
        k_msleep(300);
        // Set the pin high
            // power up unused ram for mcuboot
        if (IS_ENABLED(CONFIG_RAM_POWER_DOWN_LIBRARY))
        {
            power_up_unused_ram();
        }
        nrfx_gpiote_out_clear(CONTROL_PIN);
    	LOG_DBG("triggered reset pin"); //shouldn't get to here
    }

    In my testing I call pin_reset() in main after the 2 second delay and led initialisation. 

    It seems like because the reset pin never goes low again it doesn't complete a reset. You can see this using the "Boot Reset" button on the DK; if I hold the button down the device won't reset, It's only when I release the button does it reset.

  • I'm sorry for the delayed response. I've been working on debugging the code to identify what is triggering the SPU event. It turned out to be quite challenging because the security violation is not caused by the CPU but by another bus master.

    I have not been able to pinpoint the source of the flash access error, nor have I found any other reports of this event happening under similar conditions. However, upgrading to SDK v2.4.4 may resolve this issue as it includes several errata workarounds that were not present in SDK v2.2.0.

    A workaround for now may be to issue a soft reset from the SPU ISR in TF-M to avoid having to wait for the WDT timeout. For example, by adding the following to ncs/v2.2.0/modules/tee/tf-m/trusted-firmware-m:

    diff --git a/platform/ext/common/faults.c b/platform/ext/common/faults.c
    index eb87c971c..9124030c2 100644
    --- a/platform/ext/common/faults.c
    +++ b/platform/ext/common/faults.c
    @@ -107,3 +107,8 @@ __attribute__((naked)) void UsageFault_Handler(void)
             "b         .                      \n"
         );
     }
    +
    +void SPU_IRQHandler(void)
    +{
    +    NVIC_SystemReset();
    +}
    \ No newline at end of file

    hugzy123 said:
    It seems like because the reset pin never goes low again it doesn't complete a reset. You can see this using the "Boot Reset" button on the DK; if I hold the button down the device won't reset, It's only when I release the button does it reset.

    I think you are right. The specification does not guarantee the pin state in reset, only when going out of reset. I know this approach with using a GPIO to trigger pinreset worked with the nRF52840 but there are some differences in the reset mechanism between the nRF52840 and the nRF5340. 

  • Thanks for your reply.

    I have not been able to pinpoint the source of the flash access error, nor have I found any other reports of this event happening under similar conditions. However, upgrading to SDK v2.4.4 may resolve this issue as it includes several errata workarounds that were not present in SDK v2.2.0.

    Upgrading to a newer SDK does sound like a better fix, however, we've been struggling to get the partitions to confirm to the alignment rules in the newer SDK versions for the trusted firmware module.

    A workaround for now may be to issue a soft reset from the SPU ISR in TF-M to avoid having to wait for the WDT timeout. For example, by adding the following to ncs/v2.2.0/modules/tee/tf-m/trusted-firmware-m:

    Thanks, we've verified this workaround in our current firmware. Do you know if there are any downsides to using this patch? 

    I think you are right. The specification does not guarantee the pin state in reset, only when going out of reset. I know this approach with using a GPIO to trigger pinreset worked with the nRF52840 but there are some differences in the reset mechanism between the nRF52840 and the nRF5340. 

    We're looking into adding an IC we can use which would generate a pulse on the pinreset line from an output GPIO of the nRF module.

  • hugzy123 said:
    Upgrading to a newer SDK does sound like a better fix, however, we've been struggling to get the partitions to confirm to the alignment rules in the newer SDK versions for the trusted firmware module.

    Did you use the same memory layout in both versions? The SPU requires 32K aligment: https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/security/tfm.html#tf-m_partition_alignment_requirements.

    hugzy123 said:
    Thanks, we've verified this workaround in our current firmware. Do you know if there are any downsides to using this patch? 

    I do not foresee any problems the workaround itself. 

  • Did you use the same memory layout in both versions? The SPU requires 32K aligment: https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/security/tfm.html#tf-m_partition_alignment_requirements.

    Thank you, I will look into this.

    I do not foresee any problems the workaround itself

    Great, we'll use this for now.

Reply Children
No Data
Related