Triggering reset loop from device firmware for testing

For our application we are using the nRF9160 and we need low re connection / reset timeouts.
Sometimes during a bad chain of events our application will trigger the reset loop of the modem.
We have tried to construct our firmware in a way that prevents this loop from occurring but it's difficult to test if the devices recover from the reset loop.

Is there a way to trigger the reset loop restriction from software?

Currently to test reset loop behavior we need to flash a firmware that rapidly enables the modem and resets the device to trigger the restriction, this makes it harder to test for in our integration tests before a firmware release.

Any recommendations to test reset loop recovery are welcome.

Parents
  • Hello, 

    Yes we have read the activation method of the reset loop by resetting the modem 7 times in a row with the modem in operational mode. However triggering this behavior intentionally is currently quite a hassle:

    To activate the reset loop restriction we flash a firmware image that constantly resets the device during a connection cycle, after the reset loop has been activated we flash our application and check if it properly detects the reset loop and recovers.

    We would like to implement a way to quickly and reliably put the modem in the reset loop restriction preferably by not resetting the application core during the modem reset loop trigger:

    • Would using the Xfactoryreset with the user data parameter set work? the full reset also seems to clear the loop
    • Is there a testing command that can be used to enable the restriction to avoid overloading the network?
    • Is there a different modem reset ATcommand that just resets the modem core.

    If there is no easy way to turn the restriction on we will probably just create a function that keeps a persistant crash counter in device ROM using the settings subsystem and have it crash after a succesfull connection until the countdown reaches 0.

  • Hello, 

    Sometimes during a bad chain of events our application will trigger the reset loop of the modem.

    Could you please elaborate more on this situation? The application should actually never hit the reset loop restriction. The application should ensure graceful shutdown using +CFUN=0. What is causing the device of not doing a graceful shutdown?

    We have tried to construct our firmware in a way that prevents this loop from occurring but it's difficult to test if the devices recover from the reset loop.

    Have you subscribed to modem domain event notifications? I.e. AT%MDMEV=1. This should notifty when your device is in reset loop with %MDMEV: RESET LOOP AT notification after the modem's activation when the blocking starts or is still ongoing.

    S Biezeman said:
    Would using the Xfactoryreset with the user data parameter set work?

    You can reset the timer with the factory reset command, but should only be used during debugging and not when the product is out in field.

    Kind regards,
    Øyvind

  • Hello,

    Could you please elaborate more on this situation? The application should actually never hit the reset loop restriction. The application should ensure graceful shutdown using +CFUN=0. What is causing the device of not doing a graceful shutdown?

    A seamless update on our cloud endpoint went wrong and caused the endpoint to send malformed data to the device crashing it on each connection cycle, this bypassed triggering the exponential back off which we implemented to prevent the reset loop from happening, which caused some devices with a very short update interval to enter the reset loop.

    Have you subscribed to modem domain event notifications? I.e. AT%MDMEV=1. This should notifty when your device is in reset loop with %MDMEV: RESET LOOP AT notification after the modem's activation when the blocking starts or is still ongoing.

    Yes we properly detected the reset loop however the reset loop recovery implementation had a implementation error and didn't update the watchdog timers set by the boot loader causing the device to reset before the reset loop timer ticked down.

    We have now added input sanitation so the device shouldn't crash anymore to prevent entering the reset loop restriction and we think we have fixed the reset loop recovery implementation. However we cant test the recovery method because we have difficulties getting the device in this reset loop restriction on the desk. 

    The main point of this thread is that we would like to implement a method to easily and reliably activate the reset loop restriction so we can test if each firmware we release properly recovers from this "worst case scenario that should never happen".

    We want to check that the devices in the field are always able to recover from the restriction, even if the firmware is structured to prevent the restriction from happening.

    Kind regards, 

    s Biezeman

Reply
  • Hello,

    Could you please elaborate more on this situation? The application should actually never hit the reset loop restriction. The application should ensure graceful shutdown using +CFUN=0. What is causing the device of not doing a graceful shutdown?

    A seamless update on our cloud endpoint went wrong and caused the endpoint to send malformed data to the device crashing it on each connection cycle, this bypassed triggering the exponential back off which we implemented to prevent the reset loop from happening, which caused some devices with a very short update interval to enter the reset loop.

    Have you subscribed to modem domain event notifications? I.e. AT%MDMEV=1. This should notifty when your device is in reset loop with %MDMEV: RESET LOOP AT notification after the modem's activation when the blocking starts or is still ongoing.

    Yes we properly detected the reset loop however the reset loop recovery implementation had a implementation error and didn't update the watchdog timers set by the boot loader causing the device to reset before the reset loop timer ticked down.

    We have now added input sanitation so the device shouldn't crash anymore to prevent entering the reset loop restriction and we think we have fixed the reset loop recovery implementation. However we cant test the recovery method because we have difficulties getting the device in this reset loop restriction on the desk. 

    The main point of this thread is that we would like to implement a method to easily and reliably activate the reset loop restriction so we can test if each firmware we release properly recovers from this "worst case scenario that should never happen".

    We want to check that the devices in the field are always able to recover from the restriction, even if the firmware is structured to prevent the restriction from happening.

    Kind regards, 

    s Biezeman

Children
Related