This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Bug in how bootloader treats GPREGRET

I suspect this is related to https://devzone.nordicsemi.com/f/nordic-q-a/36601/gpregret-and-nrf_bootloader-c-evolution-proposal (Case ID: 211047)

In nrf_bootloader_info.h you find this:

#define BOOTLOADER_DFU_GPREGRET_MASK            (0xB0)      /**< Magic pattern written to GPREGRET register to signal between main app and DFU. The 3 lower bits are assumed to be used for signalling purposes.*/
#define BOOTLOADER_DFU_START_BIT_MASK (0x01) /**< Bit mask to signal from main application to enter DFU mode using a buttonless service. */

As I understand the comment, B0 is used as a kind of signature to say "yes, the register content is related to bootloader", and reading the 3 lower bits are used to "signal", bit 0 (lowest bit) is used to signal DFU should start.

I wanted to create my own signal, for which I use bit 1. So at some point, I set it, which gives the register a value of B2.

Now let's have a look at code in dfu_enter_flags_clears:

if (NRF_BL_DFU_ENTER_METHOD_GPREGRET &&
(nrf_power_gpregret_get() & BOOTLOADER_DFU_START))
{
// Clear DFU mark in GPREGRET register.
nrf_power_gpregret_set(nrf_power_gpregret_get() & ~BOOTLOADER_DFU_START);
}

This code will be executed as B2 & B1 != 0... Shouldn't it be replaced with:

if (NRF_BL_DFU_ENTER_METHOD_GPREGRET &&
((nrf_power_gpregret_get() & 0xF8 == BOOTLOADER_DFU_GPREGRET_MASK) && (nrf_power_gpregret_get() & BOOTLOADER_DFU_START_BIT_MASK)))
{
// Clear DFU mark in GPREGRET register.
nrf_power_gpregret_set(nrf_power_gpregret_get() & ~BOOTLOADER_DFU_START);
}

Now let's have a look at this code from dfu_enter_check:

if (NRF_BL_DFU_ENTER_METHOD_GPREGRET &&
(nrf_power_gpregret_get() & BOOTLOADER_DFU_START))
{
NRF_LOG_DEBUG("DFU mode requested via GPREGRET.");
return true;
}

When GPRGRET == 0xB2, this code will get executed, which, as far as I understand, contradicts the documentation and the way it was intended to work (as BOOTLOADER_DFU_START_BIT_MASK is here unimportant, replacing BOOTLOADER_DFU_START with BOOTLOADER_DFU_GPREGRET_MASK would do the same job, while clearly ignoring the "signal flags", and then what's the point of having signal flags?)

So I guess I'm expecting 2 things:
- Confirmation that I understood properly the way "signal bits" were intended to be used.
- That case ID 211047 would not be regarded as a possible evolution, but rather as a bugfix, as the bootloader does not behave according to the way "described" by the documentation.

2.5.0.0
Parents
  • Hi,

    I am sorry for the delay. Thank you very much, bug reports of this nature are always appreciated!

    Looking into the matter I agree this is very likely to be a bug. Just as you write, the comments on the defines implies that there is a five bit magic pattern to signal it is a message from a buttonless DFU service to the DFU bootloader, and a three bit signal. But in dfu_enter_check, setting any of bits 7, 5, 4 or 0 will trigger DFU mode. This must be wrong.

    I have updated the feature request that was registered based on the other thread, and it is now a bug report including your observations and my own. Hopefully it will be fixed before too long.

    Regards,
    Terje

  • This is a SERIOUS Problem! We've just stumbled on this by ourselves and you confirmed it here. To be honest, this kind of destroys my trust in the bootloader and the SDK because it seems like a basic lack of understanding binary operators.

    Also, I expect that you do at least some sort of testing of such features and if you do, there is a major part missing:

    In this case I suspect the devs checked that the correct magic byte indeed does trigger the DFU, and maybe even if 0x00 doesn't trigger it. But apparently nobody ever checked if any other byte could trigger this as well! I can only hope that Nordic recognizes this problem and adds something to the standard testing definitions to also check for "false positives" not only true positives and false negatives!

  • Hi,

    Thank you for the feedback! Agreed, and noted. Your points have been forwarded to the SDK team. We do take testing seriously, and any feedback, also on test coverage, is appreciated.

    Regards,
    Terje

  • Hi,

    Yes. This was fixed prior to nRF5 SDK v15.3.0. It is not an issue for v15.3.0 and newer.

    Regards,
    Terje

  • We are using SDKv15.3.0 and I found this error accidentally. 

    We wanted to put a value in GPREGRET that the bootloader would ignore so the application would run and read the GPREGRET register and restore an operating mode that was active before reset.

    We found that almost any combination of bits GPREGRET caused the device to enter boot-mode after reset, but could only exit after a power off/on. I found the fault due to the bitmask operation, and corrected it as described above.

    If we had deployed this code on on devices in the field it would have been impossible to cause them to exit boot-loader mode once the value in GPREGET doesn't have the bits set according to the bitmask.  We deploy hundreds of devices in remote sites.  This would have been a major problem for us.

    This is a typical case where it works correctly when tested in the way it's supposed to work.  Clearly this was never tested with combinations of bits other that what is specified to get into and out of boot-loader mode.

    This is such an amateurish error.  

    If the boot-loader source code wasn't available, we would not have been able to fix it

Reply
  • We are using SDKv15.3.0 and I found this error accidentally. 

    We wanted to put a value in GPREGRET that the bootloader would ignore so the application would run and read the GPREGRET register and restore an operating mode that was active before reset.

    We found that almost any combination of bits GPREGRET caused the device to enter boot-mode after reset, but could only exit after a power off/on. I found the fault due to the bitmask operation, and corrected it as described above.

    If we had deployed this code on on devices in the field it would have been impossible to cause them to exit boot-loader mode once the value in GPREGET doesn't have the bits set according to the bitmask.  We deploy hundreds of devices in remote sites.  This would have been a major problem for us.

    This is a typical case where it works correctly when tested in the way it's supposed to work.  Clearly this was never tested with combinations of bits other that what is specified to get into and out of boot-loader mode.

    This is such an amateurish error.  

    If the boot-loader source code wasn't available, we would not have been able to fix it

Children
Related