nRF52832 SDK15.3: Flash erased at the end addresses, application runs but bootloader cannot be entered

Hello,

We have a problem that affects a few devices (< 0.1 %) in mass production. The devices are flashed, boot up and pass tests and get shipped to customers. Some customers will later report that the bootloader does not work. The customers are generally familiar with the devices and can DFU other identical devices they have, so we do not think this is a user error. 

When we do warranty return, we can repeat the user problem. When we read back the flash we note that the final addresses (0x6XXXX + ) are filled with "0xFF". The 0xFF does not start at a flash page boundary, but it is possible that the application has been written completely and programming connector would have been disconnected before writing the bootloader partially or completely. 

We do not have records of read back flash files from devices after production, our programming script runs verify step but it's possible that operator does not notice verification error and passes the device onwards. It would also be possible that the flash gets corrupted somehow or MBR/UICR containing the bootloader start address does not get written or they get overwritten later somehow.

To summarize:

  • A few of our devices in field have erased last flash addresses, leaving bootloader unusable while application runs fine.
    • Devices are supposed to enter bootloader with button press. Working devices enter bootloader on button press at boot, faulty devices enter application on button press at boot. Button itself works in application.
    • We have not tested to enter bootloader with the Buttonless DFU Service.
  • We suspect that this might be a problem with our flashing process, programmer is disconnected before flashing is complete and verification step is ignored by operator.
  • Another possibility is that UICR/MBR of firmware start addresses are wrong and Softdevice boots application directly. 
  • Third possibility is that flash gets erased after flashing, but we do not know what could be the mechanism.

Do you have insight on what might be the root cause of our problem and how to avoid it? So far we have considered adding checksum check of the bootloader into application, so the bootloader checks application CRC first and then the application checks the bootloader CRC.

Parents
  • Simple fix: Use mergehex tool to merge the production hex files into a single file, then flash in a single step. Will even be a little faster in production.

    I strongly suspect that the operators noticed parts passing test when the last flash steps where not completed yet, and thus skipped them to save time (and thus money).

    Random erase may be possible (e.g. brownout or software bug), but I would expect that to brick the devices instead of erasing the bootloader completely and with the main app still running just fine.

    Another indication would be FDS/Peer manager data. If FDS does not find where the bootloader is (or at least should be), it will put its data at end-of-flash. Thus one would expect to find some stuff at the very last flash pages that contain FDS/peer manager data.

  • Thank you for your reply.

    We're already using merge hex and flashing one hex file that contains soft device, application, bootloader and bootloader settings page. 

    We will check if there is FDS data in the end, although I would be very curious on why it would affect just a small portion of devices.

  • Hello,

    Please check if your bootloader start address is stored at 0x10001014 or 0xff8 (maybe these are written last?). The bootloader will not be included in the boot sequence if neither of these are set (see Master boot record and SoftDevice reset procedure).

  • Hello, 

    Bootloader start address is at 0xFF8. I confirmed with a working device that 0xFF8 contains 0x00075000 which matches the bootloader staring address. 

    Possibly related, we have devices that are completely erased in the field: devzone.nordicsemi.com/.../nrf52832-sdk15-3-flash-erased-in-field

  • Hello,

    Bootloader start address is at 0xFF8. I confirmed with a working device that 0xFF8 contains 0x00075000 which matches the bootloader staring address. 

    Just to confirm, you read this from a device that had a non-working bootloader and not a fully functional one?

    Possibly related, we have devices that are completely erased in the field: 

    This is strange, the BPROT flash protection enabled by the bootloader on startup should prevent you from erasing the flash. You also won't get direct access to the NVMC while as long as the Softdevice is enabled. I assume the product must be shipped without readback enabled as you are able to read out the memory.

  • Just to confirm, you read this from a device that had a non-working bootloader and not a fully functional one?

    Good to confirm. This is a fully functional one. Regrettably I don't have earlier units with damaged bootloader available to check, and all the recent units I have received have been completely erased. I will update here once I get unit with a broken bootloader on my desk.

    I assume the product must be shipped without readback enabled as you are able to read out the memory.

    Correct, we are not worried about clones / hackers for our use cases. 

  • Here is a screenshot of hex file comparison of one unit which has been randomly erased and bricked - it boots up enough to turn a led on but nothing else. Maybe it is in a reboot loop and led blinks too fast for me to see, I can check with oscilloscope if necessary.

    To the left we have the faulty unit, it suddenly switches to 0xFFFF for some lines. Right has same hex file flashed to another unit. 

    Relevant Intel hex lines are:

    :020000040001F9
    :10A4500092F86530112B03D182F8691082F88E00D2
    :10A46000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFC
    :10AFF000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF61
    :10B000004FF00000F071B071F070B070307170717D

    So if my understanding is correct, addresses 0x0001A460 .... 0x0001AFFF are erased and 0x0001B000 and onwards  is ok.

    The application uses S132 v6.1.1, so this is in middle of the softdevice as far as I can tell. 

    Symptom is different than originally described, but root cause is the same. A portion of flash has gone missing. I can share the full hex files if necessary.

Reply
  • Here is a screenshot of hex file comparison of one unit which has been randomly erased and bricked - it boots up enough to turn a led on but nothing else. Maybe it is in a reboot loop and led blinks too fast for me to see, I can check with oscilloscope if necessary.

    To the left we have the faulty unit, it suddenly switches to 0xFFFF for some lines. Right has same hex file flashed to another unit. 

    Relevant Intel hex lines are:

    :020000040001F9
    :10A4500092F86530112B03D182F8691082F88E00D2
    :10A46000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFC
    :10AFF000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF61
    :10B000004FF00000F071B071F070B070307170717D

    So if my understanding is correct, addresses 0x0001A460 .... 0x0001AFFF are erased and 0x0001B000 and onwards  is ok.

    The application uses S132 v6.1.1, so this is in middle of the softdevice as far as I can tell. 

    Symptom is different than originally described, but root cause is the same. A portion of flash has gone missing. I can share the full hex files if necessary.

Children
  • Hello,

    I verified that ERASEALL operation is blocked when one or more sections are protected with BPROT.

    BPROT protection enabled for Bootloader and MBR regions

    Bootloader main() where BPROT is enabled:

    Can you please verify that these BPROT bits are set by your bootloader as well?

    Otso Jousimaa said:

    So if my understanding is correct, addresses 0x0001A460 .... 0x0001AFFF are erased and 0x0001B000 and onwards  is ok.

    The application uses S132 v6.1.1, so this is in middle of the softdevice as far as I can tell. 

    I still don't understand how the entire flash could have gotten erased. But this particular case would likely have been avoided if the bootloader had correctly enabled write protection for the Softdevice and app in nrf_bootloader_app_start_final().

    This line is currently failing because the size argument is not aligned to a flash page boundary:

        ret_val = nrf_bootloader_flash_protect(0,
                                               nrf_dfu_bank0_start_addr() + s_dfu_settings.bank_0.image_size,
                                               false);

    Fix:

        ret_val = nrf_bootloader_flash_protect(0,
                                               nrf_dfu_bank0_start_addr() + ALIGN_TO_PAGE(s_dfu_settings.bank_0.image_size),
                                               false);

Related