This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Internal flash corruption in MBR_PARAM_ADDR flash sector

Hello devzone members,

We are producing a device based on nrf52840 that uses MBR + bootloader and an app with SDK 15.3.0.

We have set up the MBR with MBR_BOOTLOADER_ADDR at the address of the bootloader in flash and MBR_PARAM_ADDR the address of the page just before. 

I'm working on a bug that locks devices in MBR because of a corruption of internal flash. 

The corrupted page is the page pointed by MBR_PARAM_ADDR, and it appears that the MBR tries to read a data there with a wrong value which locks the device.

We don't use any SD_MBR_COMMAND function that require this page in flash so we'll remove them in future deployment which will resolve the lock bug. 

BUT I need to understand why the memory was corrupted in the first place, I reviewed the code and did multiple tests and my setup won't reproduce the corruption. We use only NVMC write functions in the app and only on firmware release.

It could be that  the app corrupts the memory randomly, and only when it is in the MBR page we realize it because it locks the device, but I don't think so because it would have broke the app in other ways  if it was the case.

My guess is that something MBR related could corrupt the memory because we used the MBR_PARAM_ADDR  which definetely access the flash memory even if no  SD_MBR_COMMAND is use. Could you give me your review about this ? 

Thanks in advance,

Regards,

Aloïs KYROU

Parents
  • Hi,

    1) How many devices did the issue occur on? 1 or multiple?

    2) Is this product returned from the field/customers?

    3) Are you able to reproduce the issue?

    4)

    The corrupted page is the page pointed by MBR_PARAM_ADDR,

    How do you determine that? What is the content of that page now? and what was it before?

    PS:

    and an app with SDK 15.3.0.

    There are known bugs in FDS in nRF5 SDK 15.3 that could cause file corruption, that have been fixed in recent SDK versions. E.g. from the release notes in SDK 16:

    - FDS: fixed two bugs where a power loss at very specific times during garbage
      collection could corrupt the file system, making FDS unable to initialize and return
      FDS_ERR_NO_PAGES on initialization.

    But these FDS bugs does not corrupt the Bootloader, only the FDS file system.

    BR,

    Sigurd

  • Hello Sigurd and thanks for your answer,

    I was away for a bit so sorry for the late answer. 

    1) The issue occured on multiple devices.

    2) Yes

    3) At this date and on multiple devices running demo test trying to reproduce, I still have not.

    4) I determinate that by looking at a dump of the memory of the internal flash. There are data written in the address pointed by MBR_PARAM_ADDR (in my case 0xFB00) but is shouldn't; plus the data is different in each bugged devices. 

    I can't upload files so is is hard to show the content of the flash properly but it is memory in reset state with data written at some places:


    ชV~ใชpm  <G                 )  3  ำฏ=Œ @Œ @ำฏ=3  )  


    In working devices the content of this page is only memory in reset state. 

    PS) I'm not using FDS in the app, the is only calls to "nrf_nvmc_write_byte(s)" so changing SDK  shouldn't affect this bug, should it ? 

    BR,

    Aloïs

  • Hi,

    1)

    AKYR said:
    MBR_PARAM_ADDR (in my case 0xFB00)

    You mean 0xFB000 ?

    2) Have you done any DFU of the bootloader itself?

    3) Can you use nrfjrpog --memrd to read the data at MBR_PARAM_ADDR ?

    4)

    We use only NVMC write functions in the app and only on firmware release.

    Are you writing to MBR_PARAM_ADDR with NVMC functions?

  • Hello,

    1) Yes sorry.

    2) we don't use DFU or any MBR_SD command, but we provided MBR_PARAM_ADDR  and a free page in case we decide to use it in the future. We now plan to remove it but we need to ensure it is the cause of the corruption before, or we just dodge the problem to find it in another form later.

    3) bin-host/nrfjprog/nrfjprog.exe --memrd 0xFFC
    0x00000FFC: 000FB000 |....|

    bin-host/nrfjprog/nrfjprog.exe --memrd 0xfb000
    0x000FB000: E37E56AA |.V~.|

    4) No MBR_PARAM_ADDR  is written at link time.

    Thanks for your answer !

  • Hi,
    1)

    AKYR said:
    bin-host/nrfjprog/nrfjprog.exe --memrd 0xfb000
    0x000FB000: E37E56AA |.V~.|

    Could you read some more bytes that shows the content here? It would be interesting to see some more bytes.

    Try nrfjprog.exe --memrd 0xfb000 --n 4096

    2)

    I'm working on a bug that locks devices in MBR because of a corruption of internal flash. 

    This seems to be from the hex file you put in nRF Connect programmer app. Could you post a screenshot from how it looks like when you read it from the flash?

    And the address range for the different regions?
    Some like this:

    (make sure to use the latest version of the programmer app)


    BR,

    Sigurd

  • Hi !

    1) ../bin-host/nrfjprog/nrfjprog.exe --memrd 0xfb000 --n 4096

    0x000FB000: E37E56AA FFFFFFAA 20006D70 2000473C |.V~.....pm. <G. |
    0x000FB010: 00000000 00000001 00000000 00000000 |................|
    0x000FB020: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB030: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB040: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB050: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB060: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB070: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB080: 00000729 00000733 3DAF01D3 0840008C |)...3......=..@.|
    0x000FB090: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0A0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0B0: 0840008C 3DAF01D3 00000733 00000729 |..@....=3...)...|
    0x000FB0C0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0D0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0E0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|

    ...

    ...

    0x000FBFC0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FBFD0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FBFE0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FBFF0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|

    2) I can't use the insert image function on the post so here is an imgur link:

    https://imgur.com/a/thl0N1f

    Not  that it should matter but the top of the app has 3 more small splits of memory:

    6299 bytes from 0xA63B9 to 0xA7C53

    5955 bytes from 0xA7C95 to 0xA93D7

    8 bytes bytes from 0xA9419 to 0xA9420

    I don't know why but the new programmer app seems to splits a lot more the different memory chunks.

    Best regards,

    Aloïs

Reply
  • Hi !

    1) ../bin-host/nrfjprog/nrfjprog.exe --memrd 0xfb000 --n 4096

    0x000FB000: E37E56AA FFFFFFAA 20006D70 2000473C |.V~.....pm. <G. |
    0x000FB010: 00000000 00000001 00000000 00000000 |................|
    0x000FB020: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB030: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB040: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB050: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB060: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB070: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB080: 00000729 00000733 3DAF01D3 0840008C |)...3......=..@.|
    0x000FB090: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0A0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0B0: 0840008C 3DAF01D3 00000733 00000729 |..@....=3...)...|
    0x000FB0C0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0D0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FB0E0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|

    ...

    ...

    0x000FBFC0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FBFD0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FBFE0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|
    0x000FBFF0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |................|

    2) I can't use the insert image function on the post so here is an imgur link:

    https://imgur.com/a/thl0N1f

    Not  that it should matter but the top of the app has 3 more small splits of memory:

    6299 bytes from 0xA63B9 to 0xA7C53

    5955 bytes from 0xA7C95 to 0xA93D7

    8 bytes bytes from 0xA9419 to 0xA9420

    I don't know why but the new programmer app seems to splits a lot more the different memory chunks.

    Best regards,

    Aloïs

Children
  • Hi,

    The data does not seem to be completely random, but it's not e.g. a valid settings page either. If you don't do any DFU, not using e.g. FDS, or not writing to this address with nvmc functions, then it's not easy to say why this data have been written.

Related