FDS CRC errors

We have a firmware using FDS with #define FDS_CRC_CHECK_ON_READ 1 and #define FDS_CRC_CHECK_ON_WRITE 1.

In my testing it works fine, but for another engineer in a different location they are seeing a lot of CRC errors (0x860C) with multiple devices, and eventually the flash seems to get full. What can cause CRC errors? I suspected low voltage supply, but we haven't confirmed or rejected that theory yet.

  • Another thing to note, the same engineer seems to have trouble flashing with nrfjprog on windows (I am using Mac). It doesn't seem to work reliably. It seems to complete, but then on first boot is able to read values via FDS, when the full flash should have been erased (nrfjprog --chiperase --reset --program <file>)

  • Hi,

    Do you see this flash corruption issue on just a single board, or on multiple boards? Do you know if this has been used in some way that could have weared out the flash? (Let's say stuck in a loop over the weekend with some test firmware that writes and erase flash back-to-back)? Another relevant point is the supply voltage. It is not unheard of that a power supply issue could cause flash corruption. As long as the supply voltage is above the minimum (1.7 V), there should be no problem. But it drops further that will cause problems. Normally CRC checks on write and read should handle it so you don't loose or operate on corrupt data, though.

    There is a potential issue during garbage collection, when the page tag's are updated. So I would recommend to not do garbage collection during startup, as that could lead to issues if the device gets stuck in a loop of brown-out reset, boot, start of garbage collection, leading to a brownout reset, etc if the battery is depleeted to a point where it is not able to supply the minimum required supply voltage.

    Also, I am not aware of any issue where nrfjprog does not work reliably on Windows. Can you say more about this?

  • We have seen it on multiple boards, including new boards. We are using a DC power supply for testing and we believe the supply voltage to the board should be ok. We are regularly filling the flash and needing to perform garbage collection, but not specifically on startup, just when the flash is full.

    I'm not sure what else to say about nrfjprog, the command completes successfully but doesn't seem to erase the board completely. Using JFlash seems to be more reliable for the other engineer.

  • It is so that you never see this issue while your colleague see it repeatedly, right? Are there any differences in your setup? Both with regards to HW, but also importantly, the firmware? Are you able to reproduce the corruption issue if you use the exact firmware as him and test in the exact same way (so that you could potentially trigger the same bug)?

    Also, do you see CRC errors on write or read (or both)? If you see them on write, that could indicate an issue with the source buffer, for instance that it is modified before the flash operation has completed. But if it is successfully written with data intact, and is subsequently corrupted, that would be something else.

    Have you inspected the corrupt data to see if there is a pattern there?

    Regarding the nrfjprog issues it could be that it is related, but it could also be independent issues. Is it so that it always work with JFlash, or is it a perceived difference in how often the issue occurs? Does it help to for instance connect the USB cable from the debugger directly to the computer and not via a USB hub, or use a different computer?

  • My colleague shipped a problem unit to me, and I was able to reproduce the issue. The issue can be reproduced even after a complete erase and reprogram. So the problem is hardware bound. This presumably narrows the cause down to

    • A faulty batch of Nordic modules
    • A PCB manufacturing defect
    • Something that happened after manufacture that damaged it.

    Can you suggest anything we can try? Would it be possible to send Nordic a unit for further analysis?

Related