Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs

nRF52840 FATFS file corruption

I am using the nRF52840 (with SDKv17.0.2 + s140_7.2.0) and made a custom FW based on the ble_app_hrs example. My custom HW has a 128MB MX66U1G45G flash memory chip.

The system can stream notification data indefinitely without any issue. The notification packets are 242B long (max MTU is 245B) and the data is sequential 3B integer values starting at the 3rd byte. The first two bytes are a packet counter (0-255) and a packet ID (used by SW). Thus the first packet would be [00 0b 00 00 00 00 00 01 00 00 02 00 00 03 00 00 04 00 00 05 … 00 00 4f] and the second packet would be [01 0b 00 00 50 00 00 51 00 00 52 00 00 53 00 00 54 00 00 55 … 00 00 9f]. The data rate is ~60kBps (100 3B integers are queued every 5ms using a timer), and these are streamed over BLE as soon as 242B (80 3B integers plus two opcode bytes) are ready.

I am also trying to store these 242B packets to flash memory as a file via FATFS so that it can be downloaded by Windows via USB MSC. As the 242B are generated and queued for BLE transmission, they are also stored in a ping-pong type buffer. When 4096B of data is accumulated, f_write() is called. The process continues until a stop command is issued or the flash memory becomes full. When this flash logging is enabled, USB MSC is disabled, and when flash logging stops, the system becomes a USB MSC device.

This works sometimes, but unfortunately, I am not getting a way to get good (or bad) files reliably/repeatedly and thus am unable to describe the symptoms clearly. For example, in one scenario, I quick format (either by Windows or f_mkfs().Then I create and write to file0. After closing and checking the contents of file0, some of the integers may be repeated, thus causing error. Then I create and write to file1 without a format. Closing and checking the contents of file1 shows that it is okay. Then if I recreate file0, it checks out as fine, but rechecking file1 shows that it had been touched and some of the integers are again repeated, even though I had done nothing with this file. The I go back and recreate file1, and it checks out as good, whereas file0 has now become bad again. After repeating this for a while, Windows might eventually pop up a message saying that the drive needs to be scanned and repaired.

Another scenario is this: after formatting, I create and check a single file and it shows the errors. Then I recreate and check the file and it's okay. But this is seemingly random, as multiple runs can give a good/bad file. If I let the flash memory get full (the FW closes the file on full), then Windows says that the disc is bad and unrepairable. I then recreate the file and check it multiple times and it shows the file is good or bad on different runs (after the first re-creation, Windows has to scan/repair the drive).  

Slowing down the data rate does not help. Any ideas on the issue or advise on debugging this would be greatly appreciated!

Parents
  • Hi there,

    I'll be back next week with some more info.

    We're currently short on staff due to the summer vacation.

    Thank you for your patience.

    regards

    Jared 

  • Hi Jared, thanks for the follow up. I'd like to focus on the format issue, where the file is always bad after the format. My application only needs one file up to the maximum amount of flash memory, so f_mkfs() will always be called before the file creation and subsequent file writes. Thanks!

  • Hi Jared, it looks like it's the NRF_BLOCK_DEV_QSPI_FLAG_CACHE_WRITEBACK issue per this link as you've noted above. I did see the link before opening this ticket, but the link says the issue was fixed in SDK v17.0.2. I just took a look at the code in nrf_block_dev_qspi.c and it does not match the fix in that link. So I replaced NRF_BLOCK_DEV_QSPI_FLAG_CACHE_WRITEBACK by 0 in NRF_BLOCK_DEV_QSPI_DEFINE(...) and all seems okay, but I am still testing.

    It's not clear to me what NRF_BLOCK_DEV_QSPI_FLAG_CACHE_WRITEBACK does and what are the drawbacks of not using it. Kindly clarify that.. Also kindly confirm that the fix has not been implemented in SDK v17.0.2. If it indeed has not been implemented, kindly post a good nrf_block_dev_qspi.c that I can use. I am afraid to hack the file and break something else. Thanks!

  • Hi,

    The fix has been included in the latest SDK v17.1.0. Attached is the nrf_block_dev_qspi.h file from SDK v17.1.0.

    Tosa said:
    It's not clear to me what NRF_BLOCK_DEV_QSPI_FLAG_CACHE_WRITEBACK does and what are the drawbacks of not using it.

    The driver uses a cache to make flash accesses with QSPI faster. NRF_BLOCK_DEV_QSPI_FLAG_CACHE_WRITEBACK  will enable this cache. The drawback of not using the cases is less efficient flash access. The problem is that there were a bug that didn't handle all possible states when using the cache resulting in corruption.

    regards

    Jared 

    4188.nrf_block_dev_qspi.7z

  • Hi Jared, the fix seems to work but I am still testing. I will confirm the answer once testing is complete.

    Another thing is that I don't see the fix for this issue in your file. I didn't modify your file but the STA_NOINIT issue doesn't seem to occur so far in my testing. Was that resolved in SDK v17.0.2 in some other file? I had posted that issue when I started this project with SDK v17.0.0 and retained the modification when I had updated to SDK v17.0.2 and didn't test for this issue without the modification, so perhaps I didn't need the modification after the SDK update. Would it hurt to modify your file with the fix? Or can you confirm that it has been fixed (where is the fix?) and I don't need the modification? Thanks!

  • Hi Jared, could you kindly confirm if the STA_NOINIT issue has been resolved or not in SDK v17.0.2? Thanks!

  • Hi,

    The fix was implemented in SDK v17.1.0.

    SDK v17.2.0 does not have the fix and is affected by that bug.

    regards
    Jared 

Reply Children
  • Hi Jared, I am confused. Neither SDK v17.0.2's nor SDK v17.1.0's nrf_block_dev_qspi.c have 

    if ((p_work->state != NRF_BLOCK_DEV_QSPI_STATE_IDLE)) wait_for_idle(p_qspi_dev);
     as suggested by this link. However, I've not seen the block_dev_qspi_uninit() issue without that piece of code when using SDK v17.0.2. I created that link's ticket when I was using SDK v17.0.0, and that piece of code resolved the issue. When I first started with SDK v17.0.2, I didn't see that piece of code and thus implemented it without testing to see if it was needed. But after receiving and comparing SDK v17.1.0's nrf_block_dev_qspi.c which also does not have that piece of code, I decided to test without the modification and I did not see the issue, leading me to believe that the fix was implemented somewhere in SDK v17.0.2, but it is a different modification that fixes the issue.

    Separately (the main topic of this thread), SDK v17.1.0's nrf_block_dev_qspi.c does fix the NRF_BLOCK_DEV_QSPI_FLAG_CACHE_WRITEBACK issue.

    So my question now is: can I use SDK v17.0.2 with an unmodified version of SDK v17.1.0's nrf_block_dev_qspi.c and both issues will be non-existent, or do I really need to fully update to SDK v17.1.0? Thanks! 

  • Hi,

    So I should have been more precise in my last reply. I haven't seen cases with SDK v17.0.2 with this bug, but I assume that it was also affected as the patch was implemented in SDK v17.1.0.

    The patch is a bit different from what was suggested in that thread, but should result in the same, namely fix the issue where the file system fails to initialize. The fix was implemented in app_usbd_msc.c:

            case APP_USBD_EVT_STOPPED:
            {
                /*Un-initialize all block devices*/
                ASSERT(p_msc->specific.inst.block_devs_count <= 16);
                size_t i;
    
                for (i = 0; i < p_msc->specific.inst.block_devs_count; ++i)
                {
                    nrf_block_dev_t const * p_blk_dev = p_msc->specific.inst.pp_block_devs[i];
                    ret = nrf_blk_dev_uninit(p_blk_dev);
                    uint32_t timeout_ms = 250;
                    while (ret == NRF_ERROR_BUSY && timeout_ms--)
                    {
                        nrf_delay_ms(1);
                        ret = nrf_blk_dev_uninit(p_blk_dev);
                    }
    
                    if (ret == NRF_SUCCESS)
                    {
                        p_msc_ctx->blk_dev_init_mask &= ~(1u << i);
                    }
                }
    
                break;
            }

    I would thus recommend to update to SDK v17.1.0

    regards

    Jared 

Related