Filesystem Trouble - Converting FAT_FS to NVS partitions.

Hello Folks,

I've got a thorny problem that I've spent an entire day minimizing before even trying to post about it here.

Before diving in, however, here are my environment details.

===================

Board:

MAIN TARGET: xiao_ble_sense

PROTOTYPE TARGET: nrf52840dk_nrf52840

Toolchain Version:

nRF Connect SDK v2.5.0

Host Operating System:

Win 11 (May prove relevant)

===================

Context:

I am working on a system for configuring my device's settings (stored as JSON in the device's persistent memory). A previous iteration used a USB-mounted ExFat FileSystem.

The current iteration uses Bluetooth, and that part of the project is working as intended.

I'm running into trouble configuring NVS partitions in regions that used to be FAT_FS.

Problem Statement:

Partitions which were previously part of an ExFat Filesystem can't be set up as NVS partitions in the specific case when the application is threaded.

Specifics:

Consider the following Three applications (yes, really), which serve similar purposes, but with slightly different mechanics

  • mass
    • This application sets up a USB Mass Storage Device on the external flash region of the target device
    • based on zephyr/samples/subsys/usb/mass
  • nvs
    • This application sets up a custom nvs partition on the external flash of the target device, reads the current contents, then writes a fixed string back into that partition
    • based on zephyr/samples/subsys/nvs
  • nvs_aem
    • This application does the exact same thing as nvs (above)
    • based on zephyr/samples/subsys/nvs but with app_event_manager structuring added (module, event).
    • The difference: the methods to setup_flash and perform NVS read&write operations are called from a module which reacts to the app main setting state to ready.

A completely new board (either dev kit or xiao) will exhibit the following behavior when programmed with one of these applications:

EITHER OF: {nvs, nvs_aem}

An NVS partition will be established at the specified location, and the app behaves exactly as expected (specified above).

On first boot, read fails (nothing is in the storage)

On resets, the config data is loaded, and persists across Power Cycles

mass

A filesystem is created and mounted in the external flash. It appears in USB device lists.

When attempting to connect to the computer, it will specify "you need to format the disk before you can use it"

If you allow the Host OS to format the flash region, you will have a working filesystem which persists across boots and power downs.

If you disallow the Host OS to format the flash region, you are unable to access the region via USB.

Here's where things get interesting!

If you take a device that is programmed with mass, and you program it with one of the other applications, you can observe the following:

mass into nvs:

On FIRST BOOT:

Flash setup succeeds.

Read fails (errno 2: ENOENT). This makes sense, since NVS storage is different than file-system, so hasn't been given the right metadata.

Write succeeds new data into the NVS (As expected)

On SUBSEQUENT BOOTS:

Flash setup succeeds

Read (from NVS) succeeds

Write (from NVS) succeeeds)

WORKS AS EXPECTED

mass into nvs_aem:

On FIRST AND SUBSEQUENT BOOTS:

Flash Setup fails.

(errno 13: EACCES) Permission Denied. This is where I'm stuck

My Request for Assistance:

  • I need a way to recover my boards from whatever it is mass did to them. I'd like my DevKit to be functional with nvs and nvs_aem.
    • I need to be able to update the devices I have under test, but it won't be possible since they all have a USB Ex_Fat filesystem occupying their full external flash region.
  • I would also like to know why nvs is able to ignore what mass did, while nvs_aem cannot.

The Applications Themselves: WARNING! The mass app may damage your board. (I don't know how to recover what it did to my boards)

Building The Apps:

All three of the attached apps follow an identical structure

/boards (overlay files)

/scripts (WEST BUILD COMMANDS WITHIN BASH SCRIPTS)

/src

prj.conf (Shared between xiao and dk builds)

My recommendation for building each app is to open the app root directory in an nRF Connect terminal, and run the associated bash script for your target board (xiao for xiao_ble_sense, dk for nrf52840dk_nrf52840).

WARNING! The "mass" app may be harmful to your board's flash storage, and I don't know how to recover from this state.

mass_app.zip 

nvs_aem_app.zip 

nvs_app.zip

Thank you very much for any advice you can provide!

Wishing you all the best,

    - Finn

Parents
  • Sorry, did not see that it was nRF52 related so there is not TF-M (hardware security).
    Can you do the same test with internal flash and see if you get the same issue, Want to isolate the problem is within the storage side of the solution or the transport interface to the external memory. If you are able to replicate the issue with the internal flash aswell, then we can keep aside the transport interface issues and focus on configuration issues.

Reply
  • Sorry, did not see that it was nRF52 related so there is not TF-M (hardware security).
    Can you do the same test with internal flash and see if you get the same issue, Want to isolate the problem is within the storage side of the solution or the transport interface to the external memory. If you are able to replicate the issue with the internal flash aswell, then we can keep aside the transport interface issues and focus on configuration issues.

Children
  • Hi Susheel, bad news. (Or maybe good, since it's simplifying).

    I made a mistake in my nvs that I only just caught now!

    It was using "storage_partition" on internal flash instead of "custom_nvs_partition" on external flash.

    I still have a question about how to recover devices/move forward, but that's a pretty clear answer as to why NVS was working but NVS_AEM was not.

    Whoops!

    At least going forward I can test with only two apps (mass and nvs).

    I'll send another update after this one with updated apps and a writeup of my findings.

    Best,

        - Finn

  • Hi Susheel,

    Based on testing, I agree that we can eliminate the transport interface.

    Out of an abundance of caution I tested both nvs and nvs_aem on internal flash after mass on internal flash. Their behavior is now identical (as expected), which makes nvs_aem redundant.

    -----------------------------------------------------------------

    Testing on Xiao Internal Storage:

    Testing mass --> NVS:

    The problem still exists with exactly the same error output as before: fs_nvs throws an error "NVS not initialized"

    Interestingly, if I use a JLink Probe and erase the internal flash of the Xiao (and reprogram with seeed's provided bootloader), I am able to reuse the partition.

    It seems that an erase operation is good enough to repair the flash partition.

    The question, then, becomes "How can we command the p25q16h external flash device to perform an erase?"

    Testing on Dev Kit Internal Storage

    Testing mass:

    On First boot: fs_mount_error (-5)

    When trying to access the filesystem from Host Computer's filesystem: "You need to format the disk in drive E: before you can use it"

    On subsequent boots:

    If NOT formatted (identical to prior)

    If formatted:

    *** Booting nRF Connect SDK v2.5.0 ***
    [00:00:00.249,847] <inf> flashdisk: Initialize device NAND
    [00:00:00.249,847] <inf> flashdisk: offset f8000, sector size 512, page size 4096, volume size 32768
    Area 0 at 0xf8000 on flash-controller@4001e000 for 32768 bytes
    [00:00:00.260,162] <inf> flashdisk: Initialize device NAND
    [00:00:00.260,192] <inf> flashdisk: offset f8000, sector size 512, page size 4096, volume size 32768
    [00:00:00.260,284] <err> flashdisk: sector start 778135908 count 1 outside partition boundary
    [00:00:00.260,314] <err> flashdisk: sector start 168689522 count 1 outside partition boundary
    [00:00:00.260,345] <err> flashdisk: sector start 1869881465 count 1 outside partition boundary
    [00:00:00.260,406] <inf> flashdisk: Initialize device NAND
    [00:00:00.260,406] <inf> flashdisk: offset f8000, sector size 512, page size 4096, volume size 32768
    [00:00:00.260,467] <err> fs: fs mount error (-5)
    [00:00:00.260,467] <err> main: Failed to mount filesystem
    [00:00:00.270,721] <inf> main: The device is put in USB mass storage mode.
    

    Testing mass --> nvs:

    Surprisingly, this works!

    Some sub-process within flashing the application must recover the (internal) flash.

    My coworker points out (in reviewing this post before I send it) that this is likely specific to dev kit flashing behavior on the grounds that uploading mass (even with no changes to code or configuration) requires another format of the usb drive.

    ---------------------------------------

    Useful information from additional testing (Debug Trace of DevKit, External Storage, mass --> nvs)

    I was able to pinpoint where the exact problem occurs (on Dev Kit, when using external flash).

    • file
      • v2.5.0/zephyr/subsys/nvs/fs/nvs/nvs.c
    • line:
      • 786 (is the conditional evaluated that eventually returns an EDEADLK)
    • Call Stack (Recent at top)
      • nvs_startup
      • setup_flash
      • main

    It seems that each sector that is checked appears "closed," which is accompanied by the in-code comment that "all sectors are closed, this is not an nvs fs."

    This raises the questions:

    • How does a sector get marked as closed
    • How do I reset the sectors so that they are no longer closed?

    Trying to find answers here led me down a research rabbit hole, that I think has given me a better understanding.

    • To check my understanding: Is it true that NVS FS is a specific case of Zephyr Memory Storage (ZMS)?
    • If that IS the case, then I believe the following quote from the above linked page IDs my problem, but doesn't help me figure out how to resolve it
      • Additionally, for each sector we store at the last positions Header-ATEs which are ATEs that are needed for the sector to describe its status (closed, open) and the current version of ZMS.
    • My impression based on these facts is this:
      • formatting the FS with the host operating system is breaking the ZMS header ATEs (which I learned only today stood for "Allocation Table Entries")
      • Something Windows writes during formatting is interpreted by the subsystem that the sectors (which should just be garbage) are actually closed and protected.
      • This raises ANOTHER question.
        • Why doesn't the "sector-closing" property of Windows formatting affect the dev kit's internal flash?

    I'd like to call out that there's a possibility this whole line of reasoning is a red herring. I have no guarantee that the problem experienced by DevKit on external flash (and not on internal storage?) is the same as the problem experienced by Xiao on internal and external storage.

    ----------------------------------------------------

    So based on your advice and the results of today's testing, here is my updated question list

    • How can we command the p25q16h external flash device to perform an erase?
    • Is the NVS_FS that we are using an example that is accurately described by "ZMS?"
      • If that is true, then (following questions)
      • Why are formatted sections interpreted as closed?
      • What can I do about that?
      • Why doesn't the dev-kit internal storage experience the same problem?
        • This has a reasonable answer in my coworker's suggestion, that EVERY flash to the devKit erases the contents (or resets them somehow) of the storage partition.
    • What advice do you have to advance the project.

    ---------------------------------------------------

    Here are the update versions of the specific apps.

    They use the same project structuring and build commands (scripts/xiao.sh, dk.sh) as the prior included applications. These, however use the internal partition "storage_partition," which are configured in the pre-defined dts for each board. (xiao_ble_common.dtsi: line 164 AND nrf52840dk_nrf52840.dts: line 291)

    And I'll reiterate in case someone just blindly grabs these files without reading the post.

    WARNING: MASS_INTERNAL MIGHT DAMAGE YOUR BOARD'S INTERNAL FLASH PARTITION (Dev Kit seems okay, but recovering Xiao BLE Sense requires JLink to my current understanding)

    mass_internal.zip

    7723.nvs_internal.zip

    Thank you very much for your time!

    Best,

        - Finn

Related