Zephyr corrupted settings file

Hi,


I'm using NCS 1.8.0 on a nrf52840.


I've got what seems to be a corrupted external flash Settings file. The issue doesn't seem reproducible yet, but as it happened one time, I'm investigating how to handle it. The device can't save any dataset that I normally pass through `otDatasetSetActive()`. When I call `otDatasetGetActive()` just after it, it returns an OT_ERROR_NOT_FOUND. So it never attaches to the network, because it's using a blank Network Key when openthread `Restore()` the data from Settings.


I've read the settings file with `fs shell` command and its size is more than 5kB, although CONFIG_SETTINGS_FS_MAX_LINES=32.
My app is not touching the settings directly. I suppose openthread is the only one doing that.

Some more info about this problem:

  • The write to the settings file by openthread doesn't return any error
  • I check the file with the fs shell and it's increasing its size every reboot. It is normally around 500 bytes, because of 32 max lines, but it's now 5kB.
  • The current settings file has openthread configs in the middle of it. Not sure, but they seem correct.
  • The first part of the file is corrupted, though, and its deserialization will always be broken (the length of the first "setting" indicated by the first 2-bytes of the file is wrong). So, any try to read a setting will come back empty (key not found), as if the file has 0 bytes (cf_lines is 0 actually, even if the file has 5kB). It explains why the file keeps growing in this case: the appending write is working, but the maximum lines logic that triggers compaction is not being triggered.
  • From what I could catch, any delete Settings API, try to locate a Setting, that will never be found, before deleting it. So, from the above points, calling these deletion APIs would not solve the problem.

In case it's required, here is the beginning of the settings file read with shell (the file is not completely provided, as it has sensible data from the middle to the end):

```

rtt:~$ fs read mnt/sseettttiinnggss//rruunn

File size: 5230
00000000 40 04 02 00 00 00 00 00 00 00 90 2A 22 80 83 20 @..........*"..
00000010 00 06 00 00 24 30 40 00 18 12 04 52 05 24 69 42 [email protected].$iB
00000020 01 21 02 02 18 45 91 0E B1 65 2A 50 E3 FF 97 FF .!...E...e*P....
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000040 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000050 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000060 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000070 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000080 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000090 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
000000A0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
000000B0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
000000C0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
000000D0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
000000E0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
000000F0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000100 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000110 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000120 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000130 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
00000140 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 ................
00000150 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 ................
00000160 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 ................
00000170 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 ................
00000180 38 37 30 61 66 36 3D 3C 00 6F 74 2F 31 2F 37 37 870af6=<.ot/1/77
00000190 34 34 61 36 30 65 3D 00 03 00 00 0F 02 08 50 63 44a60e=.......Pc
(...)
00000430 74 2F 31 2F 30 30 34 37 32 65 64 37 3D 00 03 00 t/1/00472ed7=...
00000440 00 0F 02 08 50 63 8D BF A7 DD 85 1C 05 10 32 6C ....Pc........2l
00000450 (...)

```


I'm trying a robust way to work around this, in case the device finds itself in this situation in production, because it seems to be an irrecoverable state. I've made a backup of the external flash and tested that wiping the partition solves the problem.

What would be the best practice to handle this at runtime? Is there a verified workaround know by Nordic for this situation?

Best,

Rodrigo

Parents
  • Hi,

    Do I understand your ticket correctly, that you are using external flash to store the settings? How have you configured this external flash, and the storage partition used?

    I asked our Thread developers if they had seen similar issues before, but they have not. They also do not use external flash in their testing.

    Best regards,
    Jørgen

  • Hi, thanks for the response.

    Do I understand your ticket correctly, that you are using external flash to store the settings?

    Yes, I'm using an external flash with QSPI.

    How have you configured this external flash, and the storage partition used?

    I have followed similar configurations as nrf52840dk_nrf52840 DTS for the flash. About the storage, I've used a similar code to samples/subsys/fs/littlefs, where I basically have:

    FS_LITTLEFS_DECLARE_DEFAULT_CONFIG(storage);
    static struct fs_mount_t lfs_storage_mnt = {
    .type = FS_LITTLEFS,
    .fs_data = &storage,
    .storage_dev = (void *)FLASH_AREA_ID(storage),
    .mnt_point = "/mnt",
    };
    

    and then I mount it early on main:

    rc = fs_mount(&lfs_storage_mnt);

    For the settings I have:

    CONFIG_SETTINGS_FS_DIR="/mnt/settings"
    CONFIG_SETTINGS_FS_FILE="/mnt/settings/run"

    I'm not using automount (FSTAB), as I'm using partition manager and NCS support for it only came after NCS 1.8.0. However, my OPENTHREAD_MANUAL_START is enabled, as it's the default for NCS now.

    Thank you,

    Best,

    Rodrigo

Reply
  • Hi, thanks for the response.

    Do I understand your ticket correctly, that you are using external flash to store the settings?

    Yes, I'm using an external flash with QSPI.

    How have you configured this external flash, and the storage partition used?

    I have followed similar configurations as nrf52840dk_nrf52840 DTS for the flash. About the storage, I've used a similar code to samples/subsys/fs/littlefs, where I basically have:

    FS_LITTLEFS_DECLARE_DEFAULT_CONFIG(storage);
    static struct fs_mount_t lfs_storage_mnt = {
    .type = FS_LITTLEFS,
    .fs_data = &storage,
    .storage_dev = (void *)FLASH_AREA_ID(storage),
    .mnt_point = "/mnt",
    };
    

    and then I mount it early on main:

    rc = fs_mount(&lfs_storage_mnt);

    For the settings I have:

    CONFIG_SETTINGS_FS_DIR="/mnt/settings"
    CONFIG_SETTINGS_FS_FILE="/mnt/settings/run"

    I'm not using automount (FSTAB), as I'm using partition manager and NCS support for it only came after NCS 1.8.0. However, my OPENTHREAD_MANUAL_START is enabled, as it's the default for NCS now.

    Thank you,

    Best,

    Rodrigo

Children
No Data
Related