ZMS Data Loss — ATE ID 0x1A80 Overwritten

Environment: nRF54 SDK v3.0.0, Zephyr Memory Storage (ZMS)
Config: 4 sectors × 4KB = 16 KB partition
Record IDs involved: 0x1A80 (SRAP data, 253 bytes), 0x1A21 (pairing info, 408 bytes)
Observed behavior: Data is correct after power-on. After some runtime, 0x1A80 reads back corrupted data.
Flash Dump — Sector 0 ATE Entries (sector tail)
ATEs at the end of the first 4KB sector, ordered from newest to oldest (ZMS writes ATEs backward from the sector end):
Address  Content (hex)                                                        Interpretation
-------  -------------------------------------------------------------------  -------------------------------
4080     54 2E FF FF FF FF FF FF 00 00 00 00 01 42 00 00                      Empty ATE (cycle=0x2E)
4064     67 2D 00 00 FF FF FF FF A0 0D 00 00 FF FF FF FF                      ZMS_HEAD_ID (cycle=0x2D, offset=3488)
4048     D0 2E FD 00 80 1A 00 00 00 00 00 00 00 00 00 00                      → ID=0x1A80, len=253, **offset=0**, cycle=0x2E
4032     C7 2E 04 00 05 1A 00 00 01 00 00 00 00 00 00 00                      → ID=0x1A05, len=4 (inline data)
4016     F1 2E 00 00 FF FF FF FF 00 01 00 00 FF FF FF FF                      ZMS_HEAD_ID (cycle=0x2E, **offset=256**)
4000     A1 2E 98 01 21 1A 00 00 00 00 00 00 00 00 00 00                      → ID=0x1A21, len=408, **offset=0**, cycle=0x2E
ATE structure (16 bytes): crc8(1) + cycle_cnt(1) + len(2) + id(4) + data/offset(8) — little-endian.
Observations
  1. Both 0x1A80 and 0x1A21 have offset=0 — both point to sector base address 0 as their data location. Since 0x1A21 (408 bytes) was written after 0x1A80 (253 bytes), 0x1A21's data physically overwrites 0x1A80's data at offset 0.
  2. The ZMS_HEAD_ID entry at offset 4016 records offset=256 — this is a GC-done marker written when data_wra was at position 256 within the sector. However, the subsequent 0x1A21 ATE at offset 4000 records offset=0, meaning data_wra was reset from 256 back to 0 between these two writes.
  3. The ZMS_HEAD_ID at offset 4064 has cycle=0x2D (different from the 0x2E cycle of all other entries) — this is a residual entry from a previous sector cycle that was not fully erased.
  4. In the second sector (lines 258-513 of the dump), the sector has been GC'd/erased. Although a 0x1A80 ATE entry still exists at this location:
08 2D FD 00 80 1A 00 00 00 00 00 00 00 00 00 00    → ID=0x1A80, len=253, **offset=0**, cycle=0x2D
The actual data at the offset it points to (offset 0 of this sector) does not contain the real 0x1A80 payload — the data at that address belongs to a different record, confirming the data was overwritten there as well.
Question
Could you please help us understand what conditions in ZMS could cause the data_wra pointer to be reset back to 0 after a GC cycle has already advanced it (to 256 in this case), resulting in two different records pointing to the same data offset and overwriting each other? No power loss or system reset occurred during this failure window.
Parents
  • Hello,

    I suspect that this is patched in this commit:

    https://github.com/zephyrproject-rtos/zephyr/commit/15cbe9fd18e2a319811a3cd877f238543050da18

    Which is included in NCS v3.1.0.

    Do you have an application where I can reproduce the issue on an nRF54L15 DK? Alternatively, can you see if running your application in v3.1.0 fixes the issue, or if it is still present?

    Best regards,

    Edvin

  • Some more context. There was a bug where if a partition was mounted after it's initial mount, this could lead to a bug showing what you are seeing. But I am not 100% sure that is actually what you are seeing, which is why I would like to verify whether using NCS v3.1.0 has the same behavior or not.

    BR,
    Edvin

  • Got it. Here is the test case I used to reproduce the issue:

    I wrote a function that directly calls our ZMS library APIs.

    Setup:

    • Mount ZMS with 4 sectors.
    • Erase all sector contents before starting.

    Steps to reproduce:

    1. After mounting ZMS, write one entry with a length greater than 8 bytes.
    2. Trigger 3 GC cycles, then power off (to force a full remount on the next boot).
    3. After reboot, read the entry — the data is correct at this point.
    4. Write one more different entry.
    5. Result: the previously written entry is corrupted/overwritten.

    I also ran our real application using the same workflow, and the issue reproduces reliably there as well.

    Beat regards

    Pang

  • Can you upload the application that you are using to reproduce the issue please?

  • Hi,

    The file zms_test.ccontains a minimal code snippet to reproduce the issue.
    Could you please take a look and let me know if you can replicate the behavior on your side?

    /**
     * @file     zms_test.c
     * @brief    ZMS bug reproducer — reliably triggers data_wra recovery error
     *
     * This is a standalone test snippet that directly calls the ZMS (Zephyr Memory
     * Storage) library API. It reproduces the data_wra recovery bug caused by
     * "*addr -= 2 * fs->ate_size" in zms_recover_last_ate(), and validates the fix
     * by changing it to "*addr -= fs->ate_size".
     *
     * Entry point: drv_storage_zms_test()
     *
     * Bug symptom (with *addr -= 2 * fs->ate_size):
     *   1. Write "Hello, ZMS!"        to ID 0x1A80
     *   2. Advance 3 sectors           → triggers GC
     *   3. Remount ZMS                 → zms_recover_last_ate() called
     *   4. Read 0x1A80                 → OK, returns "Hello, ZMS!"
     *   5. Write "111111111111111..."  to ID 0x1A81
     *   6. Read 0x1A80 again           → BUG: returns 0x1A81's data instead
     *      (0x1A80 data overwritten by 0x1A81)
     *
     * Root cause:
     *   "*addr -= 2 * fs->ate_size" skips the valid ATE at
     *   sector_size - 3 * ate_size, causing data_wra to be recovered too low.
     *   Subsequent writes then reuse the already-occupied data region.
     *
     * Fix:
     *   Change to "*addr -= fs->ate_size". data_wra is recovered correctly and
     *   0x1A80 data is no longer overwritten.
     */
    
    static struct zms_fs fs;
    #define ZMS_PARTITION        storage_partition
    #define ZMS_PARTITION_DEVICE FIXED_PARTITION_DEVICE(ZMS_PARTITION)
    #define ZMS_PARTITION_OFFSET FIXED_PARTITION_OFFSET(ZMS_PARTITION)
    #define ZMS_PARTITION_SIZE   FIXED_PARTITION_SIZE(ZMS_PARTITION)
    
    
    
    
    bool drv_storage_init(void)
    {
    
        struct flash_pages_info info;
    
        fs.flash_device = ZMS_PARTITION_DEVICE;
        if (!device_is_ready(fs.flash_device)) {
            LOG_ERR("Storage device %s is not ready\n", fs.flash_device->name);
        return false;
        }
        fs.offset = ZMS_PARTITION_OFFSET;
        int rc = flash_get_page_info_by_offs(fs.flash_device, fs.offset, &info);
        if (rc) {
            LOG_ERR("Unable to get page info, rc=%d\n", rc);
        return false;
        }
        fs.sector_size = info.size;
        fs.sector_count = ZMS_PARTITION_SIZE / info.size; // 4096*4=16KB defined in prj.conf
    
        rc = zms_mount(&fs);
        if (rc) {
            LOG_ERR("Storage Init failed, rc=%d\n", rc);
        return false;
        } 
        LOG_INF("zms info - sector_size: %u, counter: %u, offset: 0x%X partitionsize:0x%x",
            fs.sector_size, fs.sector_count, (uint32_t)fs.offset, ZMS_PARTITION_SIZE);
    
    
    
        return true;
    }
    
    
    bool drv_storage_zms_clean_all_and_reinit(void)
    {
    
    	struct flash_pages_info info;
    	fs.flash_device = ZMS_PARTITION_DEVICE;
    	if (!device_is_ready(fs.flash_device)) {
    		LOG_ERR("Storage device %s is not ready\n", fs.flash_device->name);
        return false;
    	}
    	fs.offset = ZMS_PARTITION_OFFSET;
    	int rc = flash_get_page_info_by_offs(fs.flash_device, fs.offset, &info);
    	if (rc) {
    		LOG_ERR("Unable to get page info, rc=%d\n", rc);
        return false;
    	}
    	fs.sector_size = info.size;
    	fs.sector_count = ZMS_PARTITION_SIZE / info.size; // 4096*4=16KB defined in prj.conf
    
        /* 1. 先清空整个 ZMS 文件系统(相当于重新初始化数据) */
        rc = zms_clear(&fs);
        if (rc) {
            LOG_ERR("zms_clear failed: %d", rc);
            return rc;
        }
    
         /* 2. 再挂载 ZMS 文件系统 */
    	rc = zms_mount(&fs);
    	if (rc) {
    		LOG_ERR("Storage Init failed, rc=%d\n", rc);
        return false;
    	} 
    	LOG_INF("zms info - sector_size: %u, counter: %u, offset: 0x%X partitionsize:0x%x",
    		fs.sector_size, fs.sector_count, (uint32_t)fs.offset, ZMS_PARTITION_SIZE);
    
    
    
        return true;
    }
    
    
    void drv_storage_zms_test(void)
    {
        const char *test_data = "Hello, ZMS!";
        const char *test1_data = "1111111111111111111111111";
        char read_buffer[20] = {0};
    
        while(1){
    
          drv_storage_zms_clean_all_and_reinit();
          
          int rc = zms_write(&fs, 0x1A80, test_data, strlen(test_data));
          if (rc >= 0) {
              //LOG_ERR("ZMS write successful");
          } else {
              LOG_ERR("ZMS write failed, err=%d", rc);
              return;
          }
          zms_sector_use_next(&fs);
          zms_sector_use_next(&fs);
          zms_sector_use_next(&fs);
    
          drv_storage_init();
    
          rc = zms_read(&fs, 0x1A80, read_buffer, sizeof(read_buffer));
          if (rc >= 0) {
              LOG_ERR("ZMS read 0X1A80 one successful, data: %s", read_buffer);
          } else {
              LOG_ERR("ZMS read failed, err=%d", rc);
              return;
          }
    
          if (memcmp(read_buffer, test_data, strlen(test_data)) == 0) {
              //LOG_ERR("ZMS data match: %s", read_buffer);
          } else {
              LOG_ERR("ZMS data mismatch");
          }
    
          rc = zms_write(&fs, 0x1A81, test1_data, strlen(test1_data));
          if (rc >= 0) {
              //LOG_ERR("ZMS write successful");
          } else {
              LOG_ERR("ZMS write failed, err=%d", rc);
              return;
          }
    
          rc = zms_read(&fs, 0x1A80, read_buffer, sizeof(read_buffer));
          if (rc >= 0) {
              LOG_ERR("ZMS read 0X1A80 two successful, data: %s", read_buffer);
          } else {
              LOG_ERR("ZMS read failed, err=%d", rc);
              return;
          }
    
          k_msleep(3000);
        }
    
    }
    Beat regards
    Pang

  • Hi, is this issue reproducible? Awaiting your feedback.

    Beat regards

    Pang

  • Hello,

    I have not been able to run it. I would need the rest of your application (the minimal sample to reproduce the issue).

    Can you zip the application folder where you tested this (including the CMakeLists.txt, prj.conf, board files, if any, etc...) and upload that here, so that I can build it for nrf54l15dk/nrf54l10/cpuapp and flash it to an nRF54l15 DK and reproduce the issue please?

    Best regards,

    Edvin

Reply
  • Hello,

    I have not been able to run it. I would need the rest of your application (the minimal sample to reproduce the issue).

    Can you zip the application folder where you tested this (including the CMakeLists.txt, prj.conf, board files, if any, etc...) and upload that here, so that I can build it for nrf54l15dk/nrf54l10/cpuapp and flash it to an nRF54l15 DK and reproduce the issue please?

    Best regards,

    Edvin

Children
No Data
Related