This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

FDS and write/erase flash cycles

Hi!

I've read a bit about the limitation of write/read cycles with the nRF52840. I the ~10,000 write/read cycles limit means after this many erases, the flash is no longer guaranteed to 'work properly'.

I have a few questions regarding this:

1. What does 'work properly' mean? will there be a problem with write operations, read operations or both? What kind of problems can be expected? (will I write 0xAB and sometimes the flash will contain 0xAC, or will the flash most likely contain 0xAB, but my read operation will return 0xAC?)

2. If the problem is with the flash read, rather than the flash write, does the FDS not 'deal' with this in any way? Meaning that if the CRC of the read data mismatches with the CRC of the flash data, will the FDS not -re-read the flash to get the 'correct value'?

3. Is there any way of knowing that my flash has started to deterioration? is there a count of 'erase' cycles kept somewhere which I can read to get an idea of how far I am from 10,000 writes?

4. Does the FDS guarantee a 'reasonably' equal distribution of write/erase cycles throughout the flash? meaning that if one flash segment has reached the 10,000 limit, the other flash segments are very close to the limit as well?

Thank you!

  • Thanks for the update. Intereseting results!

    A.P said:

    First chip failed after 790,397 cycles.

    Second chip failed after 918,918 cycles.

    Third chip failed after 1,102,080 cycles.

      Does this correspond to the number of erase cycles on one flash page?

    A.P said:
    So, it seems we can expect around 900 thousand cycles on each page flash, if my 3 samples are in any way a valid sample pool.
    Does Nordic have the distribution graph for flash fails per write-erase cycles anywhere?

     I am afraid I don't have much data on this, it's only that the endurance may be as low as 10 000 cycles as stated in the absoute maxium ratings section of the PS. And I think you should account for that number when designing your product. E.g. by allocating as many flash pages as possible to improve wear leveling. 

  • Hi!

    Does this correspond to the number of erase cycles on one flash page?

    Yes it does.

    I think you should account for that number when designing your product. E.g. by allocating as many flash pages as possible to improve wear leveling. 

    Will do. In our app, with as many flash pages as possible, and the amount of data we expect to write to the flash, it is not realistic to expect the flash to reach the point of failure before the expected end of life for the product.
    Should that change, we will simply verify the CRC on every write to the flash (write --> read --> verify CRC is correct) and notify when the flash has started to fail.
    That should be enough for our purposes.

    Thanks for the support!

    BTW, if possible, I think we could benefit from a distribution graph, which I assume will be normal around X erase cycles, for the point at which the flash fails. This way we have both the absolute value (10,000) writes, and the average (X) so that we have a rough idea what to expect from the flash. That would require a few thousand devices to be sacrificed, which is outside the scope of what I'm willing to pay ;)

  • Was this test done using high level FDS. i.e fds_record_write , fds_record_find & fds_record_open ?


    Does "All subsequent reads returned an incorrect value from the flash.' mean that once a failure occures  no successful WRITES can be performed

  • Hi dgerman!

    I'm afraid I haven't quite understood how to replay to specific messages, so I"m just pressing anywhere that says 'reply'. I hope we can be able to piece together the message flow postmortem.
    At any rate, I've pieced together the portions of the test which are relevant to the Flash test as best I could. I'm not able to share our code as is, since it belongs to the company etc.
    We first tested using FDS, but realised it was slowing us down, we moved to a lower level and used nrf_fstorage_read/write/erase functions.

    #define IMG_SIZE 4096
    
    __ALIGN(32) uint8_t img[IMG_SIZE] = {0};
    __ALIGN(32) uint8_t img_copy[IMG_SIZE];
    
    #define FLASH_START_ADDR 0x30000
    bool write_finished = false, erase_finished = false;
    void callback(nrf_fstorage_evt_t *p_evt)
    {
        if (p_evt->id == NRF_FSTORAGE_EVT_WRITE_RESULT) {
            NRFX_LOG_INFO("flash write result event");
            write_finished = true;
        }
        if (p_evt->id == NRF_FSTORAGE_EVT_ERASE_RESULT) {
            NRFX_LOG_INFO("erase write result event");
            erase_finished = true;
        }
    }
    NRF_FSTORAGE_DEF(nrf_fstorage_t my_instance) = {
        .evt_handler = callback,
        .start_addr  = FLASH_START_ADDR,
        .end_addr    = FLASH_START_ADDR + 4096,
    };
    
    
    bool write_to_flash()
    {
        ret_code_t rc = nrf_fstorage_write(&my_instance,     /* The instance to use. */
                                           FLASH_START_ADDR, /* The address in flash where to store the data. */
                                           img,              /* A pointer to the data. */
                                           IMG_SIZE,         /* Lenght of the data, in bytes. */
                                           NULL              /* Optional parameter, backend-dependent. */
        );
        if (rc == NRF_SUCCESS) {
            NRFX_LOG_INFO("flash write success");
            return true;
        } else {
            NRFX_LOG_ERROR("flash write failure");
            return false;
        }
    }
    
    bool read_from_flash()
    {
        ret_code_t rc = nrf_fstorage_read(&my_instance,     /* The instance to use. */
                                          FLASH_START_ADDR, /* The address in flash where to read data from. */
                                          img_copy,         /* A buffer to copy the data into. */
                                          IMG_SIZE          /* Lenght of the data, in bytes. */
        );
        if (rc == NRF_SUCCESS) {
            NRFX_LOG_INFO("flash read success");
            for (int i = 0; i < IMG_SIZE; i++) {
                if (img[i] != img_copy[i]) {
                    NRFX_LOG_ERROR("flash read unequal at index %d", i);
                    return false;
                }
            }
            return true;
        } else {
            NRFX_LOG_ERROR("flash read success");
            return false;
        }
    }
    
    bool erase_from_flash()
    {
        ret_code_t rc = nrf_fstorage_erase(&my_instance,     /* The instance to use. */
                                           FLASH_START_ADDR, /* The address of the flash pages to erase. */
                                           1,                /* The number of pages to erase. */
                                           NULL              /* Optional parameter, backend-dependent. */
        );
        if (rc == NRF_SUCCESS) {
            NRFX_LOG_INFO("flash erase success");
            return true;
        } else {
            NRFX_LOG_ERROR("flash erase failure");
            return false;
        }
    }
    
    
    /**@brief Function for application main entry.
     */
    int main(void)
    {
    	  // Initialize.
        nrf_fstorage_init(&my_instance,     /* You fstorage instance, previously defined. */
                          &nrf_fstorage_sd, /* Name of the backend. */
                          NULL              /* Optional parameter, backend-dependant. */
        );
      
        log_init();
    
        // write_to_flash();
        // read_from_flash();
        // erase_from_flash();
    
        NRFX_LOG_INFO("%s Starting: %s %s (%s)\n", __func__, DEVICE_NAME, FIRMWARE_REV, MODEL_NUM);
        NRF_LOG_FLUSH();
    
    	//TODO: write 0 value to whereever you save your results. I'll just call it LOG from now on.
    
        bool     write_failed = false, gc_failed = false, read_failed = false;
        uint32_t write_cnt = 0, gc_cnt = 0, read_cnt = 0, fail_run_cnt = 0;
    
        bool use_170_and_not_85 = true;
    	//170 binary is 1010..
    	// 85 binary is 0101..
    
        while (true) {
            // if something has failed, never change it. just write to LOG every once in a long while to make sure it is on
            if (write_failed || gc_failed || read_failed) {
                ++fail_run_cnt;
                if (fail_run_cnt < 3) {
                    NRFX_LOG_ERROR("update fail LOG");
                    //TODO: write to your LOG the reason you've failed and the count.
                }
            } else {
                // flip all the bits every flash write-gc-read attempt
                if (use_170_and_not_85) {
                    for (uint32_t i = 0; i < IMG_SIZE; i++) {
                        img[i] = 170;
                    }
                    use_170_and_not_85 = false;
                } else {
                    for (uint32_t i = 0; i < IMG_SIZE; i++) {
                        img[i] = 85;
                    }
                    use_170_and_not_85 = true;
                }
                ++write_cnt;
                write_finished = false;
                bool write_ret = write_to_flash();
                while (nrf_fstorage_is_busy(&my_instance) && !write_finished)
                    ;
                if (write_ret) {
                    ++read_cnt;
                    bool read_ret = read_from_flash();
    
                    if (read_ret) {
                        ++gc_cnt;
                        erase_finished = false;
                        bool gc_ret    = erase_from_flash();
                        while (nrf_fstorage_is_busy(&my_instance) && !erase_finished)
                            ;
                        if (gc_ret) {
                        } else {
                            NRFX_LOG_ERROR("flash gc #%llu failed!", gc_cnt);
                            gc_failed = true;
                        }
                    } else {
    
                        NRFX_LOG_ERROR("flash read #%llu failed!", read_cnt);
                        read_failed = true;
                    }
    
                } else {
                    NRFX_LOG_ERROR("flash write #%llu failed!", write_cnt);
                    write_failed = true;
                }
    
                if (write_cnt % 256 == 0) {
                    //TODO: write to your LOG the the result thus far, as it hasn't failed yet.
                }
            }
    
            NRF_LOG_FLUSH();
    
            idle_state_handle();
    
            // nrf_delay_ms(10);
    
        } // end of for loop
    }


    This by no means is bug proof, nor have I tested that it compiles. Its just the skeletal structure of our testing.
    I've added a 'TODO' comment where I saw relevant.

    Please let me know if you find any problems in our logic, and do report on your findings once you have them.

    good luck!

  • Hi dgerman!
    Not sure why, I wasn't able to answer you directly before, but now I am.
    1. We first tested using FDS, but realized it was slowing us down, we moved to a lower level and used nrf_fstorage_read/write/erase functions.
    2. All subsequent reads returned an incorrect value, yes. However, I suspect that if you happen to write a '1' to the bit that is stuck as '1', you'll get a correct value read back to you. so you've got a 50% chance to be ok ;)
    3. I've added a skeletal structure of our tests in a separate comment. Hope it helps.

Related