Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

About FDS module data power-off protection

Development environment:

NRF52832

SDK:15.3

SOFTDEVICE:NONE

When testing the FDS module power-off data protection feature, the FDS module is abnormal.

During the FDS module garbage collection process, performing a software reset may cause the FLASH data to be abnormal.
I checked the source code of the FDS and found that the following problems might exist:
Stage 1: FDS garbage collection execution, copy data_page to swap_page
Stage 2: Erase data_page
Stage 3: Perform a software reset
Stage 4: First find the erase_page in the execution of the functions pages_init(), and then tag the erase_page as swap_page. This will cause m_gc.cur_page not to be set.

            case FDS_PAGE_UNDEFINED:
            {
                if (page_is_erased(p_page_addr))
                {
                    if (m_swap_page.p_addr != NULL)
                    {
                        // If a swap page is already set, flag the page as erased (in m_pages)
                        // and try to tag it as data (in flash) later on during initialization.
                        m_pages[page].page_type    = FDS_PAGE_ERASED;
                        m_pages[page].p_addr       = p_page_addr;
                        m_pages[page].write_offset = FDS_PAGE_TAG_SIZE;
                        
                        // This is a candidate for a potential new swap page, in case the
                        // current swap is going to be promoted to complete a GC instance.
                        m_gc.cur_page = page;
                        page++;
                    }
                    else
                    {
                        /*********************************************/
                        // m_gc.cur_page not to be set
                        /********************************************/
                        
                        // If there is no swap page yet, use this one.
                        m_swap_page.p_addr       = p_page_addr;
                        m_swap_page.write_offset = FDS_PAGE_TAG_SIZE;
                        swap_set_but_not_found   = true;
                    }

                    ret |= PAGE_ERASED;
                }
                else
                {
                    // The page contains non-FDS data.
                    // Do not initialize or use this page.
                    total_pages_available--;
                    m_pages[page].p_addr    = p_page_addr;
                    m_pages[page].page_type = FDS_PAGE_UNDEFINED;
                    page++;
                }
            } break;

Stage 5: After page_init() is executed, init_opts = 0x0B, and then the following code is executed:

        case FDS_OP_INIT_PROMOTE_SWAP:
        {
            p_op->init.step       = FDS_OP_INIT_TAG_SWAP;

            // When promoting the swap, keep the write_offset set by pages_init().
            ret = page_tag_write_data(m_swap_page.p_addr);
            
            /******************************************************/
            //m_gc.cur_page has not been set
            /******************************************************/
            
            uint16_t const         gc         = m_gc.cur_page;
            uint32_t const * const p_old_swap = m_swap_page.p_addr;

            // Execute the swap.
            m_swap_page.p_addr = m_pages[gc].p_addr;
            m_pages[gc].p_addr = p_old_swap;

            // Copy the offset from the swap to the new page.
            m_pages[gc].write_offset = m_swap_page.write_offset;
            m_swap_page.write_offset = FDS_PAGE_TAG_SIZE;

            m_pages[gc].page_type = FDS_PAGE_DATA;
        } break;

Stage 6:FDS module data is abnormal, SWAP_PAGE will not be found

Please confirm this question, thank you!

Parents
  • Hi,

    Thank you for reporting this issue! It is highly appreciated.

    I have not reproduced and/or confirmed your findings yet, but it looks like this might be a bug, yes.

    * What is your setup for reproducing the behavior?
    * What are the detailed steps in order to reproduce?
    * Do we need a specific layout of the FDS pages before performing the test? (E.g. which page is the swap page, any open records, etc.?)
    * Can you describe the resulting error state in more detail?
    * A flash dump from before and after the error condition would be great. ("nrfjprog --readuicr --readcode dump.hex", for dumping flash contents to a file named "dump.hex".)

    We need a way to consistently reproduce this, in order to investigate further and fully understand what is the issue.

    Regards,
    Terje

Reply
  • Hi,

    Thank you for reporting this issue! It is highly appreciated.

    I have not reproduced and/or confirmed your findings yet, but it looks like this might be a bug, yes.

    * What is your setup for reproducing the behavior?
    * What are the detailed steps in order to reproduce?
    * Do we need a specific layout of the FDS pages before performing the test? (E.g. which page is the swap page, any open records, etc.?)
    * Can you describe the resulting error state in more detail?
    * A flash dump from before and after the error condition would be great. ("nrfjprog --readuicr --readcode dump.hex", for dumping flash contents to a file named "dump.hex".)

    We need a way to consistently reproduce this, in order to investigate further and fully understand what is the issue.

    Regards,
    Terje

Children
  • My English is very bad, I hope you can understand。

    When garbage collection is performed, the records of the data pages are copied to the swap page. The data page is then erased, at which point there is a swap page and an erase page. At this point, the software reset is executed.
    After restarting, if the swap page is retrieved first in the pages_init() function and then the erase page is retrieved, the m_gc.cur_page variable value is correct. If the erase page is retrieved first and then the swap page is retrieved, then the m_gc.cur_page variable is not assigned. At this time pages_init() returns 0x0B, and m_gc.cur_page is used in the subsequent operations.

    I rewrote the FDS module to use external flash, but I confirmed that this problem exists with FDS. The external flash section erase time is about 200ms, which is easy to reproduce. The internal FLASH erase speed is faster and it is not easy to reproduce this problem.

  • Hi,

    Thank you for the clarification, I think I understand now. I will file an internal bug report for this. We should have enough information to identify the issue now. From what I can tell, this looks like something that must be fixed, yes.

    Can you provide a flash dump of the corrupted flash? ("nrfjprog --readuicr --readcode dump.hex", then attach the resulting "dump.hex".) If you do not want to share the flash dump here in a public thread, please open a private thread with the attachment and refer to this thread.

    Thank you again for the report!

    Regards,
    Terje

Related