Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

FDS - missing swap page

 Hi,

One of our devices is bricked due to the fds_init returning FDS_ERR_NO_PAGES. After flash analysis, I have discovered that all pages are marked as FDS_PAGE_DATA (no FDS_PAGE_SWAP). One of these pages is erased (just FDS_PAGE_TAG_MAGIC and FDS_PAGE_DATA header).

In our application, FDS is heavily used. During testing, we perform many power cycles. I do not have any clear reproduction path due to the fact that we found it just once (more than 200 devices online for a few months).

We are using SDK 14.2.0 with nRF52832 (custom board designs).

FDS flash dump can be found here: https://drive.google.com/drive/folders/1y_KOyIhVw9d-ZAAIc8SSa8k9vUOobDVz?usp=sharing

Parents
  • Hi,

    Good news:) I have a way to reproduce it.

    Given we have interrupted the GC procedure (power off) in the following state:

    | page_address | page_type  |
    | 0xEF000      | erased     |
    | 0xF0000      | data       |
    | 0xF1000      | swap_dirty |
    | 0xF2000      | data       |
    | 0xF3000      | data       |

    When we initialized fds again it performs the following actions (PROMOTE_SWAP_INST):

    • tag 0xF1000 as the data
    • tag 0xF0000 as the swap (this fails due to NAND flash)
    • tag 0xEF000 as the data

    When the erased page is initialized before swap page then m_gc.cur_page is not initialized correctly. This is my proposition:

                case FDS_PAGE_SWAP:
                {
                    if (swap_set_but_not_found)
                    {
                        m_pages[page].page_type    = FDS_PAGE_ERASED;
                        m_pages[page].p_addr       = m_swap_page.p_addr;
                        m_pages[page].write_offset = FDS_PAGE_TAG_SIZE;
    
                        m_gc.cur_page = page; // FIX
                        page++;
                    }
    
                    m_swap_page.p_addr = p_page_addr;
                    // If the swap is promoted, this offset should be kept, otherwise,
                    // it should be set to FDS_PAGE_TAG_SIZE.
                    page_scan(p_page_addr, &m_swap_page.write_offset, NULL);
    
                    ret |= (m_swap_page.write_offset == FDS_PAGE_TAG_SIZE) ?
                            PAGE_SWAP_CLEAN : PAGE_SWAP_DIRTY;
                } break;

    It is fixing this case, but I am not sure if it does not break anything. Please, could you review it for me?

  • I have also experienced this issue with SDK 17.0.2. On one device calls to fds_init during boot returned the NO_SWAP error. Looking at the memory contents all the FDS pages were tagged as DATA.

  • Is there a reliable way of reproducing this?

    BR,

    Edvin

  • I am having the same issue.  I have not been able to reliably reproduce it, however, in my case at least one of the DATA  pages is empty.  This is a recoverable state as a blank data page can be erased and tagged as a SWAP page.

    I suggest adding a new enum value of PAGE_DATA_EMPTY

    enum
    {
        PAGE_ERASED     = 0x1,  // One or more erased pages found.
        PAGE_DATA       = 0x2,  // One or more data pages found.
        PAGE_SWAP_CLEAN = 0x4,  // A clean (empty) swap page was found.
        PAGE_SWAP_DIRTY = 0x8,  // A dirty (non-empty) swap page was found.
        PAGE_DATA_EMPTY = 0x10, // One or more empty data pages found.
    };

    and a new enum to fds_init_opts_t

        // The filesystem only contains data pages, but at least one data page is empty.
        // It is likely that the device powered off during GC. It is safe to discard (erase)
        // a empty data page, since there is nothing stored there.    
        DISCARD_EMPTY_DATA             = (PAGE_DATA | PAGE_DATA_EMPTY),

    Handling this in a similar manner as DISCARD_SWAP (erase empty data page then init that page as a swap), will produce the same result.

  • Hi Rick N,

    I'm experiencing the same problem with the FDS and I really like your take on solving this problem.

    Do you perhaps have a more worked-out example of your solution?

    Thanks in advance.

    Kind regards,

    Remco Poelstra

  • My workaround was to check the flash pages prior to calling `fds_init`.  If there were no swap pages (`FDS_PAGE_SWAP`) and no blank pages (`FDS_PAGE_ERASED` or `FDS_PAGE_UNDEFINED`) I would perform a recovery.  Recovery would either erase a single data page, if a data page was blank (just header), or it would erase all flash storage pages (last resort, data is lost, but the MCU will recovery).  Also, if a page is `FDS_PAGE_UNDEFINED`, then I would erase that page because it is unknown what is on that page and it cannot be trusted.

    `fds_init` takes care of the rest.

Reply
  • My workaround was to check the flash pages prior to calling `fds_init`.  If there were no swap pages (`FDS_PAGE_SWAP`) and no blank pages (`FDS_PAGE_ERASED` or `FDS_PAGE_UNDEFINED`) I would perform a recovery.  Recovery would either erase a single data page, if a data page was blank (just header), or it would erase all flash storage pages (last resort, data is lost, but the MCU will recovery).  Also, if a page is `FDS_PAGE_UNDEFINED`, then I would erase that page because it is unknown what is on that page and it cannot be trusted.

    `fds_init` takes care of the rest.

Children
No Data
Related