Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Unable to return from fds_gc(), stuck in infinite loop.

There is a bug in fds.c file, Please go through the below code snippet.

ret_code = fds_record_update(&desc, record_id);
if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
{
            ret_code = fds_gc();
            ERROR_CHECK(ret_code);
            ret_code = fds_record_update(&desc, record_id);
 }

I do not return from fds_gc(), execution stays there for infinite. 

More information:

1) #define FDS_VIRTUAL_PAGES 30

2) #define FDS_VIRTUAL_PAGE_SIZE (1024)

3) Initialization is successful and able to write, update and read several records.

4) SDK version: nRF_SDK_15.2.0

The problem occurs when record_update is failed with FDS_ERR_NO_SPACE_IN_FLASH and followed by garbage collection function fds_gc().

According to me, a library function has to return an error code if something is wrong, but not locked in an infinite loop. This is a serious bug.

Please help me as I am stuck in the crucial phase of our development.

Regards,

Jagadeep

  • Hi all,

    During the fds_gc() function call, I see the following code execute in the function queue_process().

                    result = FDS_ERR_INTERNAL;

            switch (m_p_cur_op->op_code)
            {
                case FDS_OP_INIT:
                    result = init_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_WRITE:
                case FDS_OP_UPDATE:
                    result = write_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_DEL_RECORD:
                case FDS_OP_DEL_FILE:
                    result = delete_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_GC:
                    result = gc_execute(result);
                    break;
    
                default:
                    result = FDS_ERR_INTERNAL;
                    break;
            }

    Due to this, the while loop executes infinitely. 

    Please someone guide me on how to use the garbage collection function and solve my problem.

  • Hi,

    All FDS functions that writes to or erases flash are asynchronous, and fds_gc() is no exception. This means on the call to fds_gc() the actual garbage collection gets queued. You will get an FDS_EVT_GC event at a later point, when the garbage collection has run. This means that yes, the code snippet that you listed initially does not work. In the case where garbage collection is needed, what happens in that code is that garbage collection gets queued, then before garbage collection is finished there is a new attempt to write for which you should expect another ret_code of FDS_ERR_NO_SPACE_IN_FLASH.

    Have you done a debug session in order to confirm where you get stuck?

    Regards,
    Terje

  • Hi Terje.

    I'll explain what I'm trying to and what I have found during debug session.

    A) What I want to do?

    I want to store information in the FLASH storage during certain events take place. A single event record is 16 bytes and wants to store at least 500 events (8000 bytes in total). I have assigned FDS_VIRTUAL_PAGE_SIZE (1024) and FDS_VIRTUAL_PAGES 12, way more than needed.  The product life is for 5 years and wanted to test the FLASH endurance (maximum events stored/updated), as the datasheet mention 10,0000 cycles.

    B) What I have done?

      1)  Created a file to store all events.

      2)  An event is stored as a record in the file.

      3)  Successfully created and stored 500 records using fds_record_write().

      4) Update the existing records from the beginning for further records using fds_record_update(). The goal is to set up a ring buffer of 500 records.

      5) An fds_record_update() invalidates the current and creates a new record.  The allocated buffer is utilized completely after around 2000 events and the update function returned FDS_ERR_NO_SPACE_IN_FLASH. As per the instruction, fds_gc() should be used to free the invalidated records and continue with the update function.

    6) While calling the fds_gc() function, doesn't return back to my layer and keeps calling queue_process() function infinitely.

    7) It is estimated that at least 5000 events should be stored (in circular fashion) during the product lifetime and hence it is important to fix this bug. I found the issue while unit testing the event logging module.

    static void fds_evt_handler(const fds_evt_t* p_evt)
    {
        switch (p_evt->id)
        {
            case FDS_EVT_INIT:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    m_fds_initialized = 1;
                    NRF_LOG_INFO("Event: Init");
                }
                break;
            }
            case FDS_EVT_WRITE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    write_done = 1;
                    NRF_LOG_INFO("Event: Write");
                }
                break;
            }
            case FDS_EVT_DEL_RECORD:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    erase_done = 1;
                    NRF_LOG_INFO("Event: Delete");
                }
                break;
            }
            case FDS_EVT_UPDATE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    update_done = 1;
                    NRF_LOG_INFO("Event: Update");
                }
                break;
            }
            case FDS_EVT_GC:
                if (p_evt->result == FDS_SUCCESS)
                {
                    NRF_LOG_INFO("Event: Garbage Collection");
                }
                break;
            default:
                break;
        }
    }
    
    static void wait_for_fds_ready(void)
    {
        while (!m_fds_initialized)
        {
            __WFE();
        }
    }
    
    static void wait_for_write_done(void)
    {
        while (!write_done)
        {
            __WFE();
        }
        write_done = 0;
    }
    
    static void wait_for_update_done(void)
    {
        while (!update_done)
        {
            __WFE();
        }
        update_done = 0;
    }
    
    void write_event()
    {
       ret_code = fds_record_find(record_id->file_id, record_id->key, &desc, &tok);
        /* Write only when file is not found */
        if (ret_code == FDS_ERR_NOT_FOUND)
        {
            ret_code = fds_record_write(&desc, record_id);
            wait_for_write_done();
            if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_write(&desc, record_id);
                wait_for_write_done();
            }
        }
        else if (ret_code == FDS_SUCCESS)
        {
            /* Write the updated record to flash. */
            ret_code = fds_record_update(&desc, record_id);
            if (ret_code == FDS_SUCCESS)
            {
                wait_for_update_done();
            }
            else if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_update(&desc, record_id);
                wait_for_update_done();
            }
            else
            {
                ERROR_CHECK_HIGH(ret_code);
            }
        }
    }

    I placed a breakpoint at fds_gc() function and could see that it revolves within itself. The variable result is having value FDS_ERR_INTERNAL inside the function queue_process().

    static void queue_process(ret_code_t result)
    {
        static fds_op_t              * m_p_cur_op;  // Current fds operation.
        static nrf_atfifo_item_get_t   m_iget_ctx;  // Queue context for the current operation.
    
        while (true)
        {
            if (m_p_cur_op == NULL)
            {
                // Load the next from the queue if no operation is being executed.
                m_p_cur_op = queue_load(&m_iget_ctx);
            }
    
            /* We can reach here in three ways:
             * from queue_start(): something was just queued
             * from the fstorage event handler: an operation is being executed
             * looping: we only loop if there are operations still in the queue
             *
             * In all these three cases, m_p_cur_op != NULL.
             */
            ASSERT(m_p_cur_op != NULL);
    
            switch (m_p_cur_op->op_code)
            {
                case FDS_OP_INIT:
                    result = init_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_WRITE:
                case FDS_OP_UPDATE:
                    result = write_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_DEL_RECORD:
                case FDS_OP_DEL_FILE:
                    result = delete_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_GC:
                    result = gc_execute(result);
                    break;
    
                default:
                    result = FDS_ERR_INTERNAL;
                    break;
            }
    
            if (result == FDS_OP_EXECUTING)
            {
                // The operation has not completed yet. Wait for the next system event.
                break;
            }
    
            // The operation has completed (either successfully or with an error).
            // - send an event to the user
            // - free the operation buffer
            // - execute any other queued operations
    
            fds_evt_t evt =
            {
                // The operation might have failed for one of the following reasons:
                // FDS_ERR_BUSY              - flash subsystem can't accept the operation
                // FDS_ERR_OPERATION_TIMEOUT - flash subsystem timed out
                // FDS_ERR_CRC_CHECK_FAILED  - a CRC check failed
                // FDS_ERR_NOT_FOUND         - no record found (delete/update)
                .result = (result == FDS_OP_COMPLETED) ? FDS_SUCCESS : result,
            };
    
            event_prepare(m_p_cur_op, &evt);
            event_send(&evt);
    
            // Zero the pointer to the current operation so that this function
            // will fetch a new one from the queue next time it is run.
            m_p_cur_op = NULL;
    
            // The result of the operation must be reset upon re-entering the loop to ensure
            // the next operation won't be affected by eventual errors in previous operations.
            result = NRF_SUCCESS;
    
            // Free the queue element used by the current operation.
            queue_free(&m_iget_ctx);
    
            if (!queue_has_next())
            {
                // No more elements left. Nothing to do.
                break;
            }
        }
    }

    The code is stuck in this while loop. The following exit point conditions are not True and hence its not possible to escape the loop.

    1) 

    if (result == FDS_OP_EXECUTING)
          {
                      // The operation has not completed yet. Wait for the next system event.
                     break;
          }

    2) 

    if (!queue_has_next())

    {

                   // No more elements left. Nothing to do.
                   break;
    }

    Let me know if you want more information. I'm open for a call or video session to share more code.

    /Jagadeep

  • I missed an important information.

    The event handler is not called with FDS_EVT_GC event after called fds_gc().  Please find the updated event handler function and code using FDS functions.

    static void fds_evt_handler(const fds_evt_t* p_evt)
    {
        switch (p_evt->id)
        {
            case FDS_EVT_INIT:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    m_fds_initialized = 1;
                    NRF_LOG_INFO("Event: Init");
                }
                break;
            }
            case FDS_EVT_WRITE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    write_done = 1;
                    NRF_LOG_INFO("Event: Write");
                }
                break;
            }
            case FDS_EVT_DEL_RECORD:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    erase_done = 1;
                    NRF_LOG_INFO("Event: Delete");
                }
                break;
            }
            case FDS_EVT_UPDATE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    update_done = 1;
                    NRF_LOG_INFO("Event: Update");
                }
                break;
            }
            case FDS_EVT_GC:
                if (p_evt->result == FDS_SUCCESS)
                {
                    gc_done = 1;
                    NRF_LOG_INFO("Event: Garbage Collection");
                }
                break;
            default:
                break;
        }
    }
    
    static void wait_for_fds_ready(void)
    {
        while (!m_fds_initialized)
        {
            __WFE();
        }
    }
    
    static void wait_for_write_done(void)
    {
        while (!write_done)
        {
            __WFE();
        }
        write_done = 0;
    }
    
    static void wait_for_update_done(void)
    {
        while (!update_done)
        {
            __WFE();
        }
        update_done = 0;
    }
    
    static void wait_for_gc_done(void)
    {
        while (!gc_done)
        {
            __WFE();
        }
        gc_done = 0;
    }
    
    void write_event()
    {
       ret_code = fds_record_find(record_id->file_id, record_id->key, &desc, &tok);
        /* Write only when file is not found */
        if (ret_code == FDS_ERR_NOT_FOUND)
        {
            ret_code = fds_record_write(&desc, record_id);
            if (ret_code == FDS_SUCCESS)
            {
                wait_for_write_done();
            }
            else if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                wait_for_gc_done();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_write(&desc, record_id);
                wait_for_write_done();
            }
        }
        else if (ret_code == FDS_SUCCESS)
        {
            /* Write the updated record to flash. */
            ret_code = fds_record_update(&desc, record_id);
            if (ret_code == FDS_SUCCESS)
            {
                wait_for_update_done();
            }
            else if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                wait_for_gc_done();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_update(&desc, record_id);
                wait_for_update_done();
            }
            else
            {
                ERROR_CHECK_HIGH(ret_code);
            }
        }
    }

  • Hello,

    Terje is currently out travelling.

    I was wondering if you experience on_operation_failure() occurs here, this will be triggered by NRF_EVT_FLASH_OPERATION_ERROR in fs_sys_event_handler().

    Best regards,
    Kenneth

Related