Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Unable to return from fds_gc(), stuck in infinite loop.

There is a bug in fds.c file, Please go through the below code snippet.

ret_code = fds_record_update(&desc, record_id);
if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
{
            ret_code = fds_gc();
            ERROR_CHECK(ret_code);
            ret_code = fds_record_update(&desc, record_id);
 }

I do not return from fds_gc(), execution stays there for infinite. 

More information:

1) #define FDS_VIRTUAL_PAGES 30

2) #define FDS_VIRTUAL_PAGE_SIZE (1024)

3) Initialization is successful and able to write, update and read several records.

4) SDK version: nRF_SDK_15.2.0

The problem occurs when record_update is failed with FDS_ERR_NO_SPACE_IN_FLASH and followed by garbage collection function fds_gc().

According to me, a library function has to return an error code if something is wrong, but not locked in an infinite loop. This is a serious bug.

Please help me as I am stuck in the crucial phase of our development.

Regards,

Jagadeep

Parents
  • Hi Terje.

    I'll explain what I'm trying to and what I have found during debug session.

    A) What I want to do?

    I want to store information in the FLASH storage during certain events take place. A single event record is 16 bytes and wants to store at least 500 events (8000 bytes in total). I have assigned FDS_VIRTUAL_PAGE_SIZE (1024) and FDS_VIRTUAL_PAGES 12, way more than needed.  The product life is for 5 years and wanted to test the FLASH endurance (maximum events stored/updated), as the datasheet mention 10,0000 cycles.

    B) What I have done?

      1)  Created a file to store all events.

      2)  An event is stored as a record in the file.

      3)  Successfully created and stored 500 records using fds_record_write().

      4) Update the existing records from the beginning for further records using fds_record_update(). The goal is to set up a ring buffer of 500 records.

      5) An fds_record_update() invalidates the current and creates a new record.  The allocated buffer is utilized completely after around 2000 events and the update function returned FDS_ERR_NO_SPACE_IN_FLASH. As per the instruction, fds_gc() should be used to free the invalidated records and continue with the update function.

    6) While calling the fds_gc() function, doesn't return back to my layer and keeps calling queue_process() function infinitely.

    7) It is estimated that at least 5000 events should be stored (in circular fashion) during the product lifetime and hence it is important to fix this bug. I found the issue while unit testing the event logging module.

    static void fds_evt_handler(const fds_evt_t* p_evt)
    {
        switch (p_evt->id)
        {
            case FDS_EVT_INIT:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    m_fds_initialized = 1;
                    NRF_LOG_INFO("Event: Init");
                }
                break;
            }
            case FDS_EVT_WRITE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    write_done = 1;
                    NRF_LOG_INFO("Event: Write");
                }
                break;
            }
            case FDS_EVT_DEL_RECORD:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    erase_done = 1;
                    NRF_LOG_INFO("Event: Delete");
                }
                break;
            }
            case FDS_EVT_UPDATE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    update_done = 1;
                    NRF_LOG_INFO("Event: Update");
                }
                break;
            }
            case FDS_EVT_GC:
                if (p_evt->result == FDS_SUCCESS)
                {
                    NRF_LOG_INFO("Event: Garbage Collection");
                }
                break;
            default:
                break;
        }
    }
    
    static void wait_for_fds_ready(void)
    {
        while (!m_fds_initialized)
        {
            __WFE();
        }
    }
    
    static void wait_for_write_done(void)
    {
        while (!write_done)
        {
            __WFE();
        }
        write_done = 0;
    }
    
    static void wait_for_update_done(void)
    {
        while (!update_done)
        {
            __WFE();
        }
        update_done = 0;
    }
    
    void write_event()
    {
       ret_code = fds_record_find(record_id->file_id, record_id->key, &desc, &tok);
        /* Write only when file is not found */
        if (ret_code == FDS_ERR_NOT_FOUND)
        {
            ret_code = fds_record_write(&desc, record_id);
            wait_for_write_done();
            if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_write(&desc, record_id);
                wait_for_write_done();
            }
        }
        else if (ret_code == FDS_SUCCESS)
        {
            /* Write the updated record to flash. */
            ret_code = fds_record_update(&desc, record_id);
            if (ret_code == FDS_SUCCESS)
            {
                wait_for_update_done();
            }
            else if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_update(&desc, record_id);
                wait_for_update_done();
            }
            else
            {
                ERROR_CHECK_HIGH(ret_code);
            }
        }
    }

    I placed a breakpoint at fds_gc() function and could see that it revolves within itself. The variable result is having value FDS_ERR_INTERNAL inside the function queue_process().

    static void queue_process(ret_code_t result)
    {
        static fds_op_t              * m_p_cur_op;  // Current fds operation.
        static nrf_atfifo_item_get_t   m_iget_ctx;  // Queue context for the current operation.
    
        while (true)
        {
            if (m_p_cur_op == NULL)
            {
                // Load the next from the queue if no operation is being executed.
                m_p_cur_op = queue_load(&m_iget_ctx);
            }
    
            /* We can reach here in three ways:
             * from queue_start(): something was just queued
             * from the fstorage event handler: an operation is being executed
             * looping: we only loop if there are operations still in the queue
             *
             * In all these three cases, m_p_cur_op != NULL.
             */
            ASSERT(m_p_cur_op != NULL);
    
            switch (m_p_cur_op->op_code)
            {
                case FDS_OP_INIT:
                    result = init_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_WRITE:
                case FDS_OP_UPDATE:
                    result = write_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_DEL_RECORD:
                case FDS_OP_DEL_FILE:
                    result = delete_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_GC:
                    result = gc_execute(result);
                    break;
    
                default:
                    result = FDS_ERR_INTERNAL;
                    break;
            }
    
            if (result == FDS_OP_EXECUTING)
            {
                // The operation has not completed yet. Wait for the next system event.
                break;
            }
    
            // The operation has completed (either successfully or with an error).
            // - send an event to the user
            // - free the operation buffer
            // - execute any other queued operations
    
            fds_evt_t evt =
            {
                // The operation might have failed for one of the following reasons:
                // FDS_ERR_BUSY              - flash subsystem can't accept the operation
                // FDS_ERR_OPERATION_TIMEOUT - flash subsystem timed out
                // FDS_ERR_CRC_CHECK_FAILED  - a CRC check failed
                // FDS_ERR_NOT_FOUND         - no record found (delete/update)
                .result = (result == FDS_OP_COMPLETED) ? FDS_SUCCESS : result,
            };
    
            event_prepare(m_p_cur_op, &evt);
            event_send(&evt);
    
            // Zero the pointer to the current operation so that this function
            // will fetch a new one from the queue next time it is run.
            m_p_cur_op = NULL;
    
            // The result of the operation must be reset upon re-entering the loop to ensure
            // the next operation won't be affected by eventual errors in previous operations.
            result = NRF_SUCCESS;
    
            // Free the queue element used by the current operation.
            queue_free(&m_iget_ctx);
    
            if (!queue_has_next())
            {
                // No more elements left. Nothing to do.
                break;
            }
        }
    }

    The code is stuck in this while loop. The following exit point conditions are not True and hence its not possible to escape the loop.

    1) 

    if (result == FDS_OP_EXECUTING)
          {
                      // The operation has not completed yet. Wait for the next system event.
                     break;
          }

    2) 

    if (!queue_has_next())

    {

                   // No more elements left. Nothing to do.
                   break;
    }

    Let me know if you want more information. I'm open for a call or video session to share more code.

    /Jagadeep

Reply
  • Hi Terje.

    I'll explain what I'm trying to and what I have found during debug session.

    A) What I want to do?

    I want to store information in the FLASH storage during certain events take place. A single event record is 16 bytes and wants to store at least 500 events (8000 bytes in total). I have assigned FDS_VIRTUAL_PAGE_SIZE (1024) and FDS_VIRTUAL_PAGES 12, way more than needed.  The product life is for 5 years and wanted to test the FLASH endurance (maximum events stored/updated), as the datasheet mention 10,0000 cycles.

    B) What I have done?

      1)  Created a file to store all events.

      2)  An event is stored as a record in the file.

      3)  Successfully created and stored 500 records using fds_record_write().

      4) Update the existing records from the beginning for further records using fds_record_update(). The goal is to set up a ring buffer of 500 records.

      5) An fds_record_update() invalidates the current and creates a new record.  The allocated buffer is utilized completely after around 2000 events and the update function returned FDS_ERR_NO_SPACE_IN_FLASH. As per the instruction, fds_gc() should be used to free the invalidated records and continue with the update function.

    6) While calling the fds_gc() function, doesn't return back to my layer and keeps calling queue_process() function infinitely.

    7) It is estimated that at least 5000 events should be stored (in circular fashion) during the product lifetime and hence it is important to fix this bug. I found the issue while unit testing the event logging module.

    static void fds_evt_handler(const fds_evt_t* p_evt)
    {
        switch (p_evt->id)
        {
            case FDS_EVT_INIT:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    m_fds_initialized = 1;
                    NRF_LOG_INFO("Event: Init");
                }
                break;
            }
            case FDS_EVT_WRITE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    write_done = 1;
                    NRF_LOG_INFO("Event: Write");
                }
                break;
            }
            case FDS_EVT_DEL_RECORD:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    erase_done = 1;
                    NRF_LOG_INFO("Event: Delete");
                }
                break;
            }
            case FDS_EVT_UPDATE:
            {
                if (p_evt->result == FDS_SUCCESS)
                {
                    update_done = 1;
                    NRF_LOG_INFO("Event: Update");
                }
                break;
            }
            case FDS_EVT_GC:
                if (p_evt->result == FDS_SUCCESS)
                {
                    NRF_LOG_INFO("Event: Garbage Collection");
                }
                break;
            default:
                break;
        }
    }
    
    static void wait_for_fds_ready(void)
    {
        while (!m_fds_initialized)
        {
            __WFE();
        }
    }
    
    static void wait_for_write_done(void)
    {
        while (!write_done)
        {
            __WFE();
        }
        write_done = 0;
    }
    
    static void wait_for_update_done(void)
    {
        while (!update_done)
        {
            __WFE();
        }
        update_done = 0;
    }
    
    void write_event()
    {
       ret_code = fds_record_find(record_id->file_id, record_id->key, &desc, &tok);
        /* Write only when file is not found */
        if (ret_code == FDS_ERR_NOT_FOUND)
        {
            ret_code = fds_record_write(&desc, record_id);
            wait_for_write_done();
            if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_write(&desc, record_id);
                wait_for_write_done();
            }
        }
        else if (ret_code == FDS_SUCCESS)
        {
            /* Write the updated record to flash. */
            ret_code = fds_record_update(&desc, record_id);
            if (ret_code == FDS_SUCCESS)
            {
                wait_for_update_done();
            }
            else if (ret_code == FDS_ERR_NO_SPACE_IN_FLASH)
            {
                ret_code = fds_gc();
                ERROR_CHECK_HIGH(ret_code);
                ret_code = fds_record_update(&desc, record_id);
                wait_for_update_done();
            }
            else
            {
                ERROR_CHECK_HIGH(ret_code);
            }
        }
    }

    I placed a breakpoint at fds_gc() function and could see that it revolves within itself. The variable result is having value FDS_ERR_INTERNAL inside the function queue_process().

    static void queue_process(ret_code_t result)
    {
        static fds_op_t              * m_p_cur_op;  // Current fds operation.
        static nrf_atfifo_item_get_t   m_iget_ctx;  // Queue context for the current operation.
    
        while (true)
        {
            if (m_p_cur_op == NULL)
            {
                // Load the next from the queue if no operation is being executed.
                m_p_cur_op = queue_load(&m_iget_ctx);
            }
    
            /* We can reach here in three ways:
             * from queue_start(): something was just queued
             * from the fstorage event handler: an operation is being executed
             * looping: we only loop if there are operations still in the queue
             *
             * In all these three cases, m_p_cur_op != NULL.
             */
            ASSERT(m_p_cur_op != NULL);
    
            switch (m_p_cur_op->op_code)
            {
                case FDS_OP_INIT:
                    result = init_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_WRITE:
                case FDS_OP_UPDATE:
                    result = write_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_DEL_RECORD:
                case FDS_OP_DEL_FILE:
                    result = delete_execute(result, m_p_cur_op);
                    break;
    
                case FDS_OP_GC:
                    result = gc_execute(result);
                    break;
    
                default:
                    result = FDS_ERR_INTERNAL;
                    break;
            }
    
            if (result == FDS_OP_EXECUTING)
            {
                // The operation has not completed yet. Wait for the next system event.
                break;
            }
    
            // The operation has completed (either successfully or with an error).
            // - send an event to the user
            // - free the operation buffer
            // - execute any other queued operations
    
            fds_evt_t evt =
            {
                // The operation might have failed for one of the following reasons:
                // FDS_ERR_BUSY              - flash subsystem can't accept the operation
                // FDS_ERR_OPERATION_TIMEOUT - flash subsystem timed out
                // FDS_ERR_CRC_CHECK_FAILED  - a CRC check failed
                // FDS_ERR_NOT_FOUND         - no record found (delete/update)
                .result = (result == FDS_OP_COMPLETED) ? FDS_SUCCESS : result,
            };
    
            event_prepare(m_p_cur_op, &evt);
            event_send(&evt);
    
            // Zero the pointer to the current operation so that this function
            // will fetch a new one from the queue next time it is run.
            m_p_cur_op = NULL;
    
            // The result of the operation must be reset upon re-entering the loop to ensure
            // the next operation won't be affected by eventual errors in previous operations.
            result = NRF_SUCCESS;
    
            // Free the queue element used by the current operation.
            queue_free(&m_iget_ctx);
    
            if (!queue_has_next())
            {
                // No more elements left. Nothing to do.
                break;
            }
        }
    }

    The code is stuck in this while loop. The following exit point conditions are not True and hence its not possible to escape the loop.

    1) 

    if (result == FDS_OP_EXECUTING)
          {
                      // The operation has not completed yet. Wait for the next system event.
                     break;
          }

    2) 

    if (!queue_has_next())

    {

                   // No more elements left. Nothing to do.
                   break;
    }

    Let me know if you want more information. I'm open for a call or video session to share more code.

    /Jagadeep

Children
Related