FDS_EVT_GC results in NRF_SUCCESS but with remaining freeable words

Hello, 

We have nRF52832 in production on a custom board. Recently, we created a new functionality with the aim to handle flash more efficiently, and to avoid the case where peer manager will not have enough space and it will erase bonds. So, every time before saving any kind of record, words_used + new record size is checked that is less than the determined limit and then saving in flash is allowed. We also call two times a day the fds_gc, as long as there is no BLE connection at the time. 

The problem is that from many clients devices we receive logs that we have set in order to track gc behaviour. This kind of log is set after FDS_EVT_GC and if this event returns NRF_SUCCESS. In this case we call fds_stat to witness the result of garbage collector, but unfortunately, it returns the same number of freeable words at every gc call. We have this log from many users for days, and we do not know why this happens, and why it doesn't delete all freeable words but still returns nrf_success? 
When I say the same number I mean same number again and again for each user, e.g. one user has 1458, another 2012, etc. 

Why does this happen?
How can we solve this problem? 

Thank you in advance.

Dimitra

Parents
  • Hi,

    Which nRF5SDK version are you using?

    Would it be possible to share a minimal sample that would reproduce this issue? Alternatively share a flash dump before and after garbage collection using:

    nrfjprog --readcode dump.hex

    I can make the case private first if you prefer it. 

    regards

    Jared

  • Hello, 

    We use nRF5 SDK 17.1.0, softdevice s132, and SEGGER Embedded Studio. 

    I am currently running tests to reproduce it myself, as I said we have seen it mostly on release on client's devices in which we have some logs to check the functionality remotely. 
    As soon as I confirm how to reproduce the problem, I will share it with you. 

    In the meanwhile, could you please explain to me what you mean by flash dump and how to do what you describe with this spesific command too? (I haven't used command line, I edit on SEGGER and erase/write hex through nrf Connect app programmer). 

    Regards, 
    Dimitra

  • Ok, I run a test where the error occurred because I turned off power after flash was full. 
    Here are the files: 
    before1.hex

    after.hex

    I hope it helps. 

    Do you think I should do soft-reset when it gc returns SUCCESS but there are freeable remaining records?

    If you cannot find something that would help me, do you think it is ok to do soft reset when it happens?

    Thank you
    Dimitra

  • Hi,

    Similar to your observation, based on your flash dump, I can see that the GC for some reason is not performed 100%. This can happen if a record fail CRC. Do you have CRC enabled and could you try to disable it and see if you're still able to reproduce this issue?

    DimitraN said:
    Do you think I should do soft-reset when it gc returns SUCCESS but there are freeable remaining records?

    If you cannot find something that would help me, do you think it is ok to do soft reset when it happens?

    I don't see how a soft-reset or any reset would have any effect on the garbage collection after the fds event handler has retuned with a FDS_EVT_GC.

    regards

    Jared 

  • In my sdk_config.h file I have these settings:

    FDS_CRC_CHECK_ON_READ 0
    FDS_CRC_CHECK_ON_WRITE 0
    CRC16_ENABLED 1
    CRC32_ENABLED 0

    Is this the one you are talking about or there is another configuration missing?
    I will disable CRC16_ENABLED and try to reproduce it. 

    I don't see how a soft-reset or any reset would have any effect on the garbage collection after the fds event handler has retuned with a FDS_EVT_GC.

    I was thinking about calling fds_gc again after a reset which would hopefully work normally, but maybe it is not a safe approach..


    Kind regards, 
    Dimitra

  • Hi,

    Yes and set FDS_CRC_CHECK_ON_READ to 0.

    DimitraN said:
    I was thinking about calling fds_gc again after a reset which would hopefully work normally, but maybe it is not a safe approach..

    I'm not sure if it has any effect but I can't see how calling gc after a reset would effect the flash negatively except for the added time. What's important is that you wait for the FDS events to be generated before you reset as the module is asynchronous. 

    regards
    Jared

  • Hello, 

    It looks like it didn't work.. 

    I run the test again and after getting full flash I turned off the power in order to try to reproduce the error. I turned on the power again, connected, collected data and then gc was called and returned success with 2002 freeable words. 

    But this time I noticed that after collecting all data, peer manager tried to clean flash. So could it be maybe that I call gc at the same time peer manager does or right after and there is a conflict or something? 

    		// mobile is connected
    00> <info> app: No more records were found to send!  Valid Measurements: 0
    		// sampling
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    00> <info> app: Successfully wrote Unix time: 1739524052
    00> <info> app: Gmt: 2
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> app: ## FULL FLASH ##		// log that states full flash state - records were just deleted, in some seconds this goes away
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    00> <info> app: Valid Measurements: 0. Errors saved: 0
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    00> <info> app: Disconnected, reason 0x13.		
    		// mobile diconnected: this means that GC is called by me at this point if there are freeable words
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flating to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> ting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> ived (FDS_ERR_OPERATION_TIMEOUT)
    00> ash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> ived (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC rece<info> app: Fast advertising.
    00> <info> app: Event: FDS_EVT_GC received (NRF_SUCCESS)
    		// below logs: (freeable_words, words_used)
    00> <info> app: AFTER GC STAT: (2002, 2092)  (origin 4)
    00> <info> app: AFTER GC STAT UNDER LIMIT: (2002, 2092)
    		// sampling continues... 
    00> <info> app: No 2. call of GC	// GC is called again because of timestamp (this happens at midnight)
    00> <info> app: Unix limit is: 1739570400
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> app: Event: FDS_EVT_GC received (NRF_SUCCESS)
    00> <info> app: AFTER GC STAT: (2008, 2098)  (origin 4)
    00> <info> app: AFTER GC STAT UNDER LIMIT: (2008, 2098)
    		// sampling
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    		// sampling
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)


    Is there a way that I have knowledge that gc has been already called from peer manager and so I do not call it again while it is running ? 
    Or perhaps this happens if GC is called while there is another active action in flash like writing/updating a record? 

    Finally, could you explain me more about CRC? Why should I set it enabled or disabled? and now that it was reproduced should I enable CRC16_ENABLED again?

    Kind regards, 
    Dimitra
     

Reply
  • Hello, 

    It looks like it didn't work.. 

    I run the test again and after getting full flash I turned off the power in order to try to reproduce the error. I turned on the power again, connected, collected data and then gc was called and returned success with 2002 freeable words. 

    But this time I noticed that after collecting all data, peer manager tried to clean flash. So could it be maybe that I call gc at the same time peer manager does or right after and there is a conflict or something? 

    		// mobile is connected
    00> <info> app: No more records were found to send!  Valid Measurements: 0
    		// sampling
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    00> <info> app: Successfully wrote Unix time: 1739524052
    00> <info> app: Gmt: 2
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> app: ## FULL FLASH ##		// log that states full flash state - records were just deleted, in some seconds this goes away
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    00> <info> app: Valid Measurements: 0. Errors saved: 0
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> peer_manager_handler: Peer data updated in flash: peer_id: 0, data_id: Local database, action: Update
    00> <info> app: Disconnected, reason 0x13.		
    		// mobile diconnected: this means that GC is called by me at this point if there are freeable words
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flating to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> ting to clean flash.
    00> Logs dropped (9)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (9)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (9)
    00> <info> peer_manager_handler: Attempash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> ived (FDS_ERR_OPERATION_TIMEOUT)
    00> ash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> ived (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC received (FDS_ERR_OPERATION_TIMEOUT)
    00> Logs dropped (10)
    00> <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    00> Logs dropped (10)
    00> <info> peer_manager_handler: Attempting to clean flash.
    00> Logs dropped (10)
    00> <info> app: Event: FDS_EVT_GC rece<info> app: Fast advertising.
    00> <info> app: Event: FDS_EVT_GC received (NRF_SUCCESS)
    		// below logs: (freeable_words, words_used)
    00> <info> app: AFTER GC STAT: (2002, 2092)  (origin 4)
    00> <info> app: AFTER GC STAT UNDER LIMIT: (2002, 2092)
    		// sampling continues... 
    00> <info> app: No 2. call of GC	// GC is called again because of timestamp (this happens at midnight)
    00> <info> app: Unix limit is: 1739570400
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    00> <info> app: Event: FDS_EVT_GC received (NRF_SUCCESS)
    00> <info> app: AFTER GC STAT: (2008, 2098)  (origin 4)
    00> <info> app: AFTER GC STAT UNDER LIMIT: (2008, 2098)
    		// sampling
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)
    		// sampling
    00> <info> app: Event: FDS_EVT_UPDATE received (NRF_SUCCESS)


    Is there a way that I have knowledge that gc has been already called from peer manager and so I do not call it again while it is running ? 
    Or perhaps this happens if GC is called while there is another active action in flash like writing/updating a record? 

    Finally, could you explain me more about CRC? Why should I set it enabled or disabled? and now that it was reproduced should I enable CRC16_ENABLED again?

    Kind regards, 
    Dimitra
     

Children
  • Hi,

    DimitraN said:
    Is there a way that I have knowledge that gc has been already called from peer manager and so I do not call it again while it is running ? 

    I don't think that is possible with the peer manager. 

    DimitraN said:
    Is there a way that I have knowledge that gc has been already called from peer manager and so I do not call it again while it is running ? 
    Or perhaps this happens if GC is called while there is another active action in flash like writing/updating a record? 

    Essentially you should only call the garbage collection once the filesystem is full to minimize flash wear. 

    Another plausible reason for the GC skipping pages is if the record is already open when the garbage collection is started.

    From the documentation for freeable words:

        * The actual amount of space freed by garbage collection might be less than this value if
        * records are open while garbage collection is run.

    DimitraN said:
    Finally, could you explain me more about CRC? Why should I set it enabled or disabled? and now that it was reproduced should I enable CRC16_ENABLED again?

    It's an extra added layer to enable integrity of data to ensure that the data hasn't been corrupted. While we debug this issue, you can have it disabled. 

    regards

    Jared 

  • Hello, 

    Another plausible reason for the GC skipping pages is if the record is already open when the garbage collection is started.

    Usually the number of freeable left is a lot larger than a record, but it is close to the size of a virtual page which is 2048 words. 
    I found this at fds.h at function fds_record_close

    * Closing a record allows garbage collection to run on the virtual page in which the record is
    * stored (if no other records remain open on that page).

    I think I found what the problem was! 
    After a reset (or a power-up) I call some function to retrieve data from flash to set some variables. There we didn't close one record so the GC would not run on its page. 
    Probably this happens to clients that leave the device for a long time without charge and it eventually turns off.
    Also I had a call of GC right after updating another record, which probably make them run simultaneously or at least GC might run before finishing the update and maybe this causes the problem again. 
    I have confirmed at least twice that this error has not occured after correcting the above . 

    Thank you very much for your help! 

    Regards, 
    Dimitra 

  • Hi Nordic Team,

    I will continue on this thread because I have similar issue, but my problem is that I can not clear FDS with GC. Problem is that some users can not connect anymore to the device because it constantly returning this warning after successful pair:

    <info> app: Connected
    <info> app: Data len is set to 0xF4(244)
    <info> app: Passkey: 188154
    <info> app: BLE_GAP_EVT_LESC_DHKEY_REQUEST
    <info> nrf_ble_lesc: Calling sd_ble_gap_lesc_dhkey_reply on conn_handle: 0
    <warning> peer_manager_handler: Flash storage is full
    <info> peer_manager_handler: Attempting to clean flash.
    <info> peer_manager_handler: Connection secured: role: Peripheral, conn_handle: 0, procedure: Bonding
    <info> app: Link secured. Role: 1. conn_handle: 0, Procedure: 1
    <info> app: BLE_GAP_EVT_AUTH_STATUS: status=0x0 bond=0x1 lv4: 1 kdist_own:0x3 kdist_peer:0x2
    <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    <info> peer_manager_handler: Attempting to clean flash.
    <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    <info> peer_manager_handler: Attempting to clean flash.

    If i try to disconnect and connect again then I am in the loop of this warning again:

    <info> app: Disconnected
    <info> app: Connected
    <info> peer_manager_handler: Connection security failed: role: Peripheral, conn_handle: 0x0, procedure: Encryption, error: 4102
    <warning> peer_manager_handler: Disconnecting conn_handle 0.
    <info> app: Disconnected
    <info> app: Connected
    <info> peer_manager_handler: Connection security failed: role: Peripheral, conn_handle: 0x0, procedure: Encryption, error: 4102
    <warning> peer_manager_handler: Disconnecting conn_handle 0.
    <info> app: Disconnected
    <info> app: Connected
    <info> peer_manager_handler: Connection security failed: role: Peripheral, conn_handle: 0x0, procedure: Encryption, error: 4102
    <warning> peer_manager_handler: Disconnecting conn_handle 0.
    <info> app: Disconnected
    <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    <info> peer_manager_handler: Attempting to clean flash.
    <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    <info> peer_manager_handler: Attempting to clean flash.
    <warning> peer_manager_handler: Flash garbage collection failed with error Unknown error code.
    <info> peer_manager_handler: Attempting to clean flash.

    I tried to debug ant check the FDS statistics and got this output:

    <info> app: Flash usage statistics:
    <info> app: Pages available: 3
    <info> app: Open records: 0
    <info> app: Valid records: 0
    <info> app: Dirty records: 92
    <info> app: Words reserved: 0
    <info> app: Words used: 2048

    Do you maybe have an Idea what is happening here?

    Regards.


    Seba

Related