EMDS bricking device

Hello,

I am using EMDS to store data to flash under certain conditions. The implementation is pretty basic and follows the specification and example in the SDK (v2.1.2). For the most part it works great, but after the 5th time emds_store is called, the device bricks and must be re-flashed to work again. The device is normally reset after every call to emds_store. emds_store is called in an interrupt context (like the examples). 

I have verified the write/erase etc .. timings are correct for the part. I have tried using 1 or 2 sectors - always the same behaviour.

One clue (maybe) is that I store about 702 bytes of data. A page size is 4096 - which means I can fit about 5 copies of my data in the flash before an erase needs to happen. Could this be the issue? How do  I get around this problem?

Parents
  • Hi, 

    The implementation is pretty basic and follows the specification and example in the SDK (v2.1.2).

    Which example are you referring to? Can you provide the path or example name?

    the device bricks and must be re-flashed to work again.

    What does "bricks" mean? Could you provide any log? Do you prepare EMDS using emds_prepare after reboot?

    Regards,
    Amanda H.

  • I used the bluetooth/mesh/light_ctrl example as a guide to add EMDS to my application.

    To initialize I do the following:

    #define SHARED_DATA_EMDS_ID ((int) 0x100)
    EMDS_STATIC_ENTRY_DEFINE(shared_data_store, SHARED_DATA_EMDS_ID, &m_data_element, sizeof(m_data_element));
    uint32_t shared_data_init(void)
    {
    	int err = emds_init(&app_emds_cb);
    	if (err) {
    		printk("Initializing emds failed (err %d)\n", err);
    		return 1;
    	}
    
    	err = emds_load();
    	if (err) {
    		printk("Restore of emds data failed (err %d)\n", err);
    		return 1;
    	}
    
    	err = emds_prepare();
    	if (err) {
    		printk("Preparation emds failed (err %d)\n", err);
    		return 1;
    	}
    
        LOG_INF("Shared data size = %d %d", sizeof(m_data_records), sizeof(m_data_element));
        return 0;
    }

    Here is my callback ....

    static void app_emds_cb(void)
    {
        NVIC_SystemReset();
    }
    
    

    To trigger the EMDS save-to-flash I call this function:

    uint32_t shared_data_shutdown(void)
    {
        emds_store();
        return 0;
    }

    This triggers the EMDS operation and the emds_store callback triggers the app_emds_cb function which resets the system. Data is successfully recovered for the first 5 times this function is called. 

    The 6th reset (i.e. call to shared_data_shutdown command) 'bricks' the device, meaning that it does not appear that code is running. No RTT logs, resetting the device does not recover this condition, only re-flashing the device does.

    Thanks.

  • Hi, 

    The team will need to investigate this in steps. Firstly, could you test this with NCS v2.1.0? This should not take much time and is essential information before investigating further. Also, is it possible to provide the project that triggers this faulty behavior? It would make the debugging process easier for our team.

    If it's necessary, I can set this case private. Then, your project will be only shared with us. 

    -Amanda H.

  • Hi Amanda,

    I built against NCS v2.1.0 and same behaviour. 

    I made a simple application that calls emds_store on a button press. It saves a generic buffer of 702 bytes. After the 5th press the device is bricked. This was build for the nRF9160 DK, NCS v2.1.0.

    emds_simple.zip

    This is what it output before being unrecoverable (until re-flashed)

    Thanks for your help.

  • Hi, 

    Thank you. Because of the holidays, it might take some time for me to get back to you on this. In the meantime, here are some things you could try to understand better what is happening during the execution: create a partition manager report and, if possible, attach a debugger to the device (for thread info). See the links below:

    Please share your PM report and any relevant finds from the debugging session.

    -Amanda H. 

  • Hi Amanda,

    Partition Report:

    partition_report.txt

    Flash BEFORE device is bricked:

    pre-fail-flash.hex

    Flash AFTER device is bricked:

    post-fail-flash.hex

    After attaching the debugger it looks like the device is in tfm_hal_system_halt. Here is a screen capture of the debugger.

    debug.png

    Thanks

  • Hi, 

    It seems that partial erase is currently not working as expected for the 9160 (non-secure build). If you remove the CONFIG_SOC_FLASH_NRF_PARTIAL_ERASE from the prj.conf the application works fine. The drawback is that the CPU will be halted during the erase operation. For most use cases this will not be an issue but we will investigate further so that partial erasing will be possible in the future. 

    The issue has been reported to the responsible team. For now, by removing the partial erase option, you should be able to continue with the work.

    -Amanda H. 

Reply
  • Hi, 

    It seems that partial erase is currently not working as expected for the 9160 (non-secure build). If you remove the CONFIG_SOC_FLASH_NRF_PARTIAL_ERASE from the prj.conf the application works fine. The drawback is that the CPU will be halted during the erase operation. For most use cases this will not be an issue but we will investigate further so that partial erasing will be possible in the future. 

    The issue has been reported to the responsible team. For now, by removing the partial erase option, you should be able to continue with the work.

    -Amanda H. 

Children
No Data
Related