EMDS bricking device

Hello,

I am using EMDS to store data to flash under certain conditions. The implementation is pretty basic and follows the specification and example in the SDK (v2.1.2). For the most part it works great, but after the 5th time emds_store is called, the device bricks and must be re-flashed to work again. The device is normally reset after every call to emds_store. emds_store is called in an interrupt context (like the examples).

I have verified the write/erase etc .. timings are correct for the part. I have tried using 1 or 2 sectors - always the same behaviour.

One clue (maybe) is that I store about 702 bytes of data. A page size is 4096 - which means I can fit about 5 copies of my data in the flash before an erase needs to happen. Could this be the issue? How do I get around this problem?

Top Replies

Amanda Hsieh over 2 years ago in reply to SeaBass +1 verified

Hi, It seems that partial erase is currently not working as expected for the 9160 (non-secure build). If you remove the CONFIG_SOC_FLASH_NRF_PARTIAL_ERASE from the prj.conf the application works fine…

Parents

0 Amanda Hsieh over 2 years ago

Hi,

The implementation is pretty basic and follows the specification and example in the SDK (v2.1.2).

Which example are you referring to? Can you provide the path or example name?

the device bricks and must be re-flashed to work again.

What does "bricks" mean? Could you provide any log? Do you prepare EMDS using emds_prepare after reboot?

Regards,
Amanda H.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 SeaBass over 2 years ago in reply to Amanda Hsieh

I used the bluetooth/mesh/light_ctrl example as a guide to add EMDS to my application.

To initialize I do the following:

#define SHARED_DATA_EMDS_ID ((int) 0x100)
EMDS_STATIC_ENTRY_DEFINE(shared_data_store, SHARED_DATA_EMDS_ID, &m_data_element, sizeof(m_data_element));
uint32_t shared_data_init(void)
{
	int err = emds_init(&app_emds_cb);
	if (err) {
		printk("Initializing emds failed (err %d)\n", err);
		return 1;
	}

	err = emds_load();
	if (err) {
		printk("Restore of emds data failed (err %d)\n", err);
		return 1;
	}

	err = emds_prepare();
	if (err) {
		printk("Preparation emds failed (err %d)\n", err);
		return 1;
	}

    LOG_INF("Shared data size = %d %d", sizeof(m_data_records), sizeof(m_data_element));
    return 0;
}

Here is my callback ....

static void app_emds_cb(void)
{
    NVIC_SystemReset();
}

To trigger the EMDS save-to-flash I call this function:

uint32_t shared_data_shutdown(void)
{
    emds_store();
    return 0;
}

This triggers the EMDS operation and the emds_store callback triggers the app_emds_cb function which resets the system. Data is successfully recovered for the first 5 times this function is called.

The 6th reset (i.e. call to shared_data_shutdown command) 'bricks' the device, meaning that it does not appear that code is running. No RTT logs, resetting the device does not recover this condition, only re-flashing the device does.

Thanks.

0 Amanda Hsieh over 2 years ago in reply to SeaBass

Hi,

The team will need to investigate this in steps. Firstly, could you test this with NCS v2.1.0? This should not take much time and is essential information before investigating further. Also, is it possible to provide the project that triggers this faulty behavior? It would make the debugging process easier for our team.

If it's necessary, I can set this case private. Then, your project will be only shared with us.

-Amanda H.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 SeaBass over 2 years ago in reply to Amanda Hsieh

Hi Amanda,

I built against NCS v2.1.0 and same behaviour.

I made a simple application that calls emds_store on a button press. It saves a generic buffer of 702 bytes. After the 5th press the device is bricked. This was build for the nRF9160 DK, NCS v2.1.0.

emds_simple.zip

This is what it output before being unrecoverable (until re-flashed)

Thanks for your help.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Amanda Hsieh over 2 years ago in reply to SeaBass
Hi,

Thank you. Because of the holidays, it might take some time for me to get back to you on this. In the meantime, here are some things you could try to understand better what is happening during the execution: create a partition manager report and, if possible, attach a debugger to the device (for thread info). See the links below:

PM report: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/scripts/partition_manager/partition_manager.html#partition-manager-report

Debugging: Do west attach (or west debug) to attach the debugger to the device after it hanged: https://docs.zephyrproject.org/latest/develop/west/build-flash-debug.html#id2

To dump flash in between reboots, do: nrfjprog s <snr> --readcode <file>. For nRF9160, they might also need the -coprocessor <coprocessor> option

Check that CONFIG_RESET_ON_FATAL_ERROR is disabled before debugging (otherwise, the device will reboot if some hardware fault or assertion happens), and remember to enable CONFIG_DEBUG_THREAD_INFO. Then, once connected to the device through the debugger, type info thread to see all threads.

Please share your PM report and any relevant finds from the debugging session.

-Amanda H.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 SeaBass over 2 years ago in reply to Amanda Hsieh

Hi Amanda,

Partition Report:

partition_report.txt

Flash BEFORE device is bricked:

pre-fail-flash.hex

Flash AFTER device is bricked:

post-fail-flash.hex

After attaching the debugger it looks like the device is in tfm_hal_system_halt. Here is a screen capture of the debugger.

debug.png

Thanks
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
+1 Amanda Hsieh over 2 years ago in reply to SeaBass

Hi,

It seems that partial erase is currently not working as expected for the 9160 (non-secure build). If you remove the CONFIG_SOC_FLASH_NRF_PARTIAL_ERASE from the prj.conf the application works fine. The drawback is that the CPU will be halted during the erase operation. For most use cases this will not be an issue but we will investigate further so that partial erasing will be possible in the future.

The issue has been reported to the responsible team. For now, by removing the partial erase option, you should be able to continue with the work.

-Amanda H.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Reject Answer

Cancel

Reply

+1 Amanda Hsieh over 2 years ago in reply to SeaBass

Hi,

It seems that partial erase is currently not working as expected for the 9160 (non-secure build). If you remove the CONFIG_SOC_FLASH_NRF_PARTIAL_ERASE from the prj.conf the application works fine. The drawback is that the CPU will be halted during the erase operation. For most use cases this will not be an issue but we will investigate further so that partial erasing will be possible in the future.

The issue has been reported to the responsible team. For now, by removing the partial erase option, you should be able to continue with the work.

-Amanda H.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Reject Answer

Cancel

Children

No Data