Saving coredumps to external flash

Hi,

Chip: nRF52840

OS: nRF Connect / Zephyr: v2.3.0

Problem: Saving coredumps to external flash

We're trying to add a bunch of debugging features to our firmware before field trials.

We've been trying to saving coredumps to external flash, so on the next reboot we can upload them to the cloud.

It seems that Zephyr does support this using "CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION". However this doesn't seem to be supported by nordic chips for a few reasons.

I've seen another question around this subject posted about a year ago. The recommendation was to use memfault, but this seems a bit weird to have to use a commercial cloud solution to simply save coredumps to flash.

What we've tried

Un-supported fields

nordic,qspi-nor.yaml does not include "soc-nv-flash.yaml" which means properties "erase-block-size" & "write-block-size" don't exist. These are required by the flash backend impl.

Using QSPI in fatal handler

After adding the above fields & configuring our pm_static.yml file everything compiles. However when triggering a crash, coredumps are no longer created. It seems like the fatal handler is crashing (no cli outputs).

If i comment out all the flash operators in "coredump_backend_flash_partition.c" such as "flash_area_erase" & "flash_area_write" the fatal handler does run (cli outputs); obviously the coredump is not saved.

I suppose my question is can you use QSPI while in the fatal handler? I'm not sure if the QSPI driver needs re-initing, interrupts need re-enabling etc. 

Summary

Are there any examples where coredumps are being saved to external flash using nrf-sdk?

Cheers.

  • No problem! This is an internal system only, but I will update you on the progress here if there is any. I'm not sure how soon they will be able to prioritize this task. 

  • Thanks you for that. I will look into it. For some reasons there so little documentation on coredump and the coredump documentation flash partition tool from zephyr seems to not dump anything when I force an hardfault. I hate that Nordic is forcing you with Memfault instead of trying to finding a solution for offline debugging

  • Memfault is not intended to replace the debug support in the SDK. Instead, it serves as an additional option for those who want remote device management without setting up and managing their own cloud solution. I've updated my internal feature request to highlight the need for a way to save core dumps to flash.

    Documentation for the Core dump module can be found here: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/2.4.2/zephyr/services/debugging/coredump.html. There is currently no flash backend to save the output to a flash partition.

  • Do you guys plan on implementing it or for now we have to implement our own code to do that 

  • Sorry, there is a flash backend; it's just that our documentation has not been updated to include it yet. It is currently covered in the upstream Zephyr documentation at https://docs.zephyrproject.org/latest/services/debugging/coredump.html 

    I tried testing the coredump module with the Bluetooth: Peripheral LBS sample in nRF Connect SDK v2.4.2 and was able to get it to work after I created this workaround in the flash driver:

    diff --git a/drivers/mpsl/flash_sync/flash_sync_mpsl.c b/drivers/mpsl/flash_sync/flash_sync_mpsl.c
    index eea9cadaa..c37ebd78c 100644
    --- a/drivers/mpsl/flash_sync/flash_sync_mpsl.c
    +++ b/drivers/mpsl/flash_sync/flash_sync_mpsl.c
    @@ -140,9 +140,15 @@ void nrf_flash_sync_set_context(uint32_t duration)
     	_context.request_length_us = duration;
     }
     
    +bool is_in_fault_isr(void)
    +{
    +	uint32_t isr = __get_IPSR();
    +	return (isr >= 3 && isr <= 6);
    +}
    +
     bool nrf_flash_sync_is_required(void)
     {
    -	return mpsl_is_initialized();
    +	return mpsl_is_initialized() && !is_in_fault_isr();
     }
     
     int nrf_flash_sync_exe(struct flash_op_desc *op_desc)

    Summary of changes made to the peripheral LBS sample to enable and test the Coredump module

    Added the following lines to the prj.conf file:

    # Enable coredump with flash backend
    CONFIG_SHELL=y
    CONFIG_DEBUG_COREDUMP_SHELL=y
    CONFIG_DEBUG_COREDUMP=y
    CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
    CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN=y
    CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM=n
    # Must be disabled to allow flash write from exception handler.
    # https://github.com/zephyrproject-rtos/zephyr/issues/59116
    CONFIG_ASSERT=n

    And the following code in main.c to raise a fault when the button is pressed (Button 1 on DK)

    static void do_fault(void)
    {
    	*(uint32_t *) 0xFFFFFFFF = 1;
    }
    
    static void button_changed(uint32_t button_state, uint32_t has_changed)
    {
    	if (has_changed & USER_BUTTON) {
    		uint32_t user_button_state = button_state & USER_BUTTON;
    
    		bt_lbs_send_button_state(user_button_state);
    		app_button_state = user_button_state ? true : false;
    		do_fault();
    		
    	}
    }

    And lastly, the Devictree overlay to allocate the coredump partition in flash (this is for the nRF52840):

    &flash0 {
        partitions {
            storage_partition: partition@f8000 {
                reg = <0x000f8000 0x00004000>;
            };
            coredump_partition: partition@fc000 {
    			label = "coredump_partition";
                reg = <0x000fc000 0x00004000>;
            };
    
        };
    };

    Testing

    1. After the fault has been triggered, connect a serial terminal to the board to access the stored coredump via the shell

    2. Copy the coredump data to a text file. E.g., coredump.log. Then perform step 1 to 4 here https://docs.zephyrproject.org/latest/services/debugging/coredump.html#example 

    Result

    peripheral_lbs_coredump_flash$ arm-zephyr-eabi-gdb build/zephyr/zephyr.elf 
    GNU gdb (Zephyr SDK 0.16.0) 12.1
    Copyright (C) 2022 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    Type "show copying" and "show warranty" for details.
    This GDB was configured as "--host=x86_64-build_pc-linux-gnu --target=arm-zephyr-eabi".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <https://github.com/zephyrproject-rtos/sdk-ng/issues>.
    Find the GDB manual and other documentation resources online at:
        <http://www.gnu.org/software/gdb/documentation/>.
    
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from build/zephyr/zephyr.elf...
    (gdb) target remote localhost:1234
    Remote debugging using localhost:1234
    0x000113b0 in button_changed (button_state=<optimized out>, has_changed=<optimized out>) at ../src/main.c:177
    177     }
    (gdb) bt
    #0  0x000113b0 in button_changed (button_state=<optimized out>, has_changed=<optimized out>) at ../src/main.c:177
    #1  0x00000000 in ?? ()
    (gdb) info registers 
    r0             0xfffffff3          -13
    r1             0x1                 1
    r2             0x1                 1
    r3             0xfffff000          -4096
    r4             0x0                 0
    r5             0x0                 0
    r6             0x0                 0
    r7             0x0                 0
    r8             0x0                 0
    r9             0x0                 0
    r10            0x0                 0
    r11            0x0                 0
    r12            0xffffffff          -1
    sp             0x2000ab18          0x2000ab18 <sys_work_q_stack+2008>
    lr             0x113b1             70577
    pc             0x113b0             0x113b0 <button_changed+28>
    xpsr           0x1000000           16777216
    (gdb) 

    Project

    peripheral_lbs_coredump_flash.zip

Related