Saving coredumps to external flash

Hi,

Chip: nRF52840

OS: nRF Connect / Zephyr: v2.3.0

Problem: Saving coredumps to external flash

We're trying to add a bunch of debugging features to our firmware before field trials.

We've been trying to saving coredumps to external flash, so on the next reboot we can upload them to the cloud.

It seems that Zephyr does support this using "CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION". However this doesn't seem to be supported by nordic chips for a few reasons.

I've seen another question around this subject posted about a year ago. The recommendation was to use memfault, but this seems a bit weird to have to use a commercial cloud solution to simply save coredumps to flash.

What we've tried

Un-supported fields

nordic,qspi-nor.yaml does not include "soc-nv-flash.yaml" which means properties "erase-block-size" & "write-block-size" don't exist. These are required by the flash backend impl.

Using QSPI in fatal handler

After adding the above fields & configuring our pm_static.yml file everything compiles. However when triggering a crash, coredumps are no longer created. It seems like the fatal handler is crashing (no cli outputs).

If i comment out all the flash operators in "coredump_backend_flash_partition.c" such as "flash_area_erase" & "flash_area_write" the fatal handler does run (cli outputs); obviously the coredump is not saved.

I suppose my question is can you use QSPI while in the fatal handler? I'm not sure if the QSPI driver needs re-initing, interrupts need re-enabling etc. 

Summary

Are there any examples where coredumps are being saved to external flash using nrf-sdk?

Cheers.

  • Hello,

    I don't think it's a good idea to rely on Zephyr drivers such as QSPI NOR when you are in the fault handler, because you don't know what state the system will be in. Could it be an alternative to load the coredump to a "no init" section and have it written to flash on subsequent reboot? 

    I've used this approach when debugging WDT timeouts in the past (the WDT does not give you enough time to write anything to flash before resetting):

    __noinit static z_arch_esf_t esf;
    __noinit static uint32_t esf_crc;
    void dump_stack(uint32_t *p_msp)
    {
        /* Store stack frame along with a checksum value to the __noinit section in RAM  */
        memcpy(&esf, p_msp, sizeof(esf));
        esf_crc = crc32_ieee((uint8_t *)&esf, sizeof(esf));
        /* Wait for the impending Watchdog reset */
        while (1);
    }
    #if WDT_ALLOW_CALLBACK
    static void wdt_callback(const struct device *wdt_dev, int channel_id)
    {
        /* Get current stack frame from the process stack. 
         * 
         * TODO: implement logic to determine if the application
         * was running in thread or handler mode prior to the WDT interrupt.
         * For handler mode we would have to use the main stack pointer instead. 
         */
        __ASM(" mrs r0, psp          \n"
              " ldr r3, = dump_stack \n"
              " bx r3                \n");
    }
    #endif /* WDT_ALLOW_CALLBACK */
    /* To be called on startup to check if a new exception frame has been stored by our wdt_callback() */
    void wdt_startup_check(void)
    {
        uint32_t computed_crc = crc32_ieee((uint8_t *)&esf, sizeof(esf));
        if (computed_crc == esf_crc) {
            printk("Exception stack frame:\n\r");
            printk("r0/a1:  0x%08x  r1/a2:  0x%08x  r2/a3:  0x%08x\n\r", esf.basic.a1,
                esf.basic.a2, esf.basic.a3);
            printk("r3/a4:  0x%08x r12/ip:  0x%08x r14/lr:  0x%08x\n\r", esf.basic.a4,
                esf.basic.ip, esf.basic.lr);
            printk("xpsr:  0x%08x  pc: 0x%08x\n\r", esf.basic.xpsr, esf.basic.pc);
            esf_crc = 0; // Invalidate CRC
        } else {
            printk("CRC mismatch\n\r");
        }
    }

    Cheers,

    Vidar

  • Hi,

    Yes i suppose thats a good solution. I was looking at Zephyr's new flash simulator https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/flash/flash_simulator.c but its v3.4 which nrf-sdk doesn't support yet.

    My only concern with this was the sram usage if we were to take a whole dump and not just the pointers.

    I'll have a play with this tomorrow.

    Cheers

  • I've got this working, personally i think is area is lacking in nRF. With everyone shipping thousands of IoT devices with no remote debugging capability seems silly. Surely we shouldn't all keep re-inventing the wheel. I'll see if i get time to write a module which can implement everything. 

    It would still be good to be able to write directly to flash/QSPI. As this way we could take a full memory dump.

    Anyways, few pointers for anyone else interested:

    If your using nrf52 there's a really good example of how to use retained memory here: https://github.com/nrfconnect/sdk-zephyr/blob/main/samples/boards/nrf/system_off/src/retained.c

    To get the ESF you can override Zephyr's fatal function and simply store the values in retained memory (as the above example shows). 

    void k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t *esf_input)

    To save the core dump you'll have to create a file called "coredump_backend_empty.c" using this as a template https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/subsys/debug/coredump_backends/src/coredump_backend_empty.c

    You can then simply "memcpy" the coredump in function:

    static void coredump_empty_backend_buffer_output(uint8_t *c, size_t buflen)

    I use the following settings to keep the coredump small enough for RAM retention

    CONFIG_DEBUG_COREDUMP=y
    CONFIG_DEBUG_COREDUMP_BACKEND_OTHER=y
    CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM=n
    CONFIG_DEBUG_COREDUMP_SHELL=y
    CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN=y

    Hope that helps someone.

  • Thank you for taking the time to provide this feedback. I will report it as a feature request internally.

    motters said:
    It would still be good to be able to write directly to flash/QSPI. As this way we could take a full memory dump.

    I found that we support writing core dumps to the internal flash in the Memfault module, which can be found here: https://github.com/nrfconnect/sdk-nrf/blob/main/modules/memfault-firmware-sdk/memfault_flash_coredump_storage.c . I suppose the same approach could work with QSPI (i.e. use the nrfx driver directly instead of the Zephyr NOR driver).

  • Thanks. Apologies late on this!

    Would be great to know if this feature does go anywhere. Is there anyway to track it's progress or rejection? Or is it an internal system only.

    Yes, i might try giving the nrfx drivers a go. It'd be really useful to collect this data.

Related