Saving coredumps to external flash

Hi,

Chip: nRF52840

OS: nRF Connect / Zephyr: v2.3.0

Problem: Saving coredumps to external flash

We're trying to add a bunch of debugging features to our firmware before field trials.

We've been trying to saving coredumps to external flash, so on the next reboot we can upload them to the cloud.

It seems that Zephyr does support this using "CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION". However this doesn't seem to be supported by nordic chips for a few reasons.

I've seen another question around this subject posted about a year ago. The recommendation was to use memfault, but this seems a bit weird to have to use a commercial cloud solution to simply save coredumps to flash.

What we've tried

Un-supported fields

nordic,qspi-nor.yaml does not include "soc-nv-flash.yaml" which means properties "erase-block-size" & "write-block-size" don't exist. These are required by the flash backend impl.

Using QSPI in fatal handler

After adding the above fields & configuring our pm_static.yml file everything compiles. However when triggering a crash, coredumps are no longer created. It seems like the fatal handler is crashing (no cli outputs).

If i comment out all the flash operators in "coredump_backend_flash_partition.c" such as "flash_area_erase" & "flash_area_write" the fatal handler does run (cli outputs); obviously the coredump is not saved.

I suppose my question is can you use QSPI while in the fatal handler? I'm not sure if the QSPI driver needs re-initing, interrupts need re-enabling etc. 

Summary

Are there any examples where coredumps are being saved to external flash using nrf-sdk?

Cheers.

Parents
  • Hello,

    I don't think it's a good idea to rely on Zephyr drivers such as QSPI NOR when you are in the fault handler, because you don't know what state the system will be in. Could it be an alternative to load the coredump to a "no init" section and have it written to flash on subsequent reboot? 

    I've used this approach when debugging WDT timeouts in the past (the WDT does not give you enough time to write anything to flash before resetting):

    __noinit static z_arch_esf_t esf;
    __noinit static uint32_t esf_crc;
    void dump_stack(uint32_t *p_msp)
    {
        /* Store stack frame along with a checksum value to the __noinit section in RAM  */
        memcpy(&esf, p_msp, sizeof(esf));
        esf_crc = crc32_ieee((uint8_t *)&esf, sizeof(esf));
        /* Wait for the impending Watchdog reset */
        while (1);
    }
    #if WDT_ALLOW_CALLBACK
    static void wdt_callback(const struct device *wdt_dev, int channel_id)
    {
        /* Get current stack frame from the process stack. 
         * 
         * TODO: implement logic to determine if the application
         * was running in thread or handler mode prior to the WDT interrupt.
         * For handler mode we would have to use the main stack pointer instead. 
         */
        __ASM(" mrs r0, psp          \n"
              " ldr r3, = dump_stack \n"
              " bx r3                \n");
    }
    #endif /* WDT_ALLOW_CALLBACK */
    /* To be called on startup to check if a new exception frame has been stored by our wdt_callback() */
    void wdt_startup_check(void)
    {
        uint32_t computed_crc = crc32_ieee((uint8_t *)&esf, sizeof(esf));
        if (computed_crc == esf_crc) {
            printk("Exception stack frame:\n\r");
            printk("r0/a1:  0x%08x  r1/a2:  0x%08x  r2/a3:  0x%08x\n\r", esf.basic.a1,
                esf.basic.a2, esf.basic.a3);
            printk("r3/a4:  0x%08x r12/ip:  0x%08x r14/lr:  0x%08x\n\r", esf.basic.a4,
                esf.basic.ip, esf.basic.lr);
            printk("xpsr:  0x%08x  pc: 0x%08x\n\r", esf.basic.xpsr, esf.basic.pc);
            esf_crc = 0; // Invalidate CRC
        } else {
            printk("CRC mismatch\n\r");
        }
    }

    Cheers,

    Vidar

  • Hi,

    Yes i suppose thats a good solution. I was looking at Zephyr's new flash simulator https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/flash/flash_simulator.c but its v3.4 which nrf-sdk doesn't support yet.

    My only concern with this was the sram usage if we were to take a whole dump and not just the pointers.

    I'll have a play with this tomorrow.

    Cheers

Reply Children
No Data
Related