having issues with saving coredump to flash or at all

Hi Nordic

I am working with nrf52840 and nrf52832 using ncs v2.8.0

I am trying to save coredump to flash according to instructions on this link - https://docs.nordicsemi.com/bundle/ncs-2.8.0/page/zephyr/services/debugging/coredump.html

I added this to my pm_static_my_board.yml

coredump_partition:
  address: 0xCF000
  size: 0x8000
  region: flash_primary

And this to my_board.overlay

&flash0 {
    /*
     * For more information, see:
     * http: //docs.zephyrproject.org/latest/guides/dts/index.html#flash-partitions
     */
    partitions {
        compatible = "fixed-partitions";
        #address-cells = <1>;
        #size-cells = <1>;

      ...
        coredump_partition: partition@000080000 { //THIS IS NOT LEGIT ADDRESS(END OF FLASH) BUT IT IS NOT TAKEN TO ACOUNT BECAUS PM_STATIC IS
            label = "coredump-partition";
            reg = <0x000080000 DT_SIZE_K(4)>;
        };
    };

A side note is that this is strange that I need to set it in the overlay which is basically ignored because pm_static partitions is the one that actually matters (unless i got something wrong ? )

And this configs to my prj.conf

# Coredump 
CONFIG_DEBUG_COREDUMP=y
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y

In my my_board/my_app/zephyr/.config i see this coredump related configs

CONFIG_ARCH_SUPPORTS_COREDUMP=y
CONFIG_ARCH_SUPPORTS_COREDUMP_THREADS=y

# CONFIG_COREDUMP_DEVICE is not set

CONFIG_DEBUG_THREAD_INFO=y
CONFIG_DEBUG_COREDUMP=y
# CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING is not set
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
# CONFIG_DEBUG_COREDUMP_BACKEND_OTHER is not set
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN is not set
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM is not set
CONFIG_DEBUG_COREDUMP_FLASH_CHUNK_SIZE=64
CONFIG_DEBUG_COREDUMP_THREADS_METADATA=y

I am generating a coredump using this implementation 

void trigger_coredump(void)
{
    __ASSERT(0, "Forcing coredump");
}

When i try to read the flash area after generating the coredump with nrfjprog --memrd 0xCF000 --w 32 --n 0x8000
i get all 0xFF 

what i am missing ?

I also tried to check myself by replacing CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y

With CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y

Hopping to see the coredump on my open rtt but nothing .. when coredump is triggered prints just stop

  1. What am I missing? Why can't I find a coredump on the flash partition or in the rtt log ?
  2. Can it be that the device does not have the time to write the coredump before the actual crash ? If so, how can I manage that ?
  3. Is there some auto deletion of the flash partition with the coredump so new coredumps can be saved or is it something i have to manage myself after i read the coredump from flash ? 

Hope to read you soon

Best regards

Ziv

Parents Reply Children
  • hi Runar

    yes, we have MCUboot but i am flashing with vs code extension, though it builds MCUboot i am not sure it flashes with the app .. but anyway, i have a main branch which has the same MCUboot and everything, working with memfault .. and i see the coredump in memory and also in logs via rtt .. i wonder why it is different when i config memfault out and try to save the coredump to the same partition (obviously changed the name of the partition as shown before)

    hope to read you soon

    best regards

    Ziv

  • Hello Ziv.

    I'm not a Nordic employee but am currently working on the same thing as you. I've got it working with the serial CLI and am currently struggling to get it working with internal or external flash.

    Will keep you updated if I have any breakthroughs.

  • thanks Tudor

    p.s. do you know if there is a way to debug what is happening after an assert

    cause i have a branch that works with memfault and there i see all the relevant prints plus writing to flash and maybe if i can debug the 2 roots i can find out what is missing in my branch

  • It's a bit hard since I assume a hard fault/ stack overflow blocks all further instructions from running. What I found from my experience is that you can inject your own message near your point of interest and see which branch it goes through, other adjacent branches and what conditions you need to trigger them, etc.

    For example, given this usage fault:

    **** Using Zephyr OS v4.0.99-a0e545cb437a ***
    [00:00:00.297,698] <inf> flashdisk: Initialize device NAND
    [00:00:00.297,729] <inf> flashdisk: offset 300000, sector size 512, page size 4096, volume size 4194304
    [00:00:14.148,559] <err> os: ***** USAGE FAULT *****
    [00:00:14.148,559] <err> os:   Attempt to execute undefined instruction
    [00:00:14.148,590] <err> os: r0/a1:  0x0bad0000  r1/a2:  0x00000000  r2/a3:  0x00000000
    [00:00:14.148,590] <err> os: r3/a4:  0xffffffff r12/ip:  0x0004e4bb r14/lr:  0x0001b203
    [00:00:14.148,620] <err> os:  xpsr:  0x49100000
    [00:00:14.148,620] <err> os: s[ 0]:  0x200099e4  s[ 1]:  0x00000000  s[ 2]:  0x00000009  s[ 3]:  0x00021cc7
    [00:00:14.148,651] <err> os: s[ 4]:  0x00000001  s[ 5]:  0x00000030  s[ 6]:  0x0005ac30  s[ 7]:  0x0004dd57
    [00:00:14.148,651] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x200099e0  s[10]:  0x20009ae0  s[11]:  0x00001972
    [00:00:14.148,681] <err> os: s[12]:  0x20005a94  s[13]:  0x0001df3b  s[14]:  0x20005a94  s[15]:  0x00001000
    [00:00:14.148,681] <err> os: fpscr:  0xffffffff
    [00:00:14.148,681] <err> os: Faulting instruction address (r15/pc): 0x00017a88
    [00:00:14.148,712] <err> os: >>> ZEPHYR FATAL ERROR 36: Unknown error on CPU 0
    [00:00:14.148,742] <err> os: Current thread: 0x20002758 (mp_main)
    [00:00:14.273,651] <err> os: Halting system

    I wanna see what triggers it, so I search for: " Attempt to execute undefined instruction" and found it here:

    /opt/nordic/ncs/v3.0.0/zephyr/arch/arm/core/cortex_m/fault.c:550:

    PR_FAULT_INFO(" Attempt to execute undefined instruction");
    inside the function "static uint32_t usage_fault(const struct arch_esf *esf)".

    It might seem very basic/ rudimentary, but it helped me overcome various hurdles when working with different Zephyr/ Nordic features.

    ==============================

    Also worth mentioning is this post:
    RE: Saving coredumps to external flash 

    Where they mention that:
    "To get the ESF you can override Zephyr's fatal function and simply store the values in retained memory (as the above example shows). 

    void k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t *esf_input)"

    Which in my interpretation means that "k_sys_fatal_error_handler()" is the function that you're looking to debug.

    ==============================

    I looked into Memfault conceptually, but it's a much bigger feature (from a ROM and RAM consumption perspective) than simply having the Coredump being saved to flash/ external flash.

    I can provide you a working sample for printing it to the serial CLI, using:

    CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y
    Would that be of any use?
Related