having issues with saving coredump to flash or at all

Hi Nordic

I am working with nrf52840 and nrf52832 using ncs v2.8.0

I am trying to save coredump to flash according to instructions on this link - https://docs.nordicsemi.com/bundle/ncs-2.8.0/page/zephyr/services/debugging/coredump.html

I added this to my pm_static_my_board.yml

coredump_partition:
  address: 0xCF000
  size: 0x8000
  region: flash_primary

And this to my_board.overlay

&flash0 {
    /*
     * For more information, see:
     * http: //docs.zephyrproject.org/latest/guides/dts/index.html#flash-partitions
     */
    partitions {
        compatible = "fixed-partitions";
        #address-cells = <1>;
        #size-cells = <1>;

      ...
        coredump_partition: partition@000080000 { //THIS IS NOT LEGIT ADDRESS(END OF FLASH) BUT IT IS NOT TAKEN TO ACOUNT BECAUS PM_STATIC IS
            label = "coredump-partition";
            reg = <0x000080000 DT_SIZE_K(4)>;
        };
    };

A side note is that this is strange that I need to set it in the overlay which is basically ignored because pm_static partitions is the one that actually matters (unless i got something wrong ? )

And this configs to my prj.conf

# Coredump 
CONFIG_DEBUG_COREDUMP=y
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y

In my my_board/my_app/zephyr/.config i see this coredump related configs

CONFIG_ARCH_SUPPORTS_COREDUMP=y
CONFIG_ARCH_SUPPORTS_COREDUMP_THREADS=y

# CONFIG_COREDUMP_DEVICE is not set

CONFIG_DEBUG_THREAD_INFO=y
CONFIG_DEBUG_COREDUMP=y
# CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING is not set
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
# CONFIG_DEBUG_COREDUMP_BACKEND_OTHER is not set
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN is not set
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM is not set
CONFIG_DEBUG_COREDUMP_FLASH_CHUNK_SIZE=64
CONFIG_DEBUG_COREDUMP_THREADS_METADATA=y

I am generating a coredump using this implementation

void trigger_coredump(void)
{
    __ASSERT(0, "Forcing coredump");
}

When i try to read the flash area after generating the coredump with nrfjprog --memrd 0xCF000 --w 32 --n 0x8000
i get all 0xFF

what i am missing ?

I also tried to check myself by replacing CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y

With CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y

Hopping to see the coredump on my open rtt but nothing .. when coredump is triggered prints just stop

What am I missing? Why can't I find a coredump on the flash partition or in the rtt log ?
Can it be that the device does not have the time to write the coredump before the actual crash ? If so, how can I manage that ?
Is there some auto deletion of the flash partition with the coredump so new coredumps can be saved or is it something i have to manage myself after i read the coredump from flash ?

Hope to read you soon

Best regards

Ziv

Top Replies

Parents

0 runsiv 10 months ago

Hi

I will look into your case. Just a quick question to start with. Are you using MCUBOOT also?
Regards

Runar
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 10 months ago in reply to runsiv

any news on that ?
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Tudor B. 10 months ago in reply to ziv123

Hello Ziv.

I'm not a Nordic employee but am currently working on the same thing as you. I've got it working with the serial CLI and am currently struggling to get it working with internal or external flash.

Will keep you updated if I have any breakthroughs.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 10 months ago in reply to Tudor B.

thanks Tudor

p.s. do you know if there is a way to debug what is happening after an assert

cause i have a branch that works with memfault and there i see all the relevant prints plus writing to flash and maybe if i can debug the 2 roots i can find out what is missing in my branch
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 ziv123 10 months ago in reply to Tudor B.

thanks Tudor

p.s. do you know if there is a way to debug what is happening after an assert

cause i have a branch that works with memfault and there i see all the relevant prints plus writing to flash and maybe if i can debug the 2 roots i can find out what is missing in my branch
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Tudor B. 10 months ago in reply to ziv123
It's a bit hard since I assume a hard fault/ stack overflow blocks all further instructions from running. What I found from my experience is that you can inject your own message near your point of interest and see which branch it goes through, other adjacent branches and what conditions you need to trigger them, etc.

For example, given this usage fault:

**** Using Zephyr OS v4.0.99-a0e545cb437a *** [00:00:00.297,698] <inf> flashdisk: Initialize device NAND [00:00:00.297,729] <inf> flashdisk: offset 300000, sector size 512, page size 4096, volume size 4194304 [00:00:14.148,559] <err> os: ***** USAGE FAULT ***** [00:00:14.148,559] <err> os: Attempt to execute undefined instruction [00:00:14.148,590] <err> os: r0/a1: 0x0bad0000 r1/a2: 0x00000000 r2/a3: 0x00000000 [00:00:14.148,590] <err> os: r3/a4: 0xffffffff r12/ip: 0x0004e4bb r14/lr: 0x0001b203 [00:00:14.148,620] <err> os: xpsr: 0x49100000 [00:00:14.148,620] <err> os: s[ 0]: 0x200099e4 s[ 1]: 0x00000000 s[ 2]: 0x00000009 s[ 3]: 0x00021cc7 [00:00:14.148,651] <err> os: s[ 4]: 0x00000001 s[ 5]: 0x00000030 s[ 6]: 0x0005ac30 s[ 7]: 0x0004dd57 [00:00:14.148,651] <err> os: s[ 8]: 0x00000000 s[ 9]: 0x200099e0 s[10]: 0x20009ae0 s[11]: 0x00001972 [00:00:14.148,681] <err> os: s[12]: 0x20005a94 s[13]: 0x0001df3b s[14]: 0x20005a94 s[15]: 0x00001000 [00:00:14.148,681] <err> os: fpscr: 0xffffffff [00:00:14.148,681] <err> os: Faulting instruction address (r15/pc): 0x00017a88 [00:00:14.148,712] <err> os: >>> ZEPHYR FATAL ERROR 36: Unknown error on CPU 0 [00:00:14.148,742] <err> os: Current thread: 0x20002758 (mp_main) [00:00:14.273,651] <err> os: Halting system

I wanna see what triggers it, so I search for: " Attempt to execute undefined instruction" and found it here:

/opt/nordic/ncs/v3.0.0/zephyr/arch/arm/core/cortex_m/fault.c:550:

PR_FAULT_INFO(" Attempt to execute undefined instruction");

inside the function "static uint32_t usage_fault(const struct arch_esf *esf)".

It might seem very basic/ rudimentary, but it helped me overcome various hurdles when working with different Zephyr/ Nordic features.

==============================

Also worth mentioning is this post:
RE: Saving coredumps to external flash

Where they mention that:
"To get the ESF you can override Zephyr's fatal function and simply store the values in retained memory (as the above example shows).

void k_sys_fatal_error_handler(unsigned int reason, const z_arch_esf_t *esf_input)"

Which in my interpretation means that "k_sys_fatal_error_handler()" is the function that you're looking to debug.

==============================

I looked into Memfault conceptually, but it's a much bigger feature (from a ROM and RAM consumption perspective) than simply having the Coredump being saved to flash/ external flash.

I can provide you a working sample for printing it to the serial CLI, using:

CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y

Would that be of any use?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 10 months ago in reply to Tudor B.

Tudor B. said:
For example, given this usage fault:

this is not helping my case, first i don't even get this build in log at my current branch (i meant branch as a git branch from my main development branch which currently works with memfault) i am generating the assertion with __ASSERT(0, ..) so i know where it happens i just want to see that i know to save the coredump into flash and currently it does not .. and like i mentioned even does not prints the logs you mentioned .. so unfortunately no use for this at the moment

Tudor B. said:
Also worth mentioning is this post:
RE: Saving coredumps to external flash

also tried what Vidar Berg did in his attempt and it did not work for me

and i am not sure why he put CONFIG_ASSERT=n .. i want to be able to catch those as well
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 runsiv 10 months ago in reply to ziv123

Hi Ziv and sorry for the delay. From my testing it seems like I'm currently not able to catch asserts with coredump. Triggering another fatal error works like s charm, so I suspect there is something with the configuration of the error fault handler.

Regards

Runar
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Tudor B. 10 months ago in reply to ziv123

For me, the post that I linked was a keystone to getting the _LOGGING version working. It's interesting that you say:

ziv123 said:
so i know where it happens i just want to see that i know to save the coredump into flash

When the Coredump works, you shouldn't do any saving yourself. The Coredump feature itself saves to flash/ external flash.

The way you're generating the error doesn't matter as long as you can get the coredump working. This is why I suggested to first do the _LOGGING option since then you know for sure that the Coredump feature itself works. Afterwards it's just a matter of switching the Coredump feature from _LOGGING to

CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION or CONFIG_DEBUG_COREDUMP_BACKEND_OTHER. This is where I'm stuck at. :)
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 10 months ago in reply to runsiv
hi Runsiv

runsiv said:
catch asserts with coredump

i wonder why there is no problem to save coredump with assertion when using memfault instead of only trying to capture coredump ???

also, tried to generate a crash another way with

void trigger_coredump(void) { *(uint32_t *) 0xFFFFFFFF = 1; // __ASSERT(0, "Forcing coredump"); }

instead of using assert

same results

no logs that are usually build in in zephyr for telling you where the crash happened and off course no coredump saved to memory ..

if there is any farther data i can share to get some direction for this please let me know

i am kind of stuck on this feature which supposed to be a builtin supported feature both in nordic and in zephyr

[ https://docs.nordicsemi.com/bundle/ncs-2.4.2/page/zephyr/services/debugging/coredump.html

docs.zephyrproject.org/.../coredump.html ]

hope to read you soon

best regards

Ziv
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel