having issues with saving coredump to flash or at all

ziv123 9 months ago

Hi Nordic

I am working with nrf52840 and nrf52832 using ncs v2.8.0

I am trying to save coredump to flash according to instructions on this link - https://docs.nordicsemi.com/bundle/ncs-2.8.0/page/zephyr/services/debugging/coredump.html

I added this to my pm_static_my_board.yml

coredump_partition:
  address: 0xCF000
  size: 0x8000
  region: flash_primary

And this to my_board.overlay

&flash0 {
    /*
     * For more information, see:
     * http: //docs.zephyrproject.org/latest/guides/dts/index.html#flash-partitions
     */
    partitions {
        compatible = "fixed-partitions";
        #address-cells = <1>;
        #size-cells = <1>;

      ...
        coredump_partition: partition@000080000 { //THIS IS NOT LEGIT ADDRESS(END OF FLASH) BUT IT IS NOT TAKEN TO ACOUNT BECAUS PM_STATIC IS
            label = "coredump-partition";
            reg = <0x000080000 DT_SIZE_K(4)>;
        };
    };

A side note is that this is strange that I need to set it in the overlay which is basically ignored because pm_static partitions is the one that actually matters (unless i got something wrong ? )

And this configs to my prj.conf

# Coredump 
CONFIG_DEBUG_COREDUMP=y
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y

In my my_board/my_app/zephyr/.config i see this coredump related configs

CONFIG_ARCH_SUPPORTS_COREDUMP=y
CONFIG_ARCH_SUPPORTS_COREDUMP_THREADS=y

# CONFIG_COREDUMP_DEVICE is not set

CONFIG_DEBUG_THREAD_INFO=y
CONFIG_DEBUG_COREDUMP=y
# CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING is not set
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
# CONFIG_DEBUG_COREDUMP_BACKEND_OTHER is not set
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN is not set
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM is not set
CONFIG_DEBUG_COREDUMP_FLASH_CHUNK_SIZE=64
CONFIG_DEBUG_COREDUMP_THREADS_METADATA=y

I am generating a coredump using this implementation

void trigger_coredump(void)
{
    __ASSERT(0, "Forcing coredump");
}

When i try to read the flash area after generating the coredump with nrfjprog --memrd 0xCF000 --w 32 --n 0x8000
i get all 0xFF

what i am missing ?

I also tried to check myself by replacing CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y

With CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y

Hopping to see the coredump on my open rtt but nothing .. when coredump is triggered prints just stop

What am I missing? Why can't I find a coredump on the flash partition or in the rtt log ?
Can it be that the device does not have the time to write the coredump before the actual crash ? If so, how can I manage that ?
Is there some auto deletion of the flash partition with the coredump so new coredumps can be saved or is it something i have to manage myself after i read the coredump from flash ?

Hope to read you soon

Best regards

Ziv

Top Replies

Parents

0 runsiv 9 months ago

Hi

I will look into your case. Just a quick question to start with. Are you using MCUBOOT also?
Regards

Runar
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 9 months ago in reply to runsiv

any news on that ?
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel

0 Vidar Berg 9 months ago in reply to Tudor B.

You posted this earlier which is a typical crashlog:

[00:00:14.148,559] <err> os: ***** USAGE FAULT *****
[00:00:14.148,559] <err> os:   Attempt to execute undefined instruction
[00:00:14.148,590] <err> os: r0/a1:  0x0bad0000  r1/a2:  0x00000000  r2/a3:  0x00000000
[00:00:14.148,590] <err> os: r3/a4:  0xffffffff r12/ip:  0x0004e4bb r14/lr:  0x0001b203
[00:00:14.148,620] <err> os:  xpsr:  0x49100000
[00:00:14.148,620] <err> os: s[ 0]:  0x200099e4  s[ 1]:  0x00000000  s[ 2]:  0x00000009  s[ 3]:  0x00021cc7
[00:00:14.148,651] <err> os: s[ 4]:  0x00000001  s[ 5]:  0x00000030  s[ 6]:  0x0005ac30  s[ 7]:  0x0004dd57
[00:00:14.148,651] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x200099e0  s[10]:  0x20009ae0  s[11]:  0x00001972
[00:00:14.148,681] <err> os: s[12]:  0x20005a94  s[13]:  0x0001df3b  s[14]:  0x20005a94  s[15]:  0x00001000
[00:00:14.148,681] <err> os: fpscr:  0xffffffff
[00:00:14.148,681] <err> os: Faulting instruction address (r15/pc): 0x00017a88
[00:00:14.148,712] <err> os: >>> ZEPHYR FATAL ERROR 36: Unknown error on CPU 0
[00:00:14.148,742] <err> os: Current thread: 0x20002758 (mp_main)
[00:00:14.273,651] <err> os: Halting system

It tells you the type of error (stack overflow, etc) and it includes the cpu registers from the previous stack frame.

EDIT: I just stumbled across this PR https://github.com/nrfconnect/sdk-nrf/pull/21418 which I assume should address the limitation of not being able to enable ASSERTs when storing to internal flash.

0 Tudor B. 9 months ago in reply to Vidar Berg

So, you'd suggest to use the filesystem to store the crashlog?

Also, excuse my lack of knowledge, but how do you get from cpu registers to stack trace?

I know how to reverse engineer "Faulting instruction address (r15/pc): 0x00017a88" using:
/opt/nordic/ncs/toolchains/b8efef2ad5/opt/zephyr-sdk/arm-zephyr-eabi/bin/arm-zephyr-eabi-addr2line -e zephyr.elf -a0x00017a88 -f -p

What about my point regarding storing the reset reason and adding custom reset reasons?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg 9 months ago in reply to Tudor B.

You don’t need much space to store the information in the crash log, so you should be able to fit it in RAM, similar to what I suggested here: RE: Saving coredumps to external flash . There's also nothing that will prevent you from adding additional data such as the reset reason.

Tudor B. said:
Also, excuse my lack of knowledge, but how do you get from cpu registers to stack trace

You won't be able to reconstruct a stack trace from the crash log.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 9 months ago in reply to Vidar Berg

Vidar Berg said:
and if you have CONFIG_ASSERT enabled it will trigger an assert before the CD is stored to flash.

what i see is the same weather i try to assert or panic or set a pointer to forbidden address.

i do not get into z_fatal_error in any of those cases, or to assert_print .. it seems like device just crashes before everything, regardless to writing or not writing the CD

Vidar Berg said:
You can also redefine the weakly declared error handler

i actually thought about implementing my own hard-fault handler and in it call for z_fatal_error in order to have the CD written to INTERNAL flash (i am trying to use it as suppose to be supported by zephyr debug coredump) but i am not sure what is the API to use to overwrite the existing nrf hard-fault handler (would be great if you can point me for where to look or what the API is.

... i am guessing memfault does that as well, though i am not sure if it means they need to have different implementation for each hardware - nrf, stm, etc. cause i am guessing they don't all have the same API for fault handling)

Vidar Berg said:
ave you considered just saving the CPU registers from the last stack frame like in the crash log message? Or do you feel you get more information from a coredump?

currently i am not even getting the crash log my system seems to stop before that .. bu in any case coredump lets me also know about other threads and stacks beside only the place of crash if i use DEBUG_COREDUMP_MEMORY_DUMP_THREADS i don't get the whole stack but enough to have some more debugging options.

hope to read you soon

best regards

Ziv
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 9 months ago in reply to Tudor B.

Tudor B. said:
Additionally, it'd be great to store the reset reason

you can use nrf_power_resetreas_get API after reset when system start.

regarding managing lacking communication you can use your own comm watchdog ad save what ever you want into logs or some other statistics and send after re connection

at the moment i am trying to write to internal flash as mvp1

p.s. i think what you see in the logs you shared is the crash logs and not coredump
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 ziv123 9 months ago in reply to Tudor B.

Tudor B. said:
Additionally, it'd be great to store the reset reason

you can use nrf_power_resetreas_get API after reset when system start.

regarding managing lacking communication you can use your own comm watchdog ad save what ever you want into logs or some other statistics and send after re connection

at the moment i am trying to write to internal flash as mvp1

p.s. i think what you see in the logs you shared is the crash logs and not coredump
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

No Data