having issues with saving coredump to flash or at all

ziv123 9 months ago

Hi Nordic

I am working with nrf52840 and nrf52832 using ncs v2.8.0

I am trying to save coredump to flash according to instructions on this link - https://docs.nordicsemi.com/bundle/ncs-2.8.0/page/zephyr/services/debugging/coredump.html

I added this to my pm_static_my_board.yml

coredump_partition:
  address: 0xCF000
  size: 0x8000
  region: flash_primary

And this to my_board.overlay

&flash0 {
    /*
     * For more information, see:
     * http: //docs.zephyrproject.org/latest/guides/dts/index.html#flash-partitions
     */
    partitions {
        compatible = "fixed-partitions";
        #address-cells = <1>;
        #size-cells = <1>;

      ...
        coredump_partition: partition@000080000 { //THIS IS NOT LEGIT ADDRESS(END OF FLASH) BUT IT IS NOT TAKEN TO ACOUNT BECAUS PM_STATIC IS
            label = "coredump-partition";
            reg = <0x000080000 DT_SIZE_K(4)>;
        };
    };

A side note is that this is strange that I need to set it in the overlay which is basically ignored because pm_static partitions is the one that actually matters (unless i got something wrong ? )

And this configs to my prj.conf

# Coredump 
CONFIG_DEBUG_COREDUMP=y
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y

In my my_board/my_app/zephyr/.config i see this coredump related configs

CONFIG_ARCH_SUPPORTS_COREDUMP=y
CONFIG_ARCH_SUPPORTS_COREDUMP_THREADS=y

# CONFIG_COREDUMP_DEVICE is not set

CONFIG_DEBUG_THREAD_INFO=y
CONFIG_DEBUG_COREDUMP=y
# CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING is not set
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
# CONFIG_DEBUG_COREDUMP_BACKEND_OTHER is not set
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN is not set
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM is not set
CONFIG_DEBUG_COREDUMP_FLASH_CHUNK_SIZE=64
CONFIG_DEBUG_COREDUMP_THREADS_METADATA=y

I am generating a coredump using this implementation

void trigger_coredump(void)
{
    __ASSERT(0, "Forcing coredump");
}

When i try to read the flash area after generating the coredump with nrfjprog --memrd 0xCF000 --w 32 --n 0x8000
i get all 0xFF

what i am missing ?

I also tried to check myself by replacing CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y

With CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y

Hopping to see the coredump on my open rtt but nothing .. when coredump is triggered prints just stop

What am I missing? Why can't I find a coredump on the flash partition or in the rtt log ?
Can it be that the device does not have the time to write the coredump before the actual crash ? If so, how can I manage that ?
Is there some auto deletion of the flash partition with the coredump so new coredumps can be saved or is it something i have to manage myself after i read the coredump from flash ?

Hope to read you soon

Best regards

Ziv

Top Replies

Parents

0 runsiv 9 months ago

Hi

I will look into your case. Just a quick question to start with. Are you using MCUBOOT also?
Regards

Runar
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ziv123 9 months ago in reply to runsiv

any news on that ?
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel

0 Tudor B. 8 months ago in reply to Tudor B.

Hey Vidar.

I finished implementing your proposal and in the end it worked like a charm! The implementation:

#include <zephyr/arch/cpu.h>
#include <zephyr/sys/reboot.h>
#include <zephyr/fatal.h>

#define LOG_LEVEL CONFIG_LOG_DEFAULT_LEVEL
#include <zephyr/logging/log.h>
#include <zephyr/logging/log_ctrl.h>
LOG_MODULE_REGISTER(aaa);

__noinit static struct arch_esf esf_keep;
__noinit static uint32_t esf_crc = 0;

extern void sys_arch_reboot(int type);

void k_sys_fatal_error_handler(unsigned int reason, const struct arch_esf *esf)
{
    (void)irq_lock();

    // From Nordic DevZone example
	/* Store stack frame along with a checksum value to the __noinit section in RAM  */
    memcpy(&esf_keep, esf, sizeof(*esf));
    LOG_ERR("AAA! Faulting instruction address (r15/pc): 0x%08x", esf_keep.basic.pc);
    esf_crc = crc32_ieee((uint8_t *)&esf_keep, sizeof(esf_keep));

    // From fatal_error.c
	LOG_ERR("Resetting system");
	LOG_PANIC();
	sys_arch_reboot(0);

	CODE_UNREACHABLE;
}

// generate fault
static mp_obj_t mp_do_fault(void)
{
	*(uint32_t *) 0xFFFFFFFF = 1;
}
static MP_DEFINE_CONST_FUN_OBJ_0(mp_do_fault_obj, mp_do_fault);

// check fault and print stored stack frame
static mp_obj_t mp_check_val(void)
{
    // esf_crc =0;
	printk("Stored CRC val is: ~%ld~ \r\n", esf_crc);

    uint32_t computed_crc = crc32_ieee((uint8_t *)&esf_keep, sizeof(esf_keep));
    printk("Computed CRC val is: ~%ld~ \r\n", computed_crc);

    if (computed_crc == esf_crc) {
        printk("Exception stack frame:\n\r");
        printk("r0/a1:  0x%08x  r1/a2:  0x%08x  r2/a3:  0x%08x\n\r", esf_keep.basic.a1,
            esf_keep.basic.a2, esf_keep.basic.a3);
        printk("r3/a4:  0x%08x r12/ip:  0x%08x r14/lr:  0x%08x\n\r", esf_keep.basic.a4,
            esf_keep.basic.ip, esf_keep.basic.lr);
        printk("xpsr:  0x%08x  pc: 0x%08x\n\r", esf_keep.basic.xpsr, esf_keep.basic.pc);
        esf_crc = 0; // Invalidate CRC
    } else {
        printk("CRC mismatch\n\r");
    }

    return mp_const_none;
}
static MP_DEFINE_CONST_FUN_OBJ_0(mp_check_val_obj, mp_check_val);

And actually running it:

*** Booting nRF Connect SDK v3.0.0-3bfc46578e42 ***
*** Using Zephyr OS v4.0.99-a0e545cb437a ***
MicroPython v1.24.0-preview.485.ge5af5faf8.dirty on 2025-09-28; zephyr-nrf5340dk with nrf5340
Type "help()" for more information.
>>>
>>>
>>> import aaa as a
>>>
>>>
>>> a.dofault()
[00:00:08.744,354] <err> os: ***** USAGE FAULT *****
[00:00:08.750,000] <err> os:   Unaligned memory access
[00:00:08.760,162] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x00000001
[00:00:08.768,890] <err> os: r3/a4:  0xfffff000 r12/ip:  0x00041e43 r14/lr:  0x0002452d
[00:00:08.777,648] <err> os:  xpsr:  0x29100000
[00:00:08.782,928] <err> os: s[ 0]:  0x20004a28  s[ 1]:  0x00042e3b  s[ 2]:  0x00042e1f  s[ 3]:  0x20022088
[00:00:08.793,426] <err> os: s[ 4]:  0x0004f580  s[ 5]:  0x00000157  s[ 6]:  0x0004f4f8  s[ 7]:  0x00021477
[00:00:08.803,924] <err> os: s[ 8]:  0x0004f4f8  s[ 9]:  0x00000157  s[10]:  0x20022088  s[11]:  0x20006987
[00:00:08.814,422] <err> os: s[12]:  0x20022088  s[13]:  0x2000698b  s[14]:  0x0004e038  s[15]:  0x00041e51
[00:00:08.824,890] <err> os: fpscr:  0x20022088
[00:00:08.830,139] <err> os: Faulting instruction address (r15/pc): 0x0002a0d8
[00:00:08.838,104] <err> os: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0
[00:00:08.846,069] <err> os: Current thread: 0x20001c10 (mp_main)
[00:00:08.852,905] <err> aaa: AAA! Faulting instruction address (r15/pc): 0x0002a0d8
*** Booting nRF Connect SDK v3.0.0-3bfc46578e42 ***
*** Using Zephyr OS v4.0.99-a0e545cb437a ***
MicroPython v1.24.0-preview.485.ge5af5faf8.dirty on 2025-09-28; zephyr-nrf5340dk with nrf5340
Type "help()" for more information.
>>>
>>>
>>> import aaa as a
>>>
>>>
>>> a.checkval()
Stored CRC val is: ~396985589~
Computed CRC val is: ~396985589~
Exception stack frame:
r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x00000001
r3/a4:  0xfffff000 r12/ip:  0x00041e43 r14/lr:  0x0002452d
xpsr:  0x29100000  pc: 0x0002a0d8

Note that we're using MicroPython in our project. But the relevant C code excerpts can be easily extracted.

0 ziv123 8 months ago in reply to Vidar Berg

Hi Vidar

I took the approach of the merged solution you found, for using nrf apis to collect the CD instead of zephyr's, - https://github.com/nrfconnect/sdk-nrf/pull/21418/files#diff-fcf14c4b7b34fe7a11916195871ae66a59be87a395f28db73e345ebdc828085b and integrated it in my ncs ..

it seems to be working fine with assertion and i get the crash log as expected

but .. when i read the flash area where the CD was saved i do not see what i expect - the initials 0x5a,0x45 are missing and the whole thing looks like it is missing some data .. in every 4 bytes it seems to be writing only one with a value and the rest are zeros (the core dump i see when using memfault does not look like this)

any idea what is the reason for that ?

plus, i would really appreciate if you can give some clarity on what i asked in prev message regarding the weak api that is not overwritten and about the order in which things are happening at crash and what about all the asserts in zephyr code if assert is disabled

Tudor B. i you were successful, i think if you have farther questions on the solution you adopt it might be better to open a new thread since we took different approaches for CD collection

hope to read you soon

best regards

Ziv
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg 8 months ago in reply to ziv123
Thanks for the update Tudor B. . I’m glad to see you got it working. Just want to add that the noinit section may become overwritten on startup if you include a bootloader. A solution is to place the data at the end of RAM and reduce the RAM size in the sram0 devicetree node to ensure the area remains unallocated.

&sram0 { reg = <0x20000000 DT_SIZE_K(<total ram size in kB) - <size of area you want to set aside for crash data>>; };

ziv123 said:
any idea what is the reason for that ?

Hard to say what's causing this. I suggest you try this in a minimal sample in the latest SDK to see if you are able to reproduce the same.

ziv123 said:
plus, i would really appreciate if you can give some clarity on what i asked in prev message regarding the weak api that is not overwritten and about the order in which things are happening at crash and what about all the asserts in zephyr code if assert i

You need to use LOG_PANIC() to be able to flush the log buffer from the fault handler.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- Verify Answer
- Cancel

0 Tudor B. 8 months ago in reply to Vidar Berg

Sorry Ziv, it seems Vidar brought up a valid point.

Going with your example Vidar, I would foresee something like:

&sram0 {
   reg = <0x20000000 DT_SIZE_K(458752) - 150;
};

This brings an interesting question: why don't I have the full 512Kb RAM?

I set the RAM config with 448Kb total since whatever configs I use, I get:

[650/650] Linking C executable zephyr/zephyr.elf
Memory region         Used Size  Region Size  %age Used
           FLASH:      402024 B         3 MB     12.78%
             RAM:      190868 B       448 KB     41.61%
        IDT_LIST:          0 GB        32 KB      0.00%

I don't have TFM or MCUBoot enabled. Since currently we're using nrf5340dk and nrf7002dk Where are 64Kb disappearing to?

0 Vidar Berg 8 months ago in reply to Tudor B.

The "missing" 64k is allocated to the shared RAM region and is used for communication with the network core.

Reply

0 Vidar Berg 8 months ago in reply to Tudor B.

The "missing" 64k is allocated to the shared RAM region and is used for communication with the network core.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Tudor B. 8 months ago in reply to Vidar Berg
Thank you for the explanation.

Beyond the configuration:

&sram0 { reg = <0x20000000 DT_SIZE_K(458752) - 150; };

How would I set the variables in those 150 bytes that are reserved?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Vidar Berg 8 months ago in reply to Tudor B.

You can find the end address of RAM programmatically using symbols generated by the partition manager in pm_config.h, for example, PM_SRAM_PRIMARY_END_ADDRESS. Then use this as the start address when copying in the crash data.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel