having issues with saving coredump to flash or at all

Hi Nordic

I am working with nrf52840 and nrf52832 using ncs v2.8.0

I am trying to save coredump to flash according to instructions on this link - https://docs.nordicsemi.com/bundle/ncs-2.8.0/page/zephyr/services/debugging/coredump.html

I added this to my pm_static_my_board.yml

coredump_partition:
  address: 0xCF000
  size: 0x8000
  region: flash_primary

And this to my_board.overlay

&flash0 {
    /*
     * For more information, see:
     * http: //docs.zephyrproject.org/latest/guides/dts/index.html#flash-partitions
     */
    partitions {
        compatible = "fixed-partitions";
        #address-cells = <1>;
        #size-cells = <1>;

      ...
        coredump_partition: partition@000080000 { //THIS IS NOT LEGIT ADDRESS(END OF FLASH) BUT IT IS NOT TAKEN TO ACOUNT BECAUS PM_STATIC IS
            label = "coredump-partition";
            reg = <0x000080000 DT_SIZE_K(4)>;
        };
    };

A side note is that this is strange that I need to set it in the overlay which is basically ignored because pm_static partitions is the one that actually matters (unless i got something wrong ? )

And this configs to my prj.conf

# Coredump 
CONFIG_DEBUG_COREDUMP=y
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y

In my my_board/my_app/zephyr/.config i see this coredump related configs

CONFIG_ARCH_SUPPORTS_COREDUMP=y
CONFIG_ARCH_SUPPORTS_COREDUMP_THREADS=y

# CONFIG_COREDUMP_DEVICE is not set

CONFIG_DEBUG_THREAD_INFO=y
CONFIG_DEBUG_COREDUMP=y
# CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING is not set
CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y
# CONFIG_DEBUG_COREDUMP_BACKEND_OTHER is not set
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_MIN is not set
CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_THREADS=y
# CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM is not set
CONFIG_DEBUG_COREDUMP_FLASH_CHUNK_SIZE=64
CONFIG_DEBUG_COREDUMP_THREADS_METADATA=y

I am generating a coredump using this implementation 

void trigger_coredump(void)
{
    __ASSERT(0, "Forcing coredump");
}

When i try to read the flash area after generating the coredump with nrfjprog --memrd 0xCF000 --w 32 --n 0x8000
i get all 0xFF 

what i am missing ?

I also tried to check myself by replacing CONFIG_DEBUG_COREDUMP_BACKEND_FLASH_PARTITION=y

With CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y

Hopping to see the coredump on my open rtt but nothing .. when coredump is triggered prints just stop

  1. What am I missing? Why can't I find a coredump on the flash partition or in the rtt log ?
  2. Can it be that the device does not have the time to write the coredump before the actual crash ? If so, how can I manage that ?
  3. Is there some auto deletion of the flash partition with the coredump so new coredumps can be saved or is it something i have to manage myself after i read the coredump from flash ? 

Hope to read you soon

Best regards

Ziv

  •  Well, we save the CD, then we reset, we get the reset reason, check the ext flash for CD's presence, save a correlated file with reset reason next to it. Something like:

    crash_nr_231_CD.txt

    crash_nr_231_reset_reason.txt

    This also brings another question: can we also save other things the moment the hard fault happens? Sensor states, total runtime, battery level, etc?

    But again, since you said that CD can't really save to external -> should we simply override the implementation for k_sys_fatal_error_handler() and disable CD completely?

     I didn't not ignore your point ziv, but this ties in to my main question to Vidar: can CD write to external? Is CD extendable to be able to write to external and write custom data? If not, then the best course of action is to make the mechanism ourselves inside k_sys_fatal_error_handler()?

  • we use assert a lot in our code and also zephyr uses it internally so this is why it is wird for me and also why i try to avoid it plus i don't seem to be getting to

    Please read my previous comments where I try to explain what the issue is. If you are going to have ASSERTs enabled you must use the flash storage backend introduced by the commit I linked.

    i can not use RAM for saving logs or CD since the devices in the field are configured with logs disabled and i need

    My suggestion is to not use the CD functionality at all but rather store relevant information to RAM from the k_sys_fatal_error_handler(). What you do with the RAM content on subsequent reboot is up to you. You can store it to flash, transfer it over BLE, etc.

  • tore relevant information to RAM from the k_sys_fatal_error_handler(). What you do with the RAM content on subsequent reboot is up to you. You can store it to flash, transfer it over BLE, etc.

    i think there is something fundamental i am missing here .. as far as i know ,whatever i save in RAM to some variable or whatever, is gone after reset, right ? . so, if i do not use logs and can only get info on the crash from the device via OTA, after it resets back to normal, then how saving things to RAM help me ? 

  • Hey Ziv.

    There is a special no_init area of the RAM which is persistent through a hot (soft) reset. Hot reset = reset command to the MCU; cold reset = complete power cycle. There is a CONFIG_ option for the system to do a hot reset in case of stack overflows/ hard faults -> the no_init area of the RAM is kept. Wink

  • Please read my previous comments where I try to explain what the issue is

    i read the issue here https://github.com/zephyrproject-rtos/zephyr/issues/59116 and the solution here https://github.com/nrfconnect/sdk-nrf/pull/21418 (which i think i may try to integrate into my ncs v2.8.0) untill we will update to ncs v3.1.1) 

    what i still don't understand is the order of things :

    if i disable asserts then i see i am entering the  'z_arm_fault()' and after that to 'z_fatal_error()'  which inside it call for coredump() and then to k_sys_fatal_error_handler() so how come it can be overwriten if when i enable asserts i am not getting into z_arm_fault() or z_fatal_error() ? ..

    if i enable asserts and i am trying to use coredump even for logs (

    CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING  or, 
    CONFIG_DEBUG_COREDUMP_MEMORY_DUMP_LINKER_RAM

    ) and not for flash i just don't see anything .. does the assertion trigger some hardware before zephyr apis can even act ? if so, why then when coredump is not configured then i do get into 'z_arm_fault()'  and  'z_fatal_error()'  when i have an assert and asserts are enabled ?

    plus why if assert is enabled i don't get to zephyr apis even if i crash it with k_panic(), for example ?

    sorry for being a nag about this Pray but i am really trying to understand the order of which things are happening when crash occurs Pray

    p.s. inside zephyr and ncs there is also a vast use of asserts .. what happning in those asserts when assert is disabled are he checks just skipped ? is it a valid practice to have assert enabled just for "on table" development and have a build with no asserts for deployed devices ?

    i tried to overwrite the crash with my own implementation of this :

    void k_sys_fatal_error_handler(unsigned int reason, const struct arch_esf *esf){
        LOG_ERR(">>> HardFault trapped in app override!\n");
    
        /* Option 1: loop forever (easy for breakpoints) */
        while (1) {
            __NOP();
        }
         
        /* Option 2: chain to Zephyr’s internal handler */
        // extern void z_arm_fault(uint32_t reason, const z_arch_esf_t *esf);
        // z_arm_fault(K_ERR_CPU_EXCEPTION, NULL);
    }

    but i did not see that it is actually overwriting something .. i don't see it's prints and i do see that z_fatal_error prints 

    There is a special no_init area of the RAM

    thanks, but our application is already using most of available RAM so giving some of it up just for saving crashes is not a valid option 

    best regards

    Ziv

Related