Zephyr app at custom location causing hard fault

HI all,

i am using NRF52832. I have a custom bootloader because it needs to be able to receive an update file via BLE, even if the main application is broken. Also there is no space for two slots for he main app.

So there is just the custom bootloader + main app. The bootloader seems to be working fine. The main app works also as expected when I use a minimal examples (blinky with button triggering an interrupt). However, once I enable BT in my Config with CONFIG_BT=y and CONFIG_BT_PERIPHERAL=y then the app crashes in a very early startup phase in Zephyr, causing a hard fault.

My setup is like this:

Overlay flash partition used for main app.

/ {
    chosen {
        zephyr,code-partition = &app_partition;
    };

    aliases {
        flash0 = &flash0;
    };
};

&flash0 {
    partitions {
        compatible = "fixed-partitions";

        boot_partition: partition@0 {
            label = "bootloader";
            reg = <0x00000000 0x00030000>;
        };

        app_partition: partition@30000 {
            label = "app";
            reg = <0x00030000 0x00050000>;
        };
    };
};

Config of main app:

CONFIG_SERIAL=y

CONFIG_CPP=y
CONFIG_STD_CPP17=y

CONFIG_BT=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_DEVICE_NAME="name"

CONFIG_USE_DT_CODE_PARTITION=y
CONFIG_INIT_ARCH_HW_AT_BOOT=y
CONFIG_MPU_ALLOW_FLASH_WRITE=y

If the debugger can be trusted, then the fault occurs somewhere in system_nrf52.c near:

    #if NRF52_ERRATA_12_ENABLE_WORKAROUND
        /* Workaround for Errata 12 "COMP: Reference ladder not correctly calibrated" found at the Errata document
           for your device located at https://infocenter.nordicsemi.com/index.jsp */
        if (nrf52_errata_12()){
            *(volatile uint32_t *)0x40013540 = (*(uint32_t *)0x10000324 & 0x00001F00) >> 8;
        }
    #endif

I am not sure what causes this problem and also not how to debug further. Maybe I am doing something wrong. Any help will be appreciated. Thanks!

Parents
  • Please check the laser markings on the chip you have on your board to know which variant of memory you have in side the SoC.
    The details of chip reversion and its memory sizes are given here

  • Looks like n52832QFAAG02208AJ, which should have 512kb

  • OK, then we look somewhere else.

    Mark G. said:
    Ok Interestingly the hard fault seems to occur once the offset (0x00030000) + size is bigger than > 256kb. Which is very telling.

    Can you please tell me how you came to this conclusion? What debug info did you get or see and what tests you attempted to come to this conclusion?

    I am suspecting that if you came to this conclusion with a debugger, then the debug info might be misleading if there is a stack corruption.

    However, once I enable BT in my Config with CONFIG_BT=y and CONFIG_BT_PERIPHERAL=y then the app crashes in a very early startup phase in Zephyr, causing a hard fault.

    I think this smells like a stack corruption. If you enable some features and have been having some minimal stack sizes, then I would recommend you to increase the stack sizes like CONFIG_MAIN_STACK_SIZE, CONFIG_BT_RX_STACK_SIZE, CONFIG_BT_HCI_TX_STACK_SIZE etc in your prj.conf (double their sizes for testing) and see if you still have this issue. If you still do have this issue, then we focus away from insufficient RAM and stack and then look again into your partition manager settings.

  • Thanks for the quick reply!

    I am testing with a minimal "blinky"-like main.cpp that is the only source for this build. I set the partition and the start of the partition. And then I tested by varying the offset adresses in the Flash and by enabling different features in the config. 

    I look at the output size of the binary:

    [100%] Linking CXX executable trigger.elf
    Memory region         Used Size  Region Size  %age Used
               FLASH:      163508 B       320 KB     49.90%
                 RAM:       23428 B        64 KB     35.75%
            IDT_LIST:          0 GB        32 KB      0.00%

    and then add the "flash offset" to it. By testing different combinations of start address and size I figured that somehow a total size of 256kb is the magic boundary between working and not working.

    I even did further testing with complete deactivating of the Blueetooth. By controlling the occupied Flash memory of the app with some constant array I could reproduce the issue.

    // occupy 141kb of Flash memory, together with the offest > 256kb
    constexpr uint8_t magicData[1024 * 141] = {1, 2};// does lead to error
    
    // results in slightly less than 256kb flash memory
    // constexpr uint8_t magicData[1024 * 140] = {1, 2}; // no error!
    
    volatile uint8_t sum = 0;
    
    int main(void)
    {
        for (size_t i = 0; i < sizeof(magicData); ++i) {
            sum += magicData[i];
        }
        
        ....

    As soon as more than 256 KB are occupied it fails.

    So its pretty clear that its not related to BT, but somehow the partition / memory management.

Reply
  • Thanks for the quick reply!

    I am testing with a minimal "blinky"-like main.cpp that is the only source for this build. I set the partition and the start of the partition. And then I tested by varying the offset adresses in the Flash and by enabling different features in the config. 

    I look at the output size of the binary:

    [100%] Linking CXX executable trigger.elf
    Memory region         Used Size  Region Size  %age Used
               FLASH:      163508 B       320 KB     49.90%
                 RAM:       23428 B        64 KB     35.75%
            IDT_LIST:          0 GB        32 KB      0.00%

    and then add the "flash offset" to it. By testing different combinations of start address and size I figured that somehow a total size of 256kb is the magic boundary between working and not working.

    I even did further testing with complete deactivating of the Blueetooth. By controlling the occupied Flash memory of the app with some constant array I could reproduce the issue.

    // occupy 141kb of Flash memory, together with the offest > 256kb
    constexpr uint8_t magicData[1024 * 141] = {1, 2};// does lead to error
    
    // results in slightly less than 256kb flash memory
    // constexpr uint8_t magicData[1024 * 140] = {1, 2}; // no error!
    
    volatile uint8_t sum = 0;
    
    int main(void)
    {
        for (size_t i = 0; i < sizeof(magicData); ++i) {
            sum += magicData[i];
        }
        
        ....

    As soon as more than 256 KB are occupied it fails.

    So its pretty clear that its not related to BT, but somehow the partition / memory management.

Children
Related