This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

dfu upgraded app wont boot

Hi,

When upgrading the DFU with an app everythings goes seemingly fine, all init+app bytes received, validated and so on. The DFU does its resets when done updating settings page and activating the new app fw. Something is wrong because after jump_to_addr() we'll end up somewhere which is not the start of my application.

Device: nRF52840

SDK: 16.0.0

Ported secure dfu: pca10056_uart_debug example

DFU Modification: Added a CAN transport instead of the UART transport, and not using slip encoding. Have not touched any other file but sdk_config.h for enabling the SPI interface to the CAN controller.

App example for tests: blinky_PCA10056_mbr

Package generation: nrfutil pkg generate --key-file myKey.pem --application blinky_myBoard_mbr.hex --hw-version 52 --sd-req 0 --application-version 1 blinky_myBoard_mbr.zip

IDE: Segger Studio

nrfutil version: 6.0.1

My real application is using the softdevice, but I present this simple MBR example which shows same error behaviour. Otherwise I have successfully generated SD packge, App package, and sent them over CAN to DFU one after the other and all is good until the jump to address 0x1000 which fails.

I'm thinking this seems like a linking problem but I have not been able to what I'm doing wrong.

  • Hi Feran, 

     

    When upgrading the DFU with an app everythings goes seemingly fine, all init+app bytes received, validated and so on. The DFU does its resets when done updating settings page and activating the new app fw. Something is wrong because after jump_to_addr() we'll end up somewhere which is not the start of my application.

    Are you able to debug the bootloader and set a breakpoint at the jump_to_addr() call and then step through the code to see where you end up? It would also be usefull if you could readout the memory at this stage as well so we can see if the new application has been correctly placed at 0x1000 and upwards. You can readback the flash using the following nrfjprog command: 

    nrfjprog --memrd 0x0000 --n 1048576‬ >> flash_dump.txt

    This will dump the flash in a readable format to  flash_dump.txt which will be located in the same folder where you're executing nrfjprog from. 

    My real application is using the softdevice, but I present this simple MBR example which shows same error behaviour.

     Was there supposed to be an application attached to the ticket? Or should I just use the blinky_PCA10056_mbr example from SDK v16.0.0, i.e. nRF5_SDK_16.0.0_98a08e2\examples\peripheral\blinky\pca10056\mbr ? 

     

     Best regards

    Bjørn

  • Hi Bjørn!

    Yes I have been able to debug and in the mean time I have been able to simplify the procedure to reproduce this.

    I will present this procedure shortly, but regarding my previous test I can add that I did a binary diff between the resulting flash on PCA10056 and my board and could see that MBR, App, Settings area all was exactly on same address, size, and content. The only difference was the two different bootloaders which hade obviously different sizes, though starting from same addresses (0xe4000).

    When debugging the faulty bootloader at jump_to_address I have obsereved it is called as jump_to_address(0x2004000, 0xFFFFFFFF, 0x1375). The latter address I find strange.

    To the simpler case, that I attach hex files for in this post:

    DFU: nRF5_SDK_16.0.0_98a08e2\examples\dfu\secure_bootloader\pca10056_uart_debug\ses

    DFU-modification: Replaced dfu_public_key.c array with my own throw-away key generated with nrfutil

    App: nRF5_SDK_16.0.0_98a08e2\examples\peripheral\blinky\pca10056\mbr\ses

    App-modifications: None

    Building both projects as is, flash the bootloader, and then uploading the hex file from App with 

    nrfutil dfu serial -pkg blinky_pca10056_mbr.zip -p COM10 -b 115200

    works fine.

    But adding a Debug configuration (on the Solution level) to the same DFU project configured in the emproject file as such:

    <configuration Name="Debug"
    c_preprocessor_definitions="DEBUG ;DEBUG_NRF;NRF_DFU_DEBUG_VERSION;" />

    reproduces the fault, exactly as described earlier.

    This should be easily reproduced by you guys. I have not compared the two hex contents attached here but I would guess the only difference is in the bootloader area.

    By the way a related question, what does the NDEBUG symbol defined in the Release configuration exactly do in the pca10056_uart_debug project? I cannot find any references in code, but does the compiler/linker do anything with it?

     

    The file ending with "debug-conf.hex" is the faulty one resulting from building with the Debug configuration.

    pca10056_uart-sec-dfu_blinky-mbr-upgraded.hexpca10056_uart-sec-dfu_blinky-mbr-upgraded_debug-conf.hex

  • I forgot a detail. The addresses visited after the jump function in the faulty case is as such: a60->a62->a64->a66->135e and then stuck there.

  • Hi Feran, 

    i am able to reproduce the behavior you are seeing and I have root caused it to the jump_to_addr() function which is written in assembly

    __STATIC_INLINE void jump_to_addr(uint32_t new_msp, uint32_t new_lr, uint32_t addr)
    {
        __ASM volatile ("MSR MSP, %[arg]" : : [arg] "r" (new_msp));
        __ASM volatile ("MOV LR,  %[arg]" : : [arg] "r" (new_lr) : "lr");
        __ASM volatile ("BX       %[arg]" : : [arg] "r" (addr));
    }

    It turns out that  the jump_to_addr implementation in assembly does not behave well when optimization is turned off. 

    The jump_to_addr function call will push all the parameters thats on the stack into the cpu registers r0,-r2. The issue is the first assembly line in jump_to_addr that set the stack pointer to the SoftDevice's stack pointer. This should not be an issue as all the parameters have been store in the CPU registers, however, when the two next assembly lines are executed you see that the parameters are fetched relative to the stack pointer with offsets and not using the cpu registers as intended. Since the stack pointer has been updated the address being loaded for the branch instruction is just 0xFFFFFF and not the address of the SoftDevice reset handler. 

    If you turn on optimization to Level 3 or Optimize for Size you'll see that the CPU registers are used instead of fetching the values from the stack relative to the stack pointer.

    Best regards

    Bjørn

  • Hi again,

    Thank you for looking into this and narrow down the issue. I have not had opportunity to verify this but for the mean time I guess what you are suggesting is happening is highly probable.

    That is an unfortunate issue. Is this documented somewhere, maybe any workaround or future fix for making the DFU robust for different optimization levels? For now using the Release configuration/Size-Optimize is the only option for production.

    I have continued using the working DFU with Release configuration and tested it and have come accross another issue or two:

    1. After upgrading the chip, loaded with Release-Bootloader+SD v7, with first version of my application which among other have RTT logs enabled, the RTT output stops after the jump to the application. The rest of the app seem to execute fine, tested for 12h at least. There is only one method I managed to get my periodic RTT outputs show up with. If I power-cycle the my PCB, connect with J-Link RTT Viewer, then program and run the chip with the my DFU project in Segger Studio, the logs comes out sometimes in the RTT Viewer and sometimes in the SES. Is this a known behavior?

    2. I have compared the app binary content and placement in flash and found just a pair of difference which probably just are padding FF bytes that the nrfutil puts there, but I want to get this confirmed with you to be sure.

    Addr offset 0x244C0: 04 0B FF F7 E1 FF BD E8 08 40 04 B0 70 47 FF FF. The last two bytes are 00 00 in the original binary/hex fed to nrfutil. The second difference is the last byte at address 0x2EDB0: 6C 29 00 FF which also is originally a 00.

    Is this to be expected?

    BR/

    feran

Related