Hi, I'm working on integrating the reference secure DFU bootloader from the nRF5 SDK (nRF5_SDK_17.1.0_ddde560/examples/dfu/secure_bootloader/pca10100_s140_ble_debug/armgcc) to my application, which is running on a custom nRF52833 board with SoftDevice s140 7.3.0. Integration is going well, other than a few issues that took a while to debug because the error would immediately cause the board to reboot.
Particularly, there was one hard fault issue that took almost a full day to trace down because the root cause of the hard fault was very difficult to trace down with the backtrace alone.
Here's the log from the bootloader leading up to that issue, with hard fault logging enabled:
<info> app: Inside main <debug> app: In nrf_bootloader_init <debug> nrf_dfu_settings: Calling nrf_dfu_settings_init()... <debug> nrf_dfu_flash: Initializing nrf_fstorage_nvmc backend. <debug> nrf_dfu_settings: Using settings page. <debug> nrf_dfu_settings: Copying forbidden parts from backup page. <debug> nrf_dfu_settings: Destination settings are identical to source, write not needed. Skipping. <info> nrf_dfu_settings: Backing up settings page to address 0x7E000. <debug> nrf_dfu_settings: Destination settings are identical to source, write not needed. Skipping. <debug> app: Enter nrf_bootloader_fw_activate <info> app: No firmware to activate. <debug> app: App is valid <debug> app: DFU mode requested via GPREGRET. <info> nrf_bootloader_wdt: WDT is not enabled <debug> app: in weak nrf_dfu_init_user <debug> app: timer_stop (0x200068BC) <debug> app: timer_activate (0x200068BC) <info> app: Entering DFU mode. <debug> app: Initializing transports (found: 1) <debug> nrf_dfu_ble: Initializing BLE DFU transport <debug> nrf_dfu_ble: Enabling SoftDevice. <debug> nrf_dfu_ble: Configuring BLE stack. <debug> nrf_dfu_ble: Enabling the BLE stack. <debug> nrf_dfu_ble: No advertising name found <debug> nrf_dfu_ble: Using default advertising name <debug> nrf_dfu_ble: Advertising... <debug> nrf_dfu_ble: BLE DFU transport initialized. <debug> nrf_dfu_flash: Initializing nrf_fstorage_sd backend. <debug> app: Enter main loop <debug> nrf_dfu_ble: Connected <debug> nrf_dfu_ble: Received BLE_GAP_EVT_CONN_PARAM_UPDATE <debug> nrf_dfu_ble: max_conn_interval: 12 <debug> nrf_dfu_ble: min_conn_interval: 12 <debug> nrf_dfu_ble: slave_latency: 0 <debug> nrf_dfu_ble: conn_sup_timeout: 600 <debug> nrf_dfu_ble: Finished handling conn sec update <debug> nrf_dfu_ble: Set receipt notif <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_RECEIPT_NOTIF_SET <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_ble: Received BLE_GATTS_EVT_EXCHANGE_MTU_REQUEST (request: 247, reply: 247). <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_SELECT (command) <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> app: Shutting down transports (found: 1) <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_CREATE (command) <debug> app: timer_stop (0x200068BC) <debug> app: timer_activate (0x200068BC) <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_ble: Buffer 0x20007EEC acquired, len 145 (244) <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_WRITE (command) <debug> nrf_dfu_ble: Freeing buffer 0x20007EEC <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_CRC_GET (command) <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_EXECUTE (command) <debug> nrf_dfu_validation: PB: Init packet data len: 68 <info> nrf_dfu_validation: Signature required. Checking signature. <info> nrf_dfu_validation: Calculating hash (len: 68) <info> nrf_dfu_validation: Verify signature <info> nrf_dfu_validation: Image verified <debug> app: Enter nrf_dfu_cache_prepare() <debug> app: required_size: 0x395C4. <debug> app: single_bank: true. <debug> app: keep_app: false. <debug> app: keep_softdevice: true. <debug> app: SD_PRESENT: true. <debug> app: Bank contents: <debug> app: Bank 0 code: 0x01: Size: 0x395C4 <debug> app: Bank 1 code: 0x00: Size: 0x0 <debug> app: pass: 0. <debug> app: cache_address: 0x61000. <debug> app: cache_too_small: true. <debug> app: keep_firmware: false. <debug> app: delete_more: true. <debug> app: pass: 1. <debug> app: cache_address: 0x27000. <debug> app: cache_too_small: false. <debug> app: keep_firmware: true. <debug> app: delete_more: true. <debug> app: Invalidating app. <debug> nrf_dfu_validation: Write address set to 0x00027000 <debug> nrf_dfu_settings: Writing settings... <debug> nrf_dfu_settings: Erasing old settings at: 0x0007F000 <debug> nrf_dfu_flash: nrf_fstorage_erase(addr=0x0x0007F000, len=1 pages), queue usage: 0 <debug> nrf_dfu_flash: nrf_fstorage_write(addr=0x0007F000, src=0x200076AC, len=896 bytes), queue usage: 1 <info> nrf_dfu_settings: Backing up settings page to address 0x7E000. <debug> nrf_dfu_settings: Writing settings... <debug> nrf_dfu_settings: Erasing old settings at: 0x0007E000 <debug> nrf_dfu_flash: nrf_fstorage_erase(addr=0x0x0007E000, len=1 pages), queue usage: 2 <debug> nrf_dfu_flash: nrf_fstorage_write(addr=0x0007E000, src=0x20007328, len=896 bytes), queue usage: 3 <debug> nrf_dfu_req_handler: Writing valid init command to flash. <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_SELECT (data) <debug> nrf_dfu_req_handler: crc = 0x0, offset = 0x0, max_size = 0x1000 <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_SELECT (data) <debug> nrf_dfu_req_handler: crc = 0x0, offset = 0x0, max_size = 0x1000 <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_flash: Flash erase success: addr=0x0007F000, pending 3 <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_CREATE (data) <debug> nrf_dfu_flash: nrf_fstorage_erase(addr=0x0x00027000, len=1 pages), queue usage: 3 <debug> nrf_dfu_req_handler: Creating object with size: 4096. Offset: 0x00000000, CRC: 0x00000000 <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_flash: Flash write success: addr=0x0007F000, pending 3 <debug> nrf_dfu_ble: Buffer 0x20007EEC acquired, len 244 (244) <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_WRITE (data) <debug> nrf_dfu_flash: nrf_fstorage_write(addr=0x00027000, src=0x20007EEC, len=244 bytes), queue usage: 3 <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <debug> nrf_dfu_ble: Buffer 0x20007FE0 acquired, len 244 (244) <debug> nrf_dfu_req_handler: Handle NRF_DFU_OP_OBJECT_WRITE (data) <debug> nrf_dfu_flash: nrf_fstorage_write(addr=0x000270F4, src=0x20007FE0, len=244 bytes), queue usage: 4 <debug> nrf_dfu_req_handler: Request handling complete. Result: 0x1 <error> hardfault: HARD FAULT at 0x000258F2 <error> hardfault: R0: 0x00000001 R1: 0x4001E400 R2: 0x4001E504 R3: 0x4001E508 <error> hardfault: R12: 0x20006198 LR: 0x000245D3 PSR: 0x010F0018 <error> hardfault: Cause: Data bus error (return address in the stack frame is not related to the instruction that caused the error).
Breaking on HardFault_process, here's the backtrace from GDB:
Breakpoint 2, HardFault_process (p_stack=0x2001ff78) at ../../../../../components/libraries/hardfault/hardfault_implementation.c:56 56 __WEAK void HardFault_process(HardFault_stack_t * p_stack) (gdb) bt #0 HardFault_process (p_stack=0x2001ff78) at ../../../../../components/libraries/hardfault/hardfault_implementation.c:56 #1 HardFault_c_handler (p_stack_address=0x2001ff78) at ../../../../../components/libraries/hardfault/hardfault_implementation.c:154 #2 <signal handler called> #3 0x000258f2 in ?? () #4 0x000245d2 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
As you can see, execution is stuck inside the SoftDevice, and it's hard to figure out what is going on. The backtrace is all within the black-box SoftDevice space.
The root cause turned out to be that ACL was still set to write-protect Bank 0 from the initial app load from bootloader: right before the bootloader executes the app, it write-protects Bank 0 by default. When the app transitions back to the bootloader, it needs to perform a full reset (NVIC_SystemReset) to clear the protection, rather than a jump to the bootloader address. This way, Bank 0 is available for a single-bank update, since my firmware is too big for a dual-bank update.
Retrospectively, before I had added the debug hard fault handler, I had missed one clue from a register dump when the hard fault is hit:
(gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. NMI_Handler () at ../../../../../modules/nrfx/mdk/gcc_startup_nrf52833.S:302 302 b . (gdb) bt #0 NMI_Handler () at ../../../../../modules/nrfx/mdk/gcc_startup_nrf52833.S:302 #1 <signal handler called> #2 0x000258f2 in ?? () #3 0x000245d2 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) i r r0 0x68 104 r1 0x258f0 153840 r2 0x27000 159744 r3 0x722db 467675 r4 0x200001b0 536871344 r5 0x1 1 r6 0xc5c10 810000 r7 0x1 1 r8 0x1 1 r9 0x0 0 r10 0x73fa5 475045 r11 0x7d2b8 512696 r12 0x200060b3 536895667 sp 0x2001ff88 0x2001ff88 lr 0xfffffff1 4294967281 pc 0x722da 0x722da <HardFault_Handler> xpsr 0x210f0003 554631171 msp 0x2001ff88 537001864 psp 0x20010830 536938544 primask 0x0 0 basepri 0x0 0 faultmask 0x0 0 control 0x0 0 fpscr 0x0 0
r3 is set to 0x27000, which lines up with the start of Bank 0 in flash. The crash was delayed because fstorage writes are queued, and flash operations are handled (asynchronously?) by the SoftDevice. However, it seems like the SoftDevice doesn't have a way of signaling clearly, "Hey, this hard fault happened during X operation." Or, when flash erase/writes are happening, the SoftDevice should handle hard faults momentarily to catch the error so it can be sent back as an event. Otherwise, the context of the error is lost.
Is there an easier way to see what a hard fault was caused by when execution is occurring in the SoftDevice? Is there a documented way to determine what type of operation the SoftDevice was executing?
Thanks in advance.