This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

sd_ble_enable hardfaults in bootloader

Summary

  • I do a bootloader/softdevice update via the dual-bank BLE DFU example from nRFSDK11 with SD132 v2
  • When the new bootloader (the SDK11 bootloader example ported to SDK14 and SD132 v5) starts, it hangs in the call to nrf_sdh_ble_enable, specifically in the softdevice sd_ble_init_call
  • Though the bootloader we're upgrading to is different from the example in SDK14, most of the differences are after the point at which this problem occurs.

Details

I'm working on upgrading our system from nRF SDK11 / SD132 v2 to nRF SDK14 / SD132 v5. When I do a DFU update from our current SDK11/SD132 bootloader (which is the SDK11 DFU bootloader example, slightly modified to talk to another device via SPI) to the new stack, when the new bootloader comes up the system hangs somewhere inside the call to sd_ble_enable.

Note: The bootloader I'm updating from (the SDK11 example) is patched to support updating to larger bootloaders.

I'm using a port of the old legacy DFU bootloader because the bootloader settings structs are incompatible between the SDK11 and SDK14 bootloader examples. We're fine with the lack of security - this is a connectivity application only.

The SDK14 port of the bootloader works well when programmed with a jlink flasher along with the application and the softdevice. However, when updating to the new softdevice via DFU the system hangs in sd_ble_enable.

I am confident I am not stuck in the app fault handler, as I have it instrumented to toggle some GPIOs when it executes and I do not see that happen.

I have read back the flash and it appears that I have

  • The MBR from S132 v2, as expected
  • S132 v5 SD, as expected
  • The bootloader I expect
  • Bootloader settings that indicate a valid softdevice with the correct size in bank 1

The initialization sequence looks like this (with some conditionals removed where I know their outcomes):

app_timer_init();
nrf_drv_clock_init();
// this is a port of the SDK11 bootloader_init(); the pstorage implementation is the one
// used in the NRFSDK14 ANT bootloader, which was ported by Nordic to this version
pstorage_module_param_t storage_params = {.cb = pstorage_callback_handler};

err_code = pstorage_init();
VERIFY_SUCCESS(err_code);

m_bootsettings_handle.block_id = BOOTLOADER_SETTINGS_ADDRESS;
err_code = pstorage_register(&storage_params, &m_bootsettings_handle);
// Because this is a connectivity application, we use SPIS to talk to the main chip
spis_init();
dfu_start = app_reset;
// This is true - the problem is seen on the first boot to this code during DFU update
if (bootloader_dfu_sd_in_progress())
{
    err_code = bootloader_dfu_sd_update_continue(); // Goes OK
    APP_ERROR_CHECK(err_code);
    ble_stack_init(!app_reset); // Halts in here
    scheduler_init();
    err_code = bootloader_dfu_sd_update_finalize();
    APP_ERROR_CHECK(err_code);
    dfu_start = true;
}

static void ble_stack_init(bool init_softdevice)
{
uint32_t         err_code;
sd_mbr_command_t com = {SD_MBR_COMMAND_INIT_SD, };
uint32_t ram_start;
//ble_cfg_t ble_cfg   = {{0}};
//if (init_softdevice)
if (true)
{
    signal(4);
    err_code = sd_mbr_command(&com);
    APP_ERROR_CHECK(err_code);
}
err_code = sd_softdevice_vector_table_base_set(BOOTLOADER_REGION_START);
err_code = nrf_sdh_enable_request();
APP_ERROR_CHECK(err_code);
// Fetch the start address of the application RAM.
err_code = nrf_sdh_ble_app_ram_start_get(&ram_start);
APP_ERROR_CHECK(err_code);
err_code = nrf_sdh_ble_default_cfg_set(1, &ram_start);
APP_ERROR_CHECK(err_code);
// -----------------------------------------------------
// The code never returns from this call
err_code = nrf_sdh_ble_enable(&ram_start);
// -----------------------------------------------------
APP_ERROR_CHECK(err_code);
}

Update

I attached a debugger to the system while it was in this hanged state. It is in the default hardfault handler.

The following information is based on inspecting the system state with GDB and cross referencing with infocenter.arm.com/.../index.jsp , specifically the exception entry (for stacking) and fault handling (for CFSR meanings) sections.

  • The fault is a UsageFaultfor an InvalidInstruction(per inspecting the CFSR).
  • Inspecting the LR shows that the MSP was in use and the system was in handler mode, so this is probably a trace from the softdevice
  • Inspecting the data stacked on exception entry gives us the state of system registers when the exception occurred

The system data is:

(gdb) p/x $lr
$20 = 0xfffffff1  // Bit 2 is set indicating MSP was in use
(gdb) p/x $msp
$21 = 0x2000fdb8  // Contents of MSP
(gdb) p/x *(uint32_t *)$msp  // Should be r0 at exception time
$13 = 0x2000fdd8
(gdb) p/x *(uint32_t *)($msp+4)  // r1 at exception time
$14 = 0x2000fe04
(gdb) p/x *(uint32_t *)($msp+8)  // r2
$15 = 0x55
(gdb) p/x *(uint32_t *)($msp+12)  // r3
$17 = 0x1
(gdb) p/x *(uint32_t *)($msp+16)  // r12
$16 = 0x0
(gdb) p/x *(uint32_t *)($msp+20)  // lr
$18 = 0x9af3
(gdb) p/x *(uint32_t *)($msp+24)  // pc
$19 = 0xe054

0xE054 does appear to be within the softdevice (by inspecting the softdevice hex). Not sure why this would cause an illegal instruction fault.

Has anybody else experienced this, or have some ideas for where to look next?

Related