Failure in ble_stack_init during startup

We are trying to bring up a new run of an existing PCB design. No changes in the area of the processor, but a different assembly house. The first call inside ble_stack_init, a call to nrf_sdh_enable_request(), fails and returns an error code of 8. No reference anywhere gives a usable explanation.

Strangely, if started by "copying" the hex file onto the Jlink "drive" (the nRF52832 DevKit), it succeeds.

What should we be looking for?

Parents
  • Hello,

    It seems like the issue might be with how the firmware is programmed since it appears to work when you use drag&drop programming. How are you programming the device when it fails? Is the same error also returned after a reset? Note that the "SoftDevice enable" function may return NRF_ERROR_INVALID_STATE (8) if the debugger forces execution to start from the application start address instead of address 0x0. This prevents the softdevice's reset handler from running on startup.

    Best regards,

    Vidar

  • I was starting it from SES (Segger) just as I have on dozens of previous nRF52832 projects, and indeed successfully on previous runs of this board. So there's something different in the hardware, and I'm asking where to look. What sort of "invalid state" is likely?

  • ResetReason = NRF_POWER -> RESETREAS;
    NRF_POWER -> RESETREAS = ResetReason;

    The debugger works poorly or not at all beyond being a program loader when using the softdevice. I added the code you suggested above, and never got it to output (never reached). I added some probes to see where we're going off the rails, and now we're resetting somewhere inside ble_stack_init(). I'll add some more probes inside there.

    How late are you working today? Pretty sure you're in a time zone several hours ahead of me.

  • Correction to the above: It's derailing in nry_sdh_enable_request, which is the first function called from ble_stack_init.

  • In nrf_sdh_enable_request(), it's resetting in this critical region:

        CRITICAL_REGION_ENTER();
    #ifdef ANT_LICENSE_KEY
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler, ANT_LICENSE_KEY);
    #else
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler);
    #endif
        m_nrf_sdh_enabled = (ret_code == NRF_SUCCESS);
        CRITICAL_REGION_EXIT();

    According to the highlighting by SES, ANT_LICENSE_KEY is not defined, so it's resetting semoewhere in sd_softdevice_enable, but apparently without getting to app_error_fault_handler. I have no visibility into the softdevice, so please wave your magic wand over it and tell me hwat I'm doing wrong.

  • The limitation when debugging with the softdevice is that you generally cannot continue execution after halting the CPU (for example, when hitting a breakpoint), as the softdevice will detect that it has failed to meet its timing requirements and trigger an error (app_error_fault_handler() is called with id=NRF_FAULT_ID_SD_ASSERT). But I don't see why this should be a problem here.

    In any case, based on what you said earlier, the problem is not reproducible in debug mode. So you would need to use nrfjprog --memrd <register address> from the command line or similar approach to read the register from the running target after the problem has occurred. The challenge with this here is that you may end up reading the register after it has already been cleared by your FW. So I suggest commenting the NRF_POWER -> RESETREAS = ResetReason; line first, then see if it still reads as 0 when it is going in the boot loop. 

    n nrf_sdh_enable_request(), it's resetting in this critical region:

    There is nothing that can lead to NVIC_SystemReset() being called within this function. You can search through your project for NVIC_SystemReset() to see if is called in any other places than the error handler to verify.

    I added the code you suggested above, and never got it to output (never reached).

    Does this mean that you have redefined the  app_error_fault_handler() in your project? The picture I posted is showing the default handler used in the SDK examples.

    I will be signing off now (I'm based in Norway, so I'm a few hours ahead), but I hope you will be able to gather some more concrete debugging evidence of what is happening.

  • The debugger is so cumbersome as to be completely useless with a softdevice. Hence I have devised my own debugging methods. I have nonvolatile memory available, and a mechanism whereby to record any events of interest, even in a failure to start, such that subsequently getting a successful startup allows me to read the whole history. Very flexible, with recording of whatever plain text is most illuminating for the problem being diagnosed, and it works quite well.

    I have added code within app_error_fault_handler() to record all the details if that handler is called. I just now ran another complete test:

    1) Boot properly via drag&drop, run erase command on NV memory.
    2) Boot via SES F5, observe LED showing repeated resets.
    3) Boot properly via drag&drop, download saved journal.

    I could send the full journa/, but probably more useful to summarize.

    1st boot (1 above) is shown properly.

    2nd attempt, using F5:
       Starts up, gets thru services_init() in main(), then crashes somewhere in advertising_init() with the following message:

       Fault 00004001 00000000 2000FE8C ERROR 4 [NRF_ERROR_NO_MEM] at (null):0
       PC at: 0x00000000

    The above message was formatted by these two lines (in the fault handler). Infotext does just what's expected; these two sprintf's are physically in different routines:

        sprintf (InfoText, "ERROR %u [%s] at %s:%u\r\nPC at: 0x%08x",
                              p_info->err_code,
                              nrf_strerror_get(p_info->err_code),
                              p_info->p_file_name,
                              p_info->line_num,
                              pc);
        sprintf (Str, "\r\nFault %08X %08X %08X %s\r\n", id, pc, info, InfoTxt);

    Thus the values are:

    id - 00004001
    pc - 00000000 (?!?!?)
    info - 2000FE8C
    err_code - 4
    err_string - NRF_ERROR_NO_MEM
    file_name - null
    line_num - 0
    pc - 00000000

    So clearly something went off the rails badly.

    Next the system reboots, and gets as far as calling nrf_sdh_enable_request() from within ble_stack_init(). This drills down to nrf_sdh_enable_request(), which runs to a print just prior to the CRITICAL_REGION, and then reboots from within the CRITICAL_REGION. The only thing called from within there is sd_softdevice_enable(), so the reboot is happening from inside the softdevice. Recall that this is after the above no-memory error.

    From here, it repeatedly reboots and crashes in the CRITICAL_REGION until I intervene and cause a correct reboot via drag&drop in order to read out the record.

    Are you as baffled as me yet?!?

Reply
  • The debugger is so cumbersome as to be completely useless with a softdevice. Hence I have devised my own debugging methods. I have nonvolatile memory available, and a mechanism whereby to record any events of interest, even in a failure to start, such that subsequently getting a successful startup allows me to read the whole history. Very flexible, with recording of whatever plain text is most illuminating for the problem being diagnosed, and it works quite well.

    I have added code within app_error_fault_handler() to record all the details if that handler is called. I just now ran another complete test:

    1) Boot properly via drag&drop, run erase command on NV memory.
    2) Boot via SES F5, observe LED showing repeated resets.
    3) Boot properly via drag&drop, download saved journal.

    I could send the full journa/, but probably more useful to summarize.

    1st boot (1 above) is shown properly.

    2nd attempt, using F5:
       Starts up, gets thru services_init() in main(), then crashes somewhere in advertising_init() with the following message:

       Fault 00004001 00000000 2000FE8C ERROR 4 [NRF_ERROR_NO_MEM] at (null):0
       PC at: 0x00000000

    The above message was formatted by these two lines (in the fault handler). Infotext does just what's expected; these two sprintf's are physically in different routines:

        sprintf (InfoText, "ERROR %u [%s] at %s:%u\r\nPC at: 0x%08x",
                              p_info->err_code,
                              nrf_strerror_get(p_info->err_code),
                              p_info->p_file_name,
                              p_info->line_num,
                              pc);
        sprintf (Str, "\r\nFault %08X %08X %08X %s\r\n", id, pc, info, InfoTxt);

    Thus the values are:

    id - 00004001
    pc - 00000000 (?!?!?)
    info - 2000FE8C
    err_code - 4
    err_string - NRF_ERROR_NO_MEM
    file_name - null
    line_num - 0
    pc - 00000000

    So clearly something went off the rails badly.

    Next the system reboots, and gets as far as calling nrf_sdh_enable_request() from within ble_stack_init(). This drills down to nrf_sdh_enable_request(), which runs to a print just prior to the CRITICAL_REGION, and then reboots from within the CRITICAL_REGION. The only thing called from within there is sd_softdevice_enable(), so the reboot is happening from inside the softdevice. Recall that this is after the above no-memory error.

    From here, it repeatedly reboots and crashes in the CRITICAL_REGION until I intervene and cause a correct reboot via drag&drop in order to read out the record.

    Are you as baffled as me yet?!?

Children
No Data
Related