Failure in ble_stack_init during startup

We are trying to bring up a new run of an existing PCB design. No changes in the area of the processor, but a different assembly house. The first call inside ble_stack_init, a call to nrf_sdh_enable_request(), fails and returns an error code of 8. No reference anywhere gives a usable explanation.

Strangely, if started by "copying" the hex file onto the Jlink "drive" (the nRF52832 DevKit), it succeeds.

What should we be looking for?

Parents
  • Hello,

    It seems like the issue might be with how the firmware is programmed since it appears to work when you use drag&drop programming. How are you programming the device when it fails? Is the same error also returned after a reset? Note that the "SoftDevice enable" function may return NRF_ERROR_INVALID_STATE (8) if the debugger forces execution to start from the application start address instead of address 0x0. This prevents the softdevice's reset handler from running on startup.

    Best regards,

    Vidar

  • I was starting it from SES (Segger) just as I have on dozens of previous nRF52832 projects, and indeed successfully on previous runs of this board. So there's something different in the hardware, and I'm asking where to look. What sort of "invalid state" is likely?

  • Interesting thought. There's no reason in my code that it should sleep within a few minutes of startup. I tried your suggestion; no change. If run with a power cycle, or with F5 from within SES, it still goes into a continuous reset loop. If run with Build and Debug, it runs fine. I have code at the beginning of main() that flashes an LED via direct NRF_GPIO access, so no calls to anything in the Nordic code. I also tried adding the line you suggested above both before and after that code; no change.

    I have nonvolatile memory external to the nRF52832, wherein I can record a journal of actions. One thing I record is RESETREAS after each startup. It appears to have a value of 0x00000004 on a successful startup, and 0 when stuck in a loop. I will investigate that further.

    Suggestions based on the above?

  • Here's something more interesting. I recently added logging to my journal for app_error_fault_handler(). I have a breakpoint at the first line of that function, which never hit. However, DEBUG is apparently defined when doing Build and Debug, so my logging was tripped even though the code never (?) went thru the breakpoint. It recorded the function parameters: id - 00004001, pc - 00000000, info - 2000FF6C. Interpretation?

  • Digging further back into my journal, APP_ERROR_CHECK was tripped when nrf_sdh_ble_enable return an error code of 8. Leading up to this, nrf_sdh_enable_request had returned 8 also, but nrf_sdh_ble_default_cfg_set had returned 0.

    In attempting to troubleshoot this, I had added a call no nrf_sdh_disable_request at the very beginning of ble_stack_init, ignoring any returned error code from that. Removing that line doesn't change anythng.

  • An additional clue: When I erase my external nonvolatile memory (via a command on the BLE link), I finish by sending an acknowledgement message via BLE and then call NVIC_SystemReset(), and the system restarts perfectly. The message never shows up on the BLE link, presumably because the system gets reset before it goes out. The message acknowledging the beginning of the erase process always shows up, and the erase is indeed successful.

  • If run with Build and Debug, it runs fine.

    What I wanted to confirm is whether it continues to run fine on subsequent reboots as well.

    thing I record is RESETREAS after each startup. It appears to have a value of 0x00000004 on a successful startup, and 0

    Does your FW clear the register after reading it? Is important to remember that this is a retained register. If it is zero it indicates a POR or BOR reset. Another question is if you can trust the recorded value if the device is going in a boot loop. I would instead suggest that you read out the RESETREAS register with your debugger.

    DEBUG is apparently defined when doing Build and Debug, so my logging was tripped even though the code neve

    Whether DEBUG is defined or not depends on your build configuration. We include it in our debug build configurations in our SDK examples:

    id - 00004001, pc - 00000000, info - 2000FF6C

    You can get it to print out the file name, line number and error value by retrieving the error from the info pointer as done in the default handler here: 

    finish by sending an acknowledgement message via BLE and then call NVIC_SystemReset(), and the system restarts perfectly.

    Does it continue to work after this? E.g., after a power cycle.

Reply
  • If run with Build and Debug, it runs fine.

    What I wanted to confirm is whether it continues to run fine on subsequent reboots as well.

    thing I record is RESETREAS after each startup. It appears to have a value of 0x00000004 on a successful startup, and 0

    Does your FW clear the register after reading it? Is important to remember that this is a retained register. If it is zero it indicates a POR or BOR reset. Another question is if you can trust the recorded value if the device is going in a boot loop. I would instead suggest that you read out the RESETREAS register with your debugger.

    DEBUG is apparently defined when doing Build and Debug, so my logging was tripped even though the code neve

    Whether DEBUG is defined or not depends on your build configuration. We include it in our debug build configurations in our SDK examples:

    id - 00004001, pc - 00000000, info - 2000FF6C

    You can get it to print out the file name, line number and error value by retrieving the error from the info pointer as done in the default handler here: 

    finish by sending an acknowledgement message via BLE and then call NVIC_SystemReset(), and the system restarts perfectly.

    Does it continue to work after this? E.g., after a power cycle.

Children
  • ResetReason = NRF_POWER -> RESETREAS;
    NRF_POWER -> RESETREAS = ResetReason;

    The debugger works poorly or not at all beyond being a program loader when using the softdevice. I added the code you suggested above, and never got it to output (never reached). I added some probes to see where we're going off the rails, and now we're resetting somewhere inside ble_stack_init(). I'll add some more probes inside there.

    How late are you working today? Pretty sure you're in a time zone several hours ahead of me.

  • Correction to the above: It's derailing in nry_sdh_enable_request, which is the first function called from ble_stack_init.

  • In nrf_sdh_enable_request(), it's resetting in this critical region:

        CRITICAL_REGION_ENTER();
    #ifdef ANT_LICENSE_KEY
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler, ANT_LICENSE_KEY);
    #else
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler);
    #endif
        m_nrf_sdh_enabled = (ret_code == NRF_SUCCESS);
        CRITICAL_REGION_EXIT();

    According to the highlighting by SES, ANT_LICENSE_KEY is not defined, so it's resetting semoewhere in sd_softdevice_enable, but apparently without getting to app_error_fault_handler. I have no visibility into the softdevice, so please wave your magic wand over it and tell me hwat I'm doing wrong.

  • The limitation when debugging with the softdevice is that you generally cannot continue execution after halting the CPU (for example, when hitting a breakpoint), as the softdevice will detect that it has failed to meet its timing requirements and trigger an error (app_error_fault_handler() is called with id=NRF_FAULT_ID_SD_ASSERT). But I don't see why this should be a problem here.

    In any case, based on what you said earlier, the problem is not reproducible in debug mode. So you would need to use nrfjprog --memrd <register address> from the command line or similar approach to read the register from the running target after the problem has occurred. The challenge with this here is that you may end up reading the register after it has already been cleared by your FW. So I suggest commenting the NRF_POWER -> RESETREAS = ResetReason; line first, then see if it still reads as 0 when it is going in the boot loop. 

    n nrf_sdh_enable_request(), it's resetting in this critical region:

    There is nothing that can lead to NVIC_SystemReset() being called within this function. You can search through your project for NVIC_SystemReset() to see if is called in any other places than the error handler to verify.

    I added the code you suggested above, and never got it to output (never reached).

    Does this mean that you have redefined the  app_error_fault_handler() in your project? The picture I posted is showing the default handler used in the SDK examples.

    I will be signing off now (I'm based in Norway, so I'm a few hours ahead), but I hope you will be able to gather some more concrete debugging evidence of what is happening.

Related