Failure in ble_stack_init during startup

We are trying to bring up a new run of an existing PCB design. No changes in the area of the processor, but a different assembly house. The first call inside ble_stack_init, a call to nrf_sdh_enable_request(), fails and returns an error code of 8. No reference anywhere gives a usable explanation.

Strangely, if started by "copying" the hex file onto the Jlink "drive" (the nRF52832 DevKit), it succeeds.

What should we be looking for?

Parents
  • Hello,

    It seems like the issue might be with how the firmware is programmed since it appears to work when you use drag&drop programming. How are you programming the device when it fails? Is the same error also returned after a reset? Note that the "SoftDevice enable" function may return NRF_ERROR_INVALID_STATE (8) if the debugger forces execution to start from the application start address instead of address 0x0. This prevents the softdevice's reset handler from running on startup.

    Best regards,

    Vidar

  • I was starting it from SES (Segger) just as I have on dozens of previous nRF52832 projects, and indeed successfully on previous runs of this board. So there's something different in the hardware, and I'm asking where to look. What sort of "invalid state" is likely?

  • Some more digging:

    Per the above, the first time in from a startup via either power cycle or SES F5, both nrf_sdh_enable_request() and nrf_sdh_ble_enable() return an error code of 8, and sometime after, the error handler gets called with the report above - no pc, nothing to indicate from what it was called.

    I have also added probes that show that the reset is coming from the call to sd_softdevice_enable(), but that is probably just a symptom of the soft device not being properly started above. Once it happens, though, I get continual resets, entering nrf_sdh_enable_request() and crashing with a reset in sd_softdevice_enable() every time.

    I'll be asleep when you come in to work this morning (Monday) as we're 6 hours behind you. But I should be online by about noon your time.

  • The inconsistencies are making it difficult to troubleshoot this issue. For example, earlier you indicated that the problem did not occur while debugging:

    SteveHx said:
    If run with Build and Debug, it runs fine

    But now it fails advertising_init() with NRF_ERROR_NO_MEM, or in sd_softdevice_enable() with NRF_ERROR_INVALID_STATE.

    SteveHx said:
    I have also added probes that show that the reset is coming from the call to sd_softdevice_enable(), but that is probably just a symptom of the soft device not being properly started above

    The softdevice call can return an error code, but it cannot trigger a system reset. Assuming this function is only called once during startup, the only reason it can return NRF_ERROR_INVALID_STATE is that the softdevice's reset handler was not executed during the boot sequence due to the debugger.

    Please confirm that you have the following debugger setting in your project configuration 

    Just to illustrate, here is what it looks like if I enable this setting on my side:

    Crash log if I let it run to error handler:

    If I place a breakpoint at the line number reported by the crashlog:

    And when I step into the nrf_sdh_enable_request() function:

    SteveHx said:
    The system starts up fine from either loading firmware via drag&drop or from a programmed NVIC_SystemReset

    I assume you are calling NVIC_SystemReset() from your error handler, since the device is entering a boot loop. In that case, any runtime error should have causes the device to recover from this state.

  • Nothing inconsistent except the assumption that by F5 I mean Build&Debug. F5 on SES is Build&Run. To summarize:

    If run via drag&drop firmware loading, it runs fine for hours.

    If run via F5 (Build&Run), or if run from a power cycle, it resets continuously.

    When I trigger an erase of my external NV memory via a BLE command, the last step is to do NVIC_SystemReset, and that reset results in a successful run.

    If I could get it to run from power-up, I'd be done.

    Running identical firmware on a previous rev board does not result in these resets, so clearly there's some hardware difference. Is it possible that a reset could be caused by the 32 KHz oscillator not yet being up and stable at the call to start up BLE? I'm well aware that the osc is sensitive to loading, so directly probing with my scope is not conclusive. How best to determine if the 32 KHz osc is running, via some section of the chip I can access in my code?

  • Probing the idea of the LF clock further. On a board that starts up fine from F5 or power up, I sample LFCLKSTAT early in the startup - all zeroes. I attempt to start it with

    NRF_CLOCK -> TASKS_LFCLKSTOP;
    NRF_CLOCK -> LFCLKSRC = 1;
    NRF_CLOCK -> TASKS_LFCLKSTART;

    And then test LFCLKSTAT up to 1 million times, always zero. On the good board, proceeding thru the BLE initialization and then testing LFCLKSTAT yields the expected 00010001. So I seem to have added another mystery: Why can't I start the clock? And the fact that it hasn't started prior to calling ble_stack_init() doesn't seem to be a problem. But if ble_stack_init() can't get it started, that could be the source of the hangups / resets on the new rev board.

    Am I at all close here?

  • F5 has always been assigned to "Debug->Go" on my setups (linux, macOS, and Windows with SES v5.x), hence my assumption that you were starting a debug session. "Debug->Go" is equivalent to "Build -> Build&Debug". Either way, thanks for clearing this up.

    I would say the expected failure mode for an LF clock issue is either that the program hangs forever inside sd_softdevice_enable() because the oscillator fails to start, or BLE connectivity issues due to frequency drift. What it cannot do is cause the invalid state error you are seeing.

    I'm still convinced that the issue is that the softdevice's reset handler was skipped during boot. Again, this is expected and a known limitation when using build&run. I do not know why you have not experienced this before with other boards, but the problem can easily get masked as I mentioned earlier. If you want to test for this hypothesis you can add the code below temporary to your project.

    #define RESET_HANDLER_ADDR (*(volatile uint32_t *) 0x4)
    #define SOFTDEVICE_RESET_HANDLER_EXECUTED 0xAA
    
    typedef void (*reset_handler_t)(void);
    
    static void run_reset_handler_once(void)
    {
        if (NRF_POWER->GPREGRET == SOFTDEVICE_RESET_HANDLER_EXECUTED)
        {
            /* Already ran fon af previous reset — skip */
            NRF_POWER->GPREGRET = 0xFF;
            return;
        }
    
        /* Mark it before jumping, in case reset handler doesn't return */
        NRF_POWER->GPREGRET = SOFTDEVICE_RESET_HANDLER_EXECUTED;
    
        ((reset_handler_t)(RESET_HANDLER_ADDR))();
    }
    
    /**@brief Function for application main entry.
     */
    int main(void)
    {
        /* For debugging purposes - manually invoke the MBR->Softdevice's reset handler */
        run_reset_handler_once();
        

Reply
  • F5 has always been assigned to "Debug->Go" on my setups (linux, macOS, and Windows with SES v5.x), hence my assumption that you were starting a debug session. "Debug->Go" is equivalent to "Build -> Build&Debug". Either way, thanks for clearing this up.

    I would say the expected failure mode for an LF clock issue is either that the program hangs forever inside sd_softdevice_enable() because the oscillator fails to start, or BLE connectivity issues due to frequency drift. What it cannot do is cause the invalid state error you are seeing.

    I'm still convinced that the issue is that the softdevice's reset handler was skipped during boot. Again, this is expected and a known limitation when using build&run. I do not know why you have not experienced this before with other boards, but the problem can easily get masked as I mentioned earlier. If you want to test for this hypothesis you can add the code below temporary to your project.

    #define RESET_HANDLER_ADDR (*(volatile uint32_t *) 0x4)
    #define SOFTDEVICE_RESET_HANDLER_EXECUTED 0xAA
    
    typedef void (*reset_handler_t)(void);
    
    static void run_reset_handler_once(void)
    {
        if (NRF_POWER->GPREGRET == SOFTDEVICE_RESET_HANDLER_EXECUTED)
        {
            /* Already ran fon af previous reset — skip */
            NRF_POWER->GPREGRET = 0xFF;
            return;
        }
    
        /* Mark it before jumping, in case reset handler doesn't return */
        NRF_POWER->GPREGRET = SOFTDEVICE_RESET_HANDLER_EXECUTED;
    
        ((reset_handler_t)(RESET_HANDLER_ADDR))();
    }
    
    /**@brief Function for application main entry.
     */
    int main(void)
    {
        /* For debugging purposes - manually invoke the MBR->Softdevice's reset handler */
        run_reset_handler_once();
        

Children
  • I'm not sure that would cause the continuous resets I'm seeing, but that's just the code I needed to test for a proper reset. Will run and get back to you shortly.

  • Confirming again, even with your added "run once" reset code: If started from a drag&drop, it runs exactly as expected. If started from F5 (default is Build&Run in SES, which is how I left it), or from a power cycle, it gets to the call to sd_softdevice_enable() in the CRITICAL_REGION in nrf_sdh_enable_request() and resets somewhere before it returns from that call. I know this can't be, but somewhere in this house of cards hides a joker....

  • I have tried to start up the LF clock with the following routines, before any calls into the softdevice. It does not start up on either board, but after a correct startup (any method on the old rev, or via drag&drop on the new boards), it always shows running. I speculate that there's some problem in the softdevice starting the clock, but how to find it?

    void CheckLfClock (void)
    {
    char Str [80];
    sprintf (Str, "LFCLKSTAT %08X", NRF_CLOCK -> LFCLKSTAT);
    LogToJournal (Str);
    }

    void StartLfClock (void)
    {
    LogToJournal ("Starting LF Clock");
    NRF_CLOCK -> TASKS_LFCLKSTOP;
    NRF_CLOCK -> LFCLKSRC = 1;
    NRF_CLOCK -> TASKS_LFCLKSTART;
    }

    void AwaitLfClock (void)
    {
    int i;
    char Str [80];
    for (i = 1000000; i; --i) if (NRF_CLOCK -> LFCLKSTAT) break;
    sprintf (Str, "%d tries remaining", i);
    LogToJournal (Str);
    }

    I also tried clearing SOFTDEVICE_RESET_HANDLER_EXECUTED after recording its value and before the call to the softdevice, thinking I could at least trap any reset occurring there. No joy.

Related