Failure in ble_stack_init during startup

We are trying to bring up a new run of an existing PCB design. No changes in the area of the processor, but a different assembly house. The first call inside ble_stack_init, a call to nrf_sdh_enable_request(), fails and returns an error code of 8. No reference anywhere gives a usable explanation.

Strangely, if started by "copying" the hex file onto the Jlink "drive" (the nRF52832 DevKit), it succeeds.

What should we be looking for?

Parents
  • Hello,

    It seems like the issue might be with how the firmware is programmed since it appears to work when you use drag&drop programming. How are you programming the device when it fails? Is the same error also returned after a reset? Note that the "SoftDevice enable" function may return NRF_ERROR_INVALID_STATE (8) if the debugger forces execution to start from the application start address instead of address 0x0. This prevents the softdevice's reset handler from running on startup.

    Best regards,

    Vidar

  • I was starting it from SES (Segger) just as I have on dozens of previous nRF52832 projects, and indeed successfully on previous runs of this board. So there's something different in the hardware, and I'm asking where to look. What sort of "invalid state" is likely?

  • I just discovered that the working boards were built with the nRF52832QFAAE0, while the boards that fail have the E1 variant. Extensive searching online did not reveal any relevant differences. Can this be the whole problem?

  • The observations you have described do not indicate an LF clock issue. If I have understood you correctly, these are the three problems you have observed:

    1. Failure in advertising_init() with NRF_ERROR_NO_MEM.
    2. NRF_ERROR_INVALID_STATE returned from sd_softdevice_enable() (again, this is expected on the first run when using Build & Run, unless you still have my run_reset_handler_once() test function at the start of main()).
    3. A sudden reset with an unconfirmed reset source occurring somewhere in ble_stack_init().

    However, it is unclear to me under which conditions these issues occur and when they do not. To find the root cause and narrow down the problem, we first need to understand exactly what the error is.

    If you still want to test with the LF RC oscillator, you can remove the test code you posted, and instead apply the following changes to the Softdevice clock configuration in sdk_config.h:

    SteveHx said:
    void StartLfClock (void)
    {
    LogToJournal ("Starting LF Clock");
    NRF_CLOCK -> TASKS_LFCLKSTOP;
    NRF_CLOCK -> LFCLKSRC = 1;
    NRF_CLOCK -> TASKS_LFCLKSTART;
    }

    Note that a task is triggered by writing 1 to the TASK register. The code above is not writing anything to the register.

  • Note that a task is triggered by writing 1 to the TASK register. The code above is not writing anything to the register.

    Doohhhhhh.... I knew that! (He says while peeling egg from his face!) And here we see why it's really helpful to have a second person look at my code!

    Back to the issue at hand. Only item 3 above is still a problem. If I program with drag&drop, everything runs fine. But even after programming that way and later power cycling, the system appears to reset when sd_softdevice_enable() is called from within nrf_sdh_enable_request() which is called from ble_stack_init() which is called early in main(). So what is different about the way it resets from drag&drop programming vs. how reset works from power up?

  • Starting LFCLK works MUCH better that way! (suprise, surprise!) But when bit 16 of LFCLKSTAT goes high, the LSBs are all zero, indicating that it's running the RC clock rather than the xtal clock I requested.

    And with LFCLK running, I'm getting some other funny behavior that I need to investigate further.

    void CheckLfClock (void)
    {
    char Str [80];
    sprintf (Str, "LFCLKSTAT %08X", NRF_CLOCK -> LFCLKSTAT);
    LogToJournal (Str);
    }

    void StartLfClock (void)
    {
    LogToJournal ("Starting LF Clock");
    NRF_CLOCK -> TASKS_LFCLKSTOP = 1;
    NRF_CLOCK -> LFCLKSRC = 1;
    NRF_CLOCK -> TASKS_LFCLKSTART = 1;
    }

    void AwaitLfClock (void)
    {
    int i;
    char Str [80];
    for (i = 1000000; i; --i) if (NRF_CLOCK -> LFCLKSTAT) break;
    sprintf (Str, "%d tries remaining", i);
    LogToJournal (Str);
    }

    I will have to drop offline shortly and won't be back in the office till after your work hours, both today and tomorrow. I'm hoping you can illuminate what's different about the reset upon drag&drop programming vs. power up.

  • SteveHx said:
    Starting LFCLK works MUCH better that way! (suprise, surprise!) But when bit 16 of LFCLKSTAT goes high, the LSBs are all zero, indicating that it's running the RC clock rather than the xtal clock I requested.

    The clock will start running from the RC oscillator after the start task is triggered while waiting on the crystal oscillator to ramp up. So you must wait on the LF clock started event before checking the status register to determine whether the crystal oscillator was able to start or not.

        NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;
        NRF_CLOCK->TASKS_LFCLKSTART = 1;
        while(NRF_CLOCK->EVENTS_LFCLKSTARTED == 0);
        //code will not reach this point if LFXO fails to start

    Also, it can be problematic to start the oscillator with one source and then have the softdevice select a different source in sd_softdevice_enable(). That was the reasonI suggested removing your test code and instead test with different softdevice clock configurations in my previous reply.

    SteveHx said:
    which is called from ble_stack_init() which is called early in main(). So what is different about the way it resets from drag&drop programming vs. how reset works from power up?

    I can't think of any relevant differences here and I think we are really missing enough concrete information about the failure to make meaningful guesses. It would be better if we could first confirm what is actually failing after a cold boot. Perhaps it could be that the external memory fails to get ready in time after a power up.

    You could try adding a few endless loops at different points in main() and do some test iterations where you remove the loop one by one down until you find the point it is no longer reached or start going in a boot loop.

        CRITICAL_REGION_ENTER();
    #ifdef ANT_LICENSE_KEY
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler, ANT_LICENSE_KEY);
    #else
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler);
        if(ret_code != NRF_SUCCESS) 
        {
            for(;;);
        }
    #endif

Reply
  • SteveHx said:
    Starting LFCLK works MUCH better that way! (suprise, surprise!) But when bit 16 of LFCLKSTAT goes high, the LSBs are all zero, indicating that it's running the RC clock rather than the xtal clock I requested.

    The clock will start running from the RC oscillator after the start task is triggered while waiting on the crystal oscillator to ramp up. So you must wait on the LF clock started event before checking the status register to determine whether the crystal oscillator was able to start or not.

        NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;
        NRF_CLOCK->TASKS_LFCLKSTART = 1;
        while(NRF_CLOCK->EVENTS_LFCLKSTARTED == 0);
        //code will not reach this point if LFXO fails to start

    Also, it can be problematic to start the oscillator with one source and then have the softdevice select a different source in sd_softdevice_enable(). That was the reasonI suggested removing your test code and instead test with different softdevice clock configurations in my previous reply.

    SteveHx said:
    which is called from ble_stack_init() which is called early in main(). So what is different about the way it resets from drag&drop programming vs. how reset works from power up?

    I can't think of any relevant differences here and I think we are really missing enough concrete information about the failure to make meaningful guesses. It would be better if we could first confirm what is actually failing after a cold boot. Perhaps it could be that the external memory fails to get ready in time after a power up.

    You could try adding a few endless loops at different points in main() and do some test iterations where you remove the loop one by one down until you find the point it is no longer reached or start going in a boot loop.

        CRITICAL_REGION_ENTER();
    #ifdef ANT_LICENSE_KEY
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler, ANT_LICENSE_KEY);
    #else
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler);
        if(ret_code != NRF_SUCCESS) 
        {
            for(;;);
        }
    #endif

Children
No Data
Related