Failure in ble_stack_init during startup

We are trying to bring up a new run of an existing PCB design. No changes in the area of the processor, but a different assembly house. The first call inside ble_stack_init, a call to nrf_sdh_enable_request(), fails and returns an error code of 8. No reference anywhere gives a usable explanation.

Strangely, if started by "copying" the hex file onto the Jlink "drive" (the nRF52832 DevKit), it succeeds.

What should we be looking for?

Parents
  • Hello,

    It seems like the issue might be with how the firmware is programmed since it appears to work when you use drag&drop programming. How are you programming the device when it fails? Is the same error also returned after a reset? Note that the "SoftDevice enable" function may return NRF_ERROR_INVALID_STATE (8) if the debugger forces execution to start from the application start address instead of address 0x0. This prevents the softdevice's reset handler from running on startup.

    Best regards,

    Vidar

  • I was starting it from SES (Segger) just as I have on dozens of previous nRF52832 projects, and indeed successfully on previous runs of this board. So there's something different in the hardware, and I'm asking where to look. What sort of "invalid state" is likely?

  • The observations you have described do not indicate an LF clock issue. If I have understood you correctly, these are the three problems you have observed:

    1. Failure in advertising_init() with NRF_ERROR_NO_MEM.
    2. NRF_ERROR_INVALID_STATE returned from sd_softdevice_enable() (again, this is expected on the first run when using Build & Run, unless you still have my run_reset_handler_once() test function at the start of main()).
    3. A sudden reset with an unconfirmed reset source occurring somewhere in ble_stack_init().

    However, it is unclear to me under which conditions these issues occur and when they do not. To find the root cause and narrow down the problem, we first need to understand exactly what the error is.

    If you still want to test with the LF RC oscillator, you can remove the test code you posted, and instead apply the following changes to the Softdevice clock configuration in sdk_config.h:

    SteveHx said:
    void StartLfClock (void)
    {
    LogToJournal ("Starting LF Clock");
    NRF_CLOCK -> TASKS_LFCLKSTOP;
    NRF_CLOCK -> LFCLKSRC = 1;
    NRF_CLOCK -> TASKS_LFCLKSTART;
    }

    Note that a task is triggered by writing 1 to the TASK register. The code above is not writing anything to the register.

  • Note that a task is triggered by writing 1 to the TASK register. The code above is not writing anything to the register.

    Doohhhhhh.... I knew that! (He says while peeling egg from his face!) And here we see why it's really helpful to have a second person look at my code!

    Back to the issue at hand. Only item 3 above is still a problem. If I program with drag&drop, everything runs fine. But even after programming that way and later power cycling, the system appears to reset when sd_softdevice_enable() is called from within nrf_sdh_enable_request() which is called from ble_stack_init() which is called early in main(). So what is different about the way it resets from drag&drop programming vs. how reset works from power up?

  • Starting LFCLK works MUCH better that way! (suprise, surprise!) But when bit 16 of LFCLKSTAT goes high, the LSBs are all zero, indicating that it's running the RC clock rather than the xtal clock I requested.

    And with LFCLK running, I'm getting some other funny behavior that I need to investigate further.

    void CheckLfClock (void)
    {
    char Str [80];
    sprintf (Str, "LFCLKSTAT %08X", NRF_CLOCK -> LFCLKSTAT);
    LogToJournal (Str);
    }

    void StartLfClock (void)
    {
    LogToJournal ("Starting LF Clock");
    NRF_CLOCK -> TASKS_LFCLKSTOP = 1;
    NRF_CLOCK -> LFCLKSRC = 1;
    NRF_CLOCK -> TASKS_LFCLKSTART = 1;
    }

    void AwaitLfClock (void)
    {
    int i;
    char Str [80];
    for (i = 1000000; i; --i) if (NRF_CLOCK -> LFCLKSTAT) break;
    sprintf (Str, "%d tries remaining", i);
    LogToJournal (Str);
    }

    I will have to drop offline shortly and won't be back in the office till after your work hours, both today and tomorrow. I'm hoping you can illuminate what's different about the reset upon drag&drop programming vs. power up.

  • SteveHx said:
    Starting LFCLK works MUCH better that way! (suprise, surprise!) But when bit 16 of LFCLKSTAT goes high, the LSBs are all zero, indicating that it's running the RC clock rather than the xtal clock I requested.

    The clock will start running from the RC oscillator after the start task is triggered while waiting on the crystal oscillator to ramp up. So you must wait on the LF clock started event before checking the status register to determine whether the crystal oscillator was able to start or not.

        NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;
        NRF_CLOCK->TASKS_LFCLKSTART = 1;
        while(NRF_CLOCK->EVENTS_LFCLKSTARTED == 0);
        //code will not reach this point if LFXO fails to start

    Also, it can be problematic to start the oscillator with one source and then have the softdevice select a different source in sd_softdevice_enable(). That was the reasonI suggested removing your test code and instead test with different softdevice clock configurations in my previous reply.

    SteveHx said:
    which is called from ble_stack_init() which is called early in main(). So what is different about the way it resets from drag&drop programming vs. how reset works from power up?

    I can't think of any relevant differences here and I think we are really missing enough concrete information about the failure to make meaningful guesses. It would be better if we could first confirm what is actually failing after a cold boot. Perhaps it could be that the external memory fails to get ready in time after a power up.

    You could try adding a few endless loops at different points in main() and do some test iterations where you remove the loop one by one down until you find the point it is no longer reached or start going in a boot loop.

        CRITICAL_REGION_ENTER();
    #ifdef ANT_LICENSE_KEY
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler, ANT_LICENSE_KEY);
    #else
        ret_code = sd_softdevice_enable(&clock_lf_cfg, app_error_fault_handler);
        if(ret_code != NRF_SUCCESS) 
        {
            for(;;);
        }
    #endif

  • Thank you for the clarification on LFCLK. It appears that everything there works as it should, now that I understand it.

    I had tried previously experimenting along the lines you suggest, with changing the timing vs. startup of the BLE initialization. I was very rushed getting out the door this morning but I saw something (

    that I need to go back and find) suggesting that's the answer.

    Is there any relevant difference between the E0 and E1 variants of the nRF52832QFAA?

    I'm guessing you're just leaving, if not already out the door for today. So have a good evening and look for some more info from me in the morning. Tomorrow I'm gone pretty much all day, except for a brief window in the office around 0530-0630 EDT (0930-1030Z).

Reply
  • Thank you for the clarification on LFCLK. It appears that everything there works as it should, now that I understand it.

    I had tried previously experimenting along the lines you suggest, with changing the timing vs. startup of the BLE initialization. I was very rushed getting out the door this morning but I saw something (

    that I need to go back and find) suggesting that's the answer.

    Is there any relevant difference between the E0 and E1 variants of the nRF52832QFAA?

    I'm guessing you're just leaving, if not already out the door for today. So have a good evening and look for some more info from me in the morning. Tomorrow I'm gone pretty much all day, except for a brief window in the office around 0530-0630 EDT (0930-1030Z).

Children
  • I have been experimenting with different delays near the beginning of main(), before calling any of the BLE initialization. With no delay, or with a null for() loop of 10000 cycles, a reset happens sometime after calling sd_softdevice_enable() and before it returns. If I increase the for() loop to 20000 or higher (up to 10000000), it seems to reset before the loop completes.

    There's got to be some fundamental difference between nRF52832QFAAE0 and the E1 variant. Here's what we know about the various programming methods, what works and what doesn't:

    Method             E0           E1

    Drag&drop        Works     Works
    Powerup           Works     Fails
    Build&debug     Works     Works
    Build&run          Works     Fails

    The critical thing is being able to run from powerup. Same firmware, same schematic and PCB layout in the area of the processor, same hex image, E1 fails at the call to sd_softdevice_enable().

  • It sounds like this reset may occur even if you only have a busy wait loop in main(), which does point to a HW problem if true. We should try to confirm the reset source. To do that, please remove the part of your code that clears RESETREAS as suggested earlier, and instead read the RESETREAS value directly using your debugger. This avoids relying on the application to capture and store the register value correctly. You can use nrfjprog or nrfutil from the command line to read out the register after the reset has occurred. 

    # Read out POWER->RESETREAS 
    nrfjprog --memrd 0x40000400
    nrfutil device read --address 0x40000400 --direct

    Regarding the build code difference, please have a look at the PCN here: https://nrfconnectdocs.nordicsemi.com/pdf/PCN/pcn_106_v1.0.pdf. It's the same silicon. It's just that it's tested and assembled at different locations.

  • That's no help because the reset doesn't occur if run with the debugger. I already record RESETREAS into NV memory right at hte beginning of main(), and it's all zeroes in the failure case. ?!?!?

  • that is exactly why I am suggesting you attach the debugger to the running target after the reset has occurred. 

  • How do I go about that? Any time I run the debugger, it resets the code to the start of main(), and stepping or just running from there runs flawlessly. Apparently you know some new and very useful trick I need to learn.

Related