Hang with nRF5 SDK 17.1.0 Bootloader and nRF Connect SDK 2.1.0 application

I have an existing product that is using the nRF5 SDK 17.1.0 Bootloader.  I have updated the application to use nRF Connect SDK 2.1.0.  The new application also has an implementation of the buttonless DFU service, so that the nRF Connect app on iOS can be used to do the updates.

Most of this is working.  The application runs fine when built to run without a boot loader.  When build to run with the boot loader (CONFIG_FLASH_LOAD_OFFSET=0x27000) there is an issue that I'm having trouble debugging.  This is what is happening:

  1. I erase the device and program in the nRF5 SDK 17.1.0 Bootloader.  After reset the boot loader starts up normally.
  2. I use the nRF Connect app on iOS to transfer over the nRF Connect SDK 2.1.0 application.  I get a success message from the nRF Connect app.
  3. The application does not start BLE advertising and the device appears to be hung.
  4. If I attach the VScode debugger, it usually halts inside the boot loader startup sequence where it is checking the CRC.
  5. Looking at the RESETREAS register I see the expected SREQ, but I also see LOCKUP.
  6. If I continue code execution then the application starts up and works fine.

The hang happens every time I run the above sequence.  It happens with or without a debug probe attached.  None of the nrfjprog resets (--reset, --debugreset, --pinreset) have any effect - the device remains hung.  If I attach the debugger before step 2 above then the halt doesn't happen.  The application starts up fine without any hang.

Anyone have any ideas on what might be happening?  Any tips on how to debug this type of issue where simply having the debugger attached avoids the issue?

  • I change some other config options and have RTT logging enabled.  I added a print to the top of my main.  When I start the app from the debugger (no boot loader) I see a boot message from zephyr and the message from my main:

    *** Booting Zephyr OS build v3.1.99-ncs1 ***

    starting vulcan

    However, if I have RTT logging connected then the code works fine (no hang or reset loop).  Same as when debugger is attached.

    So I created a logging backend that just store the log text in RAM.  Ran from debugger and the log messages are showing up in RAM.

    Now, when I run the boot loader and then DFU my app and have hit the point where things look locked up or in a reset loop and then attach the debugger, I don't see any logging messages.  So seems the issue happens before zephyr prints the boot message.

  • I am trying to use a simpler app to see if it will start properly, so I am now using the sample hello_world.  It is showing the same issue.  However, this time when I attach in the debugger I end up in the same place each time - inside lfclk_spinwait loop:

    while (!(nrfx_clock_is_running(d, (void *)&type)
    ... k_cpu_atomic_idle(key);
    If I hit continue in the debugger then hello_world starts up.  So maybe some issue starting up the low frequency clock or getting the interrupt that it has started?
    This issue does not happen when an SDK 17 app is DFU'ed.  So maybe something different in the Zephyr startup sequence with respect to the low frequency clock startup?
  • It is starting to look like a problem with the 32KHz clock now (the OS scheduler needs this clock to run). I didn't think this could be the issue earlier considering how your application ran fine when programmed without the bootloader.

    Do you have the 32K crystal mounted on your board and have you tried to use these configurations from the peripheral_lbs.zip example I uploaded:

    # Config settings to allow the FW to run on boards without the
    # optional 32KHz crystal or DCDC indctors mounted
    CONFIG_BOARD_ENABLE_DCDC=n
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM=y

    ?

  • I added this to the hello_world config and now it starts up without the hang above:

    CONFIG_CLOCK_CONTROL_NRF_K32SRC_SYNTH=y
     
    So I put that config into my application and it gets past that point, but now hangs on:

    CC_PalWaitInterruptRND

    It looks like in both cases there is a wait for interrupt and the CPU is never woken up.  I wonder if maybe these things are already on in the boot loader.  Then when the boot loader starts the app, the app startup sequence assumes they are not on and waits for an interrupt that will never happen.  Then attaching the debugger wakes the MCU and the code continues.

  • I can't think of any obvious reasons for this. The peripherals used by the bootloader should be reset before the bootloader starts the application. Do you use the cc310 for boot validation or just do a CRC check (Boot validation modes)?

    So I put that config into my application and it gets past that point, but now hangs on:

    CC_PalWaitInterruptRND

    Is this in the application code?

Related