Hang with nRF5 SDK 17.1.0 Bootloader and nRF Connect SDK 2.1.0 application

I have an existing product that is using the nRF5 SDK 17.1.0 Bootloader.  I have updated the application to use nRF Connect SDK 2.1.0.  The new application also has an implementation of the buttonless DFU service, so that the nRF Connect app on iOS can be used to do the updates.

Most of this is working.  The application runs fine when built to run without a boot loader.  When build to run with the boot loader (CONFIG_FLASH_LOAD_OFFSET=0x27000) there is an issue that I'm having trouble debugging.  This is what is happening:

  1. I erase the device and program in the nRF5 SDK 17.1.0 Bootloader.  After reset the boot loader starts up normally.
  2. I use the nRF Connect app on iOS to transfer over the nRF Connect SDK 2.1.0 application.  I get a success message from the nRF Connect app.
  3. The application does not start BLE advertising and the device appears to be hung.
  4. If I attach the VScode debugger, it usually halts inside the boot loader startup sequence where it is checking the CRC.
  5. Looking at the RESETREAS register I see the expected SREQ, but I also see LOCKUP.
  6. If I continue code execution then the application starts up and works fine.

The hang happens every time I run the above sequence.  It happens with or without a debug probe attached.  None of the nrfjprog resets (--reset, --debugreset, --pinreset) have any effect - the device remains hung.  If I attach the debugger before step 2 above then the halt doesn't happen.  The application starts up fine without any hang.

Anyone have any ideas on what might be happening?  Any tips on how to debug this type of issue where simply having the debugger attached avoids the issue?

Parents
  • I am trying to use a simpler app to see if it will start properly, so I am now using the sample hello_world.  It is showing the same issue.  However, this time when I attach in the debugger I end up in the same place each time - inside lfclk_spinwait loop:

    while (!(nrfx_clock_is_running(d, (void *)&type)
    ... k_cpu_atomic_idle(key);
    If I hit continue in the debugger then hello_world starts up.  So maybe some issue starting up the low frequency clock or getting the interrupt that it has started?
    This issue does not happen when an SDK 17 app is DFU'ed.  So maybe something different in the Zephyr startup sequence with respect to the low frequency clock startup?
  • Program boot loader:

    nrfjprog -f NRF52 --recover

    nrfjprog -f NRF52 --program

    sdk_nordic/components/softdevice/s140/hex/s140_nrf52_7.2.0_softdevice.hex –-chiperase

    nrfjprog -f NRF52 --program bin/bootloader-neo/bootloader-neo.hex

    nrfjprog -f NRF52 --reset

    Then use nrf connect on iOS to DFU the zip.

  • Thanks, I will try and see if I can reproduce it with your bootloader (hoping it will run on a DK).

    nrfutil pkg display lbs-1.0.0.zip gives this:

    DFU Package: <lbs-1.0.0.zip>:
    |
    |- Image count: 1
    |
    |- Image #0:
       |- Type: application
       |- Image file: zephyr.bin
       |- Init packet file: zephyr.dat
          |
          |- op_code: INIT
          |- signature_type: ECDSA_P256_SHA256
          |- signature (little-endian): b'f10f7aca6dc0353607a1fce8bbc46d53ee7d7092c07b86eeb87f0f7aa37c72a9c6514b3b3c6c58c33d968d8e0e1435530c64c83e7496fb51a5efc5c380ebc4a0'
          |
          |- fw_version: 0x00000003 (3)
          |- hw_version 0x00000034 (52)
          |- sd_req: 0x100
          |- type: APPLICATION
          |- sd_size: 0
          |- bl_size: 0
          |- app_size: 358152
          |
          |- hash_type: SHA256
          |- hash (little-endian): b'83d92fe619627e257d91ad857c124c5b671df6f6d450890795176f82a244daaf'
          |
          |- boot_validation_type: ['VALIDATE_GENERATED_CRC']
          |- boot_validation_signature (little-endian): [b'']
          |
          |- is_debug: False

    So, the bootloader is not using the cc310 for boot validation, at least.

  • I ran the above binaries on an nRF52840-DK and am seeing the hang here.

  • I built peripheral_lbs with these additions to the prj.conf and reproduced the hang:

    Can you confirm that it hangs at the same location inside the cryptocell runtime library, and not in the fault handler because the settings partition overlap with the bootloader?

    I can't get this bootloader to exit DFU mode. I guess it may be because of your NRF_BL_DFU_ENTER_METHOD_BUTTON_PIN setting in sdk_config, or were you able to run the same hex on your DK?

    Turns out I programmed the wrong settings page. Programming your bootloader with my ncs_app.hex works on the DK:

    nrfutil settings generate --family NRF52840 --application ncs_app.hex --application-version 1 --bootloader-version 1 --bl-settings-version 2 settings.hex
    
    nrfjprog -e
    nrfjprog --program bootloader-neo.hex --verify
    nrfjprog --program s140_nrf52_7.2.0_softdevice.hex --verify
    nrfjprog --program ncs_app.hex --verify
    nrfjprog --program settings.hex --verify
    nrfjprog -r
    

Reply
  • I built peripheral_lbs with these additions to the prj.conf and reproduced the hang:

    Can you confirm that it hangs at the same location inside the cryptocell runtime library, and not in the fault handler because the settings partition overlap with the bootloader?

    I can't get this bootloader to exit DFU mode. I guess it may be because of your NRF_BL_DFU_ENTER_METHOD_BUTTON_PIN setting in sdk_config, or were you able to run the same hex on your DK?

    Turns out I programmed the wrong settings page. Programming your bootloader with my ncs_app.hex works on the DK:

    nrfutil settings generate --family NRF52840 --application ncs_app.hex --application-version 1 --bootloader-version 1 --bl-settings-version 2 settings.hex
    
    nrfjprog -e
    nrfjprog --program bootloader-neo.hex --verify
    nrfjprog --program s140_nrf52_7.2.0_softdevice.hex --verify
    nrfjprog --program ncs_app.hex --verify
    nrfjprog --program settings.hex --verify
    nrfjprog -r
    

Children
  • By "works on the DK" do you mean that peripheral_lbs starts up without hanging and you can connect to it?

    I'm not programming in any settings.hex (or ncs_app.hex).  Just programming in the boot loader, then the soft device, then the boot loader hex.  Then using nrf connect app to run the DFU process.  At the end of that is says "Your DFU update was completed successfully".  At that point I do not see peripheral_lbs advertisements.

    When I attach the vscode debugger, the call stack is the boot loader crc32_compute.  When I continue from there I still don't see advertisements.  Break in the debugger again and it is in the application in CC_PalWaitInterruptRND.  Then tried break and continue a few more times and it is usually in one of those, but sometimes in other parts of the startup code.  Sounds like a reset loop...

    So seeing three different things:

    1) My app.  Doesn't startup properly.  Connect debugger and continue and it then starts up fine.

    2) The hello_world app.  With the clock crystal in the startup it hangs waiting for the crystal startup.  With the synthetic option, it starts fine.

    3) The peripheral_lbs app.  Connect debugger and continue and it appears to be stuck in a reset loop.

  • Yes, the DK starts advertiding when using my version. But it enters a reset loop if I DFU your app. Writing the settings page is a quicker way to test this. And it’s the only way for me if I’m going to run my app with your bootloader.

    Did you move the settings partition in your lbs project. If not, it will go in a boot loop like I mentioned in my first reply here.

  • Sorry, I missed that.  I see, the change is in the peripheral_lbs/nrf52840dk_nrf52840.overlay.  Using the nRF52840-DK board I can now DFU with that change and the app starts up.  And if I remove the config changes to not use the DC/DC and to use the RC the app also starts up.

    I then tried the same boot loader and peripheral_lbs dfu on my custom board without the RC changes.  And it does hang.  Until I attach debugger and hit continue.

    So the behavior on the nRF52840-DK and our custom board is different.

    I then checked the schematics and sure enough the inductors are not in this design for the DC/DC converter.  Adding CONFIG_BOARD_ENABLE_DCDC=n to my app fixed the issue.  The app starts up properly now.

    Thank you for all the help!  Obviously I should have seen this when setting up the custom dts.  What I still don't understand is how that manifested in all these other odd things happening, working from the debugger, etc...

    BTW: I don't have a storage partition defined in my custom board dts that my app uses.

    CONFIG_BT_SETTINGS=y
    CONFIG_SETTINGS_RUNTIME=y
    CONFIG_SETTINGS=y
    CONFIG_SETTINGS_NONE=y

  • Thank you for the summary. I'm glad to hear that you found the problem!

    Thank you for all the help!  Obviously I should have seen this when setting up the custom dts.  What I still don't understand is how that manifested in all these other odd things happening, working from the debugger, etc...

    It must be because of the noise generated by the DCDC. Although, I'm not able to explain exactly how this would interfere with the clock startup. This does however explain why you didn't experience the problem when the chip was in Debug Interface mode as this mode forces the chip to switch to the LDO regulator.

    I don't have a storage partition defined in my custom board dts that my app uses.

    The build should fail at compile time if there is no storage partition defined. This is also what I observe here if I remove the storage_partition node before building the LBS sample.

    I would suggest that you check the generated zephyr.dts file in <build directory>/zephyr/ before adding the overlay to see if the storage partition is indeed missing.

Related