Hang with nRF5 SDK 17.1.0 Bootloader and nRF Connect SDK 2.1.0 application

I have an existing product that is using the nRF5 SDK 17.1.0 Bootloader.  I have updated the application to use nRF Connect SDK 2.1.0.  The new application also has an implementation of the buttonless DFU service, so that the nRF Connect app on iOS can be used to do the updates.

Most of this is working.  The application runs fine when built to run without a boot loader.  When build to run with the boot loader (CONFIG_FLASH_LOAD_OFFSET=0x27000) there is an issue that I'm having trouble debugging.  This is what is happening:

  1. I erase the device and program in the nRF5 SDK 17.1.0 Bootloader.  After reset the boot loader starts up normally.
  2. I use the nRF Connect app on iOS to transfer over the nRF Connect SDK 2.1.0 application.  I get a success message from the nRF Connect app.
  3. The application does not start BLE advertising and the device appears to be hung.
  4. If I attach the VScode debugger, it usually halts inside the boot loader startup sequence where it is checking the CRC.
  5. Looking at the RESETREAS register I see the expected SREQ, but I also see LOCKUP.
  6. If I continue code execution then the application starts up and works fine.

The hang happens every time I run the above sequence.  It happens with or without a debug probe attached.  None of the nrfjprog resets (--reset, --debugreset, --pinreset) have any effect - the device remains hung.  If I attach the debugger before step 2 above then the halt doesn't happen.  The application starts up fine without any hang.

Anyone have any ideas on what might be happening?  Any tips on how to debug this type of issue where simply having the debugger attached avoids the issue?

Parents
  • Logging is not enabled.  This is custom board without any uart access.

    Just did a power cycle and readregs very quickly.  This time I see that the PC is in the application region in a function in libmpsl.

    .text          0x000000000002baf8       0x60 /opt/nordic/ncs/v2.1.0/nrfxlib/mpsl/lib/cortex-m4/hard-float/libmpsl.a(libmpsl_debug_hardfp__obfuscated.elf)

                    0x000000000002baf8                sym_PAD7XREQQORPXRJMXMW2EYVS4S43S42A5D43SBA

     

  • Thanks for testing. This means that the app is booting.  It would also be good if you try to disable the ACL protection in your nRF5 Bootloader ( ACL — Access control lists - I assume you use the 52840?)

  • Yes, this is the nRF52840 chip.

    It does look like some kind of reset loop.  And I'm not seeing the lockup bit set now.  Just the soft reset bit.

    Problem is still that if I attach the debugger then things work normally, so unsure how to catch the problem.

    Possibly something in the ncs sequence is doing a soft reset (except when running from the debugger).

  • Yes, it's probably reaching the error handler which will reset the device (soft reset). I suggest you add the following kconfig settings to your prj.conf file:

    CONFIG_LOG=y
    CONFIG_LOG_BACKEND_UART=n
    CONFIG_USE_SEGGER_RTT=y
    CONFIG_RESET_ON_FATAL_ERROR=n
    CONFIG_THREAD_NAME=y

    Then when the app hangs, open Segger RTTViewer and see if you can get a crashlog from the error handler.

    The reason I recommend disabling the ACL protection earlier is that it is one of the few configurations that will carry over from the bootloader to the application. But only as a test to help narrow down the problem. I don't recommend leaving ACL off in production.

Reply
  • Yes, it's probably reaching the error handler which will reset the device (soft reset). I suggest you add the following kconfig settings to your prj.conf file:

    CONFIG_LOG=y
    CONFIG_LOG_BACKEND_UART=n
    CONFIG_USE_SEGGER_RTT=y
    CONFIG_RESET_ON_FATAL_ERROR=n
    CONFIG_THREAD_NAME=y

    Then when the app hangs, open Segger RTTViewer and see if you can get a crashlog from the error handler.

    The reason I recommend disabling the ACL protection earlier is that it is one of the few configurations that will carry over from the bootloader to the application. But only as a test to help narrow down the problem. I don't recommend leaving ACL off in production.

Children
Related