nRF52832 BLE DFU reports success but device sometimes fails to boot after update

Hello Nordic team,

I’m currently developing on nRF52832 and encountering an intermittent issue with BLE DFU.

Problem Description

When performing BLE DFU, the DFU process completes successfully on the mobile app side (shows DFU completed), but sometimes the device fails to boot after the update.

In these failure cases:

  • The device does not start the application

  • It does not advertise or respond over BLE

  • The only way to recover is to:

    • Connect via SWD

    • Perform a full ERASE

    • Re-flash the firmware again
      After that, the device works normally.

This issue does not happen every time, but occurs randomly.


Environment

  • SoC: nRF52832

  • DFU type: BLE DFU

  • SDK: nRF5_SDK_17.0.2_d674dde

Parents
  • Hi,

    How do you verify that the device fails to boot? The first thing I would like to establish is if the issue is related to the bootloader or the application. Do you use persistent storage in your application, for instance via FDS? If so, could it be that the application somehow failes on some devices due to a problematic state there (which is resolved when you perform an erase all)?

    Are you able to test with the debug bootloader with RTT logging? If so, and you are able to reproduce the issue, what does the RTT log say?

Reply
  • Hi,

    How do you verify that the device fails to boot? The first thing I would like to establish is if the issue is related to the bootloader or the application. Do you use persistent storage in your application, for instance via FDS? If so, could it be that the application somehow failes on some devices due to a problematic state there (which is resolved when you perform an erase all)?

    Are you able to test with the debug bootloader with RTT logging? If so, and you are able to reproduce the issue, what does the RTT log say?

Children
  • Hi,

    The application does use FDS, however the stored data itself should not prevent the system from booting.
    The data size is limited, and the application is designed to handle empty or default values during startup.

    I would also like to ask whether the described behavior could be related to flash usage or RAM usage, especially during or after BLE DFU.

    Additionally, I’m wondering if this issue could be environment-dependent.

    In our internal testing environment, we are not able to reproduce the issue reliably.
    However, the problem occurs much more frequently on the customer side.

    One notable difference is that the customer environment contains a very large number of BLE devices.
    When scanning with a mobile phone, it is possible to detect around 100 BLE devices simultaneously.

    Could a high BLE traffic / crowded RF environment potentially:

    Affect DFU timing or memory usage

    Increase the likelihood of race conditions

    Or expose edge cases in flash or RAM usage during DFU or early application startup?

    I will also try to test with the debug bootloader and RTT logging to see if it provides any clues.
    However, as mentioned above, I’m currently unable to reproduce the issue in our own environment, which makes capturing RTT logs challenging.

    Any insights on whether BLE density, flash usage, or RAM constraints could contribute to this behavior would be very helpful.

    Thanks again for your support.

  • Hi,

    PiKa PiKa said:
    The application does use FDS, however the stored data itself should not prevent the system from booting.
    The data size is limited, and the application is designed to handle empty or default values during startup.

    I agree. However, how do you know that the device did not boot? Which state is it in? (A simple test here if you get hold of a bad device from the field could be to attach a debugger when the device is in the bad state, and check where the PC points. Is it in the application or the bootloader?

    We need more data from testing and debugging to understand what is going on, so I would like to start with logs if possible, and also I am interested in knowing what you mean by the device not booting. Can you clarify that exactly? Which state is it in? Is it for instance in DFU mode (which could happen if there is a single bank DFU procedure that fails, in which case the bootloader will enter DFU mode in order to recover). Or is it not in DFU mode?

    PiKa PiKa said:
    I would also like to ask whether the described behavior could be related to flash usage or RAM usage, especially during or after BLE DFU.

    We need more debugging before we can say what could cause it (as described above). But yes, that is a possibility. What does your memory map look like, and what is the sdk_config.h for both the bootloader and application, as well as the linker configuration for both? Are there any overlaps? And regarding FDS (ref my suggestion that it could possibly be related, is NRF_DFU_APP_DATA_AREA_SIZE in the bootloader's sdk_config.h set appropriately for accommodate all FDS pages (configured in the application's sdk_config.h by FDS_VIRTUAL_PAGES and FDS_VIRTUAL_PAGE_SIZE)? If you have other data in flash not using FDS that must also be in a region the bootloader knows it need to stay away from.

    PiKa PiKa said:

    Could a high BLE traffic / crowded RF environment potentially:

    Affect DFU timing or memory usage

    It can affect timing in the sense that you may have packet loss, and the size of packets also depend on the capabilities of the DFU master. That should not cause fundamental problems though, as during transport the new image is only stored in flash, and when completely transferred, it is subsequently activated, which is a separate process. Memory usage should not be affected.

    PiKa PiKa said:
    Increase the likelihood of race conditions

    I cannot imagine what that should be, and particularly not as you see from the phone (DFU master) that the DFU operation was successful, so the new image was valdid.

    PiKa PiKa said:
    Or expose edge cases in flash or RAM usage during DFU or early application startup?

    You can never rule something out, but that seems highly unlikely. What could affect application startup is corrupt memory though, for instance due to mismatch between memory map or configuration in application and bootloader (ref earlier comments in this post).

    Best regards,

    Einar

Related