URGENT: Random High Power Consumption Issue Causing Severe Battery Drain in Production

Hi,

We are facing a critical power consumption issue in our nRF52-based Zephyr application, where some devices randomly enter a high-power consumption state (~80+ µA) instead of the expected 35-40 µA in idle mode.

Impact:

  • Users are experiencing severe battery drain.
  • Expected battery life has reduced to less than half in affected devices.
  • The issue is random, making it difficult to track and resolve.

This issue occurs in multiple scenarios:

  • On runtime under certain conditions
  • After a reset
  • After a firmware update
  • Randomly, without a clear trigger

We have a global float variable: float total_current = 0.0f;

  • This variable is modified in two places:

    • A worker thread
    • A BLE command receive event handler
  • When we commented out the function call in the BLE handler and flashed the firmware, the power consumption increased unexpectedly.
  • We initially suspected FPU lazy stacking or context-switching issues but could not confirm.
  • However, aligning the float variable using __aligned(8) or declaring it locally inside the function completely eliminated the power issue and we thought this will fix the issue and added __aligned(8) for all global float variables. 
  • This led us to suspect a floating-point-related issue, but the behavior reappeared when we initialized an atomic variable globally, like this atomic_t is_bat_chargin = ATOMIC_INIT(0);
    and used it inside some timers and functions.

Need help with the following,

  1. Why does initializing a global float (or even an atomic variable) sometimes cause a high-power state?
  2. Could this be related to FPU context switching, stack alignment, or something else in Zephyr/nRF52?
  3. Why does aligning the float or declaring it locally fix the issue?
  4. Is there a better way to avoid such random power increases when using floating-point operations and atomic variables in Zephyr?
  5. Is there any extra configuration, debug setting, or power profiling tool we should enable to track down the exact cause of this issue?

Additional info

  • SoC: nRF52840
  • Zephyr Version: 2.6.1
  • CONFIG_FPU=y , CONFIG_FPU_SHARING=y
  • Optimization Level in Build: Optimize for size
  • Power Measurement Setup: Using a power profiler tool

This is a critical issue affecting production devices, and we urgently need guidance to identify the root cause and resolution.

Appreciate any insights into this issue! Is there anything extra we should check or enable to debug this further?

Thanks,

Vishnu

Parents
  • Hello,

    What makes you think this is related to the floating point number? Does it disappear completely if you remove all floating point numbers? 

    Are you using a DK or a custom board? Are you able to replicate the behavior on a DK?

    Can you share the plot from the power profiling tool? Either some screenshots, or export the data as the power profiling tool format, and upload it here?

    Is there any way for me to reproduce what you are seeing using a DK and no external components/sensors?

    Best regards,

    Edvin

  • Hi. Only the initial suspect was floating point related , but later it was getting replicated with atomic variable or uint32_t also.

    I am using a custom board. Will check if it can be replicated with DK and no external components/ sensor. 
    __aligned(8) attribute was fixing this but adding __aligned(8) to every float variables were also causing the issue again. 

  • I tried the NCS v2.6.1\zephyr\samples\bluetooth\beacon with the added config lines that you provided, flashed it to an nRF52840DK, and ran this reset-script:

    :loop
    nrfjprog --reset -f NRF52
    TIMEOUT 2
    goto loop

    But so far, the current consumption looks normal:

    But if I understand correctly, once it reaches the high current state, it will remain that way until I power cycle the kit, so I can leave it running over night. 

    Let me know if you are able to reproduce it on a DK.

    But I would like to know a bit more about your HFXTAL issues. Perhaps it is related. Why did you decide to use the internal LFCLK? Is there an error with the XTAL? Or are the capacitors wrong?

    Best regards,

    Edvin

  • But if I understand correctly, once it reaches the high current state, it will remain that way until I power cycle the kit, so I can leave it running over night. 

    That is right. Only a power on reset or program with external LF configuration or doing a

    NRF_POWER->SYSTEMOFF = 1; and waking up from sleep only was fixing. We also found on one board, setting CONFIG_SYSTEM_CLOCK_WAIT_FOR_STABILITY=n was fixing the issue. But this is not working on all boards.

    But I would like to know a bit more about your HFXTAL issues. Perhaps it is related. Why did you decide to use the internal LFCLK? Is there an error with the XTAL? Or are the capacitors wrong?
     For the devices out there on field were going into not advertising state and we decided to go with internal RC. Now the issue counts are reduced drastically. Looks like problem with LF XTAL. 
  • vishnu3391_uh said:
    For the devices out there on field were going into not advertising state and we decided to go with internal RC. Now the issue counts are reduced drastically. Looks like problem with LF XTAL.

    Is it the same devices that would stop advertising that are showing symptoms of high current consumption? Is it all devices? Or just some of them? 

    vishnu3391_uh said:
    CONFIG_SYSTEM_CLOCK_WAIT_FOR_STABILITY=n was fixing the issue. But this is not working on all boards.

    How many does it work on, and how many does this not work for? (approximate numbers, and a percentage of the total amount).

    What comes to mind is this errata:
    https://docs.nordicsemi.com/bundle/errata_nRF52840_Rev3/page/ERR/nRF52840/Rev3/latest/anomaly_840_36.html#anomaly_840_36

    But it is very old, and the workaround for this is already implemented in NCS (even in the nRF5 SDK).

    But for good measure, when you are in the state of the high currency, could you try to read out the status registers in the very beginning of the application? I suspect that they are already used when you hit main(), but you can check if there are any differences between the good runs and the bad ones.

    Still no luck on reproducing on nRF52DK?

    Best regards,

    Edvin

  • Is it the same devices that would stop advertising that are showing symptoms of high current consumption? Is it all devices? Or just some of them? 

    We are able to reproduce this on all our boards here when configured as internal RC. 


    How many does it work on, and how many does this not work for? (approximate numbers, and a percentage of the total amount).

    We tested only on two boards. It is working on one and not working on another. 

    But it is very old, and the workaround for this is already implemented in NCS (even in the nRF5 SDK).

    Thanks  . Can you point out which registers to read? 

    Still no luck on reproducing on nRF52DK?

    Not yet. We were checking only on our boards.  Will update here once this is done with the DK. 

    Regards,

    Vishnu

  • vishnu3391_uh said:
    Can you point out which registers to read?

    I was thinking of this one: https://docs.nordicsemi.com/bundle/ps_nrf52840/page/clock.html#ariaid-title29

    I would expect it to be running in both cases (because that would be the typical scenario for most applications), but you can have a look at the LFCLKSTAT as well:

    https://docs.nordicsemi.com/bundle/ps_nrf52840/page/clock.html#ariaid-title27

    So the address for these are LFCLKSTAT:

    0x40000418

    HFCLKSTAT:

    0x4000040C

    Reading a register using e.g.: "nrfjprog --memrd 0x4000040C" will halt the device. You may actually halt it while the HFCLK is running, but most of the time it should not be running (it will be turned on only right before transmitting a packet). So if you restart it, and read the HFCLKSTAT several times (reset between), and it is always on on the high consumption state, then that may be a clue. 

    Best regards,

    Edvin

Reply Children
No Data
Related