URGENT: Random High Power Consumption Issue Causing Severe Battery Drain in Production

Hi,

We are facing a critical power consumption issue in our nRF52-based Zephyr application, where some devices randomly enter a high-power consumption state (~80+ µA) instead of the expected 35-40 µA in idle mode.

Impact:

  • Users are experiencing severe battery drain.
  • Expected battery life has reduced to less than half in affected devices.
  • The issue is random, making it difficult to track and resolve.

This issue occurs in multiple scenarios:

  • On runtime under certain conditions
  • After a reset
  • After a firmware update
  • Randomly, without a clear trigger

We have a global float variable: float total_current = 0.0f;

  • This variable is modified in two places:

    • A worker thread
    • A BLE command receive event handler
  • When we commented out the function call in the BLE handler and flashed the firmware, the power consumption increased unexpectedly.
  • We initially suspected FPU lazy stacking or context-switching issues but could not confirm.
  • However, aligning the float variable using __aligned(8) or declaring it locally inside the function completely eliminated the power issue and we thought this will fix the issue and added __aligned(8) for all global float variables. 
  • This led us to suspect a floating-point-related issue, but the behavior reappeared when we initialized an atomic variable globally, like this atomic_t is_bat_chargin = ATOMIC_INIT(0);
    and used it inside some timers and functions.

Need help with the following,

  1. Why does initializing a global float (or even an atomic variable) sometimes cause a high-power state?
  2. Could this be related to FPU context switching, stack alignment, or something else in Zephyr/nRF52?
  3. Why does aligning the float or declaring it locally fix the issue?
  4. Is there a better way to avoid such random power increases when using floating-point operations and atomic variables in Zephyr?
  5. Is there any extra configuration, debug setting, or power profiling tool we should enable to track down the exact cause of this issue?

Additional info

  • SoC: nRF52840
  • Zephyr Version: 2.6.1
  • CONFIG_FPU=y , CONFIG_FPU_SHARING=y
  • Optimization Level in Build: Optimize for size
  • Power Measurement Setup: Using a power profiler tool

This is a critical issue affecting production devices, and we urgently need guidance to identify the root cause and resolution.

Appreciate any insights into this issue! Is there anything extra we should check or enable to debug this further?

Thanks,

Vishnu

Parents
  • Hello,

    What makes you think this is related to the floating point number? Does it disappear completely if you remove all floating point numbers? 

    Are you using a DK or a custom board? Are you able to replicate the behavior on a DK?

    Can you share the plot from the power profiling tool? Either some screenshots, or export the data as the power profiling tool format, and upload it here?

    Is there any way for me to reproduce what you are seeing using a DK and no external components/sensors?

    Best regards,

    Edvin

  • Hi. Only the initial suspect was floating point related , but later it was getting replicated with atomic variable or uint32_t also.

    I am using a custom board. Will check if it can be replicated with DK and no external components/ sensor. 
    __aligned(8) attribute was fixing this but adding __aligned(8) to every float variables were also causing the issue again. 

  • Hello,

    Is the board power cycled after programming? And how exactly do you measure the current? Are there any other components on the board that is drawing power?

    Are the units on the Y axis correct? Does the base current never go below 40µA?

    BR,

    Edvin

  • This is for our custom board and yes, 40uA is base current with the components. 

  • Edvin said:
    Is the board power cycled after programming?

    I am kind of asking blindly here. Let me know if you are able to reproduce it on a DK. 

    But what you are saying is that you some times have a 40µA increased base current (comparing the middle run to the right hand side run). Can you confirm that the board is power cycled after being programmed? And that you are not running a debug session at this point? (Or monioring RTT logs?)

    Are any peripherals activated (UART, SPI, TWI/I2C, etc)? Perhaps you can experiment by disabling them one at the time, meaning it will not be enabled at all, and compare the current consumption. If you remove one of them, and the issue goes away, it would suggest that it is not being properly disabled, or that one of the GPIOs are not being reset properly.

    Best regards,

    Edvin

  • I am kind of asking blindly here. Let me know if you are able to reproduce it on a DK. 

    Didn't try to reproduce on DK. Will try that. 

    Can you confirm that the board is power cycled after being programmed?

    Yes. Tried power cycling after programming and same issue persists.


    And that you are not running a debug session at this point? (Or monioring RTT logs?)

    No. RTT and logging disabled

    As I said, only change between these two builds are this global variable(which is what I can replicate easily) initialisation and calls the function, which uses this variable, from one thread.

    The ways the power consumption goes back to normal are,

    • If I add function call on some more place, but the execution never reaches here. Only adding the function call fixed.
    • Making the global variable as local.
    • adding __aligned(8)) attribute to the global variable. But even here if I add __aligned(8)) attribute to all the float variables then the current consumption increases.  

    My worry is if  similar issue can happen anywhere on the application, then it is a problem and can affect the current consumption anytime. What could be the possible reasons for this?

    Thanks,

    Vishnu

  • Hello Vishnu,

    So the issue never happens if you don't have a global variable?

Reply Children
Related