URGENT: Random High Power Consumption Issue Causing Severe Battery Drain in Production

Hi,

We are facing a critical power consumption issue in our nRF52-based Zephyr application, where some devices randomly enter a high-power consumption state (~80+ µA) instead of the expected 35-40 µA in idle mode.

Impact:

  • Users are experiencing severe battery drain.
  • Expected battery life has reduced to less than half in affected devices.
  • The issue is random, making it difficult to track and resolve.

This issue occurs in multiple scenarios:

  • On runtime under certain conditions
  • After a reset
  • After a firmware update
  • Randomly, without a clear trigger

We have a global float variable: float total_current = 0.0f;

  • This variable is modified in two places:

    • A worker thread
    • A BLE command receive event handler
  • When we commented out the function call in the BLE handler and flashed the firmware, the power consumption increased unexpectedly.
  • We initially suspected FPU lazy stacking or context-switching issues but could not confirm.
  • However, aligning the float variable using __aligned(8) or declaring it locally inside the function completely eliminated the power issue and we thought this will fix the issue and added __aligned(8) for all global float variables. 
  • This led us to suspect a floating-point-related issue, but the behavior reappeared when we initialized an atomic variable globally, like this atomic_t is_bat_chargin = ATOMIC_INIT(0);
    and used it inside some timers and functions.

Need help with the following,

  1. Why does initializing a global float (or even an atomic variable) sometimes cause a high-power state?
  2. Could this be related to FPU context switching, stack alignment, or something else in Zephyr/nRF52?
  3. Why does aligning the float or declaring it locally fix the issue?
  4. Is there a better way to avoid such random power increases when using floating-point operations and atomic variables in Zephyr?
  5. Is there any extra configuration, debug setting, or power profiling tool we should enable to track down the exact cause of this issue?

Additional info

  • SoC: nRF52840
  • Zephyr Version: 2.6.1
  • CONFIG_FPU=y , CONFIG_FPU_SHARING=y
  • Optimization Level in Build: Optimize for size
  • Power Measurement Setup: Using a power profiler tool

This is a critical issue affecting production devices, and we urgently need guidance to identify the root cause and resolution.

Appreciate any insights into this issue! Is there anything extra we should check or enable to debug this further?

Thanks,

Vishnu

Parents
  • Hello,

    What makes you think this is related to the floating point number? Does it disappear completely if you remove all floating point numbers? 

    Are you using a DK or a custom board? Are you able to replicate the behavior on a DK?

    Can you share the plot from the power profiling tool? Either some screenshots, or export the data as the power profiling tool format, and upload it here?

    Is there any way for me to reproduce what you are seeing using a DK and no external components/sensors?

    Best regards,

    Edvin

  • Hi. Only the initial suspect was floating point related , but later it was getting replicated with atomic variable or uint32_t also.

    I am using a custom board. Will check if it can be replicated with DK and no external components/ sensor. 
    __aligned(8) attribute was fixing this but adding __aligned(8) to every float variables were also causing the issue again. 

  • That is what I observe at least in this context. 

  • 1: Ok. I would very much like to see this replicated on a DK, if possible. So please see if you are able to do so.

    2: Do you have any logs or anything? Does it say anything particular when the current consumption goes high, compared to when it is low? You can use RTT logging, and connect the RTT Viewer after you see the current consumption go high. This way it will not start the debug session until you attach the RTT viewer.

    BR,
    Edvin

  • @Edvin. We’ve been investigating a high power consumption issue on our custom board and found something interesting during bench testing ( Haven't tried on DK). 

    When we repeatedly reset the device using nrfjprog --reset, the system enters a high power state (>500 uA). This consistently happens across all devices. Our application uses the internal RC oscillator, and we observed that this issue occurs even with the Beacon example with internal RC configuration.

    The high power state persists until we either:

    • Perform a power-on reset, or

    • Trigger a system OFF using NRF_POWER->SYSTEMOFF = 1; followed by a GPIO wake-up.

    However, the issue disappears completely when we flash firmware built with external RC configuration. What’s more intriguing is:

    • Once we program the board with external RC firmware, the current drops as expected.

    • If we then re-flash firmware using internal RC, the high current issue returns.

    This suggests that something persists across resets or programming events which only clears on power cycling or entering system OFF state.

    1. What could be the root cause of this behavior with internal RC?

    2. Is there a known fix or workaround that allows us to continue using the internal RC reliably?

  • Interesting.

    I assume you are talking about the RC Oscillator vs an external LFXTAL, right?

    Does your application depend on the reset reason register? Are you able to debug to check if there are any peripherals that starts when you trigger a reset, that doesn't trigger on a power on-reset? Any peripherals? Timer (HFCLK)? RTC(LFCLK)?

    I don't know what it would mean, but does the issue also happen if you call: nrfjprog --pinreset?

    Does your device have an external LFXTAL? 

    I see that you have some older tickets here on devzone regarding HW review (not my area). Are any of those for this design? Did any of them point out anything that could be related?

    Best regards,

    Edvin

  • I assume you are talking about the RC Oscillator vs an external LFXTAL, right?

    Yes. Internal RC vs external LFXTAL.


    Does your application depend on the reset reason register?

    We have memfault added in which I think the reset reason is being used.

    re you able to debug to check if there are any peripherals that starts when you trigger a reset, that doesn't trigger on a power on-reset? Any peripherals? Timer (HFCLK)? RTC(LFCLK)?

    We have added a disable peripheral function in main on start which is as below

    static void disable_peripherals(void)
    {
        // UARTE0
        nrf_uart_disable(NRF_UARTE0);
    	// UARTE1
    	nrf_uart_disable(NRF_UARTE1);	
        // TWIM0, TWIM1
    	disable_twis();
        // SPIM0, SPIM1, SPIM2, SPIM3
        nrf_spim_disable(NRF_SPIM0);
        nrf_spim_disable(NRF_SPIM1);
        nrf_spim_disable(NRF_SPIM2);
        nrf_spim_disable(NRF_SPIM3);
    
        // SAADC (ADC)
        nrf_saadc_disable(NRF_SAADC);
        
        // PWM instances
        nrf_pwm_disable(NRF_PWM0);
        nrf_pwm_disable(NRF_PWM1);
        nrf_pwm_disable(NRF_PWM2);
        nrf_pwm_disable(NRF_PWM3);
    
        // QSPI
        nrf_qspi_task_trigger(NRF_QSPI, NRF_QSPI_TASK_DEACTIVATE);
    
        // PDM, if used
        NRF_PDM->ENABLE = 0;
    
        // I2S
        NRF_I2S->ENABLE = 0;
    
        // RNG (disable if not in use)
        NRF_RNG->TASKS_STOP = 1;
    
        // COMP, LPCOMP (comparators)
        NRF_COMP->ENABLE = 0;
        NRF_LPCOMP->ENABLE = 0;
    
        // WDT: Leave it untouched if you're intentionally using it
    
        // USB
        NRF_USBD->ENABLE = 0;
    
        // NFCT (NFC Tag)
        NRF_NFCT->TASKS_DISABLE = 1;
    }

    Does your device have an external LFXTAL? 

    Yes. But this has some issue and causing device not advertising issue. 

    I see that you have some older tickets here on devzone regarding HW review (not my area). Are any of those for this design? Did any of them point out anything that could be related?

    Yes. The tickets are for this same design. Could the be related? Because the issue is happening when we switch to internal RC only. Can you try to replicate this on DK or on your side. Just keep firing the nrfjprog --reset command for 20+ times continuously. 

    Will try with the pinreset command. 

Reply
  • I assume you are talking about the RC Oscillator vs an external LFXTAL, right?

    Yes. Internal RC vs external LFXTAL.


    Does your application depend on the reset reason register?

    We have memfault added in which I think the reset reason is being used.

    re you able to debug to check if there are any peripherals that starts when you trigger a reset, that doesn't trigger on a power on-reset? Any peripherals? Timer (HFCLK)? RTC(LFCLK)?

    We have added a disable peripheral function in main on start which is as below

    static void disable_peripherals(void)
    {
        // UARTE0
        nrf_uart_disable(NRF_UARTE0);
    	// UARTE1
    	nrf_uart_disable(NRF_UARTE1);	
        // TWIM0, TWIM1
    	disable_twis();
        // SPIM0, SPIM1, SPIM2, SPIM3
        nrf_spim_disable(NRF_SPIM0);
        nrf_spim_disable(NRF_SPIM1);
        nrf_spim_disable(NRF_SPIM2);
        nrf_spim_disable(NRF_SPIM3);
    
        // SAADC (ADC)
        nrf_saadc_disable(NRF_SAADC);
        
        // PWM instances
        nrf_pwm_disable(NRF_PWM0);
        nrf_pwm_disable(NRF_PWM1);
        nrf_pwm_disable(NRF_PWM2);
        nrf_pwm_disable(NRF_PWM3);
    
        // QSPI
        nrf_qspi_task_trigger(NRF_QSPI, NRF_QSPI_TASK_DEACTIVATE);
    
        // PDM, if used
        NRF_PDM->ENABLE = 0;
    
        // I2S
        NRF_I2S->ENABLE = 0;
    
        // RNG (disable if not in use)
        NRF_RNG->TASKS_STOP = 1;
    
        // COMP, LPCOMP (comparators)
        NRF_COMP->ENABLE = 0;
        NRF_LPCOMP->ENABLE = 0;
    
        // WDT: Leave it untouched if you're intentionally using it
    
        // USB
        NRF_USBD->ENABLE = 0;
    
        // NFCT (NFC Tag)
        NRF_NFCT->TASKS_DISABLE = 1;
    }

    Does your device have an external LFXTAL? 

    Yes. But this has some issue and causing device not advertising issue. 

    I see that you have some older tickets here on devzone regarding HW review (not my area). Are any of those for this design? Did any of them point out anything that could be related?

    Yes. The tickets are for this same design. Could the be related? Because the issue is happening when we switch to internal RC only. Can you try to replicate this on DK or on your side. Just keep firing the nrfjprog --reset command for 20+ times continuously. 

    Will try with the pinreset command. 

Children
  •  Can crystal be a reason for this unpredictable high current behaviour on resets? If yes, what is the reason for the behaviour? 

  • vishnu3391_uh said:
    Can crystal be a reason for this unpredictable high current behaviour on resets? If yes, what is the reason for the behaviour?

    I wouldn't think so. 

    vishnu3391_uh said:
    But this has some issue and causing device not advertising issue

    Ok, so I assume that is why you are using the internal RC which triggered this issue in the first place. 

    vishnu3391_uh said:
    Yes. The tickets are for this same design. Could the be related?

    Not sure. But I would assume that if the LFXTAL is present, but you configure your application to use the internal RC Oscillator, then the LFXTAL will remain unpowered. You don't use the XTAL pins P0.00 and P0.01 for anything else, right?

    Have you tried desoldering the XTAL to make sure that this is not what's spinning up (but I would be very surprised if this is actually the case). 

    vishnu3391_uh said:
    Can you try to replicate this on DK or on your side. Just keep firing the nrfjprog --reset command for 20+ times continuously. 

    I couldn't. Did you try? I assume it would require an application similar to the one that you are using. Can you please try to build the application for a DK, and see if you can replicate it there?

    Best regards,

    Edvin

  •  Sorry I think you got it wrong. The thing is it is working in external LF and the high current issue happens only when we select internal RC only. This was replicating with BLE baecon sample as well with LF source as internal RC.

    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC_CALIBRATION=y
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_PERIOD=5000
    CONFIG_CLOCK_CONTROL_NRF_CALIBRATION_LF_ALWAYS_ON=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM=y
    CONFIG_SYS_CLOCK_TICKS_PER_SEC=32768
    CONFIG_BOARD_ENABLE_DCDC=y


    This is the config we have .

  • I tried the NCS v2.6.1\zephyr\samples\bluetooth\beacon with the added config lines that you provided, flashed it to an nRF52840DK, and ran this reset-script:

    :loop
    nrfjprog --reset -f NRF52
    TIMEOUT 2
    goto loop

    But so far, the current consumption looks normal:

    But if I understand correctly, once it reaches the high current state, it will remain that way until I power cycle the kit, so I can leave it running over night. 

    Let me know if you are able to reproduce it on a DK.

    But I would like to know a bit more about your HFXTAL issues. Perhaps it is related. Why did you decide to use the internal LFCLK? Is there an error with the XTAL? Or are the capacitors wrong?

    Best regards,

    Edvin

  • But if I understand correctly, once it reaches the high current state, it will remain that way until I power cycle the kit, so I can leave it running over night. 

    That is right. Only a power on reset or program with external LF configuration or doing a

    NRF_POWER->SYSTEMOFF = 1; and waking up from sleep only was fixing. We also found on one board, setting CONFIG_SYSTEM_CLOCK_WAIT_FOR_STABILITY=n was fixing the issue. But this is not working on all boards.

    But I would like to know a bit more about your HFXTAL issues. Perhaps it is related. Why did you decide to use the internal LFCLK? Is there an error with the XTAL? Or are the capacitors wrong?
     For the devices out there on field were going into not advertising state and we decided to go with internal RC. Now the issue counts are reduced drastically. Looks like problem with LF XTAL. 
Related