nrf_pwr_mgmt_run sometimes not sleeping?

nRF52840, SDK v17.0.2, SD140 v7.0.1

My device is sometimes (only sometimes!) failing to enter a low-power sleep mode sometime around establishing a BLE connection; I am observing my main-loop calls to nrf_pwr_mgmt_run() returning immediately and never ever sleeping. My device enters a high-current-consumption state and the battery dies much more quickly than it should.

My main loop looks roughly like:

while (1) {
  poll_subsystem_a();
  poll_subsystem_b();
  poll_subsystem_c();
  
  if (idle) {
    counter_increment(sleep_calls);
    nrf_pwr_mgmt_run();
  }
  
  counter_increment(main_loop_turns);
}

In my previous attempts to root-cause this, I filed  nRF52840: How do I read the "Event Register"?  - the end result is that I'm now intercepting every single IRQ, including the SoftDevice ones, to see how frequently they're occurring. I log the number of times my main loop runs, the number of times I call nrf_pwr_mgmt_run(), and I now have a full histogram of IRQ counts. Here's a table of the data coming from my device:

Note that every counter is incremental; the differences between each row are what's important. For example, you can see that each minute, the main loop is being run around ~150,000 times, and I'm calling nrf_pwr_mgmt_run() about that many times as well. My napkin math shows that this roughly the number of main loop turns I'd see if the nRF52840 never actually slept.

There are more IRQs in my histogram that aren't shown, but they're all happening less frequently than the ones you can see. None of the IRQs are happening per-turn, though, generally 1-2 orders of magnitude less.

SoftDevice is running, and there is an active BLE connection. You can see the radio interrupts going, and the connection is stable and my peripheral device is exchanging data with the connected central.

I'm aware of the FPU errata, but I see that nrf_pwr_mgmt_run mitigates it. I've confirmed that my code is reaching the actual call to sd_app_evt_wait() inside nrf_pwr_mgmt_run().

It seems similar to this ticket:  nrf_pwr_mgmt_run() with SoftDevice not go to low power 

But that ticket doesn't seem to have a resolution.

I'm wondering if there's SoftDevice wants from me that I'm not giving it, a response to a request or something, and the absence is causing sd_app_evt_wait() to return immediately in the hopes that I'll help?

So, some questions I was hoping someone could help me with:

1. Are there any known errata in SD140 v7.0.1 that could cause this?

2. Are there any paths in the opaque SoftDevice binary function sd_app_evt_wait that return to the user without actually doing the WFE/SEV sleep?

3. Does anyone have any ideas what I can try next, or where I can look, to root-cause and fix this?

Thanks in advance for any advice you can offer,

Charles

Related