nrf_pwr_mgmt_run sometimes not sleeping?

nRF52840, SDK v17.0.2, SD140 v7.0.1

My device is sometimes (only sometimes!) failing to enter a low-power sleep mode sometime around establishing a BLE connection; I am observing my main-loop calls to nrf_pwr_mgmt_run() returning immediately and never ever sleeping. My device enters a high-current-consumption state and the battery dies much more quickly than it should.

My main loop looks roughly like:

while (1) {
  poll_subsystem_a();
  poll_subsystem_b();
  poll_subsystem_c();
  
  if (idle) {
    counter_increment(sleep_calls);
    nrf_pwr_mgmt_run();
  }
  
  counter_increment(main_loop_turns);
}

In my previous attempts to root-cause this, I filed  nRF52840: How do I read the "Event Register"?  - the end result is that I'm now intercepting every single IRQ, including the SoftDevice ones, to see how frequently they're occurring. I log the number of times my main loop runs, the number of times I call nrf_pwr_mgmt_run(), and I now have a full histogram of IRQ counts. Here's a table of the data coming from my device:

Note that every counter is incremental; the differences between each row are what's important. For example, you can see that each minute, the main loop is being run around ~150,000 times, and I'm calling nrf_pwr_mgmt_run() about that many times as well. My napkin math shows that this roughly the number of main loop turns I'd see if the nRF52840 never actually slept.

There are more IRQs in my histogram that aren't shown, but they're all happening less frequently than the ones you can see. None of the IRQs are happening per-turn, though, generally 1-2 orders of magnitude less.

SoftDevice is running, and there is an active BLE connection. You can see the radio interrupts going, and the connection is stable and my peripheral device is exchanging data with the connected central.

I'm aware of the FPU errata, but I see that nrf_pwr_mgmt_run mitigates it. I've confirmed that my code is reaching the actual call to sd_app_evt_wait() inside nrf_pwr_mgmt_run().

It seems similar to this ticket:  nrf_pwr_mgmt_run() with SoftDevice not go to low power 

But that ticket doesn't seem to have a resolution.

I'm wondering if there's SoftDevice wants from me that I'm not giving it, a response to a request or something, and the absence is causing sd_app_evt_wait() to return immediately in the hopes that I'll help?

So, some questions I was hoping someone could help me with:

1. Are there any known errata in SD140 v7.0.1 that could cause this?

2. Are there any paths in the opaque SoftDevice binary function sd_app_evt_wait that return to the user without actually doing the WFE/SEV sleep?

3. Does anyone have any ideas what I can try next, or where I can look, to root-cause and fix this?

Thanks in advance for any advice you can offer,

Charles

  • Update-

    This seems to help, but I don't know why and I'm not sure if it's safe to use when SoftDevice is active. I really don't think this counts as a "fix" yet, but it does seem like evidence that sd_app_evt_wait() has a fallthrough path where it returns without running WFE/SEV? It sure would be nice if the source code for SoftDevice was available for reading... Slight smile

    In this specific experiment below, I have also set SEVONPEND to 1, though I'm not sure what the implications are or when / why I'd want that in production.

    void cpu_sleep(void) {
      CRITICAL_REGION_ENTER();
      __set_FPSCR(__get_FPSCR() & ~0x9Fu);
      __DSB();
      NVIC_ClearPendingIRQ(FPU_IRQn);
      __WFE();
      CRITICAL_REGION_EXIT();
    
      // Un-pend any disabled pending interrupts
      NVIC->ICPR[0] = ~NVIC->ICER[0];
      NVIC->ICPR[1] = ~NVIC->ICER[1];
    }
    

    Charles

  • Hi Charles,

    1. Are there any known errata in SD140 v7.0.1 that could cause this?

    No, I do not think so.

    2. Are there any paths in the opaque SoftDevice binary function sd_app_evt_wait that return to the user without actually doing the WFE/SEV sleep?

    No, sd_app_evt_wait() will always result in a sequence of  __WFE(); __SEV(); __WFE();. After wake-up, a mask is checked to mask out any SoftDevice IRQs. If any bits are set in the ISPR that are not set, sd_app_evt_wait() returns. If not it will (conceptually) call __WFE(); __SEV(); __WFE(); again.

    3. Does anyone have any ideas what I can try next, or where I can look, to root-cause and fix this?

    The most obvious suspects are that either you do not clear some event(s) that cause interrupts, or simply the idle variable in your pseudo code is not correctly updates (so that it is not true when it should be or it is not volatile).

    charles_fi said:
    In this specific experiment below, I have also set SEVONPEND to 1, though I'm not sure what the implications are or when / why I'd want that in production.

    That should be OK I believe. However, SEVONPEND should already be set to 1, as you use the nrf_pwr_mgmt module (done when you call nrf_pwr_mgmt_init()).

  • Thanks for the quick response, Einar-

    I think my original table displays that no _new_ interrupts are occurring, or I'd expect to see the "main loop turns" counter match one of the "xyz_IRQn" counters, but they're off by a few orders of magnitude. My original table also displays an almost exact 1:1 matching between "main loop turns" and "sleep attempts"; I do think that my "idle" flag works, or I would expect to see many loop turns and few sleep attempts.

    nrf_pwr_mgmt_run() does some tidying up and then calls sd_app_evt_wait(). The documentation for sd_app_evt_wait() says the following in a note:

    "The application must ensure that the pended flag is cleared using sd_nvic_ClearPendingIRQ in order to sleep using this function. This is only necessary for disabled interrupts, as the interrupt handler will clear the pending flag automatically for enabled interrupts."

    One difference between my "this seems to work" update post and my original "this sometimes doesn't work" post is that I'm doing 

    // Un-pend any disabled pending interrupts
    NVIC->ICPR[0] = ~NVIC->ICER[0];
    NVIC->ICPR[1] = ~NVIC->ICER[1];

    I don't see any examples in the nRF SDK of calling sd_nvic_ClearPendingIRQ (or manipulating the ICPR registers in general) to implement this note. Am I doing it correctly?

    In an earlier attempt, I un-pended all disabled interrupts while in a critical section- that made SoftDevice stop working. I'm assuming some number of SD interrupts fire between when I exit my critsec and when I hit ICPR.

    I guess a deeper question I have is "why does nrf_pwr_mgmt_init set SEVONPEND?" If I'm not using nrf_pwr_mgmt_init anymore, and just doing the FPU errata and WFE myself, do I need SEVONPEND? My application is running at the lowest priority, which seems relevant.

    Can I simply leave SEVONPEND disabled and still just WFE? That seems simpler.

  • Hi,

    charles_fi said:
    I don't see any examples in the nRF SDK of calling sd_nvic_ClearPendingIRQ (or manipulating the ICPR registers in general) to implement this note. Am I doing it correctly?

    That is a good question. This is only relevant for disabled interrupts (as stated in the API documentation for sd_app_evt_wait()). But that leads to something interesting. With disabling interrupt and setting SEVOPENED you would wake up on an event but the corresponding ISR is not run (as it is not enabled in the interrupt mask in the NVIC). If you have an interrupt source which is disabled in the interrupt mask it could explain what you are seeing (device keeps waking up, but no ISR is run).

    charles_fi said:
    I guess a deeper question I have is "why does nrf_pwr_mgmt_init set SEVONPEND?"

    I have not been able to find explicitly why this is doen in nrf_pwr_mgmt, but it makes sense in some cases. (For instance, in our FreeRTOS port we do the same for RTC the RTC to act as periodic wake-up source without needing to spend CPU cycles in a ISR doing nothing.)

    charles_fi said:

    If I'm not using nrf_pwr_mgmt_init anymore, and just doing the FPU errata and WFE myself, do I need SEVONPEND? My application is running at the lowest priority, which seems relevant.

    Can I simply leave SEVONPEND disabled and still just WFE? That seems simpler.

    I don't have a full overview of your design, but in principle, yes. It should be OK to leave SEVOPENED disabled in most cases.

  • Einar- Thanks so much for the deep investigation. I've replaced my sleep call with the following:

    void cpu_init(void) {
      SCB->SCR &= ~SCB_SCR_SEVONPEND_Msk;
    }
    
    void cpu_sleep(void) {
      CRITICAL_REGION_ENTER();
      __set_FPSCR(__get_FPSCR() & ~0x9Fu);
      __DSB();
      NVIC_ClearPendingIRQ(FPU_IRQn);
      CRITICAL_REGION_EXIT();
    
      sd_app_evt_wait();
    }
    

    And it seems to be working well now. I think SEVONPEND was the culprit; I've shipped a few nRF528xx IoT products now and this was the first time I've ever tried nrf_pwr_mgmt_run().

    Going back to handling the FPU IRQ errata myself and just calling sd_app_evt_wait() directly seems to be working well (again).

    Thanks a bunch for your help!

    Best,

    Charles

Related