nRF52840 app not sleeping, SoftDevice seems involved

Hello-

I'm seeing the nRF52840 stay awake when I don't want it to. I'm calling sd_app_evt_wait() every turn through the main loop with SEVONPEND disabled, but it's waking up and falling through.

I'm capturing a histogram of the entire IRQ set, now including the 16 core exceptions, and I found a "smoking gun"- There's almost a 1:1 correspondence between my main loop turns and the core SVCALL_IRQn exception interrupt:

Reading the nRFSDK / SoftDevice headers, it looks like there's a syscall-style interface- when calling from the app into SoftDevice, it issues the SVC with the call id encoded in the instruction.

My hunch about what's happening here is that I've managed to get my app into a state where it's calling into SoftDevice (maybe to test something?) once per main loop turn. That issues an SVC, which sets the Event register. Then, when I call into sd_app_evt_wait() at the end of my loop, it clears the Event register and simply continues. The cycle repeats again, and my app never sleeps.

So, here are some questions I was hoping an expert could give me some advice + insight into please!

1. Is my hypothesis reasonable? Could this be happening? Does SVC set the Event register?

2. Is there anything in the nRFSDK that lets me audit SoftDevice supervisor calls in an automated way? (I think I can build this into my IRQ capture-and-forward shim)

3. Do you have any other ideas or suggestions?

Sorry to file a relatively open-ended question, but time is critical for us so I'd love any thoughts you all might have!

Best,

Charles

  • The difference between "mainLoopTurnCount" and "mainLoopSleepEnterCalls" is due to us realizing that because of the order of operations in our main loop, we can't sleep yet, roughly like this:

    void main_loop(void) {
      increment_counter(mainLoopTurns);
      
      poll_a();
      poll_b(); // can provide immediate work for poll_a() to do!
      
      if (idle) { // no immediate work to do
        increment_counter(mainLoopSleepEnterCalls);
        sd_app_evt_wait();
      }
    }

    We have RTT logging going, but of course there's no debugger attached in the field Slight smile

    I collect this histogram from all of our employee modules, and I always see this pattern when the battery collapse happens. Sometimes it heals itself and stops, other times the battery is drained to exhaustion.

    I agree that waking up at 2khz is really wild! I'm also extremely confused that I'm not seeing any of the histogram counts growing 1:1 with the main loop / sleep counts. I also agree that the SVCALL IRQn histogram growing almost 1:1 makes perfect sense- every call to sd_app_evt_wait() does an SVC, and then we have a few more for normal SD API calls that do the same.

    I have never been able to reproduce this at my bench; it only happens in the field, and it always starts when the device receives a connection from a BLE central (phone or dedicated base).

    I'm running out of data to gather, and hypotheses to test, which is starting to make me nervous. If it's not an IRQ, what else could be waking the system?

    Perhaps another thread to explore: I'm explicitly disabling SEVONPEND. Does anything in SoftDevice turn it back on? Could it be related to that? If the ISR associated with an IRQ never actually runs, my counting shim won't ever see it.

  • Hi Charles, 


    There is a mentioning about SEVONPEND in the softdevice specification at section 7.6: 

    I am not aware that softdevice explicitly set  SEVONPEND  => After check the source I saw that the softdevice may set/clear it , but it will return to the value has been set by the application after it's done. 

    I assume you cleared the bit on low interrupt level or in thread level ? 

    If you think sd_app_evt_wait() could be the culprit you can try using:

    __WFE();

    __SEV();

    __WFE();

    ADDED: Please make sure you use this in a spinlock loop as Vidar has suggested. 

    It would have the same effect (except that the function returna when there is any softdevice activity)

    Unlike what you think of sd_app_evt_wait(), the spin for sleeping inside this function is not executed in the SVC context but in main context. 

    Have you made sure the poll_a();poll_b(); doesn't trigger any interrupt in a corner case or something ? 

    Just to be extra sure, you don't use any UART logging ? 

  • We have 2x UARTs in our system.

    1 is for logging data from our external WiFi chip, but we disable the UART when WiFi is inactive (and verify this by dumping the GPIO pin states + directions to the same server that creates the IRQ histograms). I'd love to hear your thoughts, though- are you thinking that the UART interrupts might be keeping the system awake or something?

    Another UART is for factory-mode tests between nRF52840 and nRF9160 (LTE). In the field we use the TX/RX for SPI instead, and never initialize this UART peripheral.

    It's absolutely possible that something in my poll functions triggers an interrupt, especially since starting and stopping timers triggers an interrupt in nRFSDK. But- If anything in my main loop triggered an interrupt , I would see it on my IRQn histogram, right? Is there a scenario where I wouldn't?

    I clear SEVONPEND once, in thread level, from my main function, early after boot.

    What happens if I don't have WFE/SEV/WFE in a loop? Worst case, I run my main loop again, right?

  • Hi Charles, 


    The problem of not having a spinlock loop is that there is a chance that in your functions that is executed before the sd_app_evt_wait() or __WFE, you may trigger an interrupt. 
    This interrupt may wake the chip up immediately after sd_app_evt_wait/WFE is called. If there is a spinlock, the loop continue and the next WFE at the beginning of the next loop will put the device to sleep as the event register is cleared by the WFE in the last loop. 

    If there is no spinlock loop (a loop checking for a app_wakeup flag) the CPU will just jump to your poll functions which may trigger another interrupt. 

    You may run to this issue if you process a UART logging right before the call. There can be an interrupt after you finish transmitting UART and that can wake the chip up. But if you already monitor all interrupts you should be able to spot that. 

  • I see, thanks for the response. I'll keep collecting data and searching; I'll update this thread with any evidence or questions as they arise.

    Thanks also for continuing to engage with me on this, I appreciate it!

Related