nRF52840 app not sleeping, SoftDevice seems involved

Hello-

I'm seeing the nRF52840 stay awake when I don't want it to. I'm calling sd_app_evt_wait() every turn through the main loop with SEVONPEND disabled, but it's waking up and falling through.

I'm capturing a histogram of the entire IRQ set, now including the 16 core exceptions, and I found a "smoking gun"- There's almost a 1:1 correspondence between my main loop turns and the core SVCALL_IRQn exception interrupt:

Reading the nRFSDK / SoftDevice headers, it looks like there's a syscall-style interface- when calling from the app into SoftDevice, it issues the SVC with the call id encoded in the instruction.

My hunch about what's happening here is that I've managed to get my app into a state where it's calling into SoftDevice (maybe to test something?) once per main loop turn. That issues an SVC, which sets the Event register. Then, when I call into sd_app_evt_wait() at the end of my loop, it clears the Event register and simply continues. The cycle repeats again, and my app never sleeps.

So, here are some questions I was hoping an expert could give me some advice + insight into please!

1. Is my hypothesis reasonable? Could this be happening? Does SVC set the Event register?

2. Is there anything in the nRFSDK that lets me audit SoftDevice supervisor calls in an automated way? (I think I can build this into my IRQ capture-and-forward shim)

3. Do you have any other ideas or suggestions?

Sorry to file a relatively open-ended question, but time is critical for us so I'd love any thoughts you all might have!

Best,

Charles

Parents
  • Ah, sorry, I'm going to give readers some whiplash here-

    I think my hypothesis was incorrect! If I'm seeing an almost exactly 1:1 relationship between SVCALL_IRQn calls and main loop turns and sleep calls, that means that things are working ok because sd_app_evt_wait() is run via a SVCALL.

    So I want to explore a different track, perhaps it's a GPIO issue. I see in section 6.9 of the nRF52840 Product Specification:

    I see that "Wake-up from high or low level triggers on all pins" is a separate line item from "Trigger interrupt on state changes on any pin".

    Is it possible to configure the GPIO and/or GPIOTE subsystems so that the level of a pin will cause a wake-up, and/or keep the system from sleeping? We are using PPI to drive a wifi/ble coexistence pin and it's tied to PA/LNA, does the PPI system have level-based wakes anywhere?

    Note that my IRQ histogram shows that I am not getting GPIOTE_IRQn 100,000+ times per minute, so it's not a strobing pin:

    I'll keep studying the data sheet, but if anyone has any information or quick answers about pin levels being able to keep the nRF52840 from entering WFE sleep, that would help me greatly Slight smile

    Thanks,

    Charles

  • Hi Charles, 

    The main purpose of PPI is to not get the CPU involved in the automation of event and task. So PPI wouldn't wake the CPU up. But the event/interrupt can.
    I think what important here is that if you have an interrupt that trigger and even tyou clear the event after you handle the interrupt. If not the CPU won't be able to enter sleep mode if the event has not been cleared. 

    Have you tried to disable the GPIO interrupt to check if it has something to do with waking up the CPU ? 


    The softdevice only trigger an event when there is BLE activity that require the application to handle. So It's not normal that the application get waking up on every sd_app_evt_wait(). Note that sd_app_evt_wait() will not exist when the softdevice is active to handle BLE event. It only exist when there is application interrupt or when there is BLE event.

    I would also suggest to use the Power Profiler Kit to monitor the activity of the CPU so that we have better view of when the CPU waking up or if the CPU enters sleep mode at all. 

  • Thanks for the response. Because this is is very hard to reproduce, I'm looking for ways to gather more clues from units in the field.

    Another question- if I have a BLE central connect and hold a very fast connection interval, 7.5ms, could that cause behavior like this? sd_app_evt_wait() seems like it doesn't return to my code from radio/timer0/rtc0 interrupts, but I'm curious if it's possible.

    I'm seeing ~100k+ main loop runs per minute, is the only option left that I actually have an interrupt firing that many times (kilohertz range)? I'd expect to see it on my IRQ histogram (see initial post) but I don't.

  • Hi Charles, 
    Is there any chance that it's locked up somewhere and couldn't enter sd_app_evt_wait() ? 

    Short interval of 7.5ms shouldn't be a problem. The BLE activity only takes from 2ms for an connection event. And if there is no BLE activity that requires the app involvement sd_app_evt_wait() will not return because of the BLE activity. 


     Could you give me some more info about the softdevice ? Which version are you using ? 

  • I'm incrementing a counter every time I turn through my main loop, and I'm incrementing a separate counter every time I call sd_app_evt_wait(), and I'm incrementing a counter per-IRQn (all of them, via an assembly VTOR shim that intercepts all interrupts), so I can see all of the softdevice interrupts as well.

    Note that the number of SVCALL interrupts is almost identical to the number of calls to sd_app_evt_wait(), which is also almost identical to the number of main loop turns.

    This data tells me that I am regularly calling my main loop, and that I am also regularly calling sd_app_evt_wait(), and I'm not blasting SoftDevice with other calls.

    Please look to my first and second posts in this thread to observe the data that I've captured; perhaps something will jump out at you? 

    If I had some other high-frequency interrupt occurring, I'd expect to see it in my table above.

    I'm using nRFSDK 17.0.2, SD140 7.0.1, nRF52840.

    Also; I'm explicitly _clearing_ SEVONPEND, is it possible that disabled-but-pending interrupts are somehow contributing to wake?

  • (btw the IRQn shim thing isn't causing the wakeups; I wrote the IRQn shim thing in an attempt to gather data to diagnose and fix the issue :) )

  • Hi Charles, 

    Thanks for the explanation about the IRQ histogram I wasn't sure what it was earlier.

    So you have a counter on each of the IRQ handler and in the main loop.

    Could you explain why there is a different of the counter between MainloopTurns and mainloopSleepEntercalls ? 

    charles_fi said:
    Note that the number of SVCALL interrupts is almost identical to the number of calls to sd_app_evt_wait(), which is also almost identical to the number of main loop turns.

    I think the above behavior is true for any application, that the number of SVCall is almost identical to sd_app_evt_wait() and to the number of the main loop turns. It's how it should be, isn't it ? 

    The question here is only on why it waking up so often. As I can see from the table it's about 2230 times per second. 

    It doesn't ring any bell for us. It's too fast to be any BLE activity. And without more detail or profiling it's not possible to tell what could be the culprit here. 

    I assume you only managed to record the histogram once ? If you can reproduce the issue again you either using the power profiler or can think of toggling a GPIO in the main loop so that it's easier to visualize the wake up behavior. 

    Do you have any other activity for example UART logging , RTT, ADC ? anything else that can involve ? Can you share the source code ? You can convert the case to a private case to share the code if needed. 

Reply
  • Hi Charles, 

    Thanks for the explanation about the IRQ histogram I wasn't sure what it was earlier.

    So you have a counter on each of the IRQ handler and in the main loop.

    Could you explain why there is a different of the counter between MainloopTurns and mainloopSleepEntercalls ? 

    charles_fi said:
    Note that the number of SVCALL interrupts is almost identical to the number of calls to sd_app_evt_wait(), which is also almost identical to the number of main loop turns.

    I think the above behavior is true for any application, that the number of SVCall is almost identical to sd_app_evt_wait() and to the number of the main loop turns. It's how it should be, isn't it ? 

    The question here is only on why it waking up so often. As I can see from the table it's about 2230 times per second. 

    It doesn't ring any bell for us. It's too fast to be any BLE activity. And without more detail or profiling it's not possible to tell what could be the culprit here. 

    I assume you only managed to record the histogram once ? If you can reproduce the issue again you either using the power profiler or can think of toggling a GPIO in the main loop so that it's easier to visualize the wake up behavior. 

    Do you have any other activity for example UART logging , RTT, ADC ? anything else that can involve ? Can you share the source code ? You can convert the case to a private case to share the code if needed. 

Children
  • The difference between "mainLoopTurnCount" and "mainLoopSleepEnterCalls" is due to us realizing that because of the order of operations in our main loop, we can't sleep yet, roughly like this:

    void main_loop(void) {
      increment_counter(mainLoopTurns);
      
      poll_a();
      poll_b(); // can provide immediate work for poll_a() to do!
      
      if (idle) { // no immediate work to do
        increment_counter(mainLoopSleepEnterCalls);
        sd_app_evt_wait();
      }
    }

    We have RTT logging going, but of course there's no debugger attached in the field Slight smile

    I collect this histogram from all of our employee modules, and I always see this pattern when the battery collapse happens. Sometimes it heals itself and stops, other times the battery is drained to exhaustion.

    I agree that waking up at 2khz is really wild! I'm also extremely confused that I'm not seeing any of the histogram counts growing 1:1 with the main loop / sleep counts. I also agree that the SVCALL IRQn histogram growing almost 1:1 makes perfect sense- every call to sd_app_evt_wait() does an SVC, and then we have a few more for normal SD API calls that do the same.

    I have never been able to reproduce this at my bench; it only happens in the field, and it always starts when the device receives a connection from a BLE central (phone or dedicated base).

    I'm running out of data to gather, and hypotheses to test, which is starting to make me nervous. If it's not an IRQ, what else could be waking the system?

    Perhaps another thread to explore: I'm explicitly disabling SEVONPEND. Does anything in SoftDevice turn it back on? Could it be related to that? If the ISR associated with an IRQ never actually runs, my counting shim won't ever see it.

  • Hi Charles, 


    There is a mentioning about SEVONPEND in the softdevice specification at section 7.6: 

    I am not aware that softdevice explicitly set  SEVONPEND  => After check the source I saw that the softdevice may set/clear it , but it will return to the value has been set by the application after it's done. 

    I assume you cleared the bit on low interrupt level or in thread level ? 

    If you think sd_app_evt_wait() could be the culprit you can try using:

    __WFE();

    __SEV();

    __WFE();

    ADDED: Please make sure you use this in a spinlock loop as Vidar has suggested. 

    It would have the same effect (except that the function returna when there is any softdevice activity)

    Unlike what you think of sd_app_evt_wait(), the spin for sleeping inside this function is not executed in the SVC context but in main context. 

    Have you made sure the poll_a();poll_b(); doesn't trigger any interrupt in a corner case or something ? 

    Just to be extra sure, you don't use any UART logging ? 

  • We have 2x UARTs in our system.

    1 is for logging data from our external WiFi chip, but we disable the UART when WiFi is inactive (and verify this by dumping the GPIO pin states + directions to the same server that creates the IRQ histograms). I'd love to hear your thoughts, though- are you thinking that the UART interrupts might be keeping the system awake or something?

    Another UART is for factory-mode tests between nRF52840 and nRF9160 (LTE). In the field we use the TX/RX for SPI instead, and never initialize this UART peripheral.

    It's absolutely possible that something in my poll functions triggers an interrupt, especially since starting and stopping timers triggers an interrupt in nRFSDK. But- If anything in my main loop triggered an interrupt , I would see it on my IRQn histogram, right? Is there a scenario where I wouldn't?

    I clear SEVONPEND once, in thread level, from my main function, early after boot.

    What happens if I don't have WFE/SEV/WFE in a loop? Worst case, I run my main loop again, right?

Related