nRF52840 app not sleeping, SoftDevice seems involved

Hello-

I'm seeing the nRF52840 stay awake when I don't want it to. I'm calling sd_app_evt_wait() every turn through the main loop with SEVONPEND disabled, but it's waking up and falling through.

I'm capturing a histogram of the entire IRQ set, now including the 16 core exceptions, and I found a "smoking gun"- There's almost a 1:1 correspondence between my main loop turns and the core SVCALL_IRQn exception interrupt:

Reading the nRFSDK / SoftDevice headers, it looks like there's a syscall-style interface- when calling from the app into SoftDevice, it issues the SVC with the call id encoded in the instruction.

My hunch about what's happening here is that I've managed to get my app into a state where it's calling into SoftDevice (maybe to test something?) once per main loop turn. That issues an SVC, which sets the Event register. Then, when I call into sd_app_evt_wait() at the end of my loop, it clears the Event register and simply continues. The cycle repeats again, and my app never sleeps.

So, here are some questions I was hoping an expert could give me some advice + insight into please!

1. Is my hypothesis reasonable? Could this be happening? Does SVC set the Event register?

2. Is there anything in the nRFSDK that lets me audit SoftDevice supervisor calls in an automated way? (I think I can build this into my IRQ capture-and-forward shim)

3. Do you have any other ideas or suggestions?

Sorry to file a relatively open-ended question, but time is critical for us so I'd love any thoughts you all might have!

Best,

Charles

Parents
  • Ah, sorry, I'm going to give readers some whiplash here-

    I think my hypothesis was incorrect! If I'm seeing an almost exactly 1:1 relationship between SVCALL_IRQn calls and main loop turns and sleep calls, that means that things are working ok because sd_app_evt_wait() is run via a SVCALL.

    So I want to explore a different track, perhaps it's a GPIO issue. I see in section 6.9 of the nRF52840 Product Specification:

    I see that "Wake-up from high or low level triggers on all pins" is a separate line item from "Trigger interrupt on state changes on any pin".

    Is it possible to configure the GPIO and/or GPIOTE subsystems so that the level of a pin will cause a wake-up, and/or keep the system from sleeping? We are using PPI to drive a wifi/ble coexistence pin and it's tied to PA/LNA, does the PPI system have level-based wakes anywhere?

    Note that my IRQ histogram shows that I am not getting GPIOTE_IRQn 100,000+ times per minute, so it's not a strobing pin:

    I'll keep studying the data sheet, but if anyone has any information or quick answers about pin levels being able to keep the nRF52840 from entering WFE sleep, that would help me greatly Slight smile

    Thanks,

    Charles

  • Hi Charles, 

    The main purpose of PPI is to not get the CPU involved in the automation of event and task. So PPI wouldn't wake the CPU up. But the event/interrupt can.
    I think what important here is that if you have an interrupt that trigger and even tyou clear the event after you handle the interrupt. If not the CPU won't be able to enter sleep mode if the event has not been cleared. 

    Have you tried to disable the GPIO interrupt to check if it has something to do with waking up the CPU ? 


    The softdevice only trigger an event when there is BLE activity that require the application to handle. So It's not normal that the application get waking up on every sd_app_evt_wait(). Note that sd_app_evt_wait() will not exist when the softdevice is active to handle BLE event. It only exist when there is application interrupt or when there is BLE event.

    I would also suggest to use the Power Profiler Kit to monitor the activity of the CPU so that we have better view of when the CPU waking up or if the CPU enters sleep mode at all. 

  • Thanks for the response, I appreciate it. Unfortunately this only happens sometimes, and I can't get it to happen at my bench, so I can't use local setups like the Power Profiler Kit.

    I do have real-time battery statistics I can watch that we collect for this purpose, though, and this issue is happening very rarely.

    Since writing this post, I reverse-engineered the SoftDevice binary, and disassembled the functions on the path from the SVCall ISR into the implementation of sd_app_evt_wait(), I see that it loops back into the wfe call after every softdevice interrupt and only returns when a non-softdevice interrupt happens. So, I don't think it's SoftDevice's fault here!

    I am curious what you mean by 

    "I think what important here is that if you have an interrupt that trigger an event you clear the event after you handle the interrupt. If not the CPU won't be able to enter sleep mode if the event has not been cleared."

    Do you meant the PPI event here? I am using the PPI task/event system to tie the PA + LNA pins to a WiFi coex pin on a separate chip on our board. Could I ask you for more information about "clear the event" please?

    Thank you again,

    Charles

  • Sorry- to be more complete here, I'm doing this:

    void ble_pa_lna_enable(void) {
      if (s_util.pa_lna_enabled) {
        return;
      }
    
      APP_ERROR_CHECK_BOOL(s_util.radio_gpios.notification_gpio != NULL);
      APP_ERROR_CHECK_BOOL(s_util.radio_gpios.lna_gpio != NULL);
      APP_ERROR_CHECK_BOOL(s_util.radio_gpios.pa_gpio != NULL);
    
      s_util.pa_lna_enabled = true;
      nrf_ppi_channel_t ch_set, ch_clr;
      APP_ERROR_CHECK(nrfx_ppi_channel_alloc(&ch_set));
      APP_ERROR_CHECK(nrfx_ppi_channel_alloc(&ch_clr));
      APP_ERROR_CHECK(sd_ble_opt_set(BLE_COMMON_OPT_PA_LNA,
                                     &(ble_opt_t) {
                                       .common_opt = {
                                         .pa_lna = {
                                           .pa_cfg = {
                                             .enable = 1,
                                             .active_high = 1,
                                             .gpio_pin = s_util.radio_gpios.pa_gpio->pin,
                                           },
                                           .lna_cfg = {
                                             .enable = 1,
                                             .active_high = 1,
                                             .gpio_pin = s_util.radio_gpios.lna_gpio->pin,
                                           },
                                           .ppi_ch_id_set = ch_set,
                                           .ppi_ch_id_clr = ch_clr,
                                           .gpiote_ch_id = PA_LNA_GPIOTE_CHANNEL_ID,
                                         },
                                       },
                                     }));
    
      nrf_gpiote_task_configure(RADIO_NOTIFICATION_GPIOTE_CHANNEL_ID,
                                s_util.radio_gpios.notification_gpio->pin,
                                (nrf_gpiote_polarity_t)GPIOTE_CONFIG_POLARITY_None,
                                NRF_GPIOTE_INITIAL_VALUE_LOW);
    
      APP_ERROR_CHECK(nrfx_ppi_channel_fork_assign(
        ch_set,
        nrf_gpiote_task_addr_get(
          nrf_gpiote_set_task_get(RADIO_NOTIFICATION_GPIOTE_CHANNEL_ID))));
    
      APP_ERROR_CHECK(nrfx_ppi_channel_fork_assign(
        ch_clr,
        nrf_gpiote_task_addr_get(
          nrf_gpiote_clr_task_get(RADIO_NOTIFICATION_GPIOTE_CHANNEL_ID))));
    }
    
    void ble_radio_notification_enable(void) {
      if (s_util.enabled) {
        return;
      }
      s_util.enabled = true;
      APP_ERROR_CHECK_BOOL(s_util.pa_lna_enabled);
      nrf_gpiote_task_enable(RADIO_NOTIFICATION_GPIOTE_CHANNEL_ID);
    }
    
    void ble_radio_notification_disable(void) {
      if (!s_util.enabled) {
        return;
      }
      s_util.enabled = false;
      APP_ERROR_CHECK_BOOL(s_util.pa_lna_enabled);
      nrf_gpiote_task_disable(RADIO_NOTIFICATION_GPIOTE_CHANNEL_ID);
    }
    

    This code is doing two things-

    1. It's loading the desired PA/LNA pin behavior into SoftDevice using channel 0 (PA_LNA_GPIOTE_CHANNEL_ID) 

    2. It's forking the set and clear events into a new task using channel 1 (RADIO_NOTIFICATION_GPIOTE_CHANNEL_ID)

    This way the PA set event sets my WiFi coex pin, and the LNA clear event clears my WiFi coex pin.

    I don't have the WiFi coex pin set up to trigger an interrupt, because my nRF52840 firmware doesn't need to react to it; I just need the PA + LNA tasks to also set and clear this WiFi coex pin.

    Note that my "disable" call just calls nrf_gpiote_task_disable here- I'm not ever explicitly clearing the PA/LNA events. Is it possible that a poorly-timed call to my ble_radio_notification_disable function is leaving the event in a state where it's preventing the nRF52840 from sleeping in sd_app_evt_wait()?

    In general, is this approach reasonable and sound?

    Thanks again for reading!

  • Hi Charles, 

    What I meant by "clear the event" is to set an event register to 0 when you handle the event. For example like this: 


    But I was wrong when thinking that not clearing it may keep the CPU to not entering sleep mode. The interrupt handler should be able to clear the interrupt source. 
    I don't see any problem in the code you provided. I couldn't think of anything else but an application interrupt that keep waking the CPU up. 

    If you can reproduce the issue and can capture a power profile it should be possible to see if the CPU could enter sleep mode at all. 
    And if possible to reproduce, I would suggest to test on one of our example (with additional code to control PA LNA and coex) to see if you can see the same problem. 

  • Thanks for the response. Because this is is very hard to reproduce, I'm looking for ways to gather more clues from units in the field.

    Another question- if I have a BLE central connect and hold a very fast connection interval, 7.5ms, could that cause behavior like this? sd_app_evt_wait() seems like it doesn't return to my code from radio/timer0/rtc0 interrupts, but I'm curious if it's possible.

    I'm seeing ~100k+ main loop runs per minute, is the only option left that I actually have an interrupt firing that many times (kilohertz range)? I'd expect to see it on my IRQ histogram (see initial post) but I don't.

Reply
  • Thanks for the response. Because this is is very hard to reproduce, I'm looking for ways to gather more clues from units in the field.

    Another question- if I have a BLE central connect and hold a very fast connection interval, 7.5ms, could that cause behavior like this? sd_app_evt_wait() seems like it doesn't return to my code from radio/timer0/rtc0 interrupts, but I'm curious if it's possible.

    I'm seeing ~100k+ main loop runs per minute, is the only option left that I actually have an interrupt firing that many times (kilohertz range)? I'd expect to see it on my IRQ histogram (see initial post) but I don't.

Children
  • Hi Charles, 
    Is there any chance that it's locked up somewhere and couldn't enter sd_app_evt_wait() ? 

    Short interval of 7.5ms shouldn't be a problem. The BLE activity only takes from 2ms for an connection event. And if there is no BLE activity that requires the app involvement sd_app_evt_wait() will not return because of the BLE activity. 


     Could you give me some more info about the softdevice ? Which version are you using ? 

  • I'm incrementing a counter every time I turn through my main loop, and I'm incrementing a separate counter every time I call sd_app_evt_wait(), and I'm incrementing a counter per-IRQn (all of them, via an assembly VTOR shim that intercepts all interrupts), so I can see all of the softdevice interrupts as well.

    Note that the number of SVCALL interrupts is almost identical to the number of calls to sd_app_evt_wait(), which is also almost identical to the number of main loop turns.

    This data tells me that I am regularly calling my main loop, and that I am also regularly calling sd_app_evt_wait(), and I'm not blasting SoftDevice with other calls.

    Please look to my first and second posts in this thread to observe the data that I've captured; perhaps something will jump out at you? 

    If I had some other high-frequency interrupt occurring, I'd expect to see it in my table above.

    I'm using nRFSDK 17.0.2, SD140 7.0.1, nRF52840.

    Also; I'm explicitly _clearing_ SEVONPEND, is it possible that disabled-but-pending interrupts are somehow contributing to wake?

  • (btw the IRQn shim thing isn't causing the wakeups; I wrote the IRQn shim thing in an attempt to gather data to diagnose and fix the issue :) )

  • Hi Charles, 

    Thanks for the explanation about the IRQ histogram I wasn't sure what it was earlier.

    So you have a counter on each of the IRQ handler and in the main loop.

    Could you explain why there is a different of the counter between MainloopTurns and mainloopSleepEntercalls ? 

    charles_fi said:
    Note that the number of SVCALL interrupts is almost identical to the number of calls to sd_app_evt_wait(), which is also almost identical to the number of main loop turns.

    I think the above behavior is true for any application, that the number of SVCall is almost identical to sd_app_evt_wait() and to the number of the main loop turns. It's how it should be, isn't it ? 

    The question here is only on why it waking up so often. As I can see from the table it's about 2230 times per second. 

    It doesn't ring any bell for us. It's too fast to be any BLE activity. And without more detail or profiling it's not possible to tell what could be the culprit here. 

    I assume you only managed to record the histogram once ? If you can reproduce the issue again you either using the power profiler or can think of toggling a GPIO in the main loop so that it's easier to visualize the wake up behavior. 

    Do you have any other activity for example UART logging , RTT, ADC ? anything else that can involve ? Can you share the source code ? You can convert the case to a private case to share the code if needed. 

  • The difference between "mainLoopTurnCount" and "mainLoopSleepEnterCalls" is due to us realizing that because of the order of operations in our main loop, we can't sleep yet, roughly like this:

    void main_loop(void) {
      increment_counter(mainLoopTurns);
      
      poll_a();
      poll_b(); // can provide immediate work for poll_a() to do!
      
      if (idle) { // no immediate work to do
        increment_counter(mainLoopSleepEnterCalls);
        sd_app_evt_wait();
      }
    }

    We have RTT logging going, but of course there's no debugger attached in the field Slight smile

    I collect this histogram from all of our employee modules, and I always see this pattern when the battery collapse happens. Sometimes it heals itself and stops, other times the battery is drained to exhaustion.

    I agree that waking up at 2khz is really wild! I'm also extremely confused that I'm not seeing any of the histogram counts growing 1:1 with the main loop / sleep counts. I also agree that the SVCALL IRQn histogram growing almost 1:1 makes perfect sense- every call to sd_app_evt_wait() does an SVC, and then we have a few more for normal SD API calls that do the same.

    I have never been able to reproduce this at my bench; it only happens in the field, and it always starts when the device receives a connection from a BLE central (phone or dedicated base).

    I'm running out of data to gather, and hypotheses to test, which is starting to make me nervous. If it's not an IRQ, what else could be waking the system?

    Perhaps another thread to explore: I'm explicitly disabling SEVONPEND. Does anything in SoftDevice turn it back on? Could it be related to that? If the ISR associated with an IRQ never actually runs, my counting shim won't ever see it.

Related