Why is the SoftDevice automatically NAKing packets?

I'm developing a BLE application on the nRF52832 that uses NRF_SDH_DISPATCH_MODEL_APPSH to receive events from the SoftDevice. My application involves using some software-implemented elliptic-curve cryptographic computations, which can take up to a second and a half to run. I found that if I send enough Characteristic Writes while the nRF52832 is doing the ECC computations, I can overflow the App Scheduler (i.e. app_sched_event_put() in SD_EVT_IRQHandler() returns NRF_ERROR_NO_MEM).

Once I discovered this, I tried finding the upper limits of how many packets my device might have to receive over the air during the ECC computations. What I've found is that if I increase the size of the App Scheduler to about 60, then no matter how long the ECC computations take, I stop seeing NRF_ERROR_NO_MEM and the SoftDevice does not enqueue more than about 50 packets. On a sniffer, I see that the SoftDevice starts NAKing a packet at the Link Layer for an extended period (definitely not due to interference) after it correctly receives a stream of about 50. The ECC computations are running in APP_IRQ_PRIORITY_THREAD, so they should not be able to directly cause the SoftDevice to drop events.

What is the SoftDevice doing here? Is there a recommendation on how to determine the appropriate size of my App Scheduler based on this "automatic NAK" behavior?

I am using s132 v5.1.0.

Parents
  • Once I discovered this, I tried finding the upper limits of how many packets my device might have to receive over the air during the ECC computations. What I've found is that if I increase the size of the App Scheduler to about 60, then no matter how long the ECC computations take, the SoftDevice will not enqueue more than about 50 packets. On a sniffer, I see that the SoftDevice starts NAKing a packet at the Link Layer for an extended period (definitely not due to interference) after it correctly receives a stream of about 50.

    What you are seeing might not be just due to the app_Scheduler. Each connection in your application have arond 6-8 buffers for the incoming events from the softdevice. It is totally upto the application as to when it can pull these events from the softdevice. The softdevice notifies the application using the SWI2_IRQn at interrupt priority 6. If your application is doing the cryptographic computations at an higher interrupt priority then it will easily mask this event irq for longer periods of time. And it is also possible that more softdevice events are generated which are not being pulled by the application as it was busy doing something else in other higher priority contexts.  Once these event buffers are full, the softdevice will not accept any more packets from the peer and starts to NACK them.

    You have two choices.

    1. Do your ECC computations at a lower priority than the SWI2_IRQn priority (that is lower than priority 6)
    2. reduce the BLE activities whiles doing ECC computations so there are less events generated inside the softdevice.
  • I've edited my original question to clarify.

    1. My ECC computations are already running in APP_IRQ_PRIORITY_THREAD.

    2. This is irrelevant. I'm not looking to solve a particular issue, I'm looking to learn more about the solution I've already found (increasing the app scheduler to 60).

  • The softdevice is not doing anything wrong here. It is just waiting for the application to pull the events that are already notified. You could create a TIMER interrupt at higher priority and print on RTT sd_ble_evt_get like every 50ms or every 100ms and see the periods where where your application is not pulling the events and making the event buffer full forcing the softdevice to NACK incoming packets. The question as to why this happens depends on your application contexts. If you provide me the information of all the contexts in your applicaiton and the packet processing times that your application take, then I might be able to answer why you see this behavior when you change the app_scheduler buffer size to 60.

  • It is just waiting for the application to pull the events that are already notified.

    Oh! I was about to ask how this works, but I just realized that it's because the application hasn't called nrf_sdh_evts_poll()! The difference between NRF_ERROR_NO_MEM and LL NAKs is that it depends on whether the App Scheduler or the SoftDevice's buffers can hold more events!

    So, I want to keep leveraging the LL NAK behavior. This means that I just have to make sure that my App Scheduler is larger than the maximum number of events that the SoftDevice can hold. How do I determine this number for the worst-case (i.e. events from a malicious Central)?

  • Elias said:
    Oh! I was about to ask how this works, but I just realized that it's because the application hasn't called nrf_sdh_evts_poll()! The difference between NRF_ERROR_NO_MEM and LL NAKs is that it depends on whether the App Scheduler or the SoftDevice's buffers can hold more events!

    Yes, that is correct.

    Elias said:
    So, I want to keep leveraging the LL NAK behavior. This means that I just have to make sure that my App Scheduler is larger than the maximum number of events that the SoftDevice can hold

    Correct.

    Elias said:
    How do I determine this number for the worst-case (i.e. events from a malicious Central)?

    You only remain connected to a trusted peer. After the connection your establish that the peer is in fact the one you want to be connected and then the rest of the communication works on trust basis that the peer would behave in a predictable way. If you are trying to make an application that connects to any central that sent a connect request, then you probably can keep a count of how many events/given_time are being received from peer and disconnect from them if the events/given_time exceed your application expectations.

Reply
  • Elias said:
    Oh! I was about to ask how this works, but I just realized that it's because the application hasn't called nrf_sdh_evts_poll()! The difference between NRF_ERROR_NO_MEM and LL NAKs is that it depends on whether the App Scheduler or the SoftDevice's buffers can hold more events!

    Yes, that is correct.

    Elias said:
    So, I want to keep leveraging the LL NAK behavior. This means that I just have to make sure that my App Scheduler is larger than the maximum number of events that the SoftDevice can hold

    Correct.

    Elias said:
    How do I determine this number for the worst-case (i.e. events from a malicious Central)?

    You only remain connected to a trusted peer. After the connection your establish that the peer is in fact the one you want to be connected and then the rest of the communication works on trust basis that the peer would behave in a predictable way. If you are trying to make an application that connects to any central that sent a connect request, then you probably can keep a count of how many events/given_time are being received from peer and disconnect from them if the events/given_time exceed your application expectations.

Children
  • I meant: "How do I determine the maximum number of events that the SoftDevice can hold?". "Worst-case" meaning with events that can happen arbitrarily fast and have the smallest corresponding data structure (e.g. zero-length writes).

  • Maximum number is 6 events/connection as this is the buffer set inside the softdevice. So softdevice will start nacking the connection packets if there are already 6 events due to be pulled by the application at any time.

  • Then why can I induce NRF_ERROR_NO_MEM during ECC computations when my App Scheduler is size 50?

    During ECC computations, the loop which calls nrf_sdh_evts_poll() is blocked. So, the SoftDevice can continue receiving events, but the application will not process them. If the SoftDevice has 6 events in its own buffer, wouldn't that mean that it called app_sched_event_put() only 6 times before the next time nrf_sdh_evts_poll() is called?

    Now, my application does support 4 concurrent connections, but that only accounts for (6*4=)24 events.  All my testing for this issue has been done with only one connection. There are no events in the application that happen comparably frequently to SoftDevice events, so the App Scheduler isn't filling for some other reason.

  • Elias said:
    During ECC computations, the loop which calls nrf_sdh_evts_poll() is blocked. So, the SoftDevice can continue receiving events, but the application will not process them. If the SoftDevice has 6 events in its own buffer, wouldn't that mean that it called app_sched_event_put() only 6 times before the next time nrf_sdh_evts_poll() is called?

    The events buffer is inside the softdevice memory region and not the application memory region. The size of app_scheduler event buffer does not correlate directly to the size of the softdevice events buffer. If the call to nrf_sdh_evts_poll is blocked, during the ECC computations, then the application is not pulling the events out of the softdevice events buffer. Which means that the softdevice event buffer gets filled up quite fast. 

    The size of the app_scheduler buffer corresponds to how many times you can call app_sched_event_put without calling app_sched_execute. The sched events can be any (in our case these are softdevice events) data. 

    In simple design, the longer the thread which calls nrf_sdh_evt_poll is blocked in the midst of BLE activity that can generate softdevice events, the faster it is possible to fill up softdevice specific internal event buffer (not controlled by app and not the same as app_scheduler event buffer)

  • Yes, I understand all that. I think there might be some confusion over what I'm asking. I've figured out the "What is the SoftDevice doing here?" in concept, and now I'm trying to figure out the second part: "Is there a recommendation on how to determine the appropriate size of my App Scheduler based on this "automatic NAK" behavior?" I've opened a new thread.

Related