Why is the SoftDevice automatically NAKing packets?

I'm developing a BLE application on the nRF52832 that uses NRF_SDH_DISPATCH_MODEL_APPSH to receive events from the SoftDevice. My application involves using some software-implemented elliptic-curve cryptographic computations, which can take up to a second and a half to run. I found that if I send enough Characteristic Writes while the nRF52832 is doing the ECC computations, I can overflow the App Scheduler (i.e. app_sched_event_put() in SD_EVT_IRQHandler() returns NRF_ERROR_NO_MEM).

Once I discovered this, I tried finding the upper limits of how many packets my device might have to receive over the air during the ECC computations. What I've found is that if I increase the size of the App Scheduler to about 60, then no matter how long the ECC computations take, I stop seeing NRF_ERROR_NO_MEM and the SoftDevice does not enqueue more than about 50 packets. On a sniffer, I see that the SoftDevice starts NAKing a packet at the Link Layer for an extended period (definitely not due to interference) after it correctly receives a stream of about 50. The ECC computations are running in APP_IRQ_PRIORITY_THREAD, so they should not be able to directly cause the SoftDevice to drop events.

What is the SoftDevice doing here? Is there a recommendation on how to determine the appropriate size of my App Scheduler based on this "automatic NAK" behavior?

I am using s132 v5.1.0.

  • Once I discovered this, I tried finding the upper limits of how many packets my device might have to receive over the air during the ECC computations. What I've found is that if I increase the size of the App Scheduler to about 60, then no matter how long the ECC computations take, the SoftDevice will not enqueue more than about 50 packets. On a sniffer, I see that the SoftDevice starts NAKing a packet at the Link Layer for an extended period (definitely not due to interference) after it correctly receives a stream of about 50.

    What you are seeing might not be just due to the app_Scheduler. Each connection in your application have arond 6-8 buffers for the incoming events from the softdevice. It is totally upto the application as to when it can pull these events from the softdevice. The softdevice notifies the application using the SWI2_IRQn at interrupt priority 6. If your application is doing the cryptographic computations at an higher interrupt priority then it will easily mask this event irq for longer periods of time. And it is also possible that more softdevice events are generated which are not being pulled by the application as it was busy doing something else in other higher priority contexts.  Once these event buffers are full, the softdevice will not accept any more packets from the peer and starts to NACK them.

    You have two choices.

    1. Do your ECC computations at a lower priority than the SWI2_IRQn priority (that is lower than priority 6)
    2. reduce the BLE activities whiles doing ECC computations so there are less events generated inside the softdevice.
  • I've edited my original question to clarify.

    1. My ECC computations are already running in APP_IRQ_PRIORITY_THREAD.

    2. This is irrelevant. I'm not looking to solve a particular issue, I'm looking to learn more about the solution I've already found (increasing the app scheduler to 60).

  • The softdevice is not doing anything wrong here. It is just waiting for the application to pull the events that are already notified. You could create a TIMER interrupt at higher priority and print on RTT sd_ble_evt_get like every 50ms or every 100ms and see the periods where where your application is not pulling the events and making the event buffer full forcing the softdevice to NACK incoming packets. The question as to why this happens depends on your application contexts. If you provide me the information of all the contexts in your applicaiton and the packet processing times that your application take, then I might be able to answer why you see this behavior when you change the app_scheduler buffer size to 60.

  • It is just waiting for the application to pull the events that are already notified.

    Oh! I was about to ask how this works, but I just realized that it's because the application hasn't called nrf_sdh_evts_poll()! The difference between NRF_ERROR_NO_MEM and LL NAKs is that it depends on whether the App Scheduler or the SoftDevice's buffers can hold more events!

    So, I want to keep leveraging the LL NAK behavior. This means that I just have to make sure that my App Scheduler is larger than the maximum number of events that the SoftDevice can hold. How do I determine this number for the worst-case (i.e. events from a malicious Central)?

  • Elias said:
    Oh! I was about to ask how this works, but I just realized that it's because the application hasn't called nrf_sdh_evts_poll()! The difference between NRF_ERROR_NO_MEM and LL NAKs is that it depends on whether the App Scheduler or the SoftDevice's buffers can hold more events!

    Yes, that is correct.

    Elias said:
    So, I want to keep leveraging the LL NAK behavior. This means that I just have to make sure that my App Scheduler is larger than the maximum number of events that the SoftDevice can hold

    Correct.

    Elias said:
    How do I determine this number for the worst-case (i.e. events from a malicious Central)?

    You only remain connected to a trusted peer. After the connection your establish that the peer is in fact the one you want to be connected and then the rest of the communication works on trust basis that the peer would behave in a predictable way. If you are trying to make an application that connects to any central that sent a connect request, then you probably can keep a count of how many events/given_time are being received from peer and disconnect from them if the events/given_time exceed your application expectations.

Related