This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

No UART irq for several ms after regular BLE radio activity

-Using SoftDevice S110 8.0.0 for the nRF51822_xxAA.

-UART0 running on 19k2 without flow control (only rx/tx) and using irq priority APP_IRQ_PRIORITY_LOW (=3)

-Using the Radio Timeslot functionality while also fulfilling the peripheral role

When the system is not connected all is working fine. The Radio Timeslot is being used to broadcast and listen to proprietary messages when the radio is not used for advertisement. Also the UART is transmitting and receiving as expected.

But when the nRF51822 is connected I get UART errors (overrun). I implemented the radio notification for debugging and see that for several milliseconds (4ms-13ms) after the radio is disabled, the UART ISR is not entered even though bytes are received on the RX line. During this time the UART fifo will obviously overrun since bytes are not read from the RXD register. The radio activity mentioned above is normal BLE communication (reception and transmission of regular BLE packets) and the timeslot was not active.

When I change the UART priority to APP_IRQ_PRIORITY_HIGH (=1) the overrun error does not occur, so it seems like the softdevice is blocking the UART interrupt when the overrun error happens(?).

Futhermore, even though the timeslot is not active at the moment the overrun error occurs, the problem is non-existent when the Radio Timeslot functionality is not used (not initialized/started).

I made a screendump of the scenario:

My questions:

-Is it normal for the softdevice to block the UART interrupt (with low prio) for several milliseconds? This seems very long to me?

-How do I figure out what the softdevice is doing in that moment after the radio is deactivated?

-What could be the relation to the Radio Timeslot (maybe some left over, not re-initialized radio setting?)?

Using UART flow control is not an option in this scenario. Also, changing the priority is not my preferred solution, at least not before understanding why this is happening.

  • Hi

    -Is it normal for the softdevice to block the UART interrupt (with low prio) for several milliseconds? This seems very long to me?

    Yes, it is normal for the Softdevice to block other interrupts, but maybe not for 13 milliseconds. It depends on your application. As you can see here, the Softdevice uses interrupt priority 0 (highest) for timing critical "under-the-hood" stuff. It uses priority 2 for less critical stuff though, like forwarding events to your application when you receive data for example. So if you have implemented time consuming code inside such event handlers, it might be that you are blocking other things running at lower interrupt priorities. When you change UART priority to 1, it means that it gets higher priority than all the Softdevice events which is probably why it works better. Maybe you can try to toggle some GPIOs inside suspicious Softdevice event handler functions and see if you can find the sinner. 

    Futhermore, even though the timeslot is not active at the moment the overrun error occurs, the problem is non-existent when the Radio Timeslot functionality is not used (not initialized/started).

    Can you elaborate on this? Have you been using the timeslot API, but uninitialized it? Or do you mean if you remove everything related to the timeslot API and proprietary radio from your application entirely?

    -What could be the relation to the Radio Timeslot (maybe some left over, not re-initialized radio setting?)?

    Not sure if it is relevant here, but the radio timeslot API also works in high priority interrupts that will block your UART as you can see here: Radio Timeslot API processor usage patterns

    Using UART flow control is not an option in this scenario.

    That is unfortunate. Using BLE and UART on the nRF51 with 19k2 baud rate and no flow control is risky.

     

    PS: It is holiday season in Norway these days and the response time might be slower than usual. 

    Best regards,
    Martin

  • Hi Martin, thanks for your reply.

    In my implementation the use of the timeslot functionality can be activated and deactivated when the device is up and running. When it is deactivated (or when the device boots with timeslot functionality deactivated), the UART errors do not occur. But as soon as the timeslot functionality is activated the UART errors occur when a BLE connection is set up. However I've not seen the UART error when actually being in a timeslot.

    I've been trying to find if any of the event handlers is consuming excessive time but unfortunately this hasn't resulted in a suspect. So far all event handlers I've checked use their normal timing and shouldn't be the reason the UART is not serviced anymore. The moment the UART interrupts are blocked, always seems to coincides with the radio deactivating (and each time it was after regular BLE traffic, not timeslot traffic). 

    The common factor I guess is the radio; issue only occurs when timeslot functionality is active (though not actually being in a timeslot), issue occurs after radio activity. Hence my remark about "what could be the relation to the Radio Timeslot...".

    Is it possible to retrieve the "previous" priority level somehow? If so I was thinking about running a TIMER on priority 1 and then logging the priority level the system was in prior to the TIMER IRQ. Then I would at least see what priority level was active when the UART is not serviced (being either an event handler running on priority 3 or the soft device running in priority level 2).

    Or are there possibly some other hooks in the soft device I can use to get information on the priority level the system is in?

    /Martijn

  • Ok, so BLE and timeslot doesn't work, but either one by itself works?

    I read through the radio notification chapter i the Softdevice Specification again, and since I'm under the impression that you turn on and off the timeslot API dynamically at run-time I took notice of this quote: 

    Are you handling this?

    So far all event handlers I've checked
    So there are more handlers to check?

    The moment the UART interrupts are blocked, always seems to coincides with the radio deactivating (and each time it was after regular BLE traffic, not timeslot traffic)
    Could this be a pointer towards an event handler taking care of received data?

    Is it possible to retrieve the "previous" priority level somehow? 
    Not sure about the privious level, but at least you can get the current level by doing something similar to what is being discussed here: https://community.arm.com/processors/f/discussions/5387/get-current-active-interrupt-priority

    Or are there possibly some other hooks in the soft device 
    Unfortunately the Softdevice is Nordic's proprietary black box, and there aren't many ways to peek inside it. But you can assume that all Softdevice generated events forwarded to your application is running in priority 2 as illustrated here.

    Have you tried to reduce the baud rate?

    What SDK are you using and are you using our UART drivers? 

    The bottom line is that, because of the asynchronous nature of UART and BLE, you will not get a 100 % reliable UART transfer without flow control when using BLE at the same time. And using the timeslot API and proprietary radio protocol on top of that again certainly won't make it better. 

    Any chance it would be possible to upgrade to nRF52? It has a much more advanced UART capable of writing data directly to RAM while the CPU is busy or sleeping. 

  • Ok, so BLE and timeslot doesn't work, but either one by itself works?

    Well not exactly. BLE and timeslot perfectly work together radio wise. It's just that when both are active/functional I get UART errors as soon as the BLE connection becomes active....

    Are you handling this?

    Yup. Right after the softdevice is enabled I configure the Radio Notification. Also, this notifcation I only implemented to debug the UART error. It's not used for the timeslot functionality or anything. And I will remove it as soon as the UART problem is fixed.

    So there are more handlers to check?

    Not really. It was just a way to say I'm out of options what else to check :-)

    Could this be a pointer towards an event handler taking care of received data?

    Well I checked the events I get from the softdevice via SD_EVT_IRQn/SWI2_IRQn and the sd_evt_get() function. The time it takes to handle those events/received data from the softdevice is not the reason for blocking the UART IRQ. I made a screenshot:

    You can see that when the UART interrupts are missed, the system is not in the SWI2_IRQn ISR to handle the ble events.

    Have you tried to reduce the baud rate?

    No I haven't. I'm not sure this would be a feasible solution since it would mean I also have to lower the baudrate of the device on the remote side of the UART. To overcome the UART problem by lowering the UART baudrate would mean I have to buffer those 13 ms in the 6 bytes FIFO receiver buffer. That would be a really low baudrate of ~460 bytes/sec :-(

    but at least you can get the current level by doing something similar

    I will check to see if that can help me ...

    you will not get a 100 % reliable UART transfer without flow control when using BLE at the same time

    I understand what you're saying and I agree there are limits to what to expect from a UART without flowcontrol. That being said, I don't think the baudrate I'm using, is pushing the system to the limit as can also be seen in the screenshot above. The system doesn't seemed to be more "stressed" the moment interrupts are missed

    Any chance it would be possible to upgrade to nRF52?

    Actually I have this system running on nRF52 as well, with the same uart implementation etc. No problem there. Because of the already established user base (nRF51 products), I'm making it run on nRF51 as well.

  • Hi,

    I was made aware of a recently discovered bug in the timeslot API today. The bug was discovered using S140 and nRF52840 though, but the symptoms and circumstances look similar. Are you using the timeslot API to perform "earliest possible" requests? Can you try to increase the timeout on the timeslot request and see if that has an effect?


    PS: I'll be on holiday until next Tuesday. As it is holiday season in Norway it is not certain that any of my colleagues will have time to pick up the case in the meantime.

Related