This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

No UART irq for several ms after regular BLE radio activity

-Using SoftDevice S110 8.0.0 for the nRF51822_xxAA.

-UART0 running on 19k2 without flow control (only rx/tx) and using irq priority APP_IRQ_PRIORITY_LOW (=3)

-Using the Radio Timeslot functionality while also fulfilling the peripheral role

When the system is not connected all is working fine. The Radio Timeslot is being used to broadcast and listen to proprietary messages when the radio is not used for advertisement. Also the UART is transmitting and receiving as expected.

But when the nRF51822 is connected I get UART errors (overrun). I implemented the radio notification for debugging and see that for several milliseconds (4ms-13ms) after the radio is disabled, the UART ISR is not entered even though bytes are received on the RX line. During this time the UART fifo will obviously overrun since bytes are not read from the RXD register. The radio activity mentioned above is normal BLE communication (reception and transmission of regular BLE packets) and the timeslot was not active.

When I change the UART priority to APP_IRQ_PRIORITY_HIGH (=1) the overrun error does not occur, so it seems like the softdevice is blocking the UART interrupt when the overrun error happens(?).

Futhermore, even though the timeslot is not active at the moment the overrun error occurs, the problem is non-existent when the Radio Timeslot functionality is not used (not initialized/started).

I made a screendump of the scenario:

My questions:

-Is it normal for the softdevice to block the UART interrupt (with low prio) for several milliseconds? This seems very long to me?

-How do I figure out what the softdevice is doing in that moment after the radio is deactivated?

-What could be the relation to the Radio Timeslot (maybe some left over, not re-initialized radio setting?)?

Using UART flow control is not an option in this scenario. Also, changing the priority is not my preferred solution, at least not before understanding why this is happening.

Parents
  • Hi

    -Is it normal for the softdevice to block the UART interrupt (with low prio) for several milliseconds? This seems very long to me?

    Yes, it is normal for the Softdevice to block other interrupts, but maybe not for 13 milliseconds. It depends on your application. As you can see here, the Softdevice uses interrupt priority 0 (highest) for timing critical "under-the-hood" stuff. It uses priority 2 for less critical stuff though, like forwarding events to your application when you receive data for example. So if you have implemented time consuming code inside such event handlers, it might be that you are blocking other things running at lower interrupt priorities. When you change UART priority to 1, it means that it gets higher priority than all the Softdevice events which is probably why it works better. Maybe you can try to toggle some GPIOs inside suspicious Softdevice event handler functions and see if you can find the sinner. 

    Futhermore, even though the timeslot is not active at the moment the overrun error occurs, the problem is non-existent when the Radio Timeslot functionality is not used (not initialized/started).

    Can you elaborate on this? Have you been using the timeslot API, but uninitialized it? Or do you mean if you remove everything related to the timeslot API and proprietary radio from your application entirely?

    -What could be the relation to the Radio Timeslot (maybe some left over, not re-initialized radio setting?)?

    Not sure if it is relevant here, but the radio timeslot API also works in high priority interrupts that will block your UART as you can see here: Radio Timeslot API processor usage patterns

    Using UART flow control is not an option in this scenario.

    That is unfortunate. Using BLE and UART on the nRF51 with 19k2 baud rate and no flow control is risky.

     

    PS: It is holiday season in Norway these days and the response time might be slower than usual. 

    Best regards,
    Martin

  • Hi Martin, thanks for your reply.

    In my implementation the use of the timeslot functionality can be activated and deactivated when the device is up and running. When it is deactivated (or when the device boots with timeslot functionality deactivated), the UART errors do not occur. But as soon as the timeslot functionality is activated the UART errors occur when a BLE connection is set up. However I've not seen the UART error when actually being in a timeslot.

    I've been trying to find if any of the event handlers is consuming excessive time but unfortunately this hasn't resulted in a suspect. So far all event handlers I've checked use their normal timing and shouldn't be the reason the UART is not serviced anymore. The moment the UART interrupts are blocked, always seems to coincides with the radio deactivating (and each time it was after regular BLE traffic, not timeslot traffic). 

    The common factor I guess is the radio; issue only occurs when timeslot functionality is active (though not actually being in a timeslot), issue occurs after radio activity. Hence my remark about "what could be the relation to the Radio Timeslot...".

    Is it possible to retrieve the "previous" priority level somehow? If so I was thinking about running a TIMER on priority 1 and then logging the priority level the system was in prior to the TIMER IRQ. Then I would at least see what priority level was active when the UART is not serviced (being either an event handler running on priority 3 or the soft device running in priority level 2).

    Or are there possibly some other hooks in the soft device I can use to get information on the priority level the system is in?

    /Martijn

  • Hi,

    I was made aware of a recently discovered bug in the timeslot API today. The bug was discovered using S140 and nRF52840 though, but the symptoms and circumstances look similar. Are you using the timeslot API to perform "earliest possible" requests? Can you try to increase the timeout on the timeslot request and see if that has an effect?


    PS: I'll be on holiday until next Tuesday. As it is holiday season in Norway it is not certain that any of my colleagues will have time to pick up the case in the meantime.

  • Hi Martin,

    This is definitely making a difference! The timeout value was set to 50ms and I have now increased it by a factor 6 (300ms) and the UART errors are a lot less. They're still there but definitely an improvement.

    From these first tests I would say that this is same issue as for the S140 then. Can you elaborate a bit on this recently discovered bug? Is there a minimal timeout value to use which can prevent this bug from happening? Or are there other workarounds? Is this bug also existing for the S132/nRF52832?

    /Martijn

  • Hi,

    That is interesting. This is what the Softdevice team suggests:

    For the S110 with only a slave link running, I would guess that a safe value could be timeslot_event_length * 2 + 7.5 ms (<- slave max event length). Maybe give it some extra headroom. I think increasing the interrupt priority of the UART0 interrupt would be a better solution. The problem could be avoided entirely by using normal requests instead.

  • Hi Martin,

    Sorry for not getting back to you sooner. Holiday and some personal stuff kept me out of the office for some time.

    I will see if I can easily change the code to use the normal requests. I'll get back to you on that.

    Also, is it clear if this issue is also affecting the S132 softdevice (used with the nRF52832)?

    /Martijn

  • Hi,

    No worries.

    Yes, it affects the S132 Softdevices too. 

Reply Children
No Data
Related