This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

pc-ble-driver-py missing high-rate notifications

Hello,

whilst implementing my embedded device software, I'm developing a tool running on a Windows laptop to validate my work.

The idea is to use pc-ble-driver-py python bindings to implement simple interactions with the device over BLE using the services exposed.

The current setup is:

  • pc-ble-driver-py 0.15.0 (latest)
  • nRF52840 Dongle (PCA10059) running connectivity FW 4.1.2 SD API v5 (latest supported)

Most of the functionalities seem to be working fine.

But when I try to collect notifications from a certain rate on, looks like the system is not able to "catch-up".

To give some context - from the embedded device, I'm polling data and queuing notifications on certain char every 10ms. The connection interval is set 15ms.

The moment I enable the notifications on this char from Client side(after the wanted connection intervals are updated), I start receiving as a return value from sd_ble_gatts_hvx the error code NRF_ERROR_RESOURCES on the server side - it's not always the case, but very often, at least 10/20 times per second.

The only explanation I can give to this is that the Central (i.e. the python script) is not meeting all the connection events at fixed interval (15ms), and this way the notification queue on the Server is increasing till its limit.

I tested this scenario with different platforms (Windows UWP, Android) and this is the only case I'm seeing this behaviour.

What I find rather odd is that using exactly the same dongle (with same FW), but from the nRF Connect Desktop App GUI this is NOT happening - I can subscribe to the notifications and getting them at the rate I expect.

Adding a couple of graph to make things clearer:

Notifications Intervals (ms) from nRF Connect Desktop:

Notifications Intervals (ms) from pc-ble-driver-py:

It's clear that in the first case, two notifications are sometimes very close as the queuing rate is higher than the connection interval (i.e you get more than one notification for conn interval).

Instead in the second case, interval is never below 15ms, meaning that for sure notifications are lost (I imagine due to missed connection events...).

I tried to do the same but changing the dongle with a nRF52-DK. The result is that in this case the system crashes and disconnects after a couple of seconds from the enabling of notifications with this log:

2020-10-20 15:17:21,194 [21092/LogThread] h5_decode error, code: 0x802c, H5 error count: 1. raw packet: c0 f0 0e 02 00 02 39 00 00 00 00 00 00 00 1c 00 01 12 00 f7 03 71 03 d0 02 de 02 fe 02 0e c0 

As a quick workaround, If I try to queue notifications every 20ms (or anything above the connection interval), I don't see any problem and the system works flawlessly.

Can you help me understanding what's going on?

I'm evaluating this python libraries for the system testing framework of our devices - and of course we'll need to evaluate also the high-rate notifications.

I think reproducing the issue would be quite trivial, but maybe there is something I'm missing that can solve this straightaway.

Thanks!

  • Hello,

    I start receiving as a return value from sd_ble_gatts_hvx the error code NRF_ERROR_RESOURCES on the server side - it's not always the case, but very often, at least 10/20 times per second.

    An exempt from the sd_ble_gatts_hvx API Reference reads:

    NRF_ERROR_RESOURCES Too many notifications queued. Wait for a BLE_GATTS_EVT_HVN_TX_COMPLETE event and retry.

    I suspect in this case that you are indeed queueing notifications faster than you are able to send them, filling the available TX queue. My suspicion is strengthened when you say you are queueing a notification every 10 ms, with a connection interval of 15 ms. If you expect that some notifications might require a retransmit then we may also try to increase the TX queue to resolve the issue ( as long as notifications are being sent faster than they are queued on average ).

    The only explanation I can give to this is that the Central (i.e. the python script) is not meeting all the connection events at fixed interval (15ms), and this way the notification queue on the Server is increasing till its limit.

    This is a good consideration - and there are multiple reasons why this might be the case, for example if you are communicating in an environment with massive 2.4 GHz interference, or over a very long range. I would however first look at the connection parameters, and how often notifications are sent successfully.

    Are you familiar with the nRF Sniffer tool? It is a powerful tool when developing with BLE, which lets you monitor the on-air BLE traffic. You could use this to check whether or not one of your devices are skipping some connection intervals, for when we have exhausted the connection-parameter approach.

    that for sure notifications are lost

    BLE connections are loss-less - if a packet is not ACK'd, it is retransmitted.

    I have a feeling I might have misunderstood your situation and issue, in which case a sniffer trace from the nRF Sniffer would be very helpful to see the whole picture.
    Please do not hesitate to let me know if I have misunderstood your description, or if any part of my reply should be unclear.

    Looking forward to resolving this issue together!

    Best regards,
    Karl

  • Hi ,

    thanks for helping out!

    I suspect in this case that you are indeed queueing notifications faster than you are able to send them, filling the available TX queue. My suspicion is strengthened when you say you are queueing a notification every 10 ms, with a connection interval of 15 ms. If you expect that some notifications might require a retransmit then we may also try to increase the TX queue to resolve the issue ( as long as notifications are being sent faster than they are queued on average ).

    I'm not sure I follow here.
    There's an internal buffer for notifications on Server side (the embedded device).
    So if on the Server I'm queuing data every 10ms with an interval of 15ms I'm just expecting on the Client side to be notified about up to 2 notifications for each event, which should be acceptable.
    BTW packets are just around 20 bytes, whilst the MTU is set to 131 bytes (Data Length Extension 135 bytes).

    Of course when I get the error NRF_ERROR_RESOURCES I could just retry to send, but this is not removing the fundamental problem.

    This is a good consideration - and there are multiple reasons why this might be the case, for example if you are communicating in an environment with massive 2.4 GHz interference, or over a very long range. I would however first look at the connection parameters, and how often notifications are sent successfully.

    What I'm puzzled with is that exactly the same HW configuration and environment (but using nRF Connect Desktop as a notifications collector) is instead working without any problem. I'm transmitting from few centimetres, so I don't think the issue is there.
    I believe the intervals graph I attached in the original post are quite descriptive in this sense.


    This makes me think the issue is on the Client side rather on the way the Server is queuing notifications, as the other tests I did with different entities collecting the notifications are not having the same issue.

    Is it clearer now?

    BLE connections are loss-less - if a packet is not ACK'd, it is retransmitted.

    Is this true also on notifications? I know indications works differently. Please let me know if there's any documentation you suggest to understand these details at protocol level Slight smile

    As for the sniffer - that is definitely the next step for this.

    I'd just need a bit of time to set this up as I never did it, but probably can provide some more info.

    Is there any other quick tests you suggest to at least isolate the problem in a particular area?

    Thanks!

  • Hello,

    Sorry for my late reply - I was out of office for some days.

    davege said:
    thanks for helping out!

    It is no problem at all, I am happy to help!

    davege said:
    So if on the Server I'm queuing data every 10ms with an interval of 15ms I'm just expecting on the Client side to be notified about up to 2 notifications for each event, which should be acceptable.

    How long is your connection event length? You could very well not have time to sent two notifications per connection event.
    If the connection event length is long enough to accommodate additional notifications, they will be sent in the same connection event.

    davege said:
    Of course when I get the error NRF_ERROR_RESOURCES I could just retry to send, but this is not removing the fundamental problem.

    Agreed, this is a good way to look at it. Resolving the issue by its root is always preferable. 

    davege said:
    Is this true also on notifications? I know indications works differently. Please let me know if there's any documentation you suggest to understand these details at protocol level

    Yes, this is true for every packet sent in a BLE connection. Indications take it a step longer by requiring that the application acknowledges the indication. Notifications are just acknowledged by the link layer - which is all right for most use-cases.
    If you would like to better understand the different exchanges I highly recommend taking a look at the Sequence Message Chart. In your case, I especially recommend seeing the Server Notification Sequence Chart to better understand every step of a notification.

    davege said:

    As for the sniffer - that is definitely the next step for this.

    I'd just need a bit of time to set this up as I never did it, but probably can provide some more info.

    Great! While the sniffer could seem daunting to start with, it is easy to get the hang of once you get it up and running.

    davege said:
    Is there any other quick tests you suggest to at least isolate the problem in a particular area?

    If you could tell me what you connection event length is set to, and install the sniffer tool, then we will already be en route.

    Looking forward to resolving this issue together!

    Best regards,
    Karl

     

  • Hello again,

    thanks for the follow-up.

    Interesting the role of the connection event length...I did not think about it.

    At the moment, NRF_SDH_BLE_GAP_EVENT_LENGTH is set to 6 (so 7.5ms).

    But I want to reiterate again this: the same settings on the Server side are working fine when I'm notifying another Client device (typical example is the nRF Connect app running on the phone or even on Desktop with the same dongle!).

    Tomorrow I'll try to capture a log of the connection between the two devices with the Sniffer.

    Let me know if there's something in particular I should filter out, if not I'll dump one log for each of the cases (one presenting NRF_ERROR_RESOURCE error, the other without) to see the difference.

    Thanks,

  • Hi,

    davege said:
    thanks for the follow-up.

    No problem at all, I am happy to help!

    davege said:

    Interesting the role of the connection event length...I did not think about it.

    At the moment, NRF_SDH_BLE_GAP_EVENT_LENGTH is set to 6 (so 7.5ms).

    Could you try increasing this by a lot ( as a test ), to see if it then behaves as you inteded?

    davege said:
    But I want to reiterate again this: the same settings on the Server side are working fine when I'm notifying another Client device (typical example is the nRF Connect app running on the phone or even on Desktop with the same dongle!).

    Yes - I see now that I did not address this in your previous reply - this could very well be caused by the 
    Since it is the central that actually determines the connection parameters, and parameter negotiations, then it might very well be different when connected to different centrals.
    Typically, smartphones have a very rigid set of rules for which connection parameters they allow - you may not set this as you please, when using the smartphones integrated BLE module. This differs from when you are using a central device that you have programmed yourself ( or nRF Connect for Desktop application ), since you may then set the parameters yourself, as you please.

    This will be very easy to see in a sniffer trace, since the content of the connection request and parameter negotiation will be viewable.

    davege said:
    Tomorrow I'll try to capture a log of the connection between the two devices with the Sniffer.
    davege said:
    Let me know if there's something in particular I should filter out, if not I'll dump one log for each of the cases (one presenting NRF_ERROR_RESOURCE error, the other without) to see the difference.

    Great, that should make the issue a lot easier to address. Please remember to select your device from the device menu, as shown in the included image, to make the sniffer follow into your connection. If you leave the option at the default ( as shown in the image ) then your sniffer will only listen for advertisements.


    Do not filter anything else out - it is best if I get the unfiltered log. It does not matter if it is thousands of packets, as long as you mention which packets the issue is demonstrated in ( i.e " In the exchange starting at packet number .. and ending at packet number .. you can see the behavior ... ", or similar ).

    Looking forward to resolving this issue together!

    Best regards,
    Karl

Related