This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

pc-ble-driver-py missing high-rate notifications

Hello,

whilst implementing my embedded device software, I'm developing a tool running on a Windows laptop to validate my work.

The idea is to use pc-ble-driver-py python bindings to implement simple interactions with the device over BLE using the services exposed.

The current setup is:

  • pc-ble-driver-py 0.15.0 (latest)
  • nRF52840 Dongle (PCA10059) running connectivity FW 4.1.2 SD API v5 (latest supported)

Most of the functionalities seem to be working fine.

But when I try to collect notifications from a certain rate on, looks like the system is not able to "catch-up".

To give some context - from the embedded device, I'm polling data and queuing notifications on certain char every 10ms. The connection interval is set 15ms.

The moment I enable the notifications on this char from Client side(after the wanted connection intervals are updated), I start receiving as a return value from sd_ble_gatts_hvx the error code NRF_ERROR_RESOURCES on the server side - it's not always the case, but very often, at least 10/20 times per second.

The only explanation I can give to this is that the Central (i.e. the python script) is not meeting all the connection events at fixed interval (15ms), and this way the notification queue on the Server is increasing till its limit.

I tested this scenario with different platforms (Windows UWP, Android) and this is the only case I'm seeing this behaviour.

What I find rather odd is that using exactly the same dongle (with same FW), but from the nRF Connect Desktop App GUI this is NOT happening - I can subscribe to the notifications and getting them at the rate I expect.

Adding a couple of graph to make things clearer:

Notifications Intervals (ms) from nRF Connect Desktop:

Notifications Intervals (ms) from pc-ble-driver-py:

It's clear that in the first case, two notifications are sometimes very close as the queuing rate is higher than the connection interval (i.e you get more than one notification for conn interval).

Instead in the second case, interval is never below 15ms, meaning that for sure notifications are lost (I imagine due to missed connection events...).

I tried to do the same but changing the dongle with a nRF52-DK. The result is that in this case the system crashes and disconnects after a couple of seconds from the enabling of notifications with this log:

2020-10-20 15:17:21,194 [21092/LogThread] h5_decode error, code: 0x802c, H5 error count: 1. raw packet: c0 f0 0e 02 00 02 39 00 00 00 00 00 00 00 1c 00 01 12 00 f7 03 71 03 d0 02 de 02 fe 02 0e c0 

As a quick workaround, If I try to queue notifications every 20ms (or anything above the connection interval), I don't see any problem and the system works flawlessly.

Can you help me understanding what's going on?

I'm evaluating this python libraries for the system testing framework of our devices - and of course we'll need to evaluate also the high-rate notifications.

I think reproducing the issue would be quite trivial, but maybe there is something I'm missing that can solve this straightaway.

Thanks!

Parents
  • Hello,

    I start receiving as a return value from sd_ble_gatts_hvx the error code NRF_ERROR_RESOURCES on the server side - it's not always the case, but very often, at least 10/20 times per second.

    An exempt from the sd_ble_gatts_hvx API Reference reads:

    NRF_ERROR_RESOURCES Too many notifications queued. Wait for a BLE_GATTS_EVT_HVN_TX_COMPLETE event and retry.

    I suspect in this case that you are indeed queueing notifications faster than you are able to send them, filling the available TX queue. My suspicion is strengthened when you say you are queueing a notification every 10 ms, with a connection interval of 15 ms. If you expect that some notifications might require a retransmit then we may also try to increase the TX queue to resolve the issue ( as long as notifications are being sent faster than they are queued on average ).

    The only explanation I can give to this is that the Central (i.e. the python script) is not meeting all the connection events at fixed interval (15ms), and this way the notification queue on the Server is increasing till its limit.

    This is a good consideration - and there are multiple reasons why this might be the case, for example if you are communicating in an environment with massive 2.4 GHz interference, or over a very long range. I would however first look at the connection parameters, and how often notifications are sent successfully.

    Are you familiar with the nRF Sniffer tool? It is a powerful tool when developing with BLE, which lets you monitor the on-air BLE traffic. You could use this to check whether or not one of your devices are skipping some connection intervals, for when we have exhausted the connection-parameter approach.

    that for sure notifications are lost

    BLE connections are loss-less - if a packet is not ACK'd, it is retransmitted.

    I have a feeling I might have misunderstood your situation and issue, in which case a sniffer trace from the nRF Sniffer would be very helpful to see the whole picture.
    Please do not hesitate to let me know if I have misunderstood your description, or if any part of my reply should be unclear.

    Looking forward to resolving this issue together!

    Best regards,
    Karl

  • Hi ,

    thanks for helping out!

    I suspect in this case that you are indeed queueing notifications faster than you are able to send them, filling the available TX queue. My suspicion is strengthened when you say you are queueing a notification every 10 ms, with a connection interval of 15 ms. If you expect that some notifications might require a retransmit then we may also try to increase the TX queue to resolve the issue ( as long as notifications are being sent faster than they are queued on average ).

    I'm not sure I follow here.
    There's an internal buffer for notifications on Server side (the embedded device).
    So if on the Server I'm queuing data every 10ms with an interval of 15ms I'm just expecting on the Client side to be notified about up to 2 notifications for each event, which should be acceptable.
    BTW packets are just around 20 bytes, whilst the MTU is set to 131 bytes (Data Length Extension 135 bytes).

    Of course when I get the error NRF_ERROR_RESOURCES I could just retry to send, but this is not removing the fundamental problem.

    This is a good consideration - and there are multiple reasons why this might be the case, for example if you are communicating in an environment with massive 2.4 GHz interference, or over a very long range. I would however first look at the connection parameters, and how often notifications are sent successfully.

    What I'm puzzled with is that exactly the same HW configuration and environment (but using nRF Connect Desktop as a notifications collector) is instead working without any problem. I'm transmitting from few centimetres, so I don't think the issue is there.
    I believe the intervals graph I attached in the original post are quite descriptive in this sense.


    This makes me think the issue is on the Client side rather on the way the Server is queuing notifications, as the other tests I did with different entities collecting the notifications are not having the same issue.

    Is it clearer now?

    BLE connections are loss-less - if a packet is not ACK'd, it is retransmitted.

    Is this true also on notifications? I know indications works differently. Please let me know if there's any documentation you suggest to understand these details at protocol level Slight smile

    As for the sniffer - that is definitely the next step for this.

    I'd just need a bit of time to set this up as I never did it, but probably can provide some more info.

    Is there any other quick tests you suggest to at least isolate the problem in a particular area?

    Thanks!

  • Hello,


    Sorry for my late reply.

    davege said:
    apparently we've been writing simultaneously

     Yes, it appears we did! Slight smile

    davege said:
    Exactly, that's the mystery to solve!

    Indeed it is, this is most strange. I would assume it might stem from the radio being busy with other tasks ( and thus not following up on the more data prompt ), but something seems amiss in this case.

    davege said:

    It is all very standard, as I just slightly modified the example code for the heart rate that I've found in the repository.

    At HW/FW level current setup is:

    • pc-ble-driver-py 0.15.0 (latest)
    • nRF52840 Dongle (PCA10059) running connectivity FW 4.1.2 SD API v5 (latest supported)

    The python script is just subscribing for notifications (on the selected custom service/char in my case, but I believe it could be whatever, even Nordic UART) and printing that line you've seen in the log with the data received.

    I believe there's no scanning active after the connection to the peripheral.

    Thank you for specifying! This is very helpful for me to know.

    davege said:
    Now wondering if this will keep scanning active even after I receive on_gap_evt_connected callback...
    davege said:
    I'm calling this API:

    What is the scan timeout parameter that you pass to your scan_start? Is it possible this set to indefinite?
    Looking through the unmodified example code it  seems that the scanning is not stopped on CONNECTED event, but is rather set to time out.

    I have reached out to the developers of the pc-ble-driver-py to ask if they have ever heard of similar behavior. I will get back to you as soon as they reply.
    I will create a small example to try and reproduce this on my end, using the connection parameters and setup you described.

    Best regards,
    Karl

  • Hi,

    great to know you are setting up something quick to reproduce the issue.

    What is the scan timeout parameter that you pass to your scan_start? Is it possible this set to indefinite?
         
    This is the full call with the params:
    scan_duration = 5
    params = BLEGapScanParams(interval_ms=200, window_ms=150, timeout_s=scan_duration)
    self.adapter.driver.ble_gap_scan_start(scan_params=params)
    I'm not aware about a way to scan forever, but usually with nRF5 SDK on the embedded side, this happens when setting the duration to 0.
    One of the problem I have with this python bindings is that they are not documented, so sometimes I'm just guessing I'm doing the right thing given the familiarity I have with the rest of the Nordic offering.
    Hopefully with your help I'll know if I'm doing something wrong Slight smile
    Thanks,
    D
  • Hi ,

    is there any progress on this?

    Let me know if I may help or if some reason it was not possible to reproduce the problem on your end.

    Thanks,

    D

  • Hello again Davege,

    I am terribly sorry for the very long delay in communications from my side on this.

    I have allocated time today to work through this issue. I will update you by the end of the day at the latest.
    I am again terribly sorry for the long delay on this issue.

    Best regards,
    Karl

  • Hello Davege,

    Thank you for your patience.
    I have today successfully replicated the issue with the setup you described on my end.
    I have allocated time tomorrow to keep working on resolving the issue.

    Best regards,
    Karl

Reply Children
  • Hello again Davege,

    Know that I have not forgotten this issue, and I am still working on it.
    Unfortunately, this issue has taken longer than I had anticipated to resolve.

    I will update you as soon as I have something.

    Best regards,
    Karl

  • Hi,

    thanks a lot for your effort in this.

    Do not worry at all, as long as I know there's some activity to understand what's happening, I'm good.

    Looking forward to hearing any news.

    Best,

    D

  • Hello davege,

    Thank you for your continued patience. I was unfortunately out of office for some days again.

    I have continued testing today, and can now at least rule out that the issue is with the UART transport layer.
    I have also spoken with one of the developers of pc-ble-driver-py today, and he will check if this could be caused by a limited RX buffer in the precompiled connectivity firmware.
    I will update you as soon as he gets back to me about this.

    davege said:
    thanks a lot for your effort in this.

    It is no problem at all davege, I am happy to help!

    Best regards,
    Karl

  • Hi ,

    thanks again for your update.

    In my opinion, it is very unlikely that the problem is on the connectivity firmware running on the DK/dongle.

    The reason is that exactly the same dongle running the same FW is not showing the issue when subscribing to notifications from the char via nRF Connect Desktop.

    My feeling is that there is something wrong during the configuration of the board... possibly some settings hardcoded in the serialisation layer transporting data from the DK to the script, when opening the link.

    Thanks again for keeping me posted, I'm really curious to understand what's happening :)

    Best,

    D

  • Hello again Davege,

    davege said:

    In my opinion, it is very unlikely that the problem is on the connectivity firmware running on the DK/dongle.

    The reason is that exactly the same dongle running the same FW is not showing the issue when subscribing to notifications from the char via nRF Connect Desktop.

    I fully agree with this reasoning.

    davege said:
    My feeling is that there is something wrong during the configuration of the board... possibly some settings hardcoded in the serialisation layer transporting data from the DK to the script, when opening the link.

    I too think that this might be an issue here. I have done some more testing, and it seems that the default ( used in the HRM collector example ) RX/TX window of the pc-ble-driver-py is 1310µs compared to the pc-ble-driver which defaults to 2120µs. However, both of these windows are big enough for multiple packets - so I do not immediately see this as a reason for the single-packet-events of the pc-ble-driver-py. However, I have noted it as the main difference in the connection configuration between the two drivers, in the notes I have made for the developers.

    I have created an internal ticket on this with the developers of the pc-ble-driver-py, and I hope to hear from them soon on this issue.
    Unfortunately, I am unable to estimate when they will have time to review the case exactly.
    Is this issue a blocker for your current development? If so, is there an option for you to use the pc-ble-driver instead, for the time being?
    Reviewing the forum for mentions of the pc-ble-driver-py's performance I have come across some posts claiming that it is significantly slower ( lower throughput ) than the pc-ble-driver. I guess this is somewhat to be expected, since the pc-ble-driver-py is an abstraction on top of the existing pc-ble-driver, but nevertheless it is something to consider.

    davege said:
    Thanks again for keeping me posted, I'm really curious to understand what's happening :)

    It is no problem at all - I too am curious to see what the root cause of this behavior should turn out to be! :)

    Best regards,
    Karl

Related