This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

pc-ble-driver-py missing high-rate notifications

Hello,

whilst implementing my embedded device software, I'm developing a tool running on a Windows laptop to validate my work.

The idea is to use pc-ble-driver-py python bindings to implement simple interactions with the device over BLE using the services exposed.

The current setup is:

  • pc-ble-driver-py 0.15.0 (latest)
  • nRF52840 Dongle (PCA10059) running connectivity FW 4.1.2 SD API v5 (latest supported)

Most of the functionalities seem to be working fine.

But when I try to collect notifications from a certain rate on, looks like the system is not able to "catch-up".

To give some context - from the embedded device, I'm polling data and queuing notifications on certain char every 10ms. The connection interval is set 15ms.

The moment I enable the notifications on this char from Client side(after the wanted connection intervals are updated), I start receiving as a return value from sd_ble_gatts_hvx the error code NRF_ERROR_RESOURCES on the server side - it's not always the case, but very often, at least 10/20 times per second.

The only explanation I can give to this is that the Central (i.e. the python script) is not meeting all the connection events at fixed interval (15ms), and this way the notification queue on the Server is increasing till its limit.

I tested this scenario with different platforms (Windows UWP, Android) and this is the only case I'm seeing this behaviour.

What I find rather odd is that using exactly the same dongle (with same FW), but from the nRF Connect Desktop App GUI this is NOT happening - I can subscribe to the notifications and getting them at the rate I expect.

Adding a couple of graph to make things clearer:

Notifications Intervals (ms) from nRF Connect Desktop:

Notifications Intervals (ms) from pc-ble-driver-py:

It's clear that in the first case, two notifications are sometimes very close as the queuing rate is higher than the connection interval (i.e you get more than one notification for conn interval).

Instead in the second case, interval is never below 15ms, meaning that for sure notifications are lost (I imagine due to missed connection events...).

I tried to do the same but changing the dongle with a nRF52-DK. The result is that in this case the system crashes and disconnects after a couple of seconds from the enabling of notifications with this log:

2020-10-20 15:17:21,194 [21092/LogThread] h5_decode error, code: 0x802c, H5 error count: 1. raw packet: c0 f0 0e 02 00 02 39 00 00 00 00 00 00 00 1c 00 01 12 00 f7 03 71 03 d0 02 de 02 fe 02 0e c0 

As a quick workaround, If I try to queue notifications every 20ms (or anything above the connection interval), I don't see any problem and the system works flawlessly.

Can you help me understanding what's going on?

I'm evaluating this python libraries for the system testing framework of our devices - and of course we'll need to evaluate also the high-rate notifications.

I think reproducing the issue would be quite trivial, but maybe there is something I'm missing that can solve this straightaway.

Thanks!

Parents
  • Hello,

    I start receiving as a return value from sd_ble_gatts_hvx the error code NRF_ERROR_RESOURCES on the server side - it's not always the case, but very often, at least 10/20 times per second.

    An exempt from the sd_ble_gatts_hvx API Reference reads:

    NRF_ERROR_RESOURCES Too many notifications queued. Wait for a BLE_GATTS_EVT_HVN_TX_COMPLETE event and retry.

    I suspect in this case that you are indeed queueing notifications faster than you are able to send them, filling the available TX queue. My suspicion is strengthened when you say you are queueing a notification every 10 ms, with a connection interval of 15 ms. If you expect that some notifications might require a retransmit then we may also try to increase the TX queue to resolve the issue ( as long as notifications are being sent faster than they are queued on average ).

    The only explanation I can give to this is that the Central (i.e. the python script) is not meeting all the connection events at fixed interval (15ms), and this way the notification queue on the Server is increasing till its limit.

    This is a good consideration - and there are multiple reasons why this might be the case, for example if you are communicating in an environment with massive 2.4 GHz interference, or over a very long range. I would however first look at the connection parameters, and how often notifications are sent successfully.

    Are you familiar with the nRF Sniffer tool? It is a powerful tool when developing with BLE, which lets you monitor the on-air BLE traffic. You could use this to check whether or not one of your devices are skipping some connection intervals, for when we have exhausted the connection-parameter approach.

    that for sure notifications are lost

    BLE connections are loss-less - if a packet is not ACK'd, it is retransmitted.

    I have a feeling I might have misunderstood your situation and issue, in which case a sniffer trace from the nRF Sniffer would be very helpful to see the whole picture.
    Please do not hesitate to let me know if I have misunderstood your description, or if any part of my reply should be unclear.

    Looking forward to resolving this issue together!

    Best regards,
    Karl

  • Hi ,

    thanks for helping out!

    I suspect in this case that you are indeed queueing notifications faster than you are able to send them, filling the available TX queue. My suspicion is strengthened when you say you are queueing a notification every 10 ms, with a connection interval of 15 ms. If you expect that some notifications might require a retransmit then we may also try to increase the TX queue to resolve the issue ( as long as notifications are being sent faster than they are queued on average ).

    I'm not sure I follow here.
    There's an internal buffer for notifications on Server side (the embedded device).
    So if on the Server I'm queuing data every 10ms with an interval of 15ms I'm just expecting on the Client side to be notified about up to 2 notifications for each event, which should be acceptable.
    BTW packets are just around 20 bytes, whilst the MTU is set to 131 bytes (Data Length Extension 135 bytes).

    Of course when I get the error NRF_ERROR_RESOURCES I could just retry to send, but this is not removing the fundamental problem.

    This is a good consideration - and there are multiple reasons why this might be the case, for example if you are communicating in an environment with massive 2.4 GHz interference, or over a very long range. I would however first look at the connection parameters, and how often notifications are sent successfully.

    What I'm puzzled with is that exactly the same HW configuration and environment (but using nRF Connect Desktop as a notifications collector) is instead working without any problem. I'm transmitting from few centimetres, so I don't think the issue is there.
    I believe the intervals graph I attached in the original post are quite descriptive in this sense.


    This makes me think the issue is on the Client side rather on the way the Server is queuing notifications, as the other tests I did with different entities collecting the notifications are not having the same issue.

    Is it clearer now?

    BLE connections are loss-less - if a packet is not ACK'd, it is retransmitted.

    Is this true also on notifications? I know indications works differently. Please let me know if there's any documentation you suggest to understand these details at protocol level Slight smile

    As for the sniffer - that is definitely the next step for this.

    I'd just need a bit of time to set this up as I never did it, but probably can provide some more info.

    Is there any other quick tests you suggest to at least isolate the problem in a particular area?

    Thanks!

  • Hi ,

    is there any progress on this?

    Let me know if I may help or if some reason it was not possible to reproduce the problem on your end.

    Thanks,

    D

  • Hello again Davege,

    I am terribly sorry for the very long delay in communications from my side on this.

    I have allocated time today to work through this issue. I will update you by the end of the day at the latest.
    I am again terribly sorry for the long delay on this issue.

    Best regards,
    Karl

  • Hello Davege,

    Thank you for your patience.
    I have today successfully replicated the issue with the setup you described on my end.
    I have allocated time tomorrow to keep working on resolving the issue.

    Best regards,
    Karl

  • Hello again Davege,

    Know that I have not forgotten this issue, and I am still working on it.
    Unfortunately, this issue has taken longer than I had anticipated to resolve.

    I will update you as soon as I have something.

    Best regards,
    Karl

  • Hi,

    thanks a lot for your effort in this.

    Do not worry at all, as long as I know there's some activity to understand what's happening, I'm good.

    Looking forward to hearing any news.

    Best,

    D

Reply Children
  • Hello davege,

    Thank you for your continued patience. I was unfortunately out of office for some days again.

    I have continued testing today, and can now at least rule out that the issue is with the UART transport layer.
    I have also spoken with one of the developers of pc-ble-driver-py today, and he will check if this could be caused by a limited RX buffer in the precompiled connectivity firmware.
    I will update you as soon as he gets back to me about this.

    davege said:
    thanks a lot for your effort in this.

    It is no problem at all davege, I am happy to help!

    Best regards,
    Karl

  • Hi ,

    thanks again for your update.

    In my opinion, it is very unlikely that the problem is on the connectivity firmware running on the DK/dongle.

    The reason is that exactly the same dongle running the same FW is not showing the issue when subscribing to notifications from the char via nRF Connect Desktop.

    My feeling is that there is something wrong during the configuration of the board... possibly some settings hardcoded in the serialisation layer transporting data from the DK to the script, when opening the link.

    Thanks again for keeping me posted, I'm really curious to understand what's happening :)

    Best,

    D

  • Hello again Davege,

    davege said:

    In my opinion, it is very unlikely that the problem is on the connectivity firmware running on the DK/dongle.

    The reason is that exactly the same dongle running the same FW is not showing the issue when subscribing to notifications from the char via nRF Connect Desktop.

    I fully agree with this reasoning.

    davege said:
    My feeling is that there is something wrong during the configuration of the board... possibly some settings hardcoded in the serialisation layer transporting data from the DK to the script, when opening the link.

    I too think that this might be an issue here. I have done some more testing, and it seems that the default ( used in the HRM collector example ) RX/TX window of the pc-ble-driver-py is 1310µs compared to the pc-ble-driver which defaults to 2120µs. However, both of these windows are big enough for multiple packets - so I do not immediately see this as a reason for the single-packet-events of the pc-ble-driver-py. However, I have noted it as the main difference in the connection configuration between the two drivers, in the notes I have made for the developers.

    I have created an internal ticket on this with the developers of the pc-ble-driver-py, and I hope to hear from them soon on this issue.
    Unfortunately, I am unable to estimate when they will have time to review the case exactly.
    Is this issue a blocker for your current development? If so, is there an option for you to use the pc-ble-driver instead, for the time being?
    Reviewing the forum for mentions of the pc-ble-driver-py's performance I have come across some posts claiming that it is significantly slower ( lower throughput ) than the pc-ble-driver. I guess this is somewhat to be expected, since the pc-ble-driver-py is an abstraction on top of the existing pc-ble-driver, but nevertheless it is something to consider.

    davege said:
    Thanks again for keeping me posted, I'm really curious to understand what's happening :)

    It is no problem at all - I too am curious to see what the root cause of this behavior should turn out to be! :)

    Best regards,
    Karl

  • Hi ,

    thanks for this.

    Unfortunately neither myself or my team at the moment has the capacity to move the testing framework implementation to pc-ble-driver, which I used in the past and I can confirm it works great.

    So hopefully I'll just wait here to understand if there's any movement.

    Do you believe that marking this ticket as private can be useful to push things a little bit?

    My company is using Nordic chips for different products, so we are already in contact.

    Thanks,

    D

  • Hello again Davege,

    davege said:

    Unfortunately neither myself or my team at the moment has the capacity to move the testing framework implementation to pc-ble-driver, which I used in the past and I can confirm it works great.

    So hopefully I'll just wait here to understand if there's any movement.

    I totally understand, no worries at all. We will get to the bottom of this.

    This might seem trivial, but could you add the following modification to your code, to see if it allows you to receive multiple notification packets per connection event?
    If you are using the heart_rate_collector example, this code block should be inserted in the collector class's open function, in the NRF52 section.
    It should look like this:

        def open(self):
            self.adapter.driver.open()
            if config.__conn_ic_id__.upper() == "NRF51":
                self.adapter.driver.ble_enable(
                    BLEEnableParams(
                        vs_uuid_count=1,
                        service_changed=0,
                        periph_conn_count=0,
                        central_conn_count=1,
                        central_sec_count=0,
                    )
                )
            elif config.__conn_ic_id__.upper() == "NRF52":
                gatt_cfg = BLEConfigConnGatt()
                gatt_cfg.att_mtu = self.adapter.default_mtu
                gatt_cfg.tag = CFG_TAG
                self.adapter.driver.ble_cfg_set(BLEConfig.conn_gatt, gatt_cfg)
    
                conn_cfg = BLEConfigConnGap()
                conn_cfg.conn_count = 1
                conn_cfg.event_length = 320
                self.adapter.driver.ble_cfg_set(BLEConfig.conn_gap, conn_cfg)
    
                self.adapter.driver.ble_enable()

    You will also need to add the BLEConfigConnGap method to the list of globals, imported from driver, which should then look like this:
        global config, BLEDriver, BLEAdvData, BLEEvtID, BLEAdapter, BLEEnableParams, BLEGapTimeoutSrc, BLEUUID, BLEConfigCommon, BLEConfig, BLEConfigConnGatt, BLEConfigConnGap, BLEGapScanParams
        from pc_ble_driver_py import config
    
        config.__conn_ic_id__ = conn_ic_id
        # noinspection PyUnresolvedReferences
        from pc_ble_driver_py.ble_driver import (
            BLEDriver,
            BLEAdvData,
            BLEEvtID,
            BLEEnableParams,
            BLEGapTimeoutSrc,
            BLEUUID,
            BLEGapScanParams,
            BLEConfigCommon,
            BLEConfig,
            BLEConfigConnGatt,
            BLEConfigConnGap,
        )



    The only difference here is that the default event_length parameter is not used, instead, the maximum event_length for the given connection interval is set explicitly.
    If you could test this with your own test-script and let me know if you succeed in getting multiple packets per event, that would be great.

    If it does not enable you to send multiple notifications per connection event, could you provide a trace of the test, with the code above added?

    I will continue to look into this to find the required event_length analytically - instead of just setting it to the maximum value.

    davege said:
    Do you believe that marking this ticket as private can be useful to push things a little bit?

    No, this will not affect the internal requests priority - since the only difference between private tickets and public ones is who may view it.
    Private tickets is of course only viewable by the tickets creator and the Technical Support staff here at Nordic.

    Looking forward to hearing the results of your test!

    Best regards,
    Karl

Related