Packet loss during ISO transmission when advertising

Hi,

Bug:

With a connected ISO connection between two devices X and Y:

  • X acts as a peripheral. It's the ISO server and it is transmiting iso data to Y.
  • Y is multirole. It connects to X and establish an ISO connection to receive ISO data.

As long as Y only receives data, the communication is stable (no packet loss). As soon as Y advertises (bt_le_adv_start), Y does not receive all packets.

  • Increasing advertising interval, decrease packet loss.
  • Decreasing advertising interval, increase packet loss.

Environment:

  • OS: Linux
  • Toolchain zephyr-sdk-0.16.8
  • NRF5340DK
  • NCS 2.7 (sdk-nrf v2.7.0 and zephyr v3.6.99-ncs2)

How can solve this issue, which causes too many packet loss ?

Thank you

Parents
  • Hi,

    When advertisign while in an ISO connection there will be collisions from time to time. And in this case, the stack prioritizes the advertising packets. To reduce the ISO packet loss you could increase the number of ISO retransmissions, and ansure you use an advertising interval that is as high as possible.

  • Hello,

    We are migrating from NCS 2.1 with Packet Craft firmware (CPU NET) and we did not have this problem (at least it was not so significant).

    With the configuration described in my first message, I advertised using different parameters (provided to bt_le_adv_start):

    • No advertising: No packet loss => 0% loss
    • BT_GAP_ADV_FAST_INT_MIN_1 (30 ms) and BT_GAP_ADV_FAST_INT_MAX_1 (60 ms): 568 packets lost (out of 3425) => ~17% loss
    • BT_GAP_ADV_FAST_INT_MIN_2 (100 ms) and BT_GAP_ADV_FAST_INT_MAX_2 (150 ms): 226 packets lost (out of 3400) => ~7% loss
    • BT_GAP_ADV_SLOW_INT_MIN (1 s) and BT_GAP_ADV_SLOW_INT_MAX (1.2 s): 21 packets lost (out of 2800) => ~1% loss

    Looking at these numbers it seems that the advertising almost always causes an ISO packet loss.


    Even though we were to use the best case scenario (advertising parameters between 1 and 1.2s), we would have a minimal 1% loss, which is too much for our use case.


    With Packet Craft, we used the advertising parameters between 30 and 60ms, and we were clearly below 1% packet loss. So I don't think it's normal to have this today, and it seems more like a regression.

    What do you think ?

    Thanks

  • Hi,

    thomas_hexploy said:
    increasing retransmission and latency improve the situation however with my configuration

    That is good.

    thomas_hexploy said:
    If advertising takes 4.5ms, with retransmission set to 2 and latency set to 20ms, it should be enough to cover all collision with advertising. Isn't it ?

    In principle, yes. However, legacy advertising packets have a random offset of 0-10 ms per the Bluetooth sepcification, so you cannot schedule it exactly where it would fit in between the ISO packets.

    thomas_hexploy said:

    In this section it is stated that "For optimal scheduling, the periodic advertising interval and ISO interval should have a common factor, and the sum of the periodic and extended advertising timing-event lengths should be less than the BIG reserved time".

    So, do you think we could reduce the impact of advertising by configuring these ?

    This applies to periodic avertising only, but as that was not mentioned before I assumed you are using only legacy (normal) advertising. If that is the case(?), this is not relevant.

  • Hello,

    Thank you for your response. However, there are two points I'd like to clarify:

    I don't quite understand why these 10ms would cause a collision. From my perspective, this is only a delay and should not interfere with transmission or reception. If we consider the minimum advertising interval allowed by the specification, which is 20ms, this would imply that up to 33% (10ms random delay divided by 30ms, which is the total advertising duration, 20ms + random delay) of the bandwidth might be unavailable. Therefore, in my understanding, this delay should not cause collision with any reception or transmission.(I used the information provided in "Legacy advertising" section of https://www.bluetooth.com/blog/periodic-advertising-sync-transfer/)

    How is this implemented in the SoftDevice ? Can you confirm that the ~10ms delay is blocking any transmission/reception ?

    Even if we treat the 10ms delay as part of the advertising, it would suggest that advertising takes 14.5ms, which is still under 20ms (the time between 3 retransmissions). So, I don’t understand how a packet could be lost with 3 retransmissions and a 40ms latency (as I tested). The following diagram illustrates my perspective (I simplified it, focusing on lost packet):


    Could you please point out where I might be mistaken?

    Thank you.

  • thomas_hexploy said:
    How is this implemented in the SoftDevice ? Can you confirm that the ~10ms delay is blocking any transmission/reception ?

    No, it is not. Advertising blocks for about 4.5ms per advertising event. But due to the random offset this is not consistent so you cannot schedule the advertising events to always be between other activity. And the ISO transmissions happen at a fixed interval. So as you have one fixed interval (ISO) and one gliding and varying interval (advertising) whese will collide from time to time in a non-deterministic way.

  • I have updated my diagram according to your previous message:



    If I follow correctly, with 2 retransmission and a latency of 20ms, we shouldn't have any loss. This is what is represented on the diagram.

    So, can you explain to me, why 2 retransmission and a latency of 20ms does not solve this packet loss issue ?

    Thank you

  • Hi,

    Sorry for chiming in. As you already know, with an advertising interval set to 20ms, the actual advertising interval could range from 10ms to 30ms due to the random offset. So in the updated diagram, it is possible that the retransmitted Frame B is also blocked by the advertising activity, when the actual advertising interval is ~10ms. Since the retransmission number (RTN) is set to 2, and Frame B has been transmitted twice without a success, Frame B will be flushed, resulting in a packet loss.

    For reference, the basic audio profile (BAP) for Bluetooth low energy defines two sets of quality of service configurations: low latency and high reliability (See 5.6.2 QoS Configurations in https://www.bluetooth.com/specifications/specs/basic-audio-profile-1-0-1/). There you can see a RTN of 2 is still in the realm of "low latency". In your application, to ensure a high reliabitliy without a high packet loss rate, I'd recommend a much higher RTN.

    As a side note, the Softdevice Controller would select flush timeout (FT) and number of subevents (NSE) based on the given RTN and max transport latency. I wouldn't go into details about FT and NSE here to keep the answer short. However, the Softdevice Controller would always prioritize max transport latency over RTN, as by the Core Specification, the RTN is only a recommendation, not a mendatory requirement. Therefore, I would suggest setting a max transport latency that your application can accept first and foremost, and then increase the RTN to achieve better reliability.

    Hope this helps!


    Cheers,

    Yuxuan

Reply
  • Hi,

    Sorry for chiming in. As you already know, with an advertising interval set to 20ms, the actual advertising interval could range from 10ms to 30ms due to the random offset. So in the updated diagram, it is possible that the retransmitted Frame B is also blocked by the advertising activity, when the actual advertising interval is ~10ms. Since the retransmission number (RTN) is set to 2, and Frame B has been transmitted twice without a success, Frame B will be flushed, resulting in a packet loss.

    For reference, the basic audio profile (BAP) for Bluetooth low energy defines two sets of quality of service configurations: low latency and high reliability (See 5.6.2 QoS Configurations in https://www.bluetooth.com/specifications/specs/basic-audio-profile-1-0-1/). There you can see a RTN of 2 is still in the realm of "low latency". In your application, to ensure a high reliabitliy without a high packet loss rate, I'd recommend a much higher RTN.

    As a side note, the Softdevice Controller would select flush timeout (FT) and number of subevents (NSE) based on the given RTN and max transport latency. I wouldn't go into details about FT and NSE here to keep the answer short. However, the Softdevice Controller would always prioritize max transport latency over RTN, as by the Core Specification, the RTN is only a recommendation, not a mendatory requirement. Therefore, I would suggest setting a max transport latency that your application can accept first and foremost, and then increase the RTN to achieve better reliability.

    Hope this helps!


    Cheers,

    Yuxuan

Children
  • Hello,

    My advertising is 1 second, not 20ms.

    Sorry for the confusion it was only to amplify what I didn't understand in one post. You can forget this value of 20ms.

    So since my advertising is (about) every second, my diagram is valid (I think) and I don't understand why the parameters (rtn:2, latency: 20ms) do not work. Can you provide me with an example where this doesn't work ?

    If possible, can you provide me with the correct parameters to have no packet loss with 1s advertising, 10ms ISO interval and 128 bytes payload ?  (In my opinion, the correct parameters are predictable)

    Thank you

  • Hi,

    Here is another possibility.

    The ISO reception could be blocked by the ACL connection. To establish a CIS, you shall have an ACL connection by the spec. These two connections have individual intervals, and it is possible/expected that they would interfere with each other. What is the connection interval you are using for the ACL connection? It is recommended to use an ACL connection interval larger than ISO interval (say 60ms or 70ms when the ISO interval is 10ms) to reduce the inteference. Such interference might already exist before, and it was not noticed because of the retransmission. With the advertising activity of the peripheral, the radio becomes busier and thus the packet loss is noticed.

    It is not guaranteed to work but could you please try a RTN = 13 with a max transport latency >= 50 ms? This allows transmitting the same ISO packet across 5 ISO intervals, in which each interval the packet could be retransmitted 3 times.

    Also, it would be nice if you have any sniffer logs that we could look into. 


    Cheers,

    Yuxuan

  • Hello,

    So this is my current configuration:

    • Advertising: 1/1.2s
    • ACL: 450/750 ms
    • RTN: 13
    • Latency: 100 ms

    In theory, we should have no packet loss with this configuration (related to Soft Device internal mechanism/scheduling).

    The result of my tests shows:

    • Increasing latency and RTN reduces packet loss, I now see 0.1% packet loss instead of 1%. However, it's still too much loss for a "best case scenario" (2 boards next to each others).
    • Increasing ACL connection is not reducing packet loss.

    I'm sorry to insist, but I need a configuration with 0% packet loss in the "best case scenario". This is mandatory for our product to have the packet loss to a minimum (even though we know we can't have 0% packet loss when deployed).

    To my knowledge, NRFSniffer does not support Isochronous connection. If it's not supported, I can provided you the code I run to reproduce it on your side. That would allow you to sniff the traffic on your side.

    As a reminder, if I remove advertising I see absolutely no packet loss. (Even with 30/60ms ACL connection time)

  • Hi,

    The ACL connection interval you are using seems a bit too large. Could you try a smaller value of 60 or 70ms?

    Due to clock drifts between devices, the peripheral will perform window widening to ensure it can receive packets from central. Simply put, the larger connection interval it is, the larger window widening it will be. This window widening on ACL connection may block the CIS packet reception.


    Cheers,

    Yuxuan

  • Hi,

    I tried with 60/70ms ACL connection time, I still have the same packet loss.

    This is my understanding of the current problem:

    Advertising is 4.5ms, so in our case this is the duration of about 3 ISO frames (3 * 1.5ms). It means that 4 retransmissions should be sufficient to cover the collisions. I don't know how much time takes ACL connection, but let's say 1.5ms (feel free to correct me), it would mean 5 retransmissions in the worst case scenario.

    So, with 10 RTN and a latency of 50ms, I don't see what can cause packet loss ? I think it should cover every collision.

    I don't understand why the worst case scenario (longest duration where ISO frames can't be sent because of collision) isn't known and why a configuration (retransmission and latency) is not possible to remove entirely this packet loss.

    We don't have specific requirements on latency and retransmission (as far as it's reasonable), if it allows no packet loss. We are in connected mode (CIS).

    Thank you.

Related