Multirole (C and P) device misses connection events

Hi all, I am observing some strange behaviour on my two nRF52840 devices. They are running NCS v2.1.0 (Zephyr 3.1.99), one as central and peripheral, one as peripheral only. They both have BLE parameters similar to those inthe nRF throughput example.

The central is connected to the peripheral, as well as to another central (nRF52 DK). The connection params on both connections are 100ms, timeout 4s, latency 0. Data is sent from the peripheral (A) to the central (B), and from that to the DK (C). With a logic analyser I am measuring radio activity on A and B (RX_READY to DISABLED on A, TX_READY to DISABLED on B) and whenever there is a confirmation of BLE message sent/received (let's call it BLE confirmation) depending on the role (sent callback on A, received callback on B).

For most of the time, the connection seems healthy, and on the logic analyser this translates to the BLE confirmation signal and the radio activity signal occuring very closely to each other on each A-B connection event, apart from a missed packet let's say every 40 packets. However, after some time (sometimes ~20 minutes, it varies), during which data is transmitted continuously between the three devices, the BLE confirmations on A and B seem to half (roughly both synced confirmations happening every 200ms rather than every conn event), and the two devices miss packets and start queueing them up. This continues for a while (10 minutes maybe) and then automatically goes back to a healthy connection.

While looking closely at what goes on during this unhealthy state, I noticed that it is B that just isn't showing up to the connection events with A. A shows up, I can see the radio activity, but on B there is no radio activity during that event. This occurs more or less once on every two A-B connection events, but sometimes more often. However, B still shows up to every connection event with C and maintains what seems to be a reliable connection with C all throughout this state.

What could be causing this?

In the mean time I will run a similar setup while replacing A and B with two nRF DKs to isolate the antenna design element from the root cause.

  • Just to provide an update on this: the issue still occurs when using two nRF52840 DK as A and B.

  • Another update: after some analysis, the issue is clearly caused by the connection events for the two connections (A-B, B-C) drifting and eventually overlapping. When they overlap, the A-B event seems to be "sacrificed" for half of the conn events, until enough time passes for the the two events to drift apart from each other and a healthy exchange resumes.

    Some more useful information on this side: I am using an external 20ppm crystal and the auto-update of conn params is disabled:

    CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=y
    CONFIG_CLOCK_CONTROL_NRF_K32SRC_20PPM=y
    CONFIG_BT_GAP_AUTO_UPDATE_CONN_PARAMS=n
    I thought re-enabling CONFIG_BT_GAP_AUTO_UPDATE_CONN_PARAMS on device B might allow the B-C connection to automatically re-update itself when the two connection events clash so that the phase of the event can be shifted or the conn parameters renegotiated, but this did not happen.
    One of the reasons I chose 100ms for both connections is to avoid issues like this occurring, but it's clear I need to take the drift of the two connections into account on device B. Ideally, I keep the B-C connection event within a certain window, semi-synchronised with the A-B conn events.
    1) Are there any NCS/Zephyr techniques to avoid the clash of connection events on a multirole device?
    2) If not, it might have to look at PPI ways. I was thinking of recalling a conn_params_update() whenever the two events get to close, but where the next event would occur in the window would be left up to chance?
    3) Is there a way to force a connection event to occur at a certain point in time rather than leaving it up to chance? 
  • I think the easiest solution would be to use different connection intervals. Do you really need exactly 100 ms on both connections? If one had for example 101.25 ms and the other 98.75 ms, that would instead result in seldom occasional clashes rather than constantly overlapping clashes.

  • Hi David, 
    I assume you are using Nordic BLE controller ? 

    If you have a look at the scheduling documentation here, you can find somewhat the explanation for the behavior you observed. By default BLE connection event should have same priority and when one connection is about to timeout due to too much preemption it will have first priority. 

    However, I'm not so certain about the drifting explanation. If there is a drifting, both connection should be drifted. And why would B-C has higher priority than A-B. 

    I agree with Emil suggestion that you can try to make the connection interval slightly different to see if you have the same issue. I would assume collision still occurs but it will not last for a long time as you are seeing in your case. 

  • Consider the case when there are three devices A, B, C with two connections:

    1. A (peripheral) <-> B (central)

    2. B (peripheral) <-> C (central)

    and both connections have equal connection interval.

    Since the timing is based on the central's clock in a connection, B's clock will drive the first connection and C's clock will drive the second connection. These two clocks are not perfectly synchronised but will drift with respect to each other. It is therefore possible that the connection events at B for the two connections for some time always clash, until they drift apart. Having unrelated connection intervals solves this issue, since if two connection events from two different connections in that case happen to overlap, they will certainly not overlap at the next connection event.

    This can be compared to the case when there is only one central, but multiple peripherals to this central. Since the central's clock drive all connections in this case, the timings will always drift the same, causing no overlaps in this case.

Related