400ms delay between ATT Write request and response

Hi, 

On a nrf52840 running SDK 17.0.2 and Softdevice 7.2, while being connected to two devices in central and peripheral role, I observe that on the connection 
where the nrf is in peripheral role, a whopping 400ms delay between an ATT write request and its response (detected and measured with wireshark)

In the Message sequence diagram "GATTS write request without authorization"  I get the impression that the response is sent "immediately"  by the softdefice,
where that immediately intuitively is in the 20ms range rather than 400ms..

My questions are:

- What influences the observed delay (I suspect connection parameters a bit..)

- Can I have influence on this in such way that the delay is reduced?

I can see that if I disconnect the peer in peripheral role (where the nrf is central) the delay drops to some 20-50ms.

Thank you, jmr.

  • Hi 

    It would definitely be interesting to know the connection parameters used, both for the central and peripheral connection. 

    Is the SoftDevice also advertising or scanning while the two connections are active?

    When the SoftDevice has multiple connections ongoing there will be occasional conflicts between links, in which case one of the links will skip a connection event in a round robin scheme where the different connections are prioritized in turn. 

    As for the message sequence chart it is interesting to see that the response happens right away. Normally any ATT request will have to be handled by the application, which means you will not be able to get the response on the same connection event that you receive the request. 
    I will check with the SoftDevice team tomorrow and get a clarification on how this works. 

    Best regards
    Torbjørn

  • Hej Torbjørn,

    Thanks for your quick reply.
    Connection Parameters  (min_conn_interval in ms, max_conn_interval in ms, latency, conn_supervision_timeout in ms ) for the two peers are:

    peer1 (nRF in Central Role) : 187.5, 437.5, 3, 6000 
              peer1 requests multiple changes to the connection parameters which I adhere to using sd_ble_gap_conn_param_update. Above set is the
              last one.

    peer2 (nRF in Peripheral Role): 20, 20, 0, 1000
              above set is what peer2 sets (Values are from softdevice within connect event)

    I might need to add that each of the connections run under its own connection-tag. I'm not sure about this but I guess this is mandatory if connection parameter negotiation should work properly.. (please correct me if I am wrong).

    The Softdevice is not advertising nor scanning while the two connections are active.

    You said "Normally any ATT request will have to be handled by the application" .. that puzzles me a bit, since my service for peer2 is very similar to ble_nus service where BLE_GATTS_EVT_WRITE just results in calling back the app with the received data (or enable/disable notification/indication in case the write targets a cccd). I can't see any call to the softdevice triggering it to send a write response. So that must be happening automatically within the SoftDevice..

    my wireshark-capture (between nRF(Slave) and peer2(Master) )looks like this:

    91968    10m 22.404261s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91969    10m 22.405258s    Slave_0x8a1d2c95    Master_0x8a1d2c95    LE LL    26    Empty PDU
    91970    10m 22.406256s    Master_0x8a1d2c95    Slave_0x8a1d2c95    ATT    53    Sent Write Request, Handle: 0x0010 (Unknown: Unknown: Unknown)
    91971    10m 22.407253s    Slave_0x8a1d2c95    Master_0x8a1d2c95    LE LL    26    Empty PDU
    91972    10m 22.424146s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91973    10m 22.443692s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91974    10m 22.463986s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91975    10m 22.483477s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91976    10m 22.503210s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91977    10m 22.524447s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91978    10m 22.544247s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91979    10m 22.565155s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91980    10m 22.584530s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91981    10m 22.604562s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91982    10m 22.625009s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91983    10m 22.644495s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91984    10m 22.665219s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91985    10m 22.684184s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91986    10m 22.704270s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91987    10m 22.724530s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91988    10m 22.744600s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91989    10m 22.764448s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91990    10m 22.784021s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91991    10m 22.804374s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91992    10m 22.824346s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91993    10m 22.844001s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91994    10m 22.845988s    Slave_0x8a1d2c95    Master_0x8a1d2c95    ATT    31    Rcvd Write Response, Handle: 0x0010 (Unknown: Unknown: Unknown)
    91995    10m 22.863943s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU
    91996    10m 22.883886s    Master_0x8a1d2c95    Slave_0x8a1d2c95    LE LL    26    Empty PDU

    Most of the empty PDUs in-between have wrong sequence number, maybe this hints to something...

    I'm looking forward to read more expert insights from you, thank you in advance.
    I'll be on holiday next week, so please don't spoil your weekend with my riddle and bear with me if you observe some delay in my response. :-)

    Best regards, Jørn.

  • Hi Jørn

    Thanks for sharing the details. Based on the sniffer trace it should be possible to pin point exactly the connection interval used. The central will pick a number in between min and max if possible, or ignore the request and pick a number outside this range. When the central device is an nRF device you should always get the connection interval that you ask for (many mobile phones are more picky about which connection parameters they will accept). 

    jmr said:
    I might need to add that each of the connections run under its own connection-tag. I'm not sure about this but I guess this is mandatory if connection parameter negotiation should work properly.. (please correct me if I am wrong).

    I assume you are referring to the connection handle?

    This number will always be unique for each connection, and is used to identify a specific connection for many of the SoftDevice API's. 

    For instance all the GATT/ATT API's are connection specific, and when you send receive data over GATT you always need to specify which connection you want to interact with. 

    jmr said:
    You said "Normally any ATT request will have to be handled by the application" .. that puzzles me a bit, since my service for peer2 is very similar to ble_nus service where BLE_GATTS_EVT_WRITE just results in calling back the app with the received data (or enable/disable notification/indication in case the write targets a cccd). I can't see any call to the softdevice triggering it to send a write response. So that must be happening automatically within the SoftDevice..

    When exchanging data over Bluetooth you always have the option of using either commands or requests, where the main difference is that a request needs to be responded to by the Bluetooth host (via the application in some cases), while a command does not. 

    This makes commands a lot more efficient in that you can send multiple of them in a single connection event, without having to wait for the host and/or application to send a response. 

    For this reason most data exchange happens using commands rather than requests. 

    The ble_app_uart uses notifications for server-to-client communication, and write without response for client-to-server communication. Both of these are command based operations.

    A critical aspect of the Bluetooth link layer is that it ensures data integrity by constantly checking the sequence number, and retransmitting any lost packets, so using requests is not necessary to ensure that a packet is successfully received by the peer device. 

    jmr said:
    Most of the empty PDUs in-between have wrong sequence number, maybe this hints to something...

    Any chance you could save the trace to file and share it with me? 
    It is possible to drag and drop files into your response in order to attach it to the case. Then it is much easier for me to look into it. 

    If you see the wrong sequence numbers it is usually a sign that the sniffer is not picking up all the packets. In order to ensure a good trace I would recommend keeping the sniffer in between the two devices, and keep them all relatively close together. 

    jmr said:
    I'm looking forward to read more expert insights from you, thank you in advance.
    I'll be on holiday next week, so please don't spoil your weekend with my riddle and bear with me if you observe some delay in my response. :-)

    Take your time, and enjoy your holiday Slight smile

    Work-life balance is important here in Norway, so I rarely work in the weekends unless absolutely necessary ;)

    Best regards
    Torbjørn 

  • Hej Torbjørn,

    Thanks for the insights and your extensive reply.
    Your input has led me into the right direction.

    So after quite a bit of digging and reading, I found the reason to why I suddenly lost communications with peer2:

    the mentioned connection intervals with peer1 and peer2 do not work well with the softdevice scheduler, which leads to dropped packages and the observed lost communications.

    With peer2, the connection parameters get set by the remote peer (as it is central, 20,20,0,1000 as mentioned in a previous message) and since we "talk" a lot with peer2, we keep this parameters and do not request an update. (I tried that but however failed for some other reason most probably related to peer2's internals)

    Peer1 is a sensor in peripheral role which requests the connection parameters to be changed a few times, finally asking for (187.5, 437.5, 3, 6000).

    originally, my software (in central role here) would process an update request like this (taken from ble_app_multirole example):

     case BLE_GAP_EVT_CONN_PARAM_UPDATE_REQUEST: {
      // Accept parameters requested by peer.
      err_code = sd_ble_gap_conn_param_update(p_gap_evt->conn_handle,
                 &p_gap_evt->params.conn_param_update_request.conn_params);
      APP_ERROR_CHECK(err_code);
    } break;

     Forwarding (187.5, 437.5, 3, 6000) as requested to the softdevice, as in the code above, the softdevice will confirm the request and set the connection interval to 437.5. Now this is a bad pick, which results in lost or inacceptably delayed packets and eventually a commmunications breakdown with peer2.

    BECAUSE

    Chapter 15 in S140_SDS_v2.1.pdf  (Or in infocenter/sds_s140/SDS/s1xx/multilink_scheduling/suggested_intervals_windows) states quite clearly, that connection intervals must be carefully chosen to avoid collisions and / or dropped packages.
    And a careful choice would be (in my case) that connection intervals of all connections are multiples of a common base. Which is with 437.5 and 20 clearly not the case. So eventually, there are package-timing collisions in softdevice which leads eventually to communications breakdown.

    MY SOLUTION was to
    change the conn param update request procedure from above to something that instead of passing the requested parameters directly to SoftDevice, would find a "good" value within the requested min/max and set that as the new connection interval. So from the (187.5, 437.5, 3, 6000) request, we find a value close to 437.5ms which is a multiple of 20ms. So the SoftDevice will get to change parameter to (400, 400, 3, 6000) and that solves my problems with peer2.

    Of course, this fix only solves my current situation, and by adding more connections (or adding simultaneous scanning while advertising and maintaining a connection) things get worse. One needs to keep a finger on Chapter 15 in softdevice spec. while working with multiple simultaneous connections (and/or attempts).
    Also worth reading is a post by Edvin on topic Connection Parameter Selection for Central Device with 10 Peripheral Simultaneous Connections which covers some of the same troubles.

     What I am missing is feedback from the softdevice scheduler to detect such conflicts. I thought setting NRF_SDH_*_LOG_LEVEL to DEBUG severity would give me such insight, but it doesn't.

    Maybe I am missing something and there IS feedback from the SD-Scheduler?

     Ah, and I forgot to mention.. the observed 400ms delay (this issue's topic) is most probably due to the scheduler having deferred the response due to being occupied with processing connection with peer1. But of course, without knowing scheduler details (and especially its scheduling strategy) this is only me guessing. Maybe some of you Torbjørn or any other nRF cracks can confirm that.

    hilsen, Jørn.

  • Hi Jørn

    It seems a bit odd that using a connection interval on the central link that is a multiple of the connection interval on the peripheral link will make such a big difference. Have you done testing to verify that using 400ms over 437.5ms provides a significant and consistent improvement?

    Peripheral links are inherently asynchronous to the timing of the central links, and will drift in and out of 'conflict' over time. The only difference of choosing a connection interval that is a multiple is that you will have a more predictable behavior: Either the central link timing will be in conflict with the peripheral link timing consistently on every connection event, or the timing of the two links will be so much out of phase that they never get in conflict (at least until they drift back into conflict). 

    When two connections have conflicting timing the SoftDevice will prioritize the links in turn, as mentioned earlier. 

    jmr said:
    Forwarding (187.5, 437.5, 3, 6000) as requested to the softdevice, as in the code above, the softdevice will confirm the request and set the connection interval to 437.5. Now this is a bad pick, which results in lost or inacceptably delayed packets and eventually a commmunications breakdown with peer2.

    Packets should never get lost when using the SoftDevice, unless you get a disconnect (which is what happens if you have a total communications breakdown). 

    The critical aspect is that you have a connection supervising timeout that is long enough to account for the connection interval and slave latency used. 

    One thing that is interesting is the fact that the remote peripheral requests a slave latency of 3. 

    With a connection interval of 437.5ms and a slave latency of 3 the peripheral is allowed to only send a packet once every 4 connection intervals, or every 1.75 seconds. 

    Could this be related to the issues you are seeing?

    Could you try a new test where you accept the min and max connection interval set by the peer, but simply set the slave latency to 0, and see if it works better?

    Best regards
    Torbjørn

Related