This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE_GATTC_EVT_TIMEOUT: Possible causes?

Hello,

I am still having troubles with the BLE_GATTC_EVT_TIMEOUT. The problem is that the error is not happening frequently and is hard to reproduce. It is also hard to attach a sniffer to each connection (and the nRF sniffer doesn't even work reliably) Here is what I have:

  • Scatternet of 8 Beacons running as concurrent central and peripheral
  • I send Write Requests but after a while the communication stalls for 30 seconds
  • I receive a GATTC timeout

I do have one beacon that will produce a similar fault very realiable. All Beacons are nRF51 Dongles of newer age with the S130 v2. It will connect to other beacons, the exchange a few packets, but then the link only sends empty PDUs. (THIS MIGHT NOT BE THE SAME FAULT)

image description

Is this a fault that I could potentially have made in my code? It looks like the Acknowledgement is never sent, but how could I possible interfer in this process with my code? Isn't this the SoftDevice's fault? Is it possible that I can decide to not acknowledge a write or to block it? Why is packet 155 an empty PDU, why is it not acknowledged directly?

Marius

  • Hi Marius,

    Could you summarize the issue to make it a little bit clearer.

    We are dealing with the issue with just one particular board or it happens on other boards also ? How often do you see the issue ?

    We need to narrow down the issue and find a way to reproduce it in house here.

    If you test with only 1 pair of devices (one connection at a time) can you reproduce the issue ?

    Please try to let the disconnection occur normally on the peripheral side and check the disconnected reason.

    If you have, please send other sniffer trace that captured the issue when it happens. Then we could be able to find a pattern. I suspecting it was the issue when a write response comes with a write request in a same connection event.

    Regarding your question, if you are too slow to pull all event from event buffer, the softdevice will NACK the packet. But there will still be packets from slave, not just stalled like what we see here.

  • Alright, I have to add an "answer" to describe the problem because the comments are too restricted.

    • I did a test with 8 devices that form a scatternet and they are all working normally. The error happens on all of these dongles. I used S130 v2.0.1
    • I logged all output from these 8 nodes on UART
    • I used 4 Sniffers to log connections
    • I managed to catch a few connections where the failure appeared.
    • I did not test with only two devices because I think it will take a very long time and maybe the failure will never appear.
    • Please forget about the sniffer trace discussed above because it might be another issue.

    61 <===> 53

    This is Log A from node 61 (Master, BBBBP, DD:D3:16:3A:6C:1B) This node has another connection to Node 62 which is not shown. image description

    This is Log B from node 53 (Slave, BBBBF, EF:45:B0:9A:69:77) This node has no other connections. image description

    This is the Wireshark log from this connection: image description

    Please note that there is an offset of about 18 seconds between my log and wireshark. But the screenshots show the same events.

    • Found in Beaconlog
      • @708sec, node 61 stops sending packets and has a GATTC_EVT_TIMEOUT 30 sec later because of no ACK
      • Last packet from 61 to 53 is 69:3E:00:00:00:03:02:04:C1:01:AB:F8:01:BE:FB:01:AF:FC:01:BD
      • The next packet (69:C9:01:B1:A3:01:BB:3D:00:B7:C0:01:C1:C8:01:AD:F7:01:BE:F9) is not ACKED
    • Conclusion from Wireshark dump:
      • Packet 15363 from Master is a WRITE_REQ but the SoftDevice does not wait for an Ack
      • Packet 15365 from Master is the next WRITE_REQ with the next data, but should not have been sent
      • Slave sends an ACK in 15368
      • Master sends another WRITE_REQ in 15396 which is also the last packet that is reported as sent by the stack, but an ACK is never received

    I attached the Wireshark log here:53 BBBBF EF.45.pcapng

    Could this error result from my code or is this an error from the SoftDevice?

    I could send you my firmware, but you need a few beacons (I used 8) to reproduce the error in a short time. I had 8 disconnections of this type in 30 minutes. I am currently tracking down another of these losses to find out if it follows the exact same pattern.


    EDIT: I was able to track down another incident. The beacon logs look the same, at one point, the communication stalls. In the Wireshark log, it looks similar but there is a packet that was not seen in the other dump:

    54 <==> 507

    image description

    Here the download: 507 BBBTK D0.9C.pcapng

    Here are the notes:

    • packet 21252@1072sec: CONNECT REQ

    • packet 36095@1800sec: Master sends another WRITE_REQ without having received a WRITE_RSP

      • Instead the Slave sent a Connection oriented channel??
    • packet 36619@1830sec: LL_TERMINATE which was triggered by my implementation as a reaction to the GATTC_EVT_TIMEOUT

    • Found in log

      • 54 is connected to 508 additionaly
      • 507 is connected to 61 additionaly
      • 54 receives the GATTC_EVT_TIMEOUT

    For the sake of completenes, here are my node logs:

    Node 54:

    image description

    Node 507:

    image description

  • I added an answer instead of a comment, i hope that's ok.

  • Hi Marius,

    The issue you described here might not be the same as what you observed in the original question. In the trace you provided in the question the softdevice actually stopped sending packet when in your newer trace the softdevice still sent empty packet until the connection terminated.

    They may not be the same problem.

    Regarding the new trace. I do agree that GATTC_EVT_TIMEOUT is unexpected and could be an issue with softdevice. I suspect it could be confused when there are multiple connections and are doing lots of reliable write.

    Do you use authorization for handle 0x000e or not ? If you use authorization, we can think of an issue on the application code that delayed the reply for the write request.

    About those packets that you're suspecting:

    61 <===> 53

    Packet 15363: We don't see a response for the write request earlier could be because it was sent in the previous connection event but the sniffer couldn't capture it. You can see the delta time of 20ms instead of 10ms between 15365 and 15364. Implying that there must be a packet in between. The SN and NESN also confirm this. More likely just the sniffer couldn't capture it.

    But I do see the connection termination came 30 seconds after these packets, there must be something wrong there.

    54 <==> 507

    Another glitch with the sniffer I suspect, the "Connection oriented channel" packet actually just a corrupted Write response. You can see the CRC mismatch and the content of this packet is almost identical with the normal write response. SN and NESN doesn't show that there was retransmission or anything.

    But then again the connection terminated 30 seconds after this, it's hard to explain.

    We may need to have your code to test here, I can setup 8 beacons. I would prefer to have a simplified version of your code that can show the issue, so it's easier to dig up the code.

  • I don' use authorization. I will open a support case for this in a few minutes. It is really hard to simplify the code but I could point you to the classes that make are involved in this incident.

Related