This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Losing Indication Confirmation from Central Device

Hi,

We have encountered an issue where we are transferring data using our custom service back and forth from the nrf52811 to a remote test device and the radio becomes unresponsive. The test involves the remote device sending data to our service as an ATT_WRITE_REQ, and the nrf52811 responding with more data via ATT_HANDLE_VALUE_IND. These transactions are happening repeatedly during a single connection. Data is being buffered on the nrf52811 in both directions since the transfers are larger than the MTU. The test will continue to connect, request data from the nRF52811 peripheral, then disconnect repeatedly for as long as the devices can hold up.

In the attached trace, the last thing we see is that the nRF82811 send and indication at packet 15052, to which the remote device confirms, then sends an ATT_WRITE_REQ with a portion of the data, but the nRF52811 does not respond to this event with the ATT_WRITE_RESP. It appears that the nRF52811 application is waiting to see the indication confirmation, but it never does and gets stuck there. At this point the radio becomes completely unresponsive. We cannot communicate to it over the air nor via UART to our printer host device.

We are using SDK 16.0.

NoAttWriteRsp.cfa

  • amarchand said:
    We have attempted to use the nRF Sniffer in the past with varied success. We can try again if that would be mroe helpful to you.

     As I said, it is at least more familiar. Sorry for the inconvenience.

    I noticed the timestamps in the trace (I had to scroll to the right. Sorry for not noticing this). I had also set the LE DATA filter, which stripped away the advertisements after packet 15059, which was why I suspected the cut in the trace. 

    Correct me if I am wrong here:

    15052: Indication from nRF -> central.

    15053: Acking indication (?) central -> nRF (acking the packtet 15052)

    15054: empty packet nRF -> central 

    15055: some data (Do you know what this is?) central -> nRF

    15056: empty packet nRF -> central(not acked)

    15057: retransmission of 15055 central -> nRF (because it didn't pick up the Ack from nRF in the previous packet)

    15058: empty packet nRF -> central

    15059: empty packet central -> nRF

    Pause 4 seconds

    15060 -> ... advertisements from nRF. 

    So two things happen at the end here. There i a retransmission of a packet, but that shouldn't matter. Then the central sends a packet that the nRF doesn't reply to (15059). It is possible that either the nRF doesn't pick up the packet from the central, or the sniffer didn't pick up the packet from the nRF. Either way, it is radio silence from then on, and for 4 seconds (exactly). Is this your supervision timeout?

    I suspect the central for being guilty. In a BLE connection it is always the central that is the responsible for sending the first packet. I don't know the HW of this sniffer, but the sniffer doesn't pick up any more packets from the central, and apparently, not the nRF either.

    since it doesn't have any packets to reply to, the nRF doesn't send anything until the connection times out (4 seconds), and then it starts advertising again after the connection timed out. 

    However, this should give the disconnected event in the nRF. If you don't see it, is it possible that you are waiting for some specific event in the ble_evt_handler() blocking the disconnected event?

    If the case is that the nRF crashes (hardfault or app error), the central should keep sending empty packets for the duration of the supervision timeout. I don't know what changes you may have done to the SDK, but the APP_ERROR_CHECK(err_code) should print something in the log when it receives an err_code != 0. Do you see that? I don't see it from the log, but I don't know if you actively removed it. You can check what it looks like by putting:
    APP_ERROR_CHECK(1); somewhere in your project after you initialize your log. 

    I would check the central in the connection. I find it weird that the central goes silent. You can test this from sniffing. If they are in a connection, and you cut the power on the peripheral, you will see that the central will keep sending messages. If you cut the power on the central, you will see that the link goes silent, because the peripheral doesn't have anything to reply to. Then, after the supervision timeout, it will restart advertising.

    Best regards,

    Edvin

  • The point at which the nRF does not respond to packet 15059 the where it appears to have crashed, or at least become completely unresponsive. As seen in the logs and trace, it will not respond via Bluetooth, nor will it respond to commands over the UART.

    We don't make any changes to the SDK, so the APP_ERROR_CHECK should print if it fails.

    At this point we have made significant changes to both our nRF application and the central device, so I will provide an update once we are able to conduct more testing.

    Regards,

    Drew

  • Attached is a set of logs and traces from our updated devices. At the end of the logs/traces, it can be seen that the central device sends a ATT_WRITE_REQ to the nRF. The nRF app is able to process the incoming data, but it never responds to the ATT_WRITE_REQ with the corresponding ATT_WRITE_RESP, which causes the central device to time out and disconnect.

    It is not clear why the nRF is unable to respond to the ATT_WRITE_REQ.

    missing_write_response.zip

  • This trace exhibits the same behavior as the original one: the peripheral sees the write request and ACKs it at the packet level and then the central stops talking. It appears as the though central is crashing/resetting, leading the peripheral to restart the advertiser four seconds later.

Related