This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Random disconnection when messages are sent from central to peripheral at high rate.

hello there!

I am working on a project in which two devices(one is central and other is peripheral ) uses Nordic bluetooth controller  for connection and data exchange. nrf51822 for peripheral and nrf51422 for the central. I have random disconnection when I try to send too many messages from the remote(central) to the peripheral. 

The short range disconnection happens by pressing the buttons on the remote in a fast random manner at a distance of 20 metres from the peripheral. The issue does not  happen, when the same buttons are pressed randomly fast in a close range(5-10 m) or if the buttons are pressed not so fast at long range(> 30m).

The analysis points to the fact that, when packets/messages sent from remote to the wheel is very high, some of the packets are lost and disconnection event is triggered by the soft device.

From the stack collected at the point of disconnection, the reason is BLE_HCI_CONNECTION_TIMEOUT and this happens because the supervision timeout is hit (no packets from the peer is received in x seconds).

The stack traces were collected both at remote(central) and peripheral  and both points to BLE_HCI_CONNECTION_TIMEOUT. Attached the stack traces(Remote and wheel).

Below are the BLE connection interval and supervision timeout

#define MIN_CONNECTION_INTERVAL     MSEC_TO_UNITS(7.5, UNIT_1_25_MS)    /**< Determines minimum connection interval in millisecond. */

#define MAX_CONNECTION_INTERVAL     MSEC_TO_UNITS(30, UNIT_1_25_MS)     /**< Determines maximum connection interval in millisecond. */
#define SLAVE_LATENCY               0                                   /**< Determines slave latency in counts of connection events. */
#define SUPERVISION_TIMEOUT         MSEC_TO_UNITS(1000, UNIT_10_MS)     /**< Determines supervision time-out in units of 10 millisecond. */

#define NRF_CLOCK_LFCLKSRC      {.source        = NRF_CLOCK_LF_SRC_RC,            \

                                 .rc_ctiv       = 16,                                \
                                 .rc_temp_ctiv  = 2,                                \
                                 .xtal_accuracy = NRF_CLOCK_LF_XTAL_ACCURACY_250_PPM}

 

It would be great if someone can point me with the correct parameters or guide how to debug the issue. Does BLE have a queue limit, since disconnection happens when messages are sent at high rate. If there is queue limit, how do I change it ?

Kind Regards,

Thomas

Parents
  • Hello Thomas,

    I suspect that what you are seeing is that when you try to send packages too fast on a bad link (long distance), the packets that aren't ACKed because of packet loss remains in the softdevice's queue. If the queue is full, and you try to queue another packet, it will return something like NRF_ERROR_RESOURCES (or something like that. The name depends on the SDK version you use).

    If this return value is passed into an APP_ERROR_CHECK(err_code); the error handler will reset the application. On the device that is reset, it will start to advertise or scan, depending on whether it is the peripheral or central, but the other device will just see that the BLE link goes silent, and eventually the link will time out.

    I don't know what your application looks like, or what SDK version you are using, so I will use the ble_app_uart from SDK12.3.0 as an example, since you use the nRF51.

    This application, as you may or may not be aware of, send's all data received over UART on the BLE link. This happens in uart_event_handle() in main.c.

    It will wait for an '\n', and call err_code = ble_nus_string_send(). This function will call return err_code = sd_ble_gatts_hvx();

    If the queue is full, it will return BLE_ERROR_NO_TX_PACKETS (in this SDK version). This means that the buffer was full. The default check for the return value is:

    err_code = ble_nus_string_send(&m_nus, data_array, index);
                    if (err_code != NRF_ERROR_INVALID_STATE)
                    {
                        APP_ERROR_CHECK(err_code);
                    }

    If you change the if-check to: if (err_code != NRF_ERROR_INVALID_STATE && err_code != NRF_ERROR_NO_TX_PACKETS)

    Then your application will not reset. But you should use this information to know that the packet was not queued. 

    It is possible to increase this queue size by increasing the MTU size. However, note that this will only increase the buffer. The same issue may still happen if the link has bad connection, and you continue to queue packets. All packets that are queued with the return value NRF_SUCCESS is queued, and will eventually be ACKed. The softdevice will retransmit the packet until it is ACKed. 

    Best regards,

    Edvin

  • Thanks Edvin for the prompt reply.

    Yes, I was able to capture one more stack trace on the central device where it points to the error NRF_ERROR_NO_TX_PACKETS.

    Our code does not use APP_ERROR_CHECK(err_code) . and no soft device resets happen due to that. Below the code to send ble messages.

    #define MAX_TIMEOUT_COUNT 0xFFFF

    while (ble_nus_c_control_send(&m_ble_nus_c_first_client, data_array, leng) != NRF_SUCCESS)   
          {
           // repeat until sent or until timeout counter reached maximum count
           timeout_counter++;
           if(timeout_counter > MAX_TIMEOUT_COUNT)
           {
            break;
           }
          }

    Could you please look at the above code that we use to send ble messages and let us know if this could cause any issues to fill up the TX buffer fast.

    Couple of queries what we have

    1, Is it possible to increase the transmit/receive FIFO/buffer size or number of TX/RX buffer ?

    2, Can we use any function to clear the TX/RX FIFO/buffer ? If yes can you please let us know what function could do this ?

    Kind Regards,

    Thomas

  • Hello Edwin,

    Continuation of my previous mail. Request your reply ASAP, as i am debugging a critical issue in our product.

    More analysis on the short range disconnection issue is due to distance between central and peripheral, some packets are getting lost and do not get BLE_EVT_TX_COMPLETE/BLE_GATTS_EVT_HVX_TX_COMPLETE for those messages. Due to which the trasmit buffers are not cleared and on sending more messages fills up the transmit buffers very fast. This would lead to NRF_ERROR_NO_TX_PACKETS.

    Kindly answer the questions below for us to help in debugging.

    1, In above scenario, is it possible to tell the BLE layer to clear the transmit buffers irrespective the sent message was successful or not. Which method to use, BLE_GATT_HVX_NOTIFICATION or BLE_GATT_HVX_INDICATION ?

    2, Can you please suggest optimistic connection parameters (too short connection supervision timeout relative to the connection interval) so that only a few packet drops will lead to disconnect. In our case, we will have frequent out of range scenarios and expect to reconnect back quickly. when in range.

    #define MIN_CONNECTION_INTERVAL     MSEC_TO_UNITS(7.5, UNIT_1_25_MS)    /**< Determines minimum connection interval in millisecond. */

    #define MAX_CONNECTION_INTERVAL     MSEC_TO_UNITS(30, UNIT_1_25_MS)     /**< Determines maximum connection interval in millisecond. */
    #define SLAVE_LATENCY               0                                   /**< Determines slave latency in counts of connection events. */
    #define SUPERVISION_TIMEOUT         MSEC_TO_UNITS(1000, UNIT_10_MS)     /**< Determines supervision time-out in units of 10 millisecond. */

    3, Please advise if the LF_CLK_SRC values are opptimal.

    #define NRF_CLOCK_LFCLKSRC      {.source        = NRF_CLOCK_LF_SRC_RC,            \

                                     .rc_ctiv       = 16,                                \
                                     .rc_temp_ctiv  = 2,                                \
                                     .xtal_accuracy = NRF_CLOCK_LF_XTAL_ACCURACY_250_PPM}

    Adding the previous queries what was posted in previous mail.

    1, Is it possible to increase the transmit/receive FIFO/buffer size or number of TX/RX buffer ?

    2, Can we use any function to clear the TX/RX FIFO/buffer ? If yes can you please let us know what function could do this ?

    Kind Regards,

    Thomas

  • Hello Thomas,

    Sorry for the late reply. I were out of office for a few days.

    So. Which SDK version do you use?

    If you don't use APP_ERROR_CHECK() it shouldn't reset. Can you double check that it doesn't return something else? Is it possible to do some logging?

    Queuing up packs shouldn't be the cause of a timeout. It will only cause the link to send packets with payload data instead of without payload data. That being said, sending longer packets may increase the chance of a packet loss, because there are more places that could include bit errors.

    Is it possible to do a sniffer trace of the connection when you get the disconnect?

    1. It is no function call to clear the Softdevice buffer, unfortunately. The softdevice will continue to re-transmit the packets in the buffer until it gets the ACK.

    2. You can try to increase the connection timeout (SUPERVISION_TIMEOUT). Most of our examples uses 4 seconds. 

    3. Clock settings:

    I read from the nRF51 spec that the ANT protocol requires a clock accuracy of 50ppm, while using the RC Oscillator only provides 250ppm. I don't know whether you use ANT or not, (but I suspect it, since you use nRF51422 on one device and nRF51822 on the other). Either way, you may try to reduce the rc_ctiv to see whether it makes a difference.

    Previous mail:

    1. Yes, but to be honest, I don't see that it would help you in the issue. This may increase the throughput of the link, but as long as not one single packet is received within the supervision timeout, it doesn't really matter. To increase the buffer, you must increase the MTU. This is done in the ble_stack_init() function:

    Since you use the S130 softdevice (nRF51), this is not supported.

    2. No.

    As mentioned, is it possible to get a sniffer trace? A sniffer trace using nRF Sniffer is sufficient.

    Best regards,

    Edvin

Reply
  • Hello Thomas,

    Sorry for the late reply. I were out of office for a few days.

    So. Which SDK version do you use?

    If you don't use APP_ERROR_CHECK() it shouldn't reset. Can you double check that it doesn't return something else? Is it possible to do some logging?

    Queuing up packs shouldn't be the cause of a timeout. It will only cause the link to send packets with payload data instead of without payload data. That being said, sending longer packets may increase the chance of a packet loss, because there are more places that could include bit errors.

    Is it possible to do a sniffer trace of the connection when you get the disconnect?

    1. It is no function call to clear the Softdevice buffer, unfortunately. The softdevice will continue to re-transmit the packets in the buffer until it gets the ACK.

    2. You can try to increase the connection timeout (SUPERVISION_TIMEOUT). Most of our examples uses 4 seconds. 

    3. Clock settings:

    I read from the nRF51 spec that the ANT protocol requires a clock accuracy of 50ppm, while using the RC Oscillator only provides 250ppm. I don't know whether you use ANT or not, (but I suspect it, since you use nRF51422 on one device and nRF51822 on the other). Either way, you may try to reduce the rc_ctiv to see whether it makes a difference.

    Previous mail:

    1. Yes, but to be honest, I don't see that it would help you in the issue. This may increase the throughput of the link, but as long as not one single packet is received within the supervision timeout, it doesn't really matter. To increase the buffer, you must increase the MTU. This is done in the ble_stack_init() function:

    Since you use the S130 softdevice (nRF51), this is not supported.

    2. No.

    As mentioned, is it possible to get a sniffer trace? A sniffer trace using nRF Sniffer is sufficient.

    Best regards,

    Edvin

Children
No Data
Related