This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Random disconnection when messages are sent from central to peripheral at high rate.

hello there!

I am working on a project in which two devices(one is central and other is peripheral ) uses Nordic bluetooth controller  for connection and data exchange. nrf51822 for peripheral and nrf51422 for the central. I have random disconnection when I try to send too many messages from the remote(central) to the peripheral. 

The short range disconnection happens by pressing the buttons on the remote in a fast random manner at a distance of 20 metres from the peripheral. The issue does not  happen, when the same buttons are pressed randomly fast in a close range(5-10 m) or if the buttons are pressed not so fast at long range(> 30m).

The analysis points to the fact that, when packets/messages sent from remote to the wheel is very high, some of the packets are lost and disconnection event is triggered by the soft device.

From the stack collected at the point of disconnection, the reason is BLE_HCI_CONNECTION_TIMEOUT and this happens because the supervision timeout is hit (no packets from the peer is received in x seconds).

The stack traces were collected both at remote(central) and peripheral  and both points to BLE_HCI_CONNECTION_TIMEOUT. Attached the stack traces(Remote and wheel).

Below are the BLE connection interval and supervision timeout

#define MIN_CONNECTION_INTERVAL     MSEC_TO_UNITS(7.5, UNIT_1_25_MS)    /**< Determines minimum connection interval in millisecond. */

#define MAX_CONNECTION_INTERVAL     MSEC_TO_UNITS(30, UNIT_1_25_MS)     /**< Determines maximum connection interval in millisecond. */
#define SLAVE_LATENCY               0                                   /**< Determines slave latency in counts of connection events. */
#define SUPERVISION_TIMEOUT         MSEC_TO_UNITS(1000, UNIT_10_MS)     /**< Determines supervision time-out in units of 10 millisecond. */

#define NRF_CLOCK_LFCLKSRC      {.source        = NRF_CLOCK_LF_SRC_RC,            \

                                 .rc_ctiv       = 16,                                \
                                 .rc_temp_ctiv  = 2,                                \
                                 .xtal_accuracy = NRF_CLOCK_LF_XTAL_ACCURACY_250_PPM}

 

It would be great if someone can point me with the correct parameters or guide how to debug the issue. Does BLE have a queue limit, since disconnection happens when messages are sent at high rate. If there is queue limit, how do I change it ?

Kind Regards,

Thomas

  • 1: no

    2: not possible to clear the buffer.

    3: no.

    What SDK version do you use?

    When a message is queued it is not possible to remove it before it is ACKed. 

    You can wait for the TX_COMPLETE event. The name of this event depends on the SDK version that you use. Therefore I keep asking what SDK version you use. What SDK version do you use?

    You can use these events to know when you have some free space in the buffer.

    Note that if you use indication instead of notification, the message has to be ACKed in the application layer of the receiver (central). If this is not done, the buffer will not be cleared. 

  • Hi Edwin,
    Thanks for the responses.
    We are using SDK version 12.3. The reason for disconnection is BLE_HCI_CONNECTION_TIMEOUT(0x08)
    As you mentioned in earlier mail, TX buffer getting full does not result in disconnection event.
    1, At what scenarios BLE_HCI_CONNECTION_TIMEOUT can occur ? Does it happen only when out of range ?
    2, Also, when there is a BLE_HCI_CONNECTION_TIMEOUT, do we get BLE_GAP_EVT_TIMEOUT. If we get BLE_GAP_EVT_TIMEOUT, how should it be handled. Should we call sd_ble_gap_disconnect() or start scanning again ?
    3, Is it possible to not use either indication or notification, so that the buffers are cleared without ACK being received. Can we tell the link layer to not use indication or notification ?
    4, Tried increasing TX buffer width in my central device to 6(default was 3).
    In ble_stack_init(),tried changing TX buffer width from 3 to 6 using the below methods.
    a,  1st method

     ble_conn_bw_counts_t conn_bw_counts = {
      .tx_counts = {.high_count = 1, .mid_count = 0, .low_count = 0},
      .rx_counts = {.high_count = 1, .mid_count = 0, .low_count = 0}
     };
     ble_enable_params.common_enable_params.p_conn_bw_counts = &conn_bw_counts;
    b,2nd method
     /*Configure bandwidth */
     ble_opt_t ble_opt;
     ble_common_opt_conn_bw_t conn_bw;
     memset(&conn_bw, 0x00, sizeof(conn_bw));
     memset(&ble_opt, 0x00, sizeof(ble_opt));
     // if this set to mid this will work but setting it to high will not
     conn_bw.conn_bw.conn_bw_rx = BLE_CONN_BW_HIGH;
     conn_bw.conn_bw.conn_bw_tx = BLE_CONN_BW_HIGH;
     err_code = sd_ble_opt_set(BLE_COMMON_OPT_CONN_BW, &ble_opt);
     APP_ERROR_CHECK(err_code);

    Even after trying to set the bandwidth to 6 as in above methods, the bandwidth does not change. Verified using sd_ble_tx_packet_count_get() and could see the value is still 3
     uint8_t p_count = 0;
     sd_ble_tx_packet_count_get(p_ble_nus_c->conn_handle, &p_count);
    5, How to check these events to know when you have some free space in the buffer?
    Thanks in advance for the responses to help us solving a critical issue.

    Thanks,
    Thomas
  • Hello Thomas,

    Let me be clear, to avoid confusions.

     

    tpoly said:
    As you mentioned in earlier mail, TX buffer getting full does not result in disconnection event.

     This is correct. This in itself doesn't cause a disconnect, but if you call:

    err_code = ble_nus_string_send(...) // or any other function that queues a packet to the TX queue
    APP_ERROR_CHECK(err_code);

    and err_code = NRF_ERROR_RESOURCES

    which is returned if the queue is full. If this value is passed into APP_ERROR_CHECK() then the device will reset. The device that resets will start from scratch, starting to advertise. The device that was connected to it will not receive any disconnect messages, and is still trying to listen to packets from the device that was reset.

    Your questions:

    1. BLE_HCI_CONNECTION_TIMEOUT occurs if the device doesn't receive any complete packets for the duration of the supervision timeout (by default 4 seconds in most of the examples). 

    2. When you get this event you are already disconnected, so no need to call the disconnect function. You can start scanning again.

    3. You can avoid using indication or notification, but then you will have to trigger manual readings from the central. The throughput will go down drastically, and I don't think it will solve your issue. Every packet sent over the air is ACKed by the SoftDevice. So a read request message will also be ACKed, and retransmitted if it isn't. This is from the BLE specification.

    4/5:

    When you want to send data, you just queue them up in the buffer, using the hvx function call: sd_ble_gatts_hvx().

    If this returns NRF_CONNECT, the packet is queued. If it returns something else, DONT send that value into APP_ERROR_CHECK. Look at the description of the return values in ble_gatts.h line 507-555, what the different return values mean.

    If the buffer is full, you must wait for the BLE_EVT_TX_COMPLETE event (you must add this to your ble_evt_handler() ). When this event is received, it means that a packet was ACKed, and you can queue more. Repeat this pattern.

    Best regards,

    Edvin

  • Thanks Edwin for the responses.

    We do not use APP_ERROR_CHECK(err_code), while sending messages. So we can rule out the possibility of device being reset due to APP_ERROR_CHECK()
    In our code we use ble_nus_c_control_send() to send messages

    timeout_counter = 0;
    while (ble_nus_c_control_send(&m_ble_nus_c_first_client, data_array, leng) != NRF_SUCCESS)  
    {
           // repeat until sent or until timeout counter reached maximum count
           timeout_counter++;
          if(timeout_counter > MAX_TIMEOUT_COUNT_SEND_TO_WHEELS)
          {
               break;
          }
    }
    and ble_nus_string_send() internally calls
    sd_ble_gattc_write(p_ble_nus_c->conn_handle, &write_params)
    1, When we are sending messages slowly, we do not receive disconnection at range(20-50 m). But when the messages are sent at faster rate, we have disconnection even at close range(15 m).

    2, We are not sure why the disconnection happens at close range. Try to understand the posibilities the disconnection can occur at close range. From the stack trace collected,
       it shows the disconnection happens due to BLE_HCI_CONNECTION_TIMEOUT. Attached the stack trace for your reference.
    Trying to understand the root cause of the issue, before we implement the method you suggested.
    "If the buffer is full, you must wait for the BLE_EVT_TX_COMPLETE event (you must add this to your ble_evt_handler() ). When this event is received, it means that a packet was ACKed, and you can queue more. Repeat this pattern."
    Please share your thoughts on the possible root causes why the disconnection can happen in close range, which otherwise does not happen when messages are sent at slow rate.
    Thanks in advance,
    Thomas
  • Hello Thomas,

    I don't know your implementation of ble_nus_c_control_send(), and when it returns NRF_SUCCESS when it returns something else, but I assume you use this implementation to keep sending until you have sent all your packets. By doing this, your nRF will not go to sleep until the timeout is reached, or all the packets are queued (sd_ble_gattc_write()).

    When sd_ble_gattc_write returns NRF_ERROR_RESOURCES, you should wait for a BLE_EVT_TX_COMPLETE event (you need to add this to your ble_evt_handler() in main.c) which means that a packet is ACKed, and you have freed up space in the queue. Use this event to queue another packet, as you can then go to sleep in between queuing packets. This is just a general tip, though. It shouldn't affect your disconnect reason.

    Can you try to monitor these events, the BLE_EVT_TX_COMPLETE? Do you not receive these in the seconds before the disconnect?

    Regarding the nRF Sniffer:

    I assume you have seen the user guide. Can you check that you have done everything in the setup section, section 2?

    Regarding disconnection due to timeout, the chances of this may be increased when sending large packets. When you don't send any packets, each device will send an empty packet on the connection events (every connection interval). These are ACKed, and it resets the timeout timer. When you send a long packet, it increases the chance of flipping bits, and thus not being ACKed. Especially on long ranges. However, the other device (peripheral in your case, I believe), will still send empty packets, which will be ACKed, and this should reset this timer. 

    A sniffer trace should say whether they can hear each other or not. And do you get the TX complete events in the seconds before the disconnection?

    Best regards,

    Edvin

Related