This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Random disconnection when messages are sent from central to peripheral at high rate.

hello there!

I am working on a project in which two devices(one is central and other is peripheral ) uses Nordic bluetooth controller  for connection and data exchange. nrf51822 for peripheral and nrf51422 for the central. I have random disconnection when I try to send too many messages from the remote(central) to the peripheral. 

The short range disconnection happens by pressing the buttons on the remote in a fast random manner at a distance of 20 metres from the peripheral. The issue does not  happen, when the same buttons are pressed randomly fast in a close range(5-10 m) or if the buttons are pressed not so fast at long range(> 30m).

The analysis points to the fact that, when packets/messages sent from remote to the wheel is very high, some of the packets are lost and disconnection event is triggered by the soft device.

From the stack collected at the point of disconnection, the reason is BLE_HCI_CONNECTION_TIMEOUT and this happens because the supervision timeout is hit (no packets from the peer is received in x seconds).

The stack traces were collected both at remote(central) and peripheral  and both points to BLE_HCI_CONNECTION_TIMEOUT. Attached the stack traces(Remote and wheel).

Below are the BLE connection interval and supervision timeout

#define MIN_CONNECTION_INTERVAL     MSEC_TO_UNITS(7.5, UNIT_1_25_MS)    /**< Determines minimum connection interval in millisecond. */

#define MAX_CONNECTION_INTERVAL     MSEC_TO_UNITS(30, UNIT_1_25_MS)     /**< Determines maximum connection interval in millisecond. */
#define SLAVE_LATENCY               0                                   /**< Determines slave latency in counts of connection events. */
#define SUPERVISION_TIMEOUT         MSEC_TO_UNITS(1000, UNIT_10_MS)     /**< Determines supervision time-out in units of 10 millisecond. */

#define NRF_CLOCK_LFCLKSRC      {.source        = NRF_CLOCK_LF_SRC_RC,            \

                                 .rc_ctiv       = 16,                                \
                                 .rc_temp_ctiv  = 2,                                \
                                 .xtal_accuracy = NRF_CLOCK_LF_XTAL_ACCURACY_250_PPM}

 

It would be great if someone can point me with the correct parameters or guide how to debug the issue. Does BLE have a queue limit, since disconnection happens when messages are sent at high rate. If there is queue limit, how do I change it ?

Kind Regards,

Thomas

Parents
  • Hi Edwin,
    Thanks for your inputs.

    Summarizing the issue, tx buffers are getting full when messages are sent at high rate and it results in a disconnection event.
    1, Is it possible to tell the BLE layer to clear the transmit buffers irrespective the sent message was successful or not. Which method to use, BLE_GATT_HVX_NOTIFICATION or BLE_GATT_HVX_INDICATION ?
    2, Please suggest a way so that irrespective the ACK is received for the messages, the buffer could be cleared.
    3, Does configuring as BLE_GATT_HVX_NOTIFICATION means buffers are cleared and no ACKS are needed to clear the tx buffer.
    Request you to give answers to the above queries.
    Thanks,
    Thomas
  • 1: no

    2: not possible to clear the buffer.

    3: no.

    What SDK version do you use?

    When a message is queued it is not possible to remove it before it is ACKed. 

    You can wait for the TX_COMPLETE event. The name of this event depends on the SDK version that you use. Therefore I keep asking what SDK version you use. What SDK version do you use?

    You can use these events to know when you have some free space in the buffer.

    Note that if you use indication instead of notification, the message has to be ACKed in the application layer of the receiver (central). If this is not done, the buffer will not be cleared. 

  • Hi Edwin,
    Thanks for the responses.
    We are using SDK version 12.3. The reason for disconnection is BLE_HCI_CONNECTION_TIMEOUT(0x08)
    As you mentioned in earlier mail, TX buffer getting full does not result in disconnection event.
    1, At what scenarios BLE_HCI_CONNECTION_TIMEOUT can occur ? Does it happen only when out of range ?
    2, Also, when there is a BLE_HCI_CONNECTION_TIMEOUT, do we get BLE_GAP_EVT_TIMEOUT. If we get BLE_GAP_EVT_TIMEOUT, how should it be handled. Should we call sd_ble_gap_disconnect() or start scanning again ?
    3, Is it possible to not use either indication or notification, so that the buffers are cleared without ACK being received. Can we tell the link layer to not use indication or notification ?
    4, Tried increasing TX buffer width in my central device to 6(default was 3).
    In ble_stack_init(),tried changing TX buffer width from 3 to 6 using the below methods.
    a,  1st method

     ble_conn_bw_counts_t conn_bw_counts = {
      .tx_counts = {.high_count = 1, .mid_count = 0, .low_count = 0},
      .rx_counts = {.high_count = 1, .mid_count = 0, .low_count = 0}
     };
     ble_enable_params.common_enable_params.p_conn_bw_counts = &conn_bw_counts;
    b,2nd method
     /*Configure bandwidth */
     ble_opt_t ble_opt;
     ble_common_opt_conn_bw_t conn_bw;
     memset(&conn_bw, 0x00, sizeof(conn_bw));
     memset(&ble_opt, 0x00, sizeof(ble_opt));
     // if this set to mid this will work but setting it to high will not
     conn_bw.conn_bw.conn_bw_rx = BLE_CONN_BW_HIGH;
     conn_bw.conn_bw.conn_bw_tx = BLE_CONN_BW_HIGH;
     err_code = sd_ble_opt_set(BLE_COMMON_OPT_CONN_BW, &ble_opt);
     APP_ERROR_CHECK(err_code);

    Even after trying to set the bandwidth to 6 as in above methods, the bandwidth does not change. Verified using sd_ble_tx_packet_count_get() and could see the value is still 3
     uint8_t p_count = 0;
     sd_ble_tx_packet_count_get(p_ble_nus_c->conn_handle, &p_count);
    5, How to check these events to know when you have some free space in the buffer?
    Thanks in advance for the responses to help us solving a critical issue.

    Thanks,
    Thomas
  • Hello Thomas,

    Let me be clear, to avoid confusions.

     

    tpoly said:
    As you mentioned in earlier mail, TX buffer getting full does not result in disconnection event.

     This is correct. This in itself doesn't cause a disconnect, but if you call:

    err_code = ble_nus_string_send(...) // or any other function that queues a packet to the TX queue
    APP_ERROR_CHECK(err_code);

    and err_code = NRF_ERROR_RESOURCES

    which is returned if the queue is full. If this value is passed into APP_ERROR_CHECK() then the device will reset. The device that resets will start from scratch, starting to advertise. The device that was connected to it will not receive any disconnect messages, and is still trying to listen to packets from the device that was reset.

    Your questions:

    1. BLE_HCI_CONNECTION_TIMEOUT occurs if the device doesn't receive any complete packets for the duration of the supervision timeout (by default 4 seconds in most of the examples). 

    2. When you get this event you are already disconnected, so no need to call the disconnect function. You can start scanning again.

    3. You can avoid using indication or notification, but then you will have to trigger manual readings from the central. The throughput will go down drastically, and I don't think it will solve your issue. Every packet sent over the air is ACKed by the SoftDevice. So a read request message will also be ACKed, and retransmitted if it isn't. This is from the BLE specification.

    4/5:

    When you want to send data, you just queue them up in the buffer, using the hvx function call: sd_ble_gatts_hvx().

    If this returns NRF_CONNECT, the packet is queued. If it returns something else, DONT send that value into APP_ERROR_CHECK. Look at the description of the return values in ble_gatts.h line 507-555, what the different return values mean.

    If the buffer is full, you must wait for the BLE_EVT_TX_COMPLETE event (you must add this to your ble_evt_handler() ). When this event is received, it means that a packet was ACKed, and you can queue more. Repeat this pattern.

    Best regards,

    Edvin

  • Thanks Edwin for the responses.

    We do not use APP_ERROR_CHECK(err_code), while sending messages. So we can rule out the possibility of device being reset due to APP_ERROR_CHECK()
    In our code we use ble_nus_c_control_send() to send messages

    timeout_counter = 0;
    while (ble_nus_c_control_send(&m_ble_nus_c_first_client, data_array, leng) != NRF_SUCCESS)  
    {
           // repeat until sent or until timeout counter reached maximum count
           timeout_counter++;
          if(timeout_counter > MAX_TIMEOUT_COUNT_SEND_TO_WHEELS)
          {
               break;
          }
    }
    and ble_nus_string_send() internally calls
    sd_ble_gattc_write(p_ble_nus_c->conn_handle, &write_params)
    1, When we are sending messages slowly, we do not receive disconnection at range(20-50 m). But when the messages are sent at faster rate, we have disconnection even at close range(15 m).

    2, We are not sure why the disconnection happens at close range. Try to understand the posibilities the disconnection can occur at close range. From the stack trace collected,
       it shows the disconnection happens due to BLE_HCI_CONNECTION_TIMEOUT. Attached the stack trace for your reference.
    Trying to understand the root cause of the issue, before we implement the method you suggested.
    "If the buffer is full, you must wait for the BLE_EVT_TX_COMPLETE event (you must add this to your ble_evt_handler() ). When this event is received, it means that a packet was ACKed, and you can queue more. Repeat this pattern."
    Please share your thoughts on the possible root causes why the disconnection can happen in close range, which otherwise does not happen when messages are sent at slow rate.
    Thanks in advance,
    Thomas
  • Hello Thomas,

    I don't know your implementation of ble_nus_c_control_send(), and when it returns NRF_SUCCESS when it returns something else, but I assume you use this implementation to keep sending until you have sent all your packets. By doing this, your nRF will not go to sleep until the timeout is reached, or all the packets are queued (sd_ble_gattc_write()).

    When sd_ble_gattc_write returns NRF_ERROR_RESOURCES, you should wait for a BLE_EVT_TX_COMPLETE event (you need to add this to your ble_evt_handler() in main.c) which means that a packet is ACKed, and you have freed up space in the queue. Use this event to queue another packet, as you can then go to sleep in between queuing packets. This is just a general tip, though. It shouldn't affect your disconnect reason.

    Can you try to monitor these events, the BLE_EVT_TX_COMPLETE? Do you not receive these in the seconds before the disconnect?

    Regarding the nRF Sniffer:

    I assume you have seen the user guide. Can you check that you have done everything in the setup section, section 2?

    Regarding disconnection due to timeout, the chances of this may be increased when sending large packets. When you don't send any packets, each device will send an empty packet on the connection events (every connection interval). These are ACKed, and it resets the timeout timer. When you send a long packet, it increases the chance of flipping bits, and thus not being ACKed. Especially on long ranges. However, the other device (peripheral in your case, I believe), will still send empty packets, which will be ACKed, and this should reset this timer. 

    A sniffer trace should say whether they can hear each other or not. And do you get the TX complete events in the seconds before the disconnection?

    Best regards,

    Edvin

Reply
  • Hello Thomas,

    I don't know your implementation of ble_nus_c_control_send(), and when it returns NRF_SUCCESS when it returns something else, but I assume you use this implementation to keep sending until you have sent all your packets. By doing this, your nRF will not go to sleep until the timeout is reached, or all the packets are queued (sd_ble_gattc_write()).

    When sd_ble_gattc_write returns NRF_ERROR_RESOURCES, you should wait for a BLE_EVT_TX_COMPLETE event (you need to add this to your ble_evt_handler() in main.c) which means that a packet is ACKed, and you have freed up space in the queue. Use this event to queue another packet, as you can then go to sleep in between queuing packets. This is just a general tip, though. It shouldn't affect your disconnect reason.

    Can you try to monitor these events, the BLE_EVT_TX_COMPLETE? Do you not receive these in the seconds before the disconnect?

    Regarding the nRF Sniffer:

    I assume you have seen the user guide. Can you check that you have done everything in the setup section, section 2?

    Regarding disconnection due to timeout, the chances of this may be increased when sending large packets. When you don't send any packets, each device will send an empty packet on the connection events (every connection interval). These are ACKed, and it resets the timeout timer. When you send a long packet, it increases the chance of flipping bits, and thus not being ACKed. Especially on long ranges. However, the other device (peripheral in your case, I believe), will still send empty packets, which will be ACKed, and this should reset this timer. 

    A sniffer trace should say whether they can hear each other or not. And do you get the TX complete events in the seconds before the disconnection?

    Best regards,

    Edvin

Children
  • Hi Edwin,
    Thanks for the resposnses.
    I tried to increase the tx buffer in 2 methods. The first method, i call the sd_ble_opt_set() during ble_stack_init()
    and in the second method i call sd_ble_opt_set(), when connection is made(BLE_GAP_EVT_ADV_REPORT). In our product we have a central and two peripherals, so for each connection the sd_ble_opt_set() is called.

    1, 1st method
    static void ble_stack_init(void)
    {
        uint32_t err_code;
      NRF_CLOCK->EVENTS_HFCLKSTARTED = 0;
      NRF_CLOCK->TASKS_HFCLKSTART = 1;
      uint32_t count = 0;
      do
      {
       count++;
       if(count>0xFFFF)
       {
        break;//timeout count in a while loop
       }
      }while(NRF_CLOCK->EVENTS_HFCLKSTARTED == 0); 
     
     
        nrf_clock_lf_cfg_t clock_lf_cfg = NRF_CLOCK_LFCLKSRC;
        // Initialize the SoftDevice handler module.
        SOFTDEVICE_HANDLER_INIT(&clock_lf_cfg, NULL);
        ble_enable_params_t ble_enable_params;
        err_code = softdevice_enable_get_default_config(NRF_BLE_CENTRAL_LINK_COUNT,
                                                        NRF_BLE_PERIPHERAL_LINK_COUNT,
                                                        &ble_enable_params);
        APP_ERROR_CHECK(err_code);
        //Check the ram settings against the used number of links
        CHECK_RAM_START_ADDR(NRF_BLE_CENTRAL_LINK_COUNT, NRF_BLE_PERIPHERAL_LINK_COUNT);
        // Enable BLE stack.
    #if (NRF_SD_BLE_API_VERSION == 3)
        ble_enable_params.gatt_enable_params.att_mtu = NRF_BLE_GATT_MAX_MTU_SIZE;
    #endif
        err_code = softdevice_enable(&ble_enable_params);
        APP_ERROR_CHECK(err_code);
        // Register with the SoftDevice handler module for BLE events.
        err_code = softdevice_ble_evt_handler_set(ble_evt_dispatch);
        APP_ERROR_CHECK(err_code);
        // Register with the SoftDevice handler module for System events.
        err_code = softdevice_sys_evt_handler_set(sys_evt_dispatch);
        APP_ERROR_CHECK(err_code);
     /*Configure bandwidth */
     ble_opt_t ble_opt;
     ble_common_opt_conn_bw_t conn_bw;
     memset(&conn_bw, 0x00, sizeof(conn_bw));
     memset(&ble_opt, 0x00, sizeof(ble_opt));
     conn_bw.conn_bw.conn_bw_rx = BLE_CONN_BW_HIGH;
     conn_bw.conn_bw.conn_bw_tx = BLE_CONN_BW_HIGH;
     conn_bw.role = BLE_GAP_ROLE_CENTRAL;
     
     ble_opt.common_opt.conn_bw = conn_bw;
     
     uint32_t err_code = sd_ble_opt_set(BLE_COMMON_OPT_CONN_BW, &ble_opt);
     APP_ERROR_CHECK(err_code);
    }
    2, 2nd method
    case BLE_GAP_EVT_ADV_REPORT:
     {
    ble_opt_t ble_opt;
    ble_common_opt_conn_bw_t conn_bw = { .role = BLE_GAP_ROLE_CENTRAL, .conn_bw = {
                          .conn_bw_rx = BLE_CONN_BW_HIGH, .conn_bw_tx = BLE_CONN_BW_HIGH } };
    ble_opt.common_opt.conn_bw = conn_bw;
    sd_ble_opt_set(BLE_COMMON_OPT_CONN_BW, &ble_opt);
     }

    With both the methods, i am not successfully able to connect to the peripherals.
    1, Is this the correct method to increase the tx buffer bandwidth ?
    2, Do we need to change the below parameters to increase tx/rx bandwidth
    #define HCI_MEM_POOL_ENABLED 0
    #define HCI_TX_BUF_SIZE 600
    #define HCI_RX_BUF_SIZE 600
    #define HCI_RX_BUF_QUEUE_SIZE 4
    Or does changing these parameters give the same result as setting through sd_ble_opt_set()
    3, Since I am increasing the tx buffer bandwidth from 3 to 6, do I need to make any changes in RAM(memory configuration) file ?
    4, How much memory should I increase for the soft device, when the tx buffer is increased from 3 to 6 .
    Currently I increase 24 bytes(since MTU is 23 bytes) for one buffer, then for 3 buffers 3*24=72.
    Since our product has two peripherals and so two connections, I make it 72+72 = 144.
    Please share your thoughts on this calculations.
    5, Do I need to check the buffer width for peripheral. For peripheral I assume tx buffer width to be 6 by default.
    Thanks,
    Thomas
  • Hello Thomas,

    I believe what you really should be changing is the gatt_init() function (typically called in the main() function, such as it is done in the ble_app_uart example.

    void gatt_init(void)
    {
        ret_code_t err_code;
    
        err_code = nrf_ble_gatt_init(&m_gatt, gatt_evt_handler);
        APP_ERROR_CHECK(err_code);
    
        err_code = nrf_ble_gatt_att_mtu_periph_set(&m_gatt, NRF_SDH_BLE_GATT_MAX_MTU_SIZE); // NRF_SDH_BLE_GATT_MAX_MTU_SIZE is defined as 247 in sdk_config.h
        APP_ERROR_CHECK(err_code);
    }

    On second thought, this is how you do it in SDK15.2.0. In SDK12.3.0, this was not yet supported, and isn't really supported at all in the S130 softdevice.

    regardless, changing the queue size shouldn't affect your project. It doesn't change the data going over the air, and it doesn't affect disconnections.

    Can you please give the nRF Sniffer a try. It is Alpha Omega for BLE development debugging. Let me know when you have a sniffer trace, or if you have any issues with the sniffer.

  • Thanks Edwin for the responses.

    I am using SDK 12.3 and S130.  I am able to see following code in my project

    /* GATT Module init. */
    void gatt_init(void)
    {
        ret_code_t err_code = nrf_ble_gatt_init(&m_gatt, gatt_evt_handler);
        APP_ERROR_CHECK(err_code);
    }

    So I think, I could add 

    err_code = nrf_ble_gatt_att_mtu_periph_set(&m_gatt, NRF_SDH_BLE_GATT_MAX_MTU_SIZE); // NRF_SDH_BLE_GATT_MAX_MTU_SIZE is defined as 247 in sdk_config.h
    APP_ERROR_CHECK(err_code);

    But the above call increases the MTU size. Does it increase the tx buffer width.

    1, Is the tx buffer width and MTU size the same or is it related ?

    2, When a message is sent from central to peripheral and no ACK is received from peripheral within the supervision timeout, do we get BLE_GAP_EVT_DISCONNECTED event or do we get any other BLE_GAP_EVT_TIMEOUT/BLE_GATTC_EVT_TIMEOUT ?

    3, Yes, tried with sniffer, but could not get it follow the connection. In the sniffer could see traces of connection events but failed to see any events when data was sent.

    Kind Regards,

    Thomas

  • Hi Edwin,

    Can you help us with these queries also

    1, Do we need to change the below parameters to increase tx/rx bandwidth
    #define HCI_MEM_POOL_ENABLED 0
    #define HCI_TX_BUF_SIZE 600
    #define HCI_RX_BUF_SIZE 600
    #define HCI_RX_BUF_QUEUE_SIZE 4
    Or does changing these parameters give the same result as setting through sd_ble_opt_set()
    2, Since I am increasing the tx buffer bandwidth from 3 to 6, do I need to make any changes in RAM(memory configuration) file ?
    3, How much memory should I increase for the soft device, when the tx buffer is increased from 3 to 6 .
    Currently I increase 24 bytes(since MTU is 23 bytes) for one buffer, then for 3 buffers 3*24=72.
    Since our product has two peripherals and so two connections, I make it 72+72 = 144.
    Please share your thoughts on this calculations.
    4, Do I need to check the buffer width for peripheral. For peripheral I assume tx buffer width to be 6 by default.
    Kind Regards,
    Thomas
  • Hello Thomas, 

    Sorry if I wasn't clear. It is not possible to increase the MTU size on the nRF51 series, because the softdevices for nRF51 does not support longer MTU sizes than 23.

    1.1 Yes and no. MTU size is the actual message size that is transmitted on air. buffer size is how much the softdevice can store.

    1.2 If no ACK is ever received, you will get a disconnect. The softdevice will keep re-transmitting until the message is ACKed.

    1.3 Is your connection encrypted? If it is, you can enter the out of band passkey in wireshark. If you don't use a passkey, but just works bonding, then you should be able to sniff the connection if the sniffer is listening  while they exchange the keys. So if you delete the bonding information on both devices, then you should be able to sniff it.

    2.1. The HCI buffer sizes, this is not related, since you are not using HCI.

    2.2 exactly where do you increase this buffer from 3 to 6? The log should always tell you what your RAM memory configurations should be.

    2.3 Check the log.

    2.4 not necessarily. Since the MTU isn't increased when you use the nRF51, the peripheral will get events on all received messages.

    I really don't think that the buffer sizes is your cause of disconnection. You shouldn't focus at the buffer sizes at this point in time, if you want to find the cause of your disconnections. Focus on the sniffer trace. Try to sniff with encryption. 

    Is it possible to send the sniffer logs?

Related