Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Connection failure when sending and receiving data simultaneously with SoftDevice 6.0 and SDK 15

I’m experiencing random connection failure when transferring data (both ways at the same time) between my peripheral (Slave) and central (Master) device.
The problem appeared just after upgrade to the SoftDevice 132 (version 6.0) and SDK (version 15).
There was no such issues with previous versions of SoftDevice (5.0) and SDK (14).

The problem occurs when 2 devices (Master and Slave) starts to stream data bi-directionally with a speed of about 8 kB/second each.
It takes from few seconds up to few minutes when connection fails and both devices starts to report an error (NRF_ERROR_RESOURCES) at the same time.
Furthermore, once this situation happen, the connection on master device seems to be dead completely.
The device is not capable to send/receive notifications anymore (using different characteristic) and is not "aware" of any connections events.
For example, the Slave device can be powered-off and the Master device does not receive “disconnection” event.

There is no issues with connection, pairing or bonding.
Sending notifications, indications or small amounts of data from one device to another seems to be ok too.
The problem starts when devices goes into fast “streaming mode” and larger amounts of data are exchanged.

Both devices are based on nrf52382 and using latest SoftDevice 6.0/SDK 15.
Slave device uses “interrupt dispatch model” (NRF_SDH_DISPATCH_MODEL_INTERRUPT).
Master device uses RTOS and “polling dispatch model” (NRF_SDH_DISPATCH_MODEL_POLLING)

Both devices uses custom Services which are very similar to the “ble_nus” and “ble_nus_c” from the SDK.
Functions used for sending data are: sd_ble_gatts_hvx and sd_ble_gattc_write.
Connection parameters (including negotiated ones) are as follows:
Data length 251 bytes
ATT MTU 247 bytes
PHY set to 2 Mbps
MIN_CONNECTION_INTERVAL 10 ms
MAX_CONNECTION_INTERVAL 20 ms
SLAVE_LATENCY 0
SUPERVISION_TIMEOUT 4000ms
NRF_SDH_BLE_GATT_MAX_MTU_SIZE 247
NRF_SDH_BLE_GATTS_ATTR_TAB_SIZE 1408
NRF_SDH_BLE_GAP_EVENT_LENGTH 400

An observation has been made (but not 100% confirmed):
When sending data in packages by 244 bytes (247-3) the connection seems to be stable.
Occasionally, “NRF_ERROR_RESOURCES” errors appears and this is normal (I know I need to wait for the BLE_GATTS_EVT_HVN_TX_COMPLETE / BLE_GATTC_EVT_WRITE_CMD_TX_COMPLETE events) but connection stays alive for long time.
When data is sent in smaller “packages” (by 160 bytes) the connection fails usually after few seconds.

I’ve tried to use nRF Sniffer to catch the moment when connection fails.
It wasn’t easy, as the tool is not upgraded and has many limitations. However, few screenshots has been made.
First picture shows the moment when Master device stops to respond (pos no. 33071).


Other pictures shows very last packets that has been sent over.

I’ve spent few days to investigate the problem in BLE parameters, memory leaks, RTOS tasks and priorities, stack sizes and in many other places.
Please give a hint for the solution.

PS: The solution isn’t the downgrade to the SoftDevice 132 (version 5.0) and SDK (version 14) as those version has other pairing/bonding issues with latest Android devices.

Parents
  • Hi JRRSoftware,

    I was able to run some tests and i found it very clear that your timer deamon task at priority (2) was starving your dummy task and softdevice task at the same priority.

    #define configTIMER_TASK_PRIORITY ( 2 )

    Remember that in FreeRTOS configuring the kernel is very important to suit your needs. Since you have many "runnable" state tasks at the same time with same priority, FreeRTOS scheduler will always choose one task to run and starve the rest as long as the first task suspends itself. The reason is that you have set the timeslicing of equal priority tasks to 0. Your configuration for this is as below

    #define configUSE_TIME_SLICING 0

    Quoting the text from FreeRTOS documentation

    configUSE_TIME_SLICING

    By default (if configUSE_TIME_SLICING is not defined, or if configUSE_TIME_SLICING is defined as 1) FreeRTOS uses prioritised preemptive scheduling with time slicing. That means the RTOS scheduler will always run the highest priority task that is in the Ready state, and will switch between tasks of equal priority on every RTOS tick interrupt. If configUSE_TIME_SLICING is set to 0 then the RTOS scheduler will still run the highest priority task that is in the Ready state, but will not switch between tasks of equal priority just because a tick interrupt has occurred.

    So if you set the timeslicing to 1 and leave the preemption to 1, then you should not see this problem.

    configUSE_PREEMPTION

    1

    configUSE_TIME_SLICING

    1

    I guess some timing has changed with softdevice in few microseconds with relation to the notification for us to be able to trigger this corner case. Never the less, please choose your task priorities very wisely, they are very crucial part of your application design.

     

  • Hi Aryan

    I think we are very close to the final solution for this problem.
    Your suggestions helped a lot and I managed to run the application whithout the issue described.

    However, what is your suggestion about task priorities in this case?
    It wasn't my intention to run all tasks at the same priority level.
    I would rather assign different priority level for each task than use the "time slicing" option (configUSE_TIME_SLICING) on.

    Do you think this approach is right? 
    Would it be a good practice, if SDH task has the highest priority in the entire application? (in the SDK code, the nrf_sdh_freertos_init() function sets the SDH task priority to 2 ...which is fairy low).

    Or maybe you will advise the same priorities for all tasks and use "time slicing" option on?

Reply
  • Hi Aryan

    I think we are very close to the final solution for this problem.
    Your suggestions helped a lot and I managed to run the application whithout the issue described.

    However, what is your suggestion about task priorities in this case?
    It wasn't my intention to run all tasks at the same priority level.
    I would rather assign different priority level for each task than use the "time slicing" option (configUSE_TIME_SLICING) on.

    Do you think this approach is right? 
    Would it be a good practice, if SDH task has the highest priority in the entire application? (in the SDK code, the nrf_sdh_freertos_init() function sets the SDH task priority to 2 ...which is fairy low).

    Or maybe you will advise the same priorities for all tasks and use "time slicing" option on?

Children
  • Hi, Very good question, but is not very straight forward to answer

    If you understand your system very well, then you should be able to tell the timing constraints of each action to be performed by your application. Very strict timing requirement on tasks gives them high priority compared to the others. 

    In your application where you seem to send the audio data, your application gives high importance on being able to transmit data (by notifications). So the task that sends hvx data could be high in priority. At the same time you should see that the high priority tasks suspends once in a while to give low priority tasks chance to run (else low priority task will never run in FreeRTOS). Safest way if you do not have clear understanding of inter dependency of tasks in your system is to enable time slicing and use same priority on tasks.

    I also did not see it very efficient that the fact that you used timer to send a notification (and only one notification at a time) instead of TX_COMPLETE event (and sending as much as you can in that event). But maybe you did this just an example to demonstrate the problem and your real application does it differently.

Related