This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Erroneous BLE disconnection associated with trivial, unrelated code changes with nRF51822

Hi all,

I am continuing with university project now in its 4th year of development for a wearables leg motion measurement application. Two trackers with identical hardware but slightly differing firmware are used, which are called the femur and tibia trackers as they are placed
on the upper and lower parts of the leg, respectively. The outcome of the project is to have the two trackers communicate with each other using BLE to generate a knee angle measurement, and then to use BLE again to communicate with an application on a phone.
The PCBs are custom designed each have a nRF51822 plus the recommended antenna and related antenna circuitry.
The existing firmware has been written in C using the Nordic BLE SoftDevice API, and I believe it was originally developed from a BLE heart rate service demo application. The Nordic PCA10028 development kit is being used to upload the code to the trackers along with the Keil uVision IDE (see below for the hardware and software details). The femur tracker configured as GAP central and the tibia tracker configured as a GAP peripheral. After a connection is established between the mobile app (either the LightBlue app or my partner's custom application), the app acts as the GATT client and the femur tracker acts as the GATT server. The initialization of these trackers involves a two-step calibration process which must be performed before data can be sent from the femur tracker to the phone. At each stage of the calibration, the mobile application writes a '1' to the calibration characteristic of the BLE profile. The tracker firmware will then initiate the respective calibration step once this calibration write event has been detected.

I have been using the LightBlue app on my iOS device to simulate my partner's application, in the sense that it is simulating a mobile device for the femur tracker to connect to and send notifications. However, I have come across a seemingly inexplicable bug. Essentially what happens is that changing the structure of an if statement inside the update_knee_angle() function, which calculates the knee angle, triggers a disconnection between the femur tracker and my LightBlue iOS app after the first calibration is initiated (which is the action of writing a '1' to the calibration characteristic as explained above). I can say with confidence that this code change somehow triggers a disconnection because it is definitely repeatable; the faulty code consistently causes a disconnection within five seconds and the working code does not. My team partner's application also disconnects from the femur tracker when the faulty firmware is flashed onto the trackers, which suggests that it's most likely a firmware problem as opposed to a problem with the either the LightBlue app or my partner's app. Below is the comparison of the code that does not cause a disconnection and the code that does:

Working code (in update_knee_angle() ):
if ((tibia_info.timestamp - prev_tibia_timestamp) == (leg_data.timestamp - prev_femur_timestamp)) {...}

Problematic code #1:
int tibia_dt = tibia_info.timestamp - prev_tibia_timestamp;
int femur_dt = leg_data.timestamp - prev_femur_timestamp;
if ((tibia_dt) == (femur_dt)) {...}

Note that I have deliberately omitted the contents of the if statement block since the code contents in the above comparison are identical. In the above comparison, the logic is essentially identical, the only difference being the definition of two new integers. To further complicate the behaviour at hand, it was also found that even with the if statement format from the working code, if a SEGGER_RTT_printf() statement was placed inside the if statement block, this would also cause the trackers to disconnect after the first calibration. This code is shown below (in this case I do show some of the contents of the if statement to illustrate the point):

Problematic code #2:
if ((tibia_info.timestamp - prev_tibia_timestamp) == (leg_data.timestamp - prev_femur_timestamp))
{
    SEGGER_RTT_printf( ... arguments etc etc ...);
    // Not an actual comment, but other code lies within this if statement
    ....
    ....
}

Subsequent code development in this update_knee_angle() function also causes connectivity problems such as the femur tracker no longer being visible in the LightBlue app's device enumeration. This subsequent code development intends to remove the original if statement which seems partly responsible for the erroneous disconnections. It seems to be that modifying the code in the update_knee_angle() function in any significant way from the last known working commit is either partly or wholly responsible for these disconnections. I have not tried modifying other functions in our source code to see if this also causes connectivity issues. In the interests of eliminating some easier possibilities, I both increased the RAM size allocation from 0x7000 to 0x8000 (in case there was a stack/heap overflow) and I reduced the compiler optimization level from O3 to O1. Neither of these adjustments, in conjunction or separately, resolved the disconnection behaviour.


Unfortunately my understanding is that it is difficult to debug the code and set breakpoints, or just seems like there is no nice way to set breakpoints to find the issue for the SoC ecosystem (as discussed on DevZone post case ID 100476). I cannot see how a trivial code change in the custom section of the source code could possibly trigger invocations of sd_ble_gap_disconnect() or other BLE GAP/GATTS-related API that ultimately leads to the disconnection between the femur tracker and the phone? Could it possibly be related to timeouts?

Hardware information:
PCA10028 development kit with motherboard firmware 7000
nRF51822 on custom application PCB

Software information:
SDK v11.0.0 or v12.2.0 (here I'm not sure which SDK exactly it is, and I do not know a way of being able to find out)
Keil uVision V5.26.2.0 with MDK-ARM Professional Version: 5.26.2.0
Target DLL: Segger\JL2CM3.dll
Dialog DLL: TARMCM1.DLL
SoftDevice version S130_nRF51_2.0.1
iOS 12.4

Could some people provide advice on what else I can investigate or what I can do to eliminate this erroneous behaviour? I understand that I have only included code snippets, but if it is necessary for me to upload the source code and/or project files to ease the debugging process, please suggest this on this thread.

This would be much appreciated,
Cheers,

inf_sup_bus

Related