This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Erroneous BLE disconnection associated with trivial, unrelated code changes with nRF51822

Hi all,

I am continuing with university project now in its 4th year of development for a wearables leg motion measurement application. Two trackers with identical hardware but slightly differing firmware are used, which are called the femur and tibia trackers as they are placed
on the upper and lower parts of the leg, respectively. The outcome of the project is to have the two trackers communicate with each other using BLE to generate a knee angle measurement, and then to use BLE again to communicate with an application on a phone.
The PCBs are custom designed each have a nRF51822 plus the recommended antenna and related antenna circuitry.
The existing firmware has been written in C using the Nordic BLE SoftDevice API, and I believe it was originally developed from a BLE heart rate service demo application. The Nordic PCA10028 development kit is being used to upload the code to the trackers along with the Keil uVision IDE (see below for the hardware and software details). The femur tracker configured as GAP central and the tibia tracker configured as a GAP peripheral. After a connection is established between the mobile app (either the LightBlue app or my partner's custom application), the app acts as the GATT client and the femur tracker acts as the GATT server. The initialization of these trackers involves a two-step calibration process which must be performed before data can be sent from the femur tracker to the phone. At each stage of the calibration, the mobile application writes a '1' to the calibration characteristic of the BLE profile. The tracker firmware will then initiate the respective calibration step once this calibration write event has been detected.

I have been using the LightBlue app on my iOS device to simulate my partner's application, in the sense that it is simulating a mobile device for the femur tracker to connect to and send notifications. However, I have come across a seemingly inexplicable bug. Essentially what happens is that changing the structure of an if statement inside the update_knee_angle() function, which calculates the knee angle, triggers a disconnection between the femur tracker and my LightBlue iOS app after the first calibration is initiated (which is the action of writing a '1' to the calibration characteristic as explained above). I can say with confidence that this code change somehow triggers a disconnection because it is definitely repeatable; the faulty code consistently causes a disconnection within five seconds and the working code does not. My team partner's application also disconnects from the femur tracker when the faulty firmware is flashed onto the trackers, which suggests that it's most likely a firmware problem as opposed to a problem with the either the LightBlue app or my partner's app. Below is the comparison of the code that does not cause a disconnection and the code that does:

Working code (in update_knee_angle() ):
if ((tibia_info.timestamp - prev_tibia_timestamp) == (leg_data.timestamp - prev_femur_timestamp)) {...}

Problematic code #1:
int tibia_dt = tibia_info.timestamp - prev_tibia_timestamp;
int femur_dt = leg_data.timestamp - prev_femur_timestamp;
if ((tibia_dt) == (femur_dt)) {...}

Note that I have deliberately omitted the contents of the if statement block since the code contents in the above comparison are identical. In the above comparison, the logic is essentially identical, the only difference being the definition of two new integers. To further complicate the behaviour at hand, it was also found that even with the if statement format from the working code, if a SEGGER_RTT_printf() statement was placed inside the if statement block, this would also cause the trackers to disconnect after the first calibration. This code is shown below (in this case I do show some of the contents of the if statement to illustrate the point):

Problematic code #2:
if ((tibia_info.timestamp - prev_tibia_timestamp) == (leg_data.timestamp - prev_femur_timestamp))
{
    SEGGER_RTT_printf( ... arguments etc etc ...);
    // Not an actual comment, but other code lies within this if statement
    ....
    ....
}

Subsequent code development in this update_knee_angle() function also causes connectivity problems such as the femur tracker no longer being visible in the LightBlue app's device enumeration. This subsequent code development intends to remove the original if statement which seems partly responsible for the erroneous disconnections. It seems to be that modifying the code in the update_knee_angle() function in any significant way from the last known working commit is either partly or wholly responsible for these disconnections. I have not tried modifying other functions in our source code to see if this also causes connectivity issues. In the interests of eliminating some easier possibilities, I both increased the RAM size allocation from 0x7000 to 0x8000 (in case there was a stack/heap overflow) and I reduced the compiler optimization level from O3 to O1. Neither of these adjustments, in conjunction or separately, resolved the disconnection behaviour.


Unfortunately my understanding is that it is difficult to debug the code and set breakpoints, or just seems like there is no nice way to set breakpoints to find the issue for the SoC ecosystem (as discussed on DevZone post case ID 100476). I cannot see how a trivial code change in the custom section of the source code could possibly trigger invocations of sd_ble_gap_disconnect() or other BLE GAP/GATTS-related API that ultimately leads to the disconnection between the femur tracker and the phone? Could it possibly be related to timeouts?

Hardware information:
PCA10028 development kit with motherboard firmware 7000
nRF51822 on custom application PCB

Software information:
SDK v11.0.0 or v12.2.0 (here I'm not sure which SDK exactly it is, and I do not know a way of being able to find out)
Keil uVision V5.26.2.0 with MDK-ARM Professional Version: 5.26.2.0
Target DLL: Segger\JL2CM3.dll
Dialog DLL: TARMCM1.DLL
SoftDevice version S130_nRF51_2.0.1
iOS 12.4

Could some people provide advice on what else I can investigate or what I can do to eliminate this erroneous behaviour? I understand that I have only included code snippets, but if it is necessary for me to upload the source code and/or project files to ease the debugging process, please suggest this on this thread.

This would be much appreciated,
Cheers,

inf_sup_bus

  • This project is not using FreeRTOS. The snippet of ble_central_adv_update() is attached.

    Would you agree that the application is likely crashing because '(int_16t) knee_angle_deg' is being copied into an unallocated space in memory, which is an illegal memory access, thus firing the HardFault_Handler?

    My reasoning is that it's more likely the illegal memory access that's causing the handler to fire as opposed to copying a 16 bit type into a memory space pointed by a pointer to uint8_t (which is knee_angle in this case), since the word length is 32 bits on the Cortex M0. As far as I know no packing is used in the source code.

  • This is your peripheral, right? I believe it is, but I get a bit confused by the name (ble_central_adv_information_update).

    You should check the return values of your softdevice function calls, such as sd_ble_gatts_hvx();

    I suggest that you change the function from void ble_central_adv_information_update() to uint32_t ble_central_adv_information_update(), and return the value from last function call:

    void ble_central_adv_information_update(ble_mpu_c_t *p_mpu, uint8_t* knee_angle)
    {
    
        if(p_mpu->conn_handle != BLE_CONN_HANDLE_INVALID)
        {
            ... // All the things that you already had
            hvx_params.p_data = (uint8_t*)knww_angle;
            return sd_ble_gatts_hvx(p_mpu->conn_handle, &hvx_params);
        }
        else // p_mpu->conn_handle == BLE_CONN_HANDLE_INVALID
        {
            return NRF_ERROR_INVALID_STATE;
        }
    
    }

    and then check the return value from your function:

    uint32_t err_code;
    
    knee_angle[0] = (int16_t) knee_angle_deg;
    err_code = ble_central_adv_information_update(&m_ble_mpu_c, knee_angle);
    
    if (err_code != NRF_ERROR_INVALIDE_STATE)
    {
        APP_ERROR_CHECK(err_code);
    }

    This way it is easier to check what the application is doing. What is the return value from ble_central_adv_information_update(&m_ble_mpu_c, knee_angle);

  • Hello Edvin,

    I have adjusted the code to how you have suggested. These are the values of the error code, line number and file names in the error_info_t after the application stopped at the breakpoint in app_error_handler.

    The error code has value 0x3401 or 13313 decimal.

  • ok. Closing in. Can you hover over the p_file_name, and see if you can find the name of the file that it is pointing to. Alternatively, copy the memory 1 values that you have in your screenshot, and see the path (if you convert them to ascii/char values. It is somewhere in components\Leg<something-something>. Should be a .c file.

    If you remove the first breakpoint and add the second, it should update the error_info variable, but you can see that the line number "line_num" that is passed onto this function is 0x196 = decimal: 406. So in one of the files in the path components\Leg... there is a APP_ERROR_CHECK(err_code) on line 406. This one receives an err_code != 0. What is the function that returned this err_code?

    EDIT:
    I checked your path, but I believe you can see it if you hover over the mouse. In case not:

    So your file knee_angle.c has an APP_ERROR_CHECK(err_code); on line 406. What function returned this value? If it is not a direct softdevice call, what function inside this function returned this return value?

    Best regards,

    Edvin

  • Hello,

    Yes there is definitely an APP_ERROR_CHECK(err_code) function call in line 406 in knee_angle.c. Removing the first breakpoint and adding the second breakpoint (where the second breakpoint is on line 54 of app_error.c) returns the same error information in the error_info_t structure as in my previous post.

    The err_code in this context is not directly returned by a SoftDevice call, it is returned by ble_central_adv_information_update(...). The definition of ble_central_adv_information_update(...) is defined exactly as you recommended in your response on the 12th August.

    The err_code variable is being returned from sd_ble_gatts_hvx(...), which is called within ble_central_adv_information_update(...). The value of the error code is 13313 decimal or 0x3401, which is the same error code as my last post (essentially, it's the same error that's occurring). This sd_ble_gatts_hvx(...) call is performing a notification, where the knee angle is being transmitted to a remote mobile phone.

Related