This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

disconnecting while operations are in progress never gives BLE_GAP_EVT_DISCONNECTED event

2020-01-24-092119EST-ProductStoppedGettingEventsFromNordicDK.txtImprivataTestNordicEventsNotReceived.zipCalls_to_pc_ble_driver.cpp0285.2020-02-24-TestProgramUploadedToNordicSupport.zipFeb25TestProgramUploadedToNordicSupport.zipImprivata_bgTestApp.zipbgSDKTestAppMay4.zip2020-05-05-035347-NordicDK_USB840M_200505_ClockInternal_2in1.hex.txt.txtbgSDKTestAppMay6.zipI’m developing an application based on pc-ble-driver to talk to an nRF52840-based dongle (from Fanstel).

I’m having trouble disconnecting cleanly when a connection has operations in progress.  For example, I call ‘sd_ble_gattc_write’, which returns NRF_SUCCESS, but I don’t receive event BLE_GATTC_EVT_WRITE_RSP (after waiting 60 seconds), so I decide to disconnect. When this happens, sd_ble_gap_disconnect returns NRF_SUCCESS, but I do not receive BLE_GAP_EVT_DISCONNECTED even after waiting 30 seconds. The connection supervision timeout is 4 seconds.  What could cause the disconnect to not generate any BLE_GAP_EVT_DISCONNECTED event?

What I’m trying to accomplish here: if a connection is not responsive, I want to end that connection, without disturbing other connections I have open.

Thanks,

Paul Bradford

Parents
  • I have attached a new test program Imprivata_bgTestApp.zip. There is a file called README with instructions, and a description of the problem.   I suggest you try this on a Dell 64-bit Windows 10 laptop that has not had Nordic dev tools installed on it. Usually the problem occurs with in an hour or two.

    This test program reproduces the problem on two 64-bit Windows 10 laptops and does not reproduce this on one 64-bit Windows 10 laptop. I don't know what difference between these laptops causes two to fail and one to succeed. All have the same driver usbser.sys 10.0.18362.1 winbuild 160101.0800. Problem has occurred when plugged into USB 2.0 and USB 3.0 ports.

    Failing laptops: both Dell, one Windows 10 version 1903 build 18362.657, another Windows 10 version 1909 build 18363.752 (the latest public Windows 10 build)

    Laptop that does not reproduce the problem: Lenovo ThinkPad, Windows 10 version 1903 build 18362.657 (exactly the same as one failing laptop). This one has had Nordic tools like nRF Connect installed. One consequence is that when I plugged in an nRF52840 it would show in Device Manager as 'nRF Connect USB CDC ACM (COMx)'. To make it like my other laptops that do not have Nordic tools, I uninstalled the driver so that the nRF52840 would show up as 'USB Serial Device (COMx)'

  • I have had the new application running on my PC with 5 parallel devices for 8 hours yesterday and 4 hours today. So far, I have not been able to reproduce the issue. I do have another PC that does not have Nordic tools installed, but this is not a Dell PC. I will try to setup the test on that one next. It's a bit harder to find equipment for testing this with the limited factors where you have managed to reproduce the issue, due most of us working from home-office due to the Corona situation. Is it still a requirement to have heavy Bluetooth activity around the device in order to reproduce the issue? This could also be a problem to facilitate during home-office.

  • Hi Jorgen,

    The following link is source project that I am using.

    https://www.dropbox.com/s/t3pcunczygxxr7k/nRF5_SDK_15.3.0_59ac345_PC_Connect_200505.rar?dl=0

    I did compare with the source patched project and can't see any thing wrong.

    Would you please also give me the entire SDK that you are using to compare.

    The Global setting

    Macro:SDK=E:\Keil_v5\ARM\Device\Nordic\nRF5_SDK_15.3.0_59ac345_PC_Connect_200505

    The Project path

    ~\nRF5_SDK_15.3.0_59ac345_PC_Connect_200505\examples\connectivity\ble_connectivity\pca10056\ser_s140_usb_hci\ses

    The HEX files that  replace error handling (External Clock does not work with Fanstel USB840M).

    https://www.dropbox.com/s/z6u7f227h2sogl7/USB840M0505.rar?dl=0

    How long you will see the error happen?

    I programmed 3 dongles and running over 4 hours.

    I can't see that error on my Win 7  desktop.

    Regards,

    Leo

  • I tested the firmware Leo referenced. With the dev-dongle, I tried both USB840M_200505_ClockInternal.zip and USB840M_200505_ClockExternal.zip and both failed with the "serial port write operation" error after a few minutes.

    With the Nordic DK I tried USB840M_200505_ClockInternal_2in1.hex and got "serial port write operation" error after 30 minutes. I attached the firmware logging file 2020-05-05-035347-NordicDK_USB840M_200505_ClockInternal_2in1.hex.txt

    The firmware logging ends with 

    [00:00:02.638,154] <debug> app: event:BLE_GATTC_EVT_HVX
    [00:00:02.643,318] <debug> sphy_hci: TX request (34 bytes)
    [00:00:02.648,884] <debug> sphy_hci: Started TX packet (payload 34).
    [00:00:02.655,796] <error> app: Fatal error
    [00:00:02.659,891] <error> app: ERROR 4 [NRF_ERROR_NO_MEM] at :0
    PC at: 0x00000000
    [00:00:02.667,605] <error> app: End of error report
    [00:00:02.674,529] <info> app: BLE Clock Internal,UART log,Size 64,CRITICAL_REGION,
    [00:00:02.682,869] <info> app: USB power detected
    [00:00:02.688,683] <info> app: USB ready

    I am now testing the Nordic DK with USB840M_200505_ClockExternal_2in1.hex

  • I reproduced the issue with the hex-files from Leo as well. How exactly did you change the scheduler queue size? As far as I can see, this is still 16 in the latest project you uploaded (nRF5_SDK_15.3.0_59ac345_PC_Connect_200505.rar):

    // In ser_conn_handlers.h:
    /** Maximum number of events in the application scheduler queue. */
    #ifdef S112
    #define SER_CONN_SCHED_QUEUE_SIZE             8u
    #else
    #define SER_CONN_SCHED_QUEUE_SIZE             16u
    #endif
    
    // In main.c:
    APP_SCHED_INIT(SER_CONN_SCHED_MAX_EVENT_DATA_SIZE, SER_CONN_SCHED_QUEUE_SIZE);

    And when debugging, it is set to 16 when app_scheduler is initialized:

    I created a clean patched SDK, with the appropriate fixes and a few changes. Can you please try building using this? These are the changes (can also be seen in attached patch-file:changes.patch)

    • Added Critical region in main.c
    • Increased scheduler size to 64
    • Removed serialization error handler, use SDK default which will output error info
    • Added DEBUG and DEBUG_NRF preprocessor defines to Release build configuration, to catch errors and asserts.
    • Added separate build configurations for different LF clocks: Release_LFXO (external crystal) and Release_LFRC (internal RC oscillator).
    • Enabled UART logging, and set the log level to only output error logs (debug logs adds heavy CPU load).
    • Changed UART log TX pin to P0.31, to allow reading logs from nRF52840 Dongle. When running on DK, a wire must be connected between P0.31 and P0.06 to output logs. P0.31 on the dongle can be connected to P0.06 on an empty DK to read logs.

    nRF5_SDK_15.3.0_59ac345_patched_addedfixes.zip

    Precompiled hex-files:

    I have not yet received the Fanstel Dongles, so I'm not sure if these will work with that. You may need to do some changes to support the bootloader etc.

  • I loaded 200505_ble_connectivity_s140_usb_hci_pca10056_lfrc_mergedsoftdevice.hex into the Nordic DK, and I'm running the test program. But when I start PuTTY to log the JLINK serial port, I see no output at all. I'm using the same procedure I've been using the last few days, including BAUD rate 115200.

  • Did you connect a wire between P0.31 and P0.06 on the DK, as I described in the last change-point below?

    Jørgen Holmefjord said:
    When running on DK, a wire must be connected between P0.31 and P0.06 to output logs.
Reply Children
  • Hi Paul,

    I tracked down the reset that I'm seeing to a call to sd_nvic_SystemReset() in ser_conn_generic_command_process() when a SER_GENERIC_CMD_RESET command is received. This indicates that the pc-ble-driver application is issuing a reset command. This also looks to be the case in the log output from your application:

    >bgsdkTestApp.exe -u27527585-e5bb-4697-b0af-0e92785043b6 -f5 -v0 -cCOM4
    use challenge/response frequency of 5 seconds
    logging level 0
    'd' to disconnect any connected devices
    'q' to quit program
    'r' to do a hard reset of the dongle - after this the program exits
    's' to show status of all monitored phones
    '+' to increase logging level
    '-' to decrease logging level
    2020-05-05 15:49:48.999: (WARNING) Nordic event handler received an un-handled event with ID:  30
    2020-05-05 15:50:31.955: (WARNING) restart scanning because we've gone 10 timer ticks with no advertisements
    2020-05-05 15:50:33.799: (ERROR) Error:  Failed to decode event, error code is 14/0xe.
    2020-05-05 15:50:33.924: (WARNING) bgsdk::details::AdvDataParser::Parse MAC =  42:f5:68:c6:cd:d1 failed to parse 1e ff 06 00 01 09 20 02 9b 89 59 dc cd f2 0f a9 46 2f b7 61 7a  invalid advertisement: section length = 30 results in end index 31 > size 21
    2020-05-05 15:50:47.656: (WARNING) restart scanning because we've gone 10 timer ticks with no advertisements
    2020-05-05 15:50:48.968: (ERROR) Error:  Failed to decode event, error code is 14/0xe.
    2020-05-05 15:50:49.281: (ERROR) Error:  Failed to decode event, error code is 14/0xe.
    2020-05-05 15:55:40.263: (WARNING) trigger reset hardware_radio_error because we've gone 300 timer ticks with no advertisements, indicating the Bluegiga hardware failed
    IMPBGSDK reported HardwareFailure (for Nordic this happens when we don't detect advertisements for a long time)
    IMPBGSDK reported HardwareSoftResetInitiated WITH DeviceRemoval (for Nordic this happens when we don't detect advertisements for a long time)
    2020-05-05 15:55:40.910: (ERROR) Error:  serial port write operation on port COM4 failed. Error: The device does not recognize the command.[22]
    2020-05-05 15:55:42.218: (WARNING) resetting Nordic failed:  0x802a NRF_ERROR_SD_RPC_H5_TRANSPORT_NO_RESPONSE
    2020-05-05 15:55:43.773: (ERROR) Error:  Error purging UART 22
    2020-05-05 15:55:43.779: (WARNING) std::exception during keepalive: sd_ble_gattc_read failed (Error: 32773) : 0x8005 NRF_ERROR_SD_RPC_NO_RESPONSE
    2020-05-05 15:55:43.784: (ERROR) bgsdk::BeaconController::Impl::StopAdvertising Failed to stop advertising:  Nordic adapter closed (Error: 36865) : Unknown error
    2020-05-05 15:55:43.786: (ERROR) bgsdk::BeaconController::Impl::StartAdvertising Failed to start advertising:  Nordic adapter closed (Error: 36865) : Unknown error
    2020-05-05 15:55:43.787: (WARNING) start scanning caused exception:  Nordic adapter closed (Error: 36865) : Unknown error
    2020-05-05 15:55:43.787: (WARNING) Event wait terminates because of global reset

    This was seen when running your latest application (bgSDKTestAppMay4.zip). I did not see similar output in the old application. Did you make any changes to the application to make the chip reset? I will try to test with the Imprivata_bgTestApp.zip application to see if I'm still able to reproduce.

    Best regards,
    Jørgen

  • The reset you are seeing is induced by the test program, and has changed since prior versions. This test program uses lots of code from our product, and one thing it does is reset the nRF52840 after a prolonged period of receiving no advertisements, since we're scanning constantly and should always be seeing advertisements - we take this 5+ minutes with advertisements to mean the chip has failed, so we reset it. Note that earlier we had noticed that advertisements stopped and had tried to restart them: "restart scanning because we've gone 10 timer ticks with no advertisements".  Does the firmware logging show what's going on that causes no advertisements to be received? 

    It may not mean anything, but the line of output from the test program "Nordic event handler received an un-handled event with ID:  30" indicates it received event BLE_GAP_EVT_SEC_REQUEST which the product does not process. I don't recall ever seeing that event. Is your test phone somehow set up to requiring pairing or bonding? If it were, it would not successfully stay connected to the test program.

    When you run the test program, do you verify that shortly after starting it has connected to the phone ("phone is being tracked") as described in the README? 

    On the PuTTY question from earlier, I had missed the instruction about wiring pins on the DK. I'll work on this with the limited equipment I have at home.

  • Paul Bradford said:
    after a prolonged period of receiving no advertisements

    Does this include all advertisements or only advertisements from the tracked phone? How long time is "10 timer ticks"? I see this message a lot when I leave the phone for a while, it kinda seems like the phone/application enters some kind of state where it pauses/terminates the application. I have tried to disable power optimization for the app on the phone. I have verified with the 's' command that the phone is actually being tracked, but sometimes it does report that it is not tracked, until I unlock the phone. I have not purposedly set any configs on the phone side, it should use the default settings of the Samsung Galaxy S7. I do not see this event on when testing Huawei P30 Pro.

    Let me know if you are not able to wire the pins for logging, I can build you a version with log on P0.06.

    @leochen: I was not able to reproduce the original issue with RTT logging enabled/RTT Viewer open. If you are always running this with the firmware, it could be the reason that you did not manage to reproduce the problem.

  • "Receiving no advertisements" means any advertisement at all, not just from the tracked phone.  The timer tick is a one-second period where we aren't doing any work other than scanning - since we're only scanning, we expect we'd received advertisements.  Between timer ticks we do stuff like connecting to phones, so N timer ticks can be much longer than N seconds.  This issue of not receiving advertisements is a real problem that I'll need to resolve, but I don't want it to interfere with this support case, so I've uploaded modified test program bgSDKTestAppMay6.zip that doesn't reset the nRF52840.
    I was able to wire the pins for logging. I removed the wire to use Leo's firmware that did RTT logging (which so far is not reproducing the issue).
  • I did a quick test run with logging enabled for all SD events and function-calls in the connectivity FW. I tried to synch the clocks as good as possible (second accuracy), as the logger and application run on different computers This is the output from your application:

    2020-05-06 20:58:36.352: (WARNING) restart scanning because we've gone 10 timer ticks with no advertisements

    This is the log file from the connectivity FW: 20200506_sdcalls_events.log

    From what I can see, it looks like there are ADV_REPROT events generated about every 1-2 seconds before the reset in your application, so I'm not sure if the events are not passed fast enough to your application, or if they are not processed quickly enough in your application. There is also a huge amount of RSSI_CHANGED events, which may slow things down. Do you require to be notified on every RSSI change in your application? You can set the threshold/skip count in the softdevice API call: sd_ble_gap_rssi_start().

    What connection parameters do you have for the connection? From the RSSI events, it could look like the connection interval is set to 30 ms? What about connection event length, slave latency, etc? This will all affect the available radio time for doing scanning. Since you are doing advertising simultaneously, you may reach the radio time limit if your parameters are not carefully selected. Check out the Scheduling chapter in the softdevice specifications.

Related