This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

disconnecting while operations are in progress never gives BLE_GAP_EVT_DISCONNECTED event

2020-01-24-092119EST-ProductStoppedGettingEventsFromNordicDK.txtImprivataTestNordicEventsNotReceived.zipCalls_to_pc_ble_driver.cpp0285.2020-02-24-TestProgramUploadedToNordicSupport.zipFeb25TestProgramUploadedToNordicSupport.zipImprivata_bgTestApp.zipbgSDKTestAppMay4.zip2020-05-05-035347-NordicDK_USB840M_200505_ClockInternal_2in1.hex.txt.txtbgSDKTestAppMay6.zipI’m developing an application based on pc-ble-driver to talk to an nRF52840-based dongle (from Fanstel).

I’m having trouble disconnecting cleanly when a connection has operations in progress.  For example, I call ‘sd_ble_gattc_write’, which returns NRF_SUCCESS, but I don’t receive event BLE_GATTC_EVT_WRITE_RSP (after waiting 60 seconds), so I decide to disconnect. When this happens, sd_ble_gap_disconnect returns NRF_SUCCESS, but I do not receive BLE_GAP_EVT_DISCONNECTED even after waiting 30 seconds. The connection supervision timeout is 4 seconds.  What could cause the disconnect to not generate any BLE_GAP_EVT_DISCONNECTED event?

What I’m trying to accomplish here: if a connection is not responsive, I want to end that connection, without disturbing other connections I have open.

Thanks,

Paul Bradford

Parents
  • I have attached a new test program Imprivata_bgTestApp.zip. There is a file called README with instructions, and a description of the problem.   I suggest you try this on a Dell 64-bit Windows 10 laptop that has not had Nordic dev tools installed on it. Usually the problem occurs with in an hour or two.

    This test program reproduces the problem on two 64-bit Windows 10 laptops and does not reproduce this on one 64-bit Windows 10 laptop. I don't know what difference between these laptops causes two to fail and one to succeed. All have the same driver usbser.sys 10.0.18362.1 winbuild 160101.0800. Problem has occurred when plugged into USB 2.0 and USB 3.0 ports.

    Failing laptops: both Dell, one Windows 10 version 1903 build 18362.657, another Windows 10 version 1909 build 18363.752 (the latest public Windows 10 build)

    Laptop that does not reproduce the problem: Lenovo ThinkPad, Windows 10 version 1903 build 18362.657 (exactly the same as one failing laptop). This one has had Nordic tools like nRF Connect installed. One consequence is that when I plugged in an nRF52840 it would show in Device Manager as 'nRF Connect USB CDC ACM (COMx)'. To make it like my other laptops that do not have Nordic tools, I uninstalled the driver so that the nRF52840 would show up as 'USB Serial Device (COMx)'

  • I have had the new application running on my PC with 5 parallel devices for 8 hours yesterday and 4 hours today. So far, I have not been able to reproduce the issue. I do have another PC that does not have Nordic tools installed, but this is not a Dell PC. I will try to setup the test on that one next. It's a bit harder to find equipment for testing this with the limited factors where you have managed to reproduce the issue, due most of us working from home-office due to the Corona situation. Is it still a requirement to have heavy Bluetooth activity around the device in order to reproduce the issue? This could also be a problem to facilitate during home-office.

  • Hi Paul,

    I tracked down the reset that I'm seeing to a call to sd_nvic_SystemReset() in ser_conn_generic_command_process() when a SER_GENERIC_CMD_RESET command is received. This indicates that the pc-ble-driver application is issuing a reset command. This also looks to be the case in the log output from your application:

    >bgsdkTestApp.exe -u27527585-e5bb-4697-b0af-0e92785043b6 -f5 -v0 -cCOM4
    use challenge/response frequency of 5 seconds
    logging level 0
    'd' to disconnect any connected devices
    'q' to quit program
    'r' to do a hard reset of the dongle - after this the program exits
    's' to show status of all monitored phones
    '+' to increase logging level
    '-' to decrease logging level
    2020-05-05 15:49:48.999: (WARNING) Nordic event handler received an un-handled event with ID:  30
    2020-05-05 15:50:31.955: (WARNING) restart scanning because we've gone 10 timer ticks with no advertisements
    2020-05-05 15:50:33.799: (ERROR) Error:  Failed to decode event, error code is 14/0xe.
    2020-05-05 15:50:33.924: (WARNING) bgsdk::details::AdvDataParser::Parse MAC =  42:f5:68:c6:cd:d1 failed to parse 1e ff 06 00 01 09 20 02 9b 89 59 dc cd f2 0f a9 46 2f b7 61 7a  invalid advertisement: section length = 30 results in end index 31 > size 21
    2020-05-05 15:50:47.656: (WARNING) restart scanning because we've gone 10 timer ticks with no advertisements
    2020-05-05 15:50:48.968: (ERROR) Error:  Failed to decode event, error code is 14/0xe.
    2020-05-05 15:50:49.281: (ERROR) Error:  Failed to decode event, error code is 14/0xe.
    2020-05-05 15:55:40.263: (WARNING) trigger reset hardware_radio_error because we've gone 300 timer ticks with no advertisements, indicating the Bluegiga hardware failed
    IMPBGSDK reported HardwareFailure (for Nordic this happens when we don't detect advertisements for a long time)
    IMPBGSDK reported HardwareSoftResetInitiated WITH DeviceRemoval (for Nordic this happens when we don't detect advertisements for a long time)
    2020-05-05 15:55:40.910: (ERROR) Error:  serial port write operation on port COM4 failed. Error: The device does not recognize the command.[22]
    2020-05-05 15:55:42.218: (WARNING) resetting Nordic failed:  0x802a NRF_ERROR_SD_RPC_H5_TRANSPORT_NO_RESPONSE
    2020-05-05 15:55:43.773: (ERROR) Error:  Error purging UART 22
    2020-05-05 15:55:43.779: (WARNING) std::exception during keepalive: sd_ble_gattc_read failed (Error: 32773) : 0x8005 NRF_ERROR_SD_RPC_NO_RESPONSE
    2020-05-05 15:55:43.784: (ERROR) bgsdk::BeaconController::Impl::StopAdvertising Failed to stop advertising:  Nordic adapter closed (Error: 36865) : Unknown error
    2020-05-05 15:55:43.786: (ERROR) bgsdk::BeaconController::Impl::StartAdvertising Failed to start advertising:  Nordic adapter closed (Error: 36865) : Unknown error
    2020-05-05 15:55:43.787: (WARNING) start scanning caused exception:  Nordic adapter closed (Error: 36865) : Unknown error
    2020-05-05 15:55:43.787: (WARNING) Event wait terminates because of global reset

    This was seen when running your latest application (bgSDKTestAppMay4.zip). I did not see similar output in the old application. Did you make any changes to the application to make the chip reset? I will try to test with the Imprivata_bgTestApp.zip application to see if I'm still able to reproduce.

    Best regards,
    Jørgen

  • The reset you are seeing is induced by the test program, and has changed since prior versions. This test program uses lots of code from our product, and one thing it does is reset the nRF52840 after a prolonged period of receiving no advertisements, since we're scanning constantly and should always be seeing advertisements - we take this 5+ minutes with advertisements to mean the chip has failed, so we reset it. Note that earlier we had noticed that advertisements stopped and had tried to restart them: "restart scanning because we've gone 10 timer ticks with no advertisements".  Does the firmware logging show what's going on that causes no advertisements to be received? 

    It may not mean anything, but the line of output from the test program "Nordic event handler received an un-handled event with ID:  30" indicates it received event BLE_GAP_EVT_SEC_REQUEST which the product does not process. I don't recall ever seeing that event. Is your test phone somehow set up to requiring pairing or bonding? If it were, it would not successfully stay connected to the test program.

    When you run the test program, do you verify that shortly after starting it has connected to the phone ("phone is being tracked") as described in the README? 

    On the PuTTY question from earlier, I had missed the instruction about wiring pins on the DK. I'll work on this with the limited equipment I have at home.

  • Hi Jorgen,

    Thanks for detail.

    I found where cause the new problem on my code.

    I am following the latest Nordic patch file and compiler again.

    I already installed three dongles to try.

    My log is setup to RTT Viewer.

    https://www.dropbox.com/s/z5xll23kkcpaxks/USB840M_200506.rar?dl=0

  • Paul Bradford said:
    after a prolonged period of receiving no advertisements

    Does this include all advertisements or only advertisements from the tracked phone? How long time is "10 timer ticks"? I see this message a lot when I leave the phone for a while, it kinda seems like the phone/application enters some kind of state where it pauses/terminates the application. I have tried to disable power optimization for the app on the phone. I have verified with the 's' command that the phone is actually being tracked, but sometimes it does report that it is not tracked, until I unlock the phone. I have not purposedly set any configs on the phone side, it should use the default settings of the Samsung Galaxy S7. I do not see this event on when testing Huawei P30 Pro.

    Let me know if you are not able to wire the pins for logging, I can build you a version with log on P0.06.

    @leochen: I was not able to reproduce the original issue with RTT logging enabled/RTT Viewer open. If you are always running this with the firmware, it could be the reason that you did not manage to reproduce the problem.

  • "Receiving no advertisements" means any advertisement at all, not just from the tracked phone.  The timer tick is a one-second period where we aren't doing any work other than scanning - since we're only scanning, we expect we'd received advertisements.  Between timer ticks we do stuff like connecting to phones, so N timer ticks can be much longer than N seconds.  This issue of not receiving advertisements is a real problem that I'll need to resolve, but I don't want it to interfere with this support case, so I've uploaded modified test program bgSDKTestAppMay6.zip that doesn't reset the nRF52840.
    I was able to wire the pins for logging. I removed the wire to use Leo's firmware that did RTT logging (which so far is not reproducing the issue).
Reply
  • "Receiving no advertisements" means any advertisement at all, not just from the tracked phone.  The timer tick is a one-second period where we aren't doing any work other than scanning - since we're only scanning, we expect we'd received advertisements.  Between timer ticks we do stuff like connecting to phones, so N timer ticks can be much longer than N seconds.  This issue of not receiving advertisements is a real problem that I'll need to resolve, but I don't want it to interfere with this support case, so I've uploaded modified test program bgSDKTestAppMay6.zip that doesn't reset the nRF52840.
    I was able to wire the pins for logging. I removed the wire to use Leo's firmware that did RTT logging (which so far is not reproducing the issue).
Children
  • I did a quick test run with logging enabled for all SD events and function-calls in the connectivity FW. I tried to synch the clocks as good as possible (second accuracy), as the logger and application run on different computers This is the output from your application:

    2020-05-06 20:58:36.352: (WARNING) restart scanning because we've gone 10 timer ticks with no advertisements

    This is the log file from the connectivity FW: 20200506_sdcalls_events.log

    From what I can see, it looks like there are ADV_REPROT events generated about every 1-2 seconds before the reset in your application, so I'm not sure if the events are not passed fast enough to your application, or if they are not processed quickly enough in your application. There is also a huge amount of RSSI_CHANGED events, which may slow things down. Do you require to be notified on every RSSI change in your application? You can set the threshold/skip count in the softdevice API call: sd_ble_gap_rssi_start().

    What connection parameters do you have for the connection? From the RSSI events, it could look like the connection interval is set to 30 ms? What about connection event length, slave latency, etc? This will all affect the available radio time for doing scanning. Since you are doing advertising simultaneously, you may reach the radio time limit if your parameters are not carefully selected. Check out the Scheduling chapter in the softdevice specifications.

Related