This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Sporadic Notification Bursts

Hi everybody,

we have a remote control based on the nRF52832 and are experiencing a wierd issue with notifications.

We are using a custom 6 byte payload keymap characteristic in a custom service, that uses notifications to send keypresses to the host. Usually this works fine. However when the remote is left idle for a couple of minutes, sporadically the next few notifications are bursted several times. This not an issue of our code calling the sd_ble_gatts_hvx funtion too often (we tested guarding it with app timer checks). If this issue occurs then the whole timeslot is filled with identical notifications, with the more data flag being set. Sometimes the issue is resolved with the next notification (key release), sometimes it continues for the next few button presses too, sometimes the release code is lost.

You can see one of these bursts as captured by a PCA10056 sniffer in Wireshark. Note that I have never seen one of the bursts before a LL_PING_REQ, but a ping does not reliably trigger this behaviour (in fact there are several pings, followed by normal notifications just before in this log).

Our initial project is based on the SDK 13.0.0 and the Bluetooth Developer Studio example using Softdevice 4.0.5. For debugging the issue the project was ported to SDK 14.2.0 with SD 5.1.0. We are using GNU Tools ARM Embedded/5.4 2016q2 on Windows 10 x64, but have tried the default GNU Tools ARM Embedded/4.9 2015q3 as well. Neither the SDK/SD nor the compiler switch changed the behaviour. It occurs connected to our custom central (nRF52832) as well as connected to an Android 9 Smartphone running nRF Connect. It does not seem to be a signal strength issue, since it occurs even with central reported RSSIs of -58dBm.

We do use a few non-standard softdevice settings for longer MTUs and event lengths due to other data heavy services, but since the problem occurs even without them those should not interfere here.

I have attached a minimal project, that shows this behaviour on the PCA10040 board. The only services used are the battery levels and the remote control service. It needs a plain SDK 14.2.0 and SD 5.1.0. Place the projects dir in the SDK root and replace the SDK bsp.h with the one from the archive (adds custom events). 

Connect to the "Remote PCA10040" device, enable service 0xdb60's 0xdb61 characteristics notifications and press the buttons. On push each button transmits a 6 byte bitmap that resembles our remote control. This message is repeated every 100ms until the button is released, at which point an empty bitmap is transmitted.

demo_minimal.zip

Could you please help us fixing this wierd issue? We are really running out of ideas what might cause this..

Thank you!

Jann

  • Hello Jann,

    Is it possible to send the .pcapng sniffertrace?

    Is the project suppose to send continuous notifications as long as you hold a button? Because it does that, but not at the rate that it does in your screenshot.

    I tried to replicate the issue using SDK14.2.0, S132 v5.1.0 andGNU Tools ARM Embedded 5.4 2016q2, but I have not been able to reproduce what you report. Is there any tricks? Any particular pattern I need to do to reproduce it? What do you connect to? nRF Connect? On Mobile or Desktop?

    This not an issue of our code calling the sd_ble_gatts_hvx funtion too often

    We will see :)

    Best regards,

    Edvin

  • Hi Edvin,

    thank you for your reply!

    Sure, I attached a trace for you. It was made with the PCA10040 demo, connected to an Android Smartphone with nRF Connect (just filtered out empty PDUs). The screenshot is from the actual remote control, the demo is just an approximation to replicate the behavior. The Demo outputs bitmaps at 100ms interval until the button is released, then instantly a single empty bitmap, the remote is a few ms different in regards of timing. The huge amount of notifications directly following each other with "more data" set to true in the screenshot however are the problem I am talking about. More data is only supposed to be able to be true rarely on a release code when button release and the scheduled 100ms retransmission happen to overlap.

    In the pcapng file you can see that after setup the first notifications up to number 137 are ok, there are a few set bitmasks as long as a button is held, then an empty one. Then the board is just left alone for a few minutes until at no 142 a button is pressed again and a notification burst is sent. The empty release code notification is also lost here. Starting a no 200 notifications work properly again.

    Thank you & best regards,

    Jann

    pca10040remote-filt.pcapng

  • Hello Jann,

    I see, and thank you for the pcapng file. 

    Do you have a project that has logging enabled that also replicate this issue?

  • Hello Edvin,

    I've attached the demo project with enabled log for you.

    There is a log_info call on every button press, release and repeat, as well as in the called rcu_update_keys function that takes care of transmission. Interestingly if a burst occurs the log shows a single button down and release event, but a multitude of repeats and transmissions are logged. It seems as if the repeat app timer goes berserk. Judging from the time it takes until the issue occurs it might be coming from the RTC overflow. And since we used app timer routines for guarding off the hvx call that explains why it didn't work out either.

    Are there any additional precautions to take with repeated app timers? Should we rather use single shot timers and restart them inside the handler? Or is it recommended to use the app timer in scheduler mode?

    Thank you & kind regards,

    Jann

    demo-minimal-log.zip

  • Hmm nope, switching to a single shot timer doesn't help. As soon as I start the 100ms timer it just keeps firing away. Even force-stopping it before restarting (in single shot mode) doesn't help.. As if the timer op queue was all filled up with calls to the repeat function..

    €: I've now added an output of app_timer_cnt_get every time the repeat timer callback is executed and I see that once the issue occurs the time between the calls drop from the requested 100ms to ~2ms. Also it does not seem to be wraparound related, as it started at a count of about 0x00fc0000 and went on well past the overflow upto about 0x00050700. Next wraparound didnt show the issue at all.

    I tried decoupling the TX from the timer interrupt by just setting a flag and evaluating that in the main loop, but with 2ms between calls that of course didnt help. 

Related