This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Softdevice S140 temporarily stops advertising BLE

We are using softdevice S140 6.1.1 with nrf52840 for a BLE peripheral. We see that mobile devices are not able to discover this peripheral intermittently for some time during the day ranging few minutes to hours. In an exception case, it required a physical power reset to recover from this issue. We use FAST BLE advertising with interval of 150ms and timeout of 0 (infinite). We additionally handle GAP connection and disconnection events to change the advertisement to Non-connectable when someone connects and back to connectable when the user disconnects. We don't see any failures in firmware logs which can indicate an issue with BLE configuration. These issues were reported for production peripheral devices and we have not been able to able to reproduce them locally. I need help with following queries:

  1. Can a softdevice stop advertising for any reason if the duration is set to infinite? We don't see any logs for BLE_GAP_EVT_ADV_SET_TERMINATED which can possibly indicate such a thing.
  2. Can we land in a situation where peripheral is not able to prioritize processing of GAP events like connected, disconnected, terminated etc. and we are not able to make softdevice resume advertisement after these events?
  3. Can there be an issue with peripheral radio which can cause it to stop advertisement for some reason? We are thinking of logging radio notifications to concretly capture any such issues, is it advisable to enable radio notifications for production devices, are there any best practices to leverage this feature? 
Parents
  • Hi,

    Can a softdevice stop advertising for any reason if the duration is set to infinite? We don't see any logs for BLE_GAP_EVT_ADV_SET_TERMINATED which can possibly indicate such a thing.

    No, I do not see how that could happen (unless it is caused by an unknown bug).

    Can we land in a situation where peripheral is not able to prioritize processing of GAP events like connected, disconnected, terminated etc. and we are not able to make softdevice resume advertisement after these events?

    I want to say no here as well. However, it is possible to contemplate a situation where SoftDevice events are not processed in a timely manner and this causes issue. I do not know anything about your code, but if  you look at the SoftDevice handler implementation in SDK 17.0.2 (similar in older versions) as an example, you will see that if the dispatch model is scheduler (NRF_SDH_DISPATCH_MODEL_APPSH), every event from the SoftDevice is put in a queue. If the queue is full because events are not processed fast enough, it would be lost. With the SDK implementation that would be detected by an APP_ERROR_CHECK, so it should not fail silently. But it could be different in your application, though. Could this be relevant?

    Can there be an issue with peripheral radio which can cause it to stop advertisement for some reason?

    Same here, I do not see how this could happen unless there is an up to now unknown bug.

    We are thinking of logging radio notifications to concretly capture any such issues, is it advisable to enable radio notifications for production devices, are there any best practices to leverage this feature? 

    Radio notification is a generic features that allows you to get events when the radio is in use. It is a commonly used feature and I do not see any problems with using radio notification in production. You can refer to the SoftDevice specification for details.

    Regarding the issue in general I wonder if you have been able to confirm that the issue on the nRF side? As you have only seen this in the field so far and not been able to reproduce locally I assume you do not have been able to get a sniffer trace when this occurs. If you did, that would show if the nRF peripheral advertises and if the central attempts to connect. Which central devices do you see this issue with? Is it limited to a few specific phone models or similar?

    Einar

  • Thanks Einar for your inputs here.

    I checked that in our case we are using NRF_SDH_DISPATCH_MODEL_POLLING as the dispatch model, so if we are not able to pull events fast enough, will softdevice drop them and can that be detected?

    >> Regarding the issue in general I wonder if you have been able to confirm that the issue on the nRF side? 

    We are seeing issues where user mobile apps (central device in our case) are not able to discover our peripherals and the issue happens temporarily i.e. peripheral will not be discoverable during one set of interactions (which can last 10-15 minutes) and then it becomes discoverable later during the day or next day. We have ruled out central specific issues because central is able to discover other nearby BLE devices during that time. 

    In some geographies we do so see such issues concentrated on specific make of phones like Caterpillar S48c and S41 in NA but we are not sure what mobile specific implementations can be causing it. Our peripherals are placed in public places so there is definitely an interference with large number of public BLE devices in the vicinity, can that contribute to such blackouts?

  • Hi Einar, 

    We also tried running frequency accuracy tests for Cat S48c phone using the nrfConnect app. While nrfConnect app can turn the mobile device into an advertiser, it doesn't allow to just enable generation of the carrier wave. With advertising enabled, the modulated signal is seen in the spectrum capture (see image below - Blue - Cat S48c, Yellow - Samsung S9 and Green - IPhone):

    How can we configure phones to just generate the carrier wave?

    Thanks

    Ram

  • Hi Ram,

    I am sorry for the late reply.

    Regarding the test results in LierdaTestReport.xlsx I assume the offset is given in kHz. If so it looks like the offset here is no more than 9.3 ppm, which is well within the 40 ppm required for BLE. However, it does not necessarily mean that it is OK over the whole temperature range. Is this data also for a device that you saw a high frequency of the issue?

    Have you got one of the failing device into your lab now? If so it would be good if you can test the frequency accuracy over temperature range. That way we will know more.

    Regarding testing mobile phones I do not believe there is any way to output a pure carrier on those (I assume there is a hidden test mode in all phones, but not publicly documented or accessible). So the approach to use here is to set up a advertiser in for instance nRF Connect, and use a spectrum analyzer that can demodulate the signal and provide information about it. Then you will be able to get the carrier frequency deviation from the analyzer. The exact method depends on the analyzer you have, but you can refer to this documentation from R&S to see an example.

    Einar

  • Hi Einar,

    Thanks for your reply. We are yet to receive the failing device in our lab for testing. The field visit is already scheduled, will update you once we have received and tested them.

    We tried your recommendation for testing cat phone and following is the test report:

    Could you please review it?

    Thanks

    Ram

  • Hi Ram,

    Yes, this is the best way to measure frequency accuracy of a arbitrary phone. The measurements here looks good and do not indicate any issue as far as I can see.

    I look forward to getting numbers from the failing nRF devices when you get them.

Reply Children
  • Hi Einar,

    If there are no issues with frequency accuracy of this specific phone, what can be other ways to rule out phone specific issues? We continue to see high incidence rate of issues on Cat S48C, S41, E6910, BV5900 etc. 

  • Hi Ram,

    It is not easy to say. Based on the observations so far frequency accuracy issues seems most likely, though there is no way to be sure before having measurements to either back it up. It is not unlikely that some phone models have larger variation than others, and that could explain why you see more issues with some phones than others, even if the nRF board may be the main problem. There could be other interoperability issues, though I am not able to find any specific indication of what that should be based on the current information. A key to know more would be to reproduce the issue in the lab, so that you could make a sniffer trace and see what actually happens on air (at least if the frequency accuracy issue hypothesis prove to be a dead end).

    Einar

  • Hi Ram,

    Do you have any new information on this? Any new findings?

    We have just discovered that it is in fact possible to loose BLE_GAP_EVT_DISCONNECTED events in some very rare cases. I cannot say if this explains what you have been seeing, but my earlier dismissal of this turns out to be incorrect. We plan to release new SoftDevices (version 7.3.0) which will incorporate a fix for this issue.

  • Hi Einar,

    Yes, we were able to run frequency tests on the production boards where we were seeing high incidence of discovery issues. Please find attached the document with test results for three boards.

    FieldBoard_Logs.docx

    >> We have just discovered that it is in fact possible to loose BLE_GAP_EVT_DISCONNECTED events in some very rare cases. 

    Could you please elaborate a bit more on how to reproduce this? I would want a way to test if this is happening in our production devices.

    Thanks

    Ram

  • Hi Ram,

    The issue is tricky to reproduce. I have asked the R&D SoftDevice team if and how it is possible to detect that this has happened. I will let you know as soon as I get some information on that.

    Regarding the frequency accuracy for the three boards in FieldBoard_Logs.docx that is well within the limits. Did you also measure at cold and hot temperatures?

    Einar

Related