This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Softdevice S140 temporarily stops advertising BLE

We are using softdevice S140 6.1.1 with nrf52840 for a BLE peripheral. We see that mobile devices are not able to discover this peripheral intermittently for some time during the day ranging few minutes to hours. In an exception case, it required a physical power reset to recover from this issue. We use FAST BLE advertising with interval of 150ms and timeout of 0 (infinite). We additionally handle GAP connection and disconnection events to change the advertisement to Non-connectable when someone connects and back to connectable when the user disconnects. We don't see any failures in firmware logs which can indicate an issue with BLE configuration. These issues were reported for production peripheral devices and we have not been able to able to reproduce them locally. I need help with following queries:

  1. Can a softdevice stop advertising for any reason if the duration is set to infinite? We don't see any logs for BLE_GAP_EVT_ADV_SET_TERMINATED which can possibly indicate such a thing.
  2. Can we land in a situation where peripheral is not able to prioritize processing of GAP events like connected, disconnected, terminated etc. and we are not able to make softdevice resume advertisement after these events?
  3. Can there be an issue with peripheral radio which can cause it to stop advertisement for some reason? We are thinking of logging radio notifications to concretly capture any such issues, is it advisable to enable radio notifications for production devices, are there any best practices to leverage this feature? 
Parents
  • Hi,

    Can a softdevice stop advertising for any reason if the duration is set to infinite? We don't see any logs for BLE_GAP_EVT_ADV_SET_TERMINATED which can possibly indicate such a thing.

    No, I do not see how that could happen (unless it is caused by an unknown bug).

    Can we land in a situation where peripheral is not able to prioritize processing of GAP events like connected, disconnected, terminated etc. and we are not able to make softdevice resume advertisement after these events?

    I want to say no here as well. However, it is possible to contemplate a situation where SoftDevice events are not processed in a timely manner and this causes issue. I do not know anything about your code, but if  you look at the SoftDevice handler implementation in SDK 17.0.2 (similar in older versions) as an example, you will see that if the dispatch model is scheduler (NRF_SDH_DISPATCH_MODEL_APPSH), every event from the SoftDevice is put in a queue. If the queue is full because events are not processed fast enough, it would be lost. With the SDK implementation that would be detected by an APP_ERROR_CHECK, so it should not fail silently. But it could be different in your application, though. Could this be relevant?

    Can there be an issue with peripheral radio which can cause it to stop advertisement for some reason?

    Same here, I do not see how this could happen unless there is an up to now unknown bug.

    We are thinking of logging radio notifications to concretly capture any such issues, is it advisable to enable radio notifications for production devices, are there any best practices to leverage this feature? 

    Radio notification is a generic features that allows you to get events when the radio is in use. It is a commonly used feature and I do not see any problems with using radio notification in production. You can refer to the SoftDevice specification for details.

    Regarding the issue in general I wonder if you have been able to confirm that the issue on the nRF side? As you have only seen this in the field so far and not been able to reproduce locally I assume you do not have been able to get a sniffer trace when this occurs. If you did, that would show if the nRF peripheral advertises and if the central attempts to connect. Which central devices do you see this issue with? Is it limited to a few specific phone models or similar?

    Einar

  • Thanks Einar for your inputs here.

    I checked that in our case we are using NRF_SDH_DISPATCH_MODEL_POLLING as the dispatch model, so if we are not able to pull events fast enough, will softdevice drop them and can that be detected?

    >> Regarding the issue in general I wonder if you have been able to confirm that the issue on the nRF side? 

    We are seeing issues where user mobile apps (central device in our case) are not able to discover our peripherals and the issue happens temporarily i.e. peripheral will not be discoverable during one set of interactions (which can last 10-15 minutes) and then it becomes discoverable later during the day or next day. We have ruled out central specific issues because central is able to discover other nearby BLE devices during that time. 

    In some geographies we do so see such issues concentrated on specific make of phones like Caterpillar S48c and S41 in NA but we are not sure what mobile specific implementations can be causing it. Our peripherals are placed in public places so there is definitely an interference with large number of public BLE devices in the vicinity, can that contribute to such blackouts?

  • >> Too large frequency offset in the BLE central (phone)

    This is interesting, what tests can we perform for the phones to rule this out?

    >> Is it possible for you to obtain one of the failing devices and test it in your lab or alternatively send it to us? 

    Yes, I will plan to do this for one of our failing device.

    Thanks for the document link to perform RF tests.

  • Hi Ram,

    karnram said:
    This is interesting, what tests can we perform for the phones to rule this out?

    Yes, you can test this on the phones as well in much the same way. You can use for instance nRF Connect for Android to set up an advertiser and measure the frequency deviation / offset of the advertisement packets. It is not as straight forward as measuring a pure carrier but certainly possible. A potential problem here is that there could be some variation between the performance of the same phone model.

    karnram said:
    Yes, I will plan to do this for one of our failing device.

    That sounds good. Would it be possible to send one of the failing devices to us for analysis as well?

    Update: Another question. Did you measure frequency deviation on assumed good devices? Or can you do it? You typically want the carrier in the middle of the band when measuring at room temperature, and if that is the case, you should have good margin for temperature variations within the specification for crystal and nRF. 

  • >> Yes, you can test this on the phones as well in much the same way. 

    Shouldn't we be testing how well a phone is able to receive advertisements i.e. their receiver senitivity?

    >>  Would it be possible to send one of the failing devices to us for analysis as well?

    Sure, I will update you once we have hardware boards for these production devices available with us after the field service visits.

    >> Did you measure frequency deviation on assumed good devices? Or can you do it?

    Sure, we were planning to do that as well.

  • Hi Ram,

    karnram said:
    Shouldn't we be testing how well a phone is able to receive advertisements i.e. their receiver senitivity?

    I am not sure. Is there a reason to suspect that receiver sensitivity is a problem here? If so, then moving the phone and nRF device a bit closer together should work - is that the case? If the issue is a frequency deviation/offset between the peers (which we do not know, but it remains a strong hypothesis), then we we need to measure that on the Tx signal from the phone. But it would have the same frequency offset for Tx and Rx, so whatever you find when measuring in Tx is equally valid for when the phone is in Rx.

    I look forward to hearing results of your findings on what we have discussed.

    Einar

  • Hi Einar,

    I found out that vendor already does the frequency accuracy test for our devices as part of their factory exit process. Please find attached the test report of one of the production device where we saw high incidence of discovery issues:

    device-1.pdf

    Does this make sense?

    Thanks

    Ram

Reply Children
Related