This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Softdevice S140 temporarily stops advertising BLE

We are using softdevice S140 6.1.1 with nrf52840 for a BLE peripheral. We see that mobile devices are not able to discover this peripheral intermittently for some time during the day ranging few minutes to hours. In an exception case, it required a physical power reset to recover from this issue. We use FAST BLE advertising with interval of 150ms and timeout of 0 (infinite). We additionally handle GAP connection and disconnection events to change the advertisement to Non-connectable when someone connects and back to connectable when the user disconnects. We don't see any failures in firmware logs which can indicate an issue with BLE configuration. These issues were reported for production peripheral devices and we have not been able to able to reproduce them locally. I need help with following queries:

  1. Can a softdevice stop advertising for any reason if the duration is set to infinite? We don't see any logs for BLE_GAP_EVT_ADV_SET_TERMINATED which can possibly indicate such a thing.
  2. Can we land in a situation where peripheral is not able to prioritize processing of GAP events like connected, disconnected, terminated etc. and we are not able to make softdevice resume advertisement after these events?
  3. Can there be an issue with peripheral radio which can cause it to stop advertisement for some reason? We are thinking of logging radio notifications to concretly capture any such issues, is it advisable to enable radio notifications for production devices, are there any best practices to leverage this feature? 
Parents
  • Hi,

    Can a softdevice stop advertising for any reason if the duration is set to infinite? We don't see any logs for BLE_GAP_EVT_ADV_SET_TERMINATED which can possibly indicate such a thing.

    No, I do not see how that could happen (unless it is caused by an unknown bug).

    Can we land in a situation where peripheral is not able to prioritize processing of GAP events like connected, disconnected, terminated etc. and we are not able to make softdevice resume advertisement after these events?

    I want to say no here as well. However, it is possible to contemplate a situation where SoftDevice events are not processed in a timely manner and this causes issue. I do not know anything about your code, but if  you look at the SoftDevice handler implementation in SDK 17.0.2 (similar in older versions) as an example, you will see that if the dispatch model is scheduler (NRF_SDH_DISPATCH_MODEL_APPSH), every event from the SoftDevice is put in a queue. If the queue is full because events are not processed fast enough, it would be lost. With the SDK implementation that would be detected by an APP_ERROR_CHECK, so it should not fail silently. But it could be different in your application, though. Could this be relevant?

    Can there be an issue with peripheral radio which can cause it to stop advertisement for some reason?

    Same here, I do not see how this could happen unless there is an up to now unknown bug.

    We are thinking of logging radio notifications to concretly capture any such issues, is it advisable to enable radio notifications for production devices, are there any best practices to leverage this feature? 

    Radio notification is a generic features that allows you to get events when the radio is in use. It is a commonly used feature and I do not see any problems with using radio notification in production. You can refer to the SoftDevice specification for details.

    Regarding the issue in general I wonder if you have been able to confirm that the issue on the nRF side? As you have only seen this in the field so far and not been able to reproduce locally I assume you do not have been able to get a sniffer trace when this occurs. If you did, that would show if the nRF peripheral advertises and if the central attempts to connect. Which central devices do you see this issue with? Is it limited to a few specific phone models or similar?

    Einar

  • Thanks Einar for your inputs here.

    I checked that in our case we are using NRF_SDH_DISPATCH_MODEL_POLLING as the dispatch model, so if we are not able to pull events fast enough, will softdevice drop them and can that be detected?

    >> Regarding the issue in general I wonder if you have been able to confirm that the issue on the nRF side? 

    We are seeing issues where user mobile apps (central device in our case) are not able to discover our peripherals and the issue happens temporarily i.e. peripheral will not be discoverable during one set of interactions (which can last 10-15 minutes) and then it becomes discoverable later during the day or next day. We have ruled out central specific issues because central is able to discover other nearby BLE devices during that time. 

    In some geographies we do so see such issues concentrated on specific make of phones like Caterpillar S48c and S41 in NA but we are not sure what mobile specific implementations can be causing it. Our peripherals are placed in public places so there is definitely an interference with large number of public BLE devices in the vicinity, can that contribute to such blackouts?

  • Hi,

    karnram said:
    I checked that in our case we are using NRF_SDH_DISPATCH_MODEL_POLLING as the dispatch model, so if we are not able to pull events fast enough, will softdevice drop them and can that be detected?

    Some events may be dropped if they are not processed in time, but the SoftDevice has an internal priority so that more important events would overwrite less important ones (such as RSSI reports, QoS reports, advertising reports etc.) if the event buffer is full. Important events would not be dropped. Instead, the SoftDevice will prevent the application from starting something else that could overwrite that event.

    karnram said:

    We are seeing issues where user mobile apps (central device in our case) are not able to discover our peripherals and the issue happens temporarily i.e. peripheral will not be discoverable during one set of interactions (which can last 10-15 minutes) and then it becomes discoverable later during the day or next day. We have ruled out central specific issues because central is able to discover other nearby BLE devices during that time. 

    In some geographies we do so see such issues concentrated on specific make of phones like Caterpillar S48c and S41 in NA but we are not sure what mobile specific implementations can be causing it.

    I see. I think it is too early to conclude that the nRF does not advertise, though. That is one possible explanation, but there are others. It would be very interesting if you were able to reproduce this in a situation where you could make a sniffer trace. That would show if the nRF advertises at all and could also give an indication if mobile devices attempt to connect.

    Before we have confirmed that the nRF is in a bad state where it is not advertising there are a few other hypotheses that are worth considering as well. If I understand correctly the issue seems to behave differently in different geographies. Does that mean that this is a unit that is exposed to outdoor temperatures? If so, we have seen issues before where the HF crystal circuitry is sub-optimal, giving too little margin. In that case we have seen that some (a few) phone chipsets may not be able to detect the BLE signal because the frequency is just too much off, while others may be able to do it. It would be interesting to see your schematics and layout around the HF crystal, as well as the datasheet for the crystal to evaluate if this could be a potential issue.

    Yet another possible explanation could be an interoperability issue between the nRF and some specific phones, typically where one or the other does not behave according to the Bluetooth specification and that causes a problem. A sniffer trace would be very useful to check if this is the case as well.

    Can you say more about the phones you see this issue with? Getting a more complete list would perhaps allow us to see if there is a common property among those phones.

    karnram said:
    Our peripherals are placed in public places so there is definitely an interference with large number of public BLE devices in the vicinity, can that contribute to such blackouts?

    The more interference the less advertisement packets would be received by the phone. But it seems odd that you should not receive any advertisement packets due to that for a long time, unless there are extraordinary conditions. But then I would not expect you would be able to receive BLE packets from other devices either. There are a few things to check her also, though. What Tx power do you advertise with? And have you tuned the antenna properly while inside the housing? Do you know roughly what RSSI you would get from the nRF when it is in it's housing, measured at for instance 1 meter?

    Einar

  • >> Important events would not be dropped. Instead, the SoftDevice will prevent the application from starting something else that could overwrite that event.

    I tested by disabling the BLE Task (created through nrf_sdh_freertos_init call in nrf_sdh_freertos.c) responsible for polling softdevice device and observed that when i initiate a connection from central with such a setup. The BLE advertisement stops and the whole peripheral firmware just restarts after few seconds (possibly by our external watchdog). I don't see any panic logs. So, it does indicate that the softdevice prevents the application from doing anything else in such cases. 

    The priority of current BLE freertos task is set to 2. Would it be better to use highest possible priority here to prevent its starvation in some cases.

    >> Does that mean that this is a unit that is exposed to outdoor temperatures?

    Yes, it is placed outdoors. 

    >> we have seen issues before where the HF crystal circuitry is sub-optimal, giving too little margin. In that case we have seen that some (a few) phone chipsets may not be able to detect the BLE signal because the frequency is just too much off, while others may be able to do it.

    Can we try simulating such conditions in a lab environment to test this hypothesis, what temperature ranges should we simulate? Could you share the list of phone chipsets that you have seen in past which can be impacted by such issues?

    >>Can you say more about the phones you see this issue with? 

    US Phones - S48C, S41, Moto G(7) Power

    UK Phones - K7, WP6, S40Lite

    ES Phones - M2003J15SC, SM-A202F, moto e5

    >> What Tx power do you advertise with? And have you tuned the antenna properly while inside the housing? Do you know roughly what RSSI you would get from the nRF when it is in it's housing, measured at for instance 1 meter?

    We use -20dbm as tx power. See below analysis that we had done for RSSI values measured at the central for different ranges:

    >> It would be interesting to see your schematics and layout around the HF crystal, as well as the datasheet for the crystal to evaluate if this could be a potential issue.

    I have attached following details for the HF crystal, let me know if you need more details. 

  • Hi,

    I will get back to you again tomorrow, but I noticed that you provided information for the LF crystal (32.768 kHz). However, what I was looking for was the HF crystal (32 MHz). Can you upload datasheet, schematics and layout for that as well? Also please specify the CL value for both crystals (HF + LF), as that is a key parameters and it is not clear from the datasheet (it describes multiple variants).

    Einar

Reply Children
  • There seems to be some confusion with the capacitor value used in our peripheral today, I found another  set of values for C26 and C33. I will confirm the correct values in sometime.

     

  • Hi,

    I think the HF crystal circuitry is the smoking gun here. From the datasheet I see that the crystal has a CL value of 8 pF, so your load caps (C26 and C33) should both be approximately 2 * 8 pF - 4 pF = 12 pF. 22 pF and 18 pF is too far off, so this crystal is not correctly loaded and is so far off that it is a likely explanation for what you are seeing.

    Normally you can verify the crystal frequency indirectly by setting up a carrier (using for instance the radio test example), and verifying that the carrier is centered in the middle of the configured channel. If it is centered at room temperature, then you will have a good margin. However, if it is far off, which I expect in this case given the HF circuitry, the margin will be less, and with low temperatures the frequency might be offset by too much. I expect you will be able to measure a difference in frequency by chancing the temperature to (say) -20 Â°C.

    (As a side note I recommend you consider sending us your designs for review and potential tuning in the future, as crystal loading is one of the things we routinely check on customer designs we are asked to review.)

    Regarding which phones we have seen this with in the past I do not have a long list but it has typically been more low-end phones. That said, if the nRF frequency is too much of it could cause problems with any phone, and that may be the case here.

    Regarding your FreeRTOS BLE task I do not know enough about your firmware to suggest anything there. However, based on what we know now, I think an issue with your HW is much more likely than this being a firmware issue, so I suggest looking properly into the hardware first.

    Edit: I see you updated the post while I was writing. It will be very interesting to know what cap values you have on the devices that have failed in case you have several variants out in the field.

    Einar

  • Hi Einar,

    I reconfirmed with our team that we are only using 12pF-12pF configuration in the production. The change happened during our product design phase when we decided to use a different make of the crystal and had to change corresponding capacitor values. With this issue ruled out, could you provide your inputs on other possible things that might be causing discovery issues?

    >> As a side note I recommend you consider sending us your designs for review and potential tuning in the future, as crystal loading is one of the things we routinely check on customer designs we are asked to review

    What additional parts of design you want to review to better understand this issue? I can share those details.

    >> Regarding which phones we have seen this with in the past I do not have a long list but it has typically been more low-end phones.

    Are there specific BLE chipsets used in those low-end phones which cause such issues? 

    Thanks

    Ram

  • Hi Ram,

    karnram said:
    With this issue ruled out, could you provide your inputs on other possible things that might be causing discovery issues?

    This is a difficult case. I have discussed with some collogues and searched though the case history, and generally there are two reasons we have seen for this symptom before (some phones not being able to detect advertisement packets):

    • Too large frequency offset of the HF crystal oscillator
    • Too large frequency offset in the BLE central (phone)
    • A combination of the two above (which helps explain why this is seen more with some device combinations but not others)

    With the updated load capacitor values of 12 pF, your HF clock circuitry looks good. However this should still be one of the major hypothesizes, given the behavior that has been observed. There could still be an HW issue, either in general, or with some specific devices. Is it possible for you to obtain one of the failing devices and test it in your lab or alternatively send it to us? I would like to test the frequency accuracy as described in this post. I think this is important, also in case it is not relevant, as in that case we would have been able to rule out one of the major hypotheses.

    Your initial hypothesis that the nRF simply stops advertising is absolutely a possibility, but I do not see how the SoftDevice could stop advertising in itself. If this is the case, then it is likely that your firmware stops advertising or does not start it when it should.

    karnram said:
    What additional parts of design you want to review to better understand this issue? I can share those details.

    My comment on this point was more regarding the load caps which seemed to be off at that time, which we would have spotted in a review. That is not relevant, and I think we have seen the relevant parts of your HW design for now.

    karnram said:
    Are there specific BLE chipsets used in those low-end phones which cause such issues? 

    To be honest we do not have enough data to see a strong pattern or to blame a specific phone, but we have seen similar issues with Huawei P Smart (Kirin 659), Nokia 1 Plus (Mediatek MT6739WW), Alcatel 1x (Mediatek MT6739), Nokia 3 (Mediatek MT6737) and some "noname" Android devices. An interesting observation is that I have not found any reports of this with a high end phone. But we do not have enough data to draw any concussions.

    Einar

  • >> Too large frequency offset in the BLE central (phone)

    This is interesting, what tests can we perform for the phones to rule this out?

    >> Is it possible for you to obtain one of the failing devices and test it in your lab or alternatively send it to us? 

    Yes, I will plan to do this for one of our failing device.

    Thanks for the document link to perform RF tests.

Related