nRFSDK v17.0.1, S140 v7.2.0
We're using the ble_advertising nRF SDK module to control advertising. We allow one connection at a time, and as soon as the peer disconnects, we want to immediately restart advertising. We try to advertise forever; we never want to time out or stop. We're not using Mesh or anything else, just plain and boring BLE.
We're seeing units in the field that simply "go dead" on BLE, but they are still otherwise running (they connect to the internet via WiFi and/or LTE on different coprocessors). I added a bunch of diagnostics, including the radio_notification module. I added a safeguard that counts radio notification events- if it ever sees less than 30% of the expected number of radio notification "active" events than it should (based on our advertising + connection interval settings) over a 10-minute period, it logs how many it saw and then reboots.
The data came in! Each of these event counts display the number of radio notification callbacks we had over the past 10 minutes that led to "rescue" reboots.
radio: 250/1098 events
radio: 274/1098 events
radio: 233/1098 events
radio: 322/1098 events
radio: 236/1098 events
radio: 289/1098 events
radio: 238/1098 events
radio: 0/1098 events
radio: 295/1098 events
radio: 327/1098 events
radio: 0/1098 events
So, we're seeing two different phenomena here: the assertions that fired with close to 30% (e.g. 322/1098) are likely advertising simply not restarting after losing a connection. The assertions with 0 events mean that somehow SoftDevice never started advertising at all?
Here's how we're initializing advertising:
ble_advertising_init_t init;
memset(&init, 0, sizeof(init));
ble_uuid_t adv_uuids[] = { { OUR_SERVICE_UUID, our_uuid_type } };
init.advdata.flags = BLE_GAP_ADV_FLAGS_LE_ONLY_GENERAL_DISC_MODE;
init.advdata.uuids_complete.uuid_cnt = ARRAY_COUNTOF(adv_uuids);
init.advdata.uuids_complete.p_uuids = adv_uuids;
init.advdata.include_appearance = true;
init.srdata.name_type = BLE_ADVDATA_FULL_NAME;
init.config.ble_adv_on_disconnect_disabled = false; // restart after disconnect
init.config.ble_adv_fast_enabled = true;
init.config.ble_adv_fast_interval = 874; // 546.25ms
init.config.ble_adv_fast_timeout = BLE_GAP_ADV_TIMEOUT_GENERAL_UNLIMITED;
init.evt_handler = on_adv_evt;
init.error_handler = on_adv_error;
memset(&s_adv.adv_params, 0, sizeof(s_adv.adv_params));
APP_ERROR_CHECK(ble_advertising_init(&s_adv, &init));
And then we start advertising:
APP_ERROR_CHECK(ble_advertising_start(&s_adv, BLE_ADV_MODE_FAST));
We have no calls to ever stop advertising, and we never turn off BLE via SoftDevice or shut down SoftDevice itself. Our error handler logs the error and reboots the nRF52840. Our event handler logs, and if the event is BLE_ADV_EVT_IDLE, reboots the nRF52840. We haven't seen any of these reboots in the logs yet, so it looks like when advertising starts, it works, and nobody stops it intentionally.
Does anyone have any idea where I could look to start diagnosing this? Are there any errata in our version of SoftDevice or the nRFSDK that might be in play here?
Thanks in advance,
Charles