S130 Softdevice hard fault info request

egkaleido over 3 years ago

I am getting a softdevice hard fault occasionally, on fully enclosed devices with no debugger port access.

Where can I find info on potential causes of this fault?

nrf51822, Softdevice S130 "s130_nrf51_2.0.1_softdevice.hex", SDK 12.2

ID: 0x0000 0001

PC: 0x0001 04AE

Info: 0x0000 (it seems the prior software developer was only saving 16 bits for this field).

0 tesc over 3 years ago

Hi,

I can confirm it is a SoftDevice assert. It may be related to event scheduling in the SoftDevice, or it may be a timing issue.

Is the application using the SoftDevice timeslot library?
What BLE roles are you using?
What is the application doing immediately prior to the fault / does it appear in one specific situation?
How often does it happen, and is it happening repeatedly for some devices and not for others, or is it an even spread?

Regards,
Terje
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 egkaleido over 3 years ago in reply to tesc

Is the application using the SoftDevice timeslot library?
No. There are not any calls to sd_radio_request() or sd_radio_session_open()

What BLE roles are you using?

This device is configured as a central. During this test, the central device scans and reconnects to a single, previously paired/bonded custom device. The central periodically writes data to the peripheral, and periodically receives notifications.

This device is writing/getting notifications of the same messages over a 5 day period. Writes and notifications should only happen in short bursts every 10 seconds, unless we have a bug in the application.

We have 24 device pairs (24 central, 24 peripheral) in the same small room.

What is the application doing immediately prior to the fault / does it appear in one specific situation?

This is what we are trying to determine.

How often does it happen, and is it happening repeatedly for some devices and not for others, or is it an even spread?

These faults appear random, we had it happen on 2 devices out of 24 during a 5 day test.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 tesc over 3 years ago in reply to egkaleido

Hi,

Thanks for the additional info.

Just to confirm: Were both the failing devices centrals? And the peripherals of those pairs has been working flawlessly?

Have you tried increasing the error rate, e.g. by reducing the time between bursts, increasing the length of the bursts, increasing network traffic (adding more device pairs)? You may of course trigger other errors then, but if you get asserts at the same PC location it would likely be the same issue. Easier reproduction would mean easier to investigate.

Have you tried to reproduce with units that do have debug port access, in order to get more info from the device (such as logs, do a proper debug session, etc.)?

Since the error rate is relatively low, I assume you do not have any sniffer trace from when the error happens?

Regards,
Terje
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 egkaleido over 3 years ago in reply to tesc

Were both the failing devices centrals? And the peripherals of those pairs has been working flawlessly?

Both faults were reported on centrals, we do not appear to have faults on any of the peripherals.

No, we will try the other troubleshooting steps next.

However, we wanted to know first if the program counter would give any hints on what is happening in the softdevice:

Could this error be due to having 24 pairs of devices in one room?

Is the cause likely due to a bug in our application?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 tesc over 3 years ago in reply to egkaleido

Hi,

The PC gives a strong hint there is something wrong with the SoftDevice scheduling. Usually this is due to timeslot bugs in the application, but in your case since you are not using timeslots this is less likely to be the case.

There is one known issue affecting the given SoftDevice, which may be the issue that you are seeing now. It has been very difficult to reproduce, and was not fixed for the S130 (the latest S130 release, v2.0.1, was released prior to the fix of the given issue.) What connection parameters do you use, and how much data is transferred each burst?

Regards,
Terje
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel