This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

hard fault after a while when connected

I'm developing a product using a module with a built-in NRF51822 and I'm experiencing a strange error:

After staying in a connection for a long time (hours to days), the device suddenly enters into the hard fault handler. The system is simple: BLE-Module, Sensor (I2C with bit-banging implementation), battery, button, LCD/LCD-Driver (SPI). Every 1s the sensor data is read and the characteristics are updated (and usually notified).

I back-traced the instruction causing the hard fault to the addresses 0x0000B65F and 0x0000B5E1 when using the softdevice S110 7.0.0 or S110 7.1.0. This seems to be somewhere in the Softdevice area where I have no source or Debug possibilities.

I am using the ARM mbed library and BLE_API. The Stack-/Heappointers don't change during the established connection and the hard fault occurs outside of my application/periodic callback (both verified through debug messages) so there should not be a problem with nested interrupts or similar I guess.

Because the error occurs outside of my application it's very hard for me to debug and find the root cause. But I have tried several things and one lead to a decreased chance of the error to happen (from >80% failure chance after two days to <20%) :

  • Not using the Hardware SPI and use a bit-banged implementation instead.

Following actions had no influence:

  • not using the sd_flash api calls
  • not updating characteristics
  • compiler optimization level
  • used memory model (2 region or 1 region)
  • softdevice version 7.0.0 or 7.1.0

Does someone has any idea what might be the cause here or what else I could try? Especially if the Nordic guys have some information about what happens @ 0x0000B65F / 0x0000B5E1?

Thanks

  • Hi, could you please also list the IC revision or markings of the chip you are using? Our first revision had a PAN that could be the reason for this.

  • I opened two modules and here is what is written on the IC: N51822 CEAAD0 1422FM and another has 1423FU

    Is D0 the revision? What revisions were affected?

  • ATTN_51 will list these. CE AA D0 is an IC revision 2 chip and has support for SoftDevices up until 7.0.0, so you should be safe. 7.1.0 might also work for you if you read the migration documents carefully. There are some PANs (PAN-44, PAN-45) that required a workaround in our earlier SoftDevices, until it was removed in 8.0.0. I do not think you are affected by this.

    I will try to dig up exactly what happens at those memory areas, but just to make sure: 0xB65F is for 7.0.0 and 0xB5E1 is for 7.1.0, or do HardFaults happen at both points in both SoftDevices?

    Edit: I do not have easy access to the mbed library, and there is nothing interesting happening in the SoftDevice at those memory locations. Are you sure you are not receiving a stack assert? Those will produce hardfaults on purpose after firing.

  • Thanks! I haven't documented with which Softdevice Version each of these instructions were causing the hard-fault. But I will try to separate these the next days with a known softdevice version and an attached debugger. edit: I tried to catch the stack assert from mbed with a brakepoint, but it was not hit. But I will take a closer look there as well. Right now I have two devices running on a debugger, one with 7.0.0 and the other with 7.1.0. As soon as I have results I will share them to you. Thanks!

  • I can confirm that with softdevice 7.1.0 the address 0x000B5E1 is causing the hard fault exception. The second device did not enter the hard-fault handler yet. Can you find out what happens there?

Related