This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SoftDevice Timing Fault with LFXO

Hello,

During development of a custom board, based on the BMD-300 module from Rigado, which uses an nRF52832, we ran into a SoftDevice fault under specific circumstances.

The fault occurs upon completion of Bluetooth Mesh provisioning of the device via PB-GATT, but before the SoftDevice is reset.

The firmware environment is:

  • nRF5 SDK 15.2.0
  • nRF5 SDK for Mesh 3.1.0
  • S132 SoftDevice 6.1.0 (bundled with the SDK) or 6.1.1
  • SEGGER Embedded Studio for ARM 4.12

We were able to reproduce the issue by running the Light Switch Server example project, but it occurs even with custom firmware.

Not all pieces run into the fault, but those that do run into it, do so consistently. The Program Counter at fault entrance is 0x000152d4 for S132 6.1.0 and 0x00015326 for 6.1.1, which seem to correspond to the same code in the two versions.

Changing the LFCLK accuracy value supplied during SoftDevice initialization (or switching the source away from the LFXO) from 20ppm (from the crystal's datasheet) to either 250ppm or 1ppm seems to fix the fault. This has left us puzzled, since we expected that increasing the specified accuracy would, if anything, lead to worse problems.

Further, the faulting boards used to work under the same scenario when we checked a few months back, and the crystal and capacitor set-up is identical (the crystal is a different model, but is identical in specifications) to Rigado's implementation of the PCA10040.

As far as measurement is concerned, we could not find any conclusive difference in the clock frequency between working and non-working boards, but, without a reliable way to get the clock signal from the microcontroller, our measurements have been somewhat approximate.

I am not sure whether to consider this a hardware fault, since it definitely is hardware-dependent, or a SoftDevice or Mesh SDK issue.

Thanks in advance.

Alberto

Parents
  • Hi,

    1) Do you see this issue when the project is compiled in both "Release" and "Debug" configuration?

    to either 250ppm or 1ppm seems to fix the fault.

    2) How about NRF_CLOCK_LF_ACCURACY_50_PPM ?

    expected that increasing the specified accuracy would, if anything, lead to worse problems.

    No, if you set a worse(higher) ppm value , then the SoftDevice will take this into account in the timining calcuation (e.g. when to open/close BLE RX/TX window). 

    0x000152d4 for S132 6.1.0 and 0x00015326 for 6.1.1

    Is it always this PC value you see?

  • 1) Do you see this issue when the project is compiled in both "Release" and "Debug" configuration?

    Yes, the configuration makes no difference. It also does not matter whether a debugger is attached or not.

    2) How about NRF_CLOCK_LF_ACCURACY_50_PPM ?

    Any value except 20ppm seems working.

    No, if you set a worse(higher) ppm value , then the SoftDevice will take this into account in the timining calcuation (e.g. when to open/close BLE RX/TX window). 

    That was my assumption, but by "increasing accuracy" I meant specifing a lower, rather than higher, ppm value.

    Is it always this PC value you see?

    Yes. To be clear: this is the PC of the frame before stopping in the hard fault handler.

  • Sounds like it could be a issue with the crystal. If the modules are old, and they used to work before, it could be crystal ageing worsening the accuracy. Are you able to desolder the crystal from the module, and solder a new crystal(a crystal you know is working at 20ppm) to the module, and see if that solves the issue ?

    As far as measurement is concerned, we could not find any conclusive difference in the clock frequency between working and non-working boards, but, without a reliable way to get the clock signal from the microcontroller, our measurements have been somewhat approximate.

     How did you do these measurements?

  • Sounds like it could be a issue with the crystal. If the modules are old, and they used to work before, it could be crystal ageing worsening the accuracy. Are you able to desolder the crystal from the module, and solder a new crystal(a crystal you know is working at 20ppm) to the module, and see if that solves the issue ?

    The devices are less than 8 months old, so ageing is rather unlikely.

    Swapping the crystals between one working and one non-working (but otherwise identical) boards produces inconsistent results: on some piece the issue does "move" with the crystal, but at other times both boards start or stop working.

    We also realized that the issue has arisen even on a board that uses a different crystal (for which we are not able to retrieve the specifications right now, but should have been chosen as an equivalent replacement).

    How did you do these measurements?

    We tried two methods:

    • Measuring the crystal directly with an oscilloscope (with the obvious problems that brings).
    • Connecting the tick event of an RTC peripheral to a GPIOTE toggle task through PPI, with LFXO as the LFCLK source.

    That said, we also tested the Light Switch Proxy Server example in the nRF5 SDK for Mesh 2.2.0, using nRF5 SDK 15.0.0 and s132 6.0.0, and the issue did not arise on known non-working board.

    Alberto

Reply
  • Sounds like it could be a issue with the crystal. If the modules are old, and they used to work before, it could be crystal ageing worsening the accuracy. Are you able to desolder the crystal from the module, and solder a new crystal(a crystal you know is working at 20ppm) to the module, and see if that solves the issue ?

    The devices are less than 8 months old, so ageing is rather unlikely.

    Swapping the crystals between one working and one non-working (but otherwise identical) boards produces inconsistent results: on some piece the issue does "move" with the crystal, but at other times both boards start or stop working.

    We also realized that the issue has arisen even on a board that uses a different crystal (for which we are not able to retrieve the specifications right now, but should have been chosen as an equivalent replacement).

    How did you do these measurements?

    We tried two methods:

    • Measuring the crystal directly with an oscilloscope (with the obvious problems that brings).
    • Connecting the tick event of an RTC peripheral to a GPIOTE toggle task through PPI, with LFXO as the LFCLK source.

    That said, we also tested the Light Switch Proxy Server example in the nRF5 SDK for Mesh 2.2.0, using nRF5 SDK 15.0.0 and s132 6.0.0, and the issue did not arise on known non-working board.

    Alberto

Children
  • Based on your description, then it seems to be software related, and not a problem with the crystal.

    Are you hardfaulting, or are you entering the app_error_fault_handler() ?

    If you are entering the app_error_fault_handler(), then please confirm what pc value that is printed in

    __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Softdevice assert: %u:%u\n", pc, info);

    If you are hardfaulting, make sure that the hardfault library is used.

    Add these to your projects:

    $(SDK_ROOT)/components/libraries/hardfault/hardfault_implementation.c
    $(SDK_ROOT)/components/libraries/hardfault/nrf52/handler/hardfault_handler_gcc.c 

    And set HARDFAULT_HANDLER_ENABLED to 1 in sdk_config.h, and add DEBUG_NRF in your preprocessor definitions.

Related