SPI transfers cause BLE connection timeout, but only on some units and especially when cold?

I'm running into a wall trying to find the root cause of a problem that we only started noticing after a small production run of a custom device built around the nRF52840 (specifically the Rigado/u-blox BMD-340 module).

In short, we have an IMU on the board (Hillcrest/CEVA BNO085) communicating over SPI at 2MHz, and when we actually use the IMU (read ~16-byte bursts about 400 times/sec), the BLE connection drops with a 0x08 timeout error. If we don't communicate with the IMU, we can maintain a stable connection as long as we want. Both the IMU and the Bluetooth module are being driven by the same external SiTime 32768Hz TCXO (SIT1552AI-JE-DCC-32.768E). This signal looks perfectly clean on all units, as far as I can measure--though I may not have the right equipment to measure adequately.

The problem only happens on about 5% of the units we built, and not always in equal severity. Most of our devices have no issues. The ones that exhibit this problem often do so rarely, with only two that we've found so far exhibiting it every time, right away, as soon as we connect and start using the IMU. I have one of these on my desk in a test jig.

I'm using SoftDevice S140, v7.2.0, SDK 17.1.0. In addition to the radio, I've also configured TWI(0), SPI(1), and SPI(2). Currently, only SPI(1) is actively used. SPI(1) goes to a different peripheral IC that is currently not implemented in firmware. In case it's relevant, the SPI pins in question are P1.13 (MISO), P1.14 (MOSI), and P1.15 (SCK). The CS pin for the slave device in question is P1.12.

Let me walk you through my troubleshooting efforts.

1. I am certain the disconnection reason given is a supervision timeout. I have plenty of debug output in place to confirm this. Other than that error code in the disconnection event, there aren't any other helpful SoftDevice debug logs generated, even when the SD log level is set to debug (4).

2. I've captured multiple sniffer traces using a nRF52840 USB dongle and Wireshark, but nothing helpful came of that. All it shows is normal communication up until the peripheral simply stops transmitting, then resumes advertising as intended after a disconnection. There's no catastrophic crash, hard fault, or watchdog reset (WDT expires after 2 seconds); the firmware seems to keep humming along fine other than the loss of BLE communication, but it will happily reconnect again afterwards.

3. I've tried sending dummy data over the BLE connection at the same rate as what we capture and process from the IMU (~3200 bytes/sec), and the data transfer itself doesn't appear to cause any issues. I can do that all day long, and it stays connected. Further, if I simply gather data from the IMU over SPI but don't send it over the air, the connection still drops.

4. My non-blocking SPI transfer uses a volatile "xfer_done" boolean in the event handler, and uses "sd_app_evt_wait()" in a while loop until the transfer finishes to ensure the SD doesn't get ignored. I was previously using "_WFE()" instead, and I was really hoping that change would fix it, but it had no effect.

5. I've tried using both blocking and non-blocking SPI master implementations. My sdk_config.h has EASY_DMA enabled, but I'm not using the new NRFX_SPIM implementation. Should I change this?

6. After stumbling across this post from earlier in 2023, I tried using a heat gun on my test device to warm it up significantly (probably 40 deg. C, room temperature is more like 25 deg. C), AND IT STARTED WORKING. Once it cooled down again, the instant-timeout issue came back. This is the most interesting result so far, because it actually explains some behavior we saw but couldn't figure out--namely, the first time we really noticed the problem was during some tests involving outdoor use, where it was cooler than indoors. BUT WHY? Although I can see why temperature could affect a clock signal and therefore link stability, why would it ONLY matter if the IMU is in use via the SPI peripheral?

7. Changing the LFCLK source to SYNTH in sdk_config.h eliminates the problem entirely, at the expense of current consumption. This is why I still suspect that the clock has something to do with it. But I can't figure out what could be wrong.

I'm at a loss what to measure, test, or try at this point. Do any of you wonderful people have any ideas? I am happy to provide more detail on any point if needed.

Parents
  • Can you provide details of the TCXO and an accurate scope (not logic analyser) trace of the TCXO waveform at the nRF52? The LFCLK external input is very piccy, and the timing may well be affected by the input voltage levels which in turn may be affected by the SPI transfer disturbance coupling to Vdd on either the TCXO or the nRF52840. What has the customer-specific TCXO output voltage been set to, and does it match the nRF52840? If not then coupled noise becomes more critical.

    nRF52840: "VAMP,IN,XO,LOW Peak to peak amplitude for external low swing clock. Input signal must not swing outside supply rails. min 200 max 1000 mV"

    SiT1552: The SiT1552 has two output voltage options. One option is a standard LVCMOS output swing. The second option is the NanoDrive reduced swing output. Output swing is customer specific and programmed between 200 mV and 800 mV. For DC-coupled applications, output VOH and VOL are individually factory programmed to the customers’ requirement"

    Also which nRF52840 external clock input option is being used:

    "The following external clock sources are supported:
    - Low swing clock signal applied to the XL1 pin. The XL2 pin shall then be grounded.  set BYPASS=0
    - Rail-to-rail clock signal applied to the XL1 pin. The XL2 pin shall then be grounded or left unconnected"  set BYPASS=1

Reply
  • Can you provide details of the TCXO and an accurate scope (not logic analyser) trace of the TCXO waveform at the nRF52? The LFCLK external input is very piccy, and the timing may well be affected by the input voltage levels which in turn may be affected by the SPI transfer disturbance coupling to Vdd on either the TCXO or the nRF52840. What has the customer-specific TCXO output voltage been set to, and does it match the nRF52840? If not then coupled noise becomes more critical.

    nRF52840: "VAMP,IN,XO,LOW Peak to peak amplitude for external low swing clock. Input signal must not swing outside supply rails. min 200 max 1000 mV"

    SiT1552: The SiT1552 has two output voltage options. One option is a standard LVCMOS output swing. The second option is the NanoDrive reduced swing output. Output swing is customer specific and programmed between 200 mV and 800 mV. For DC-coupled applications, output VOH and VOL are individually factory programmed to the customers’ requirement"

    Also which nRF52840 external clock input option is being used:

    "The following external clock sources are supported:
    - Low swing clock signal applied to the XL1 pin. The XL2 pin shall then be grounded.  set BYPASS=0
    - Rail-to-rail clock signal applied to the XL1 pin. The XL2 pin shall then be grounded or left unconnected"  set BYPASS=1

Children
No Data
Related