TWI/I2C bus / driver gets stuck occasionally

I'm working with a nRF52840 DK connected with SCL, SDA, GND, VDD to my custom PCB. I'm using the nRF5_SDK_17.1.0.ddde560. 

My apologies in advance if I missed anything obvious, as I am new to development with nRF.

I'm trying to communicate with the ADS1114 through a TCA9517A bus level shifter, as the ADS1114 has a supply voltage of 5V. The I2C communication often works, and seems to keep working, but sometimes the bus seems to get stuck. The project is based on the pca10056 twi_sensor example project, I mainly changed the registers to write.

I've looked at the other posts related to TWI problems, but I did not find a solution to this issue there, unfortunately.

I am using external pullups (~3k) and have disabled the internal pull-ups.

I analysed the I2C bus with a logic analyser and oscilloscope, but I feel like I lack the experience to figure out what exactly is going on, so I would really appreciate any insights into the problem or what steps I could try to fix it. 

Failed communication example

With the scope images of the final CLKs before communication fails (blue is SDA, purple is SCL):

Succesfull transaction example

The error doesn't always occur at this spot, but the cases look like this.

When this error occurs, the code gets stuck at the     'while (m_xfer_done == false);' that is after the transaction.

To me the biggest difference seems to be the pull down of the SCL signal to a voltage level lower than the other pull-down levels. There's also what appears to be a small 'glitch' in the SDA before this happens, but I am not sure if this is related. I looked at the ADS1114 datasheet, and it says the device does not implement clock stretching nor does it drive the SCL pin. It is almost as if the master just stops driving the CLK.

I get the impression that if the initial transfer (configuration write, address pointer write, first few reads) is succesfull, it continues to work indefinitely. Therefore I might be able to just keep on retrying until it is succesfull by implementing a recovery system. Do you have any pointers to how I would succesfully implement this?

Source code

For some more context, here is the code of the main transactions (note, LM75B_ADDR is the ADS1114 address):

/* Mode for ADS1114. */
#define OPERATING_MODE_BYTE1 0b10000100 //0x84
#define OPERATING_MODE_BYTE2 0b10000011 //0x83

void ADS1114_CONFIG(void)
{
ret_code_t err_code;

/* Writing to LM75B_REG_CONF "0" set temperature sensor in NORMAL mode. */
uint8_t reg[3] = {LM75B_REG_CONF, OPERATING_MODE_BYTE1, OPERATING_MODE_BYTE2};
err_code = nrf_drv_twi_tx(&m_twi, LM75B_ADDR, reg, sizeof(reg), false);
APP_ERROR_CHECK(err_code);
while (m_xfer_done == false);

/* Writing to LM75B_REG_CONF "0" set temperature sensor in NORMAL mode. */
uint8_t reg2[1] = {0b00000011};
err_code = nrf_drv_twi_tx(&m_twi, LM75B_ADDR, reg2, sizeof(reg2), false);
APP_ERROR_CHECK(err_code);
while (m_xfer_done == false);
}

void twi_handler(nrf_drv_twi_evt_t const * p_event, void * p_context)
{
switch (p_event->type)
{
case NRF_DRV_TWI_EVT_DONE:
//if (p_event->xfer_desc.type == NRF_DRV_TWI_XFER_RX)
//{
//data_handler(m_sample);
//}
m_xfer_done = true;
break;
default:
break;
}
}

void twi_init (void)
{
ret_code_t err_code;

const nrf_drv_twi_config_t twi_lm75b_config = {
.scl = ARDUINO_SCL_PIN,
.sda = ARDUINO_SDA_PIN,
.frequency = NRF_DRV_TWI_FREQ_100K,
.interrupt_priority = APP_IRQ_PRIORITY_HIGH,
.clear_bus_init = true
};

err_code = nrf_drv_twi_init(&m_twi, &twi_lm75b_config, twi_handler, NULL);
APP_ERROR_CHECK(err_code);

nrf_drv_twi_enable(&m_twi);
}

static void read_sensor_data()
{
m_xfer_done = false;

/* Read 1 byte from the specified address - skip 3 bits dedicated for fractional part of temperature. */
//ret_code_t err_code = nrf_drv_twi_rx(&m_twi, LM75B_ADDR, &m_sample, sizeof(m_sample));
ret_code_t err_code = nrf_drv_twi_rx(&m_twi, LM75B_ADDR, samples, sizeof(samples)/sizeof(samples[0]));

APP_ERROR_CHECK(err_code);
}

Sidenotes

- Occasionally, the first write fails entirely and this could be seen on the scope, although I am not 100% sure if this is related to the issue (here, purple was a custom output pin set high at start):

- Sometimes if I keep the development kit running for a bit, I get a random J-link error. This might be because it is stuck waiting for the transaction to finish.

Some more context:

- I am using SEGGER Embedded Studio for ARM V7.30 on Windows 10

Any advice and help is appreciated. If more information is required, please let me know.

Kind regards,

Frederik

  • Hi,

     

    At the moment, you're only resetting the variable if the TWI transaction get's an ACK:

    void twi_handler(nrf_drv_twi_evt_t const * p_event, void * p_context)
    {
    switch (p_event->type)
    {
    case NRF_DRV_TWI_EVT_DONE:
    //if (p_event->xfer_desc.type == NRF_DRV_TWI_XFER_RX)
    //{
    //data_handler(m_sample);
    //}
    m_xfer_done = true;
    break;
    default:
    break;
    }
    }

    If you do not reset the flag on other error events, you will hang if those occur.

     

    Q1: How often do you get this NACK from the sensor?

    Q2: Have you tried reducing the TWI speed to see if this has an effect on the problem?

     

    Kind regards,

    Håkon

  • Dear Håkon, thank you for your swift reply.

    Ah, that makes sense. When I was debugging initially and stepping into the handler, it seemed that it would step into the m_xfer_done = true line, and still remain stuck. That gave me the impression that it handled it succesfully. I will add a reset to the flag in the other events and see if it stops hanging.

    Q1: The example did not handle the other events, so is it correct that the example basically assumed the transaction would never fail?

    As a sidenote, this is what every periodic read from the sensor, when everything works, looks like:

    I must admit that I am not sure whether the NAK is positive or negative, but since it 'works' I assumed it was positive.

    Every succesful write ends in an ACK

    A1: I would say that at every new debugging attempt, it's probably close to 50% failure rate (NACK). I now realise this is more often than the 'occassionally' as mentioned in the title. I must add that this debugging reset is without disconnecting the PCB with the ADC, so I might need to look into whether the ADC needs a proper reset before trying to reconfigure it.

    A2: Currently the speed is set to 100 kHz, the only other defined options were 250 kHz and 400 kHz. Can I lower the frequency by replacing the definition with a lower frequency?

    Kind regards,

    Frederik

  • As a quick test, I added the following to the twi_handler:

    default:
    m_xfer_done = true;
    break;
    }

    Unfortunately, the problem still occurs:

    And it remains stuck on the while (m_xfer_done == false); 

    edit: I don't think it's the lack of an ACK that is causing the issue. The last byte contains 8 CLK pulses while the others contain 9. I think something goes wrong before the handler is called.

  • Update 2.

    Some progress:

    I implemented a general I2C call with the reset command, as described in the ads1114 datasheet, and now I can repeatedly finish the first write transaction succesfully. I must admit, that I am not 100% sure why.

    The new issue is that, right after the first byte write the program hits a NRF_BREAKPOINT_COND; breakpoint, with the following log error:

    <error> app: ERROR 17 [NRF_ERROR_BUSY] at C:\nRF5_SDK_17.1.0_ddde560\examples\peripheral\twi_sensor_edited\main.c:144

    for context, this is the block around line 144:

    I am assuming the error indicates that the TWI line is busy, but for now I haven't been able to find the exact definition of error code 17. 

    This strikes me as odd, as the while loop in line 137 should make sure the previous transfer is finished.

    This is the current code in the TWI handler, I made sure to remove the m_xfer_done = true in the default case:

  • Hi,

     

    As you have external pull-ups, I would assume that this is not an assert/software-reset that happens?

    Your clock line is held low, and if the i2c sensor is the source of this; the sensor is performing clock stretching.

    Does it ever recover from this low scenario on the SCL pin?

     

    *edit* Just saw your "update 2" post.

    Is there any timing requirements on this external sensor that might be violated? Does the issue occur randomly, or consistently after x amount of seconds?

     

    Kind regards,

    Håkon

Related