This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

TWI hanging in nrf_twim_event_check

I am using TWI 0 to communicate with an external EEPROM.  99% of the time this is going succesfully.  For reasons best known to itself, my eeprom will sometimes fail to acknowledge read requests properly, leaving the SDA line in an abnormal state.

I can identify that this has happened via a timeout and would have been able to recover, but the TWI interface is hanging in         while (!nrf_twim_event_check(p_twim, evt_to_wait))  within twim_xfer() from SDK 15.3.0

Rather than ensure my slave device never does anything unexpected, I'd prefer to make my code able to withstand these faults, simply providing back a read error.

Are there any recommended methods for adding a timeout within this loop or for clearing flags externally so that this loop breaks out with a fault instead of continuing to hang?

  • In answer to your question about the hardware Karl, I'm using an NRF82532 on a 3rd party module on a custom board, sorry it would probably have been easier were I still on the dev kit.

    I have been looking to impliment  the twi_clear_bus option to see if that helps resolve the while loop, though I have been struggling to place this where it has access to the necessary structures.

    twi_clear_bus is wanting a pointer to a structure of type nrf_drv_twi_config_t that tells it which pins need toggling.

    However, where I am detecting this lockup of while (!nrf_twim_event_check(p_twim, evt_to_wait)) is within twim_xfer() and I do not believe this has access to the same structures - I'm not sure if I may need to use the instance number to look up the config information using something like &m_cb[p_instance->drv_inst_idx]

    In the mean time I've started seeing an indication of what it is that the slave is doing that is not acceptable to the nordic twi parser.

    Typically, after the last bit has been clocked out by the master writing a value, the slave will then switch over and pull the line low for one more clock cycle to acknowledge receipt of this byte.  What I am seeing here is that intermittently (and often on certain addresses from which I am reading) I see the line get pulled low as though the slave were starting to provide an ACK bit, but then part way through the ACK that line goes high as though it were a stop bit, but without the extra clock cycle to go with it.

  • Andrew said:
    In answer to your question about the hardware Karl, I'm using an NRF82532 on a 3rd party module on a custom board, sorry it would probably have been easier were I still on the dev kit.

    I assume you meant nRF52832. It is easier to develop on the DK - since it has better access to all pins, along with the built inn debugger - but we should be able to fix this issue directly on the nRF52832 if you do not have a DK ready at hand.

    Andrew said:
    twi_clear_bus is wanting a pointer to a structure of type nrf_drv_twi_config_t that tells it which pins need toggling.

    This is correct, it needs to know which two pins are configured to SCL and SDA. If you are implementing your own version of this function, SDA and SCL is really all it needs to know, so you could either provide it with the nrf_drv_twi_config_t structure that you used to initialize your TWI to begin with, or you could modify the code to use your TWI_SCL_M and TWI_SDA_M #define's directly. Directly using the defined values is not as flexible as passing the config_t structure, but assuming you do not need this function to reset multiple TWI buses, this is no problem.

    However, I would like to stress that a successful TWI_clear_bus is not going to solve your original issue with the nRF52832 getting stuck in an acknowledgement waiting loop.
    I suggested the clear_bus solution when I was under the impression that your problem was a slave forcing the TWI bus low, which I since have understood was not your original issue.

    To solve your original issue of the nRF52832 getting stuck waiting for an acknowledgement, I suggest implementing the timer solution as discussed earlier.

    Andrew said:
    Typically, after the last bit has been clocked out by the master writing a value, the slave will then switch over and pull the line low for one more clock cycle to acknowledge receipt of this byte.  What I am seeing here is that intermittently (and often on certain addresses from which I am reading) I see the line get pulled low as though the slave were starting to provide an ACK bit, but then part way through the ACK that line goes high as though it were a stop bit, but without the extra clock cycle to go with it.

    I am glad you have been able to isolate the particular occurrences of the issue. It seems to me that the slave is starting to send a NACK(by setting the SDA), but then changes to ACK halfway through the SCL tick, and back again to NACK for the last halve? This does indeed seem strange. Which EEPROM are you using?
    Could you also tell me who is pulling SCL low at the end of the transmission, is the EEPROM using clock stretching?

    Looking forward to hearing if the timer implementation resolves your issue,

    Best regards,
    Karl

  • You are correct Karl, I'd got the part number wrong, it should have read nRF52832.

    The EEPROM is an ST M24M01 which doesn't appear to specify that it is using clock stretching, though I suspect it may be doing something deliberate to slow down the i2c access.

    Today I've seen some promising results.

    Within twim_xfer() is a loop    while (!nrf_twim_event_check(p_twim, evt_to_wait))  which is executed within the main process.

    I have a timeout counter that is set before this while loop and which gets decremented on a timer.  If this reaches zero within the while loop then I return an error code NRF_ERROR_TIMEOUT.  (currently this is several seconds long but should really be far shorter)

    This atleast gets us out of the while loop where we were locked, but due to the flags setup within the TWI instance you cannot simply go back to trying to write to I2C again as it will just get stuck in the same way.

    I therefore need to observe that this error code has been returned and by closing the I2C port and then reinitialising the whole TWI instance using the commands below.  

    I can then go ahead and try performing the read again which proves succesful this time.  Perhaps there may be simpler ways that don't require fully disabling and setting up the TWI instance again, but this certainly provides a nice fresh start that proves to be reliable.

    Now I will switch my efforts to looking at what I can do with the EEPROM to limit this error condition being caused in the first place, but with the knowledge that if this does occur then we atleast have a reliable means of recovery.

    Thank you for your support with getting to this point.

    // method for recovering an I2C connection after we have had to break out of the while loop due to a slave not acknowledging as expected.

    // close existing connection

    nrf_drv_twi_disable(eep_cfg.p_twi_instance);

    nrf_drv_twi_uninit(eep_cfg.p_twi_instance);

    // setup fresh connection

    err_code = twi_manager_request(eep_cfg.p_twi_instance,
    eep_cfg.p_twi_cfg,
    NULL,
    NULL);
    RETURN_IF_ERROR(err_code);

    nrf_drv_twi_enable(eep_cfg.p_twi_instance);

  • Andrew said:
    Today I've seen some promising results.

    I am happy to hear that!

    Andrew said:
    (currently this is several seconds long but should really be far shorter)

    Far shorter indeed. A multiple of the average transmission time will probably suffice. You could fine-tune this by using an oscilloscope to monitor the transmissions and timing when the error occurs.

    Andrew said:

    This atleast gets us out of the while loop where we were locked, but due to the flags setup within the TWI instance you cannot simply go back to trying to write to I2C again as it will just get stuck in the same way.

    I therefore need to observe that this error code has been returned and by closing the I2C port and then reinitialising the whole TWI instance using the commands below.  

    Which flags are you referring to here? Do you immediately get stuck after the following transmission, despite receiving an acknowledgement? Or does the following transmission never occur, since it is still waiting for the previous acknowledgement?
    It is good that you have found a working solution to your problem that works for your application. However, I also suspect that its possible to do this without having to reinit the twi driver. If you intend to look into simplifying the workaround, I suggest starting by identifying the flags/configuration that is not reset properly following the timeout, and using the TWI API Reference to determine how to trigger the necessary resets.

    Andrew said:
    Now I will switch my efforts to looking at what I can do with the EEPROM to limit this error condition being caused in the first place, but with the knowledge that if this does occur then we atleast have a reliable means of recovery.

    If this is something that occurs often, I would think the manufacturer of the EEPROM would have documented it, or at least someone else should have mentioned it in their forums.

    Andrew said:
    Thank you for your support with getting to this point.

    No problem at all, Andy. I am happy that we were able to solve your issue, and wish you the best of luck in the continued development of your application! :)
    Come back any time if you should encounter anymore issues in the future.

    Best regards,
    Karl

Related