FreeRTOS RTC Lockup with TWI

Hello,

I am having an issue with FreeRTOS (v10.0.0) on the NRF52833 (SDK 17.1.0). The core of the issue is that the  xPortSysTickHandler function stops being called if I am also making TWI writes as a TWI master, though it is possible that issue was occurring without that change, but more sporadically. Some more information: 

  • If I configure FreeRTOS to use the FREERTOS_USE_SYSTICK instead of the RTC then I don’t have an issue, but for power consumption reasons I would like to resolve this so that I can use the RTC
  • I have Tickless Idle turned off (0)
  • In normal operation (fresh power-up) I will successfully operate for ~100-120 seconds before the SysTickHandler stops being called. 
  • Once this issue has occurred, if I restart the processor with the debugger, it will only ever call SysTickHandler once, and then never again. A power reset is necessary to get it working again for a bit longer. This seems very interesting and telling since there seems to be some configuration bit somewhere that is resulting in our failure, and that survives a device restart but not a power cycle. 
  • I have a sensor thread that uses TWI0 to talk to onboard sensors. This thread seems to have no issues when I run it
  • I have external sensors on a thread using TW1. If I disconnect the external slave device then I do not ever seem to have a crash (over some reasonable span of ~20 minutes). If I attach the external slave device (which receives data successfully) then after ~105 transmissions (sent every second) the system will cease to call xPortSysTickHandler. If I change the frequency of comms in the thread then the number of transmissions will vary.
  • I have tried a few different slave devices. I'm using an external Arduino as a logger to serial, and have tried a few arduino boards and I2C Slave libraries
  • If I swap TWI0 & 1 between the threads the issue persists. I suspect that there is some marginal issue with the I2C slave on the external board, but lots of packet scanning has shown no obvious failure mode in the packets themselves. Also, I think that this issue may have happened previously but less quickly/reliably without this external comms thread running
  • In the crash state, vApplicationIdleHook keeps being called, and the vApplicationMallocFailedHook and vApplicationStackOverflowHook never get called. The SystemTickHandler simply stops being called
  • Doing a check for NVIC_GetEnableIRQ(portNRF_RTC_IRQn) in the vApplicationIdleHook shows that the IRQ is still enabled. The same call returns false when using SysTick so that indicates the IRQ hasn’t been deactivated
  • I do NOT have a wdt enabled, so I do not think that this thread has a solution, and I have inspected that bit of code in the debugger and seen no issues with how it is executing https://devzone.nordicsemi.com/f/nordic-q-a/63998/freertos-wdt-sdk17-problem/369131 
  • I do not see any obvious write errors in the TWI module right before things crash/go down
  • I have tried moving the RTOS to using RTC2 but that doesn't change the behavior at all- same results
  • Turning the thread that uses the other TWI module off does not change anything

One interesting experiment that I've tried is: 

  • When the device has failed it then only runs SysTickHandler once on a debugger reset. BUT if I disconnect the external I2C device, then SysTickHandler is never called on a reset
  • Plugging the I2C device back in results in SysTickHandler being called exactly once on reset. Digging into the timing, if there is no I2C device attached then SyStickHandler is called just slightly before returning from our doomed I2C external write operation. If there is a device attached then SysTickHandler is never called
    • However if I do a power cycle then SysTickHandler is called many times before we ever attempt our external i2C write operation. This makes sense since the problematic thread is last to be initialized
  • Digging into the twi_xfer operation, the error code return is NACK when no device attached, and SUCCESS with a device attached, which is expected. This result is true for a clean restart, and an error restart

I have done a lot of poking around but I am not an expert in using FreeRTOS, and so I’m at a bit of a loss as to where we are going down. Overall it seems like RTC1 that the RTOS is relying on is ceasing to run at some point, or the IRQ is not being called somehow. The system is not locking up or crashing since the idle hook keeps being called, and the thread that is on that hook is still working. But any thread that relies on the suspend_rtos_task() mechanism is not functional

Related