FreeRTOS RTC Lockup with TWI

Hello,

I am having an issue with FreeRTOS (v10.0.0) on the NRF52833 (SDK 17.1.0). The core of the issue is that the  xPortSysTickHandler function stops being called if I am also making TWI writes as a TWI master, though it is possible that issue was occurring without that change, but more sporadically. Some more information: 

  • If I configure FreeRTOS to use the FREERTOS_USE_SYSTICK instead of the RTC then I don’t have an issue, but for power consumption reasons I would like to resolve this so that I can use the RTC
  • I have Tickless Idle turned off (0)
  • In normal operation (fresh power-up) I will successfully operate for ~100-120 seconds before the SysTickHandler stops being called. 
  • Once this issue has occurred, if I restart the processor with the debugger, it will only ever call SysTickHandler once, and then never again. A power reset is necessary to get it working again for a bit longer. This seems very interesting and telling since there seems to be some configuration bit somewhere that is resulting in our failure, and that survives a device restart but not a power cycle. 
  • I have a sensor thread that uses TWI0 to talk to onboard sensors. This thread seems to have no issues when I run it
  • I have external sensors on a thread using TW1. If I disconnect the external slave device then I do not ever seem to have a crash (over some reasonable span of ~20 minutes). If I attach the external slave device (which receives data successfully) then after ~105 transmissions (sent every second) the system will cease to call xPortSysTickHandler. If I change the frequency of comms in the thread then the number of transmissions will vary.
  • I have tried a few different slave devices. I'm using an external Arduino as a logger to serial, and have tried a few arduino boards and I2C Slave libraries
  • If I swap TWI0 & 1 between the threads the issue persists. I suspect that there is some marginal issue with the I2C slave on the external board, but lots of packet scanning has shown no obvious failure mode in the packets themselves. Also, I think that this issue may have happened previously but less quickly/reliably without this external comms thread running
  • In the crash state, vApplicationIdleHook keeps being called, and the vApplicationMallocFailedHook and vApplicationStackOverflowHook never get called. The SystemTickHandler simply stops being called
  • Doing a check for NVIC_GetEnableIRQ(portNRF_RTC_IRQn) in the vApplicationIdleHook shows that the IRQ is still enabled. The same call returns false when using SysTick so that indicates the IRQ hasn’t been deactivated
  • I do NOT have a wdt enabled, so I do not think that this thread has a solution, and I have inspected that bit of code in the debugger and seen no issues with how it is executing https://devzone.nordicsemi.com/f/nordic-q-a/63998/freertos-wdt-sdk17-problem/369131 
  • I do not see any obvious write errors in the TWI module right before things crash/go down
  • I have tried moving the RTOS to using RTC2 but that doesn't change the behavior at all- same results
  • Turning the thread that uses the other TWI module off does not change anything

One interesting experiment that I've tried is: 

  • When the device has failed it then only runs SysTickHandler once on a debugger reset. BUT if I disconnect the external I2C device, then SysTickHandler is never called on a reset
  • Plugging the I2C device back in results in SysTickHandler being called exactly once on reset. Digging into the timing, if there is no I2C device attached then SyStickHandler is called just slightly before returning from our doomed I2C external write operation. If there is a device attached then SysTickHandler is never called
    • However if I do a power cycle then SysTickHandler is called many times before we ever attempt our external i2C write operation. This makes sense since the problematic thread is last to be initialized
  • Digging into the twi_xfer operation, the error code return is NACK when no device attached, and SUCCESS with a device attached, which is expected. This result is true for a clean restart, and an error restart

I have done a lot of poking around but I am not an expert in using FreeRTOS, and so I’m at a bit of a loss as to where we are going down. Overall it seems like RTC1 that the RTOS is relying on is ceasing to run at some point, or the IRQ is not being called somehow. The system is not locking up or crashing since the idle hook keeps being called, and the thread that is on that hook is still working. But any thread that relies on the suspend_rtos_task() mechanism is not functional

Parents
  • To add some additional information, if I call  nrf_rtc_counter_get(portNRF_RTC_REG); from within the applicationIdleHook I find that it stops incrementing (stops at ~120,000) when the SysTick stops running. If I then reset with the debugger it gets to 1 and then returns the same thing over and over, never increasing 

    Perhaps it is related to this issue?

  • Hi 

    First of all, thanks for writing down details on the behavior of the system when issue happens. You have tried a lot of things that rule out many of my first thoughts. I do not think this is related to Tickless or even TWI driver.

    In the crash state, vApplicationIdleHook keeps being called, and the vApplicationMallocFailedHook and vApplicationStackOverflowHook never get called. The SystemTickHandler simply stops being called

    functional_eng said:
    To add some additional information, if I call  nrf_rtc_counter_get(portNRF_RTC_REG); from within the applicationIdleHook I find that it stops incrementing (stops at ~120,000) when the SysTick stops running. If I then reset with the debugger it gets to 1 and then returns the same thing over and over, never increasing 

    These two pieces of your experiment seems interesting. The RTC counter should never stop irrespective of the system being in sleep mode or in wakeup mode. Only the interrupt is disabled for RTC in sleep mode. There are two things I would like to know.

    1. Insert Logs in every thread you see in your application so that we know when every thread resumes. Use RTT instead of UART for the backend of the logs.
    2. Dump the whole registers of the RTC1 so that we can see if there are any configuration changes happening from any other context.
    3. If all the RTC1 registers are intact, try writing manually NRF_RTC1->TASK_START to see if that was enough to kickstart the RTC, if that did not work then it was more than just accidental NRF_RTC1->TASKS_STOP that I thought was somehow triggered.

    If possible, create a minimalistic project for me to reproduce the issue at my desk, so that I can attempt to debug this.

  • Thank you for the quick response Susheel,

    Good news- I resolved the issue. The bad news is that it was a silly one. Apparently we had update the LF_SRC in 2 of 3 places in the sdk_config, so CLOCK_CONFIG_LF_SRC was set to 1 not 0. This seems to have resulted in us occasionally/eventually switching to the external clock, which is not there, but is instead shared with TWI lines. Thus all of the bizarre madness in the behavior. Making that config change seems to have resolved the problem.

    I have no idea how we updated only 2 of the 3 config values... 

Reply
  • Thank you for the quick response Susheel,

    Good news- I resolved the issue. The bad news is that it was a silly one. Apparently we had update the LF_SRC in 2 of 3 places in the sdk_config, so CLOCK_CONFIG_LF_SRC was set to 1 not 0. This seems to have resulted in us occasionally/eventually switching to the external clock, which is not there, but is instead shared with TWI lines. Thus all of the bizarre madness in the behavior. Making that config change seems to have resolved the problem.

    I have no idea how we updated only 2 of the 3 config values... 

Children
No Data
Related