Hello,
I am experiencing trouble with NRF9160 chips that get stuck after typically 1 to 2 days of operation. This happens on all 6 boards that I have available. I'll describe the software architecture in brief:
- Thread 1 is communicating with an external microcontroller (mainly retrieving data every 15 minutes)
- Thread 2 is uploading this data via an LTE connection (to be more specific, MQTT-SN to thingstream portal).
- Thread 3 is driving an e-ink display, implemented as a message queue that receives events from thread 1 and 2.
The observation is that the NRF9160 just stops doing anything. I have put all 3 threads under watchdog control, and verified that the watchdog is working fine when a thread is indeed stuck. I am also outputting logs via the console, from the 3 threads. All threads stop at the same time as far as I can tell. For testing, I have also added in thread 3 a toggling of 1 GPIO pin, even that toggling also comes to a halt. There is barely any synchronization between the 3 threads (i.e. mutexes, semaphores), and I can't imagine that threads would get stuck on that. Even if they would get stuck, that would never hold for all 3 threads, and i would also expect the watchdog to kick in eventually. I have also increased the stack, verified free heap memory, ... all seems OK from a firmware point of view.
I measured the reset pin and the voltage: they are both as expected. So my suspicion is that the NRF is brought into reset and not coming out of that. What could be the root cause for this?
Worth to mention that the NRF reset pin was pulled high to 3.3V by a 10kOhm, which was a schematic fault, and I removed the pull-ups. I have read other threads on devzone that clearly indicate that this is problematic but not sure if this mistake could really cause some damage leading to the observed behavior.
I have not yet tried to disable threads or increase the "work load" of the system to verify if those can influence the occurence of the issue. That's probably my next step for the weekend.
Any suggestions on what we could measure to verify that the NRF is in reset (or not) would be very welcome.