We have several hundred devices in real time deployment and find the typical DNS/Cellular connectivity issues during the day that we generally solve with delayed connection retries, up to and including ultimately resorting to automatically soft-rebooting using sys_reboot(SYS_REBOOT_COLD) that typically resolves the 'outage' issues.
However in some instances there's no amount of programmed rebooting that works, and only a battery disconnect+reconnect that instantly solves the connectivity issue. Is there some internal state in the modem firmware that is maintained across a soft reboot, does sys_reboot actually reboot the modem firmware, or cell tower association/mapping that is keeping some old state that only a full power cycle solves? Is there a way to fully reset the nRF9160/reboot the modem firmware as though a full power cycle occurred, or do we need to design a full de-energizing circuit for our motherboard?
I have forwarded your question to our modem team.
What modem firmware version are you using?
How do you turn off the modem before the reset?
If you turn it off with AT+CFUN=0, it will store the current network parameters to flash, to speed up future connection attempts.
I'm using modem firmware 1.2.2. I'm not doing anything specific to turn off the modem, is there an AT+ command way to reboot it? Does sys_reboot reboot the modem or sys_reboot only reboot the firmware running on the program side of the CPU?
sys_reboot should do a soft reset of the modem, as well as of the application.
The modem team would like a modem trace to investigate what happens both during your 'DNS/cellular connectivity issues', and during the reset.
Could you capture a modem trace for them to analyze?
These devices are deployed in public spaces in the field so modem trace capture isn't possible. However we have advanced flash storage logging capability so when the device does finally connect to the cloud, we get an upload of the application level firmware activity.
For a given device, Signal strength around -80 dBm.
- mqtt_connect() underlying socket connect blocks for an unacceptably long time (we cant have the CPU be at non PSM power state for too long otherwise the battery levels will drop off), so after an application timeout period such as 120 seconds we sys_reboot the device.
- We then sleep for 60m in case it is a cellular network or cloud outage.
- Try connecting again, LTE network acquisition successful, mqtt_connect to the cloud backend, socket connection block, so sys_reboot.
- We're then in a loop of socket connect blocks and reboots.
- Finally someone goes to the device and powers it off and on from the enclosure exterior with a magnet and hall sensor switch, and lo and behold the device connects instantly to the network and the socket connect succeeds to the cloud backend. The device works normally for some period again (say for 10 hours at 10 connections per hour), then ends up socket connect blocking again.
So my question is what does the power cycling do in terms of resetting 'something' that clears the indefinite socket connect blocking attempts? Dozens of other devices are in the same neighborhood so area cellular connectivity and cloud backend aren't a factor, perhaps specific tower connectivity is.
What would really be handy is getting a debug stream of say the last 20 events from modem AT+ calls, otherwise deployed modem diagnosis simply isn't possible,