Environment
- nRF9151 (Thingy:91X), modem firmware mfw_nrf91x1_2.0.4
- nRF Connect SDK v3.3.0 (Zephyr 4.3.99)
- LTE-M + GNSS:
CONFIG_LTE_NETWORK_MODE_LTE_M_GPS=y - PSM requested (RPTAU 1800s, RAT 20s), no eDRX, no RAI, default GNSS use case,
prio_modenot used
What the application does
Periodically I take the modem LTE-dark while keeping GNSS running (geofence-triggered "radio silent" state): close the MQTT/TLS socket, then call lte_lc_func_mode_set(LTE_LC_FUNC_MODE_DEACTIVATE_LTE). Later it reactivates LTE.
Problem
Intermittently, lte_lc_func_mode_set(DEACTIVATE_LTE) never returns — execution blocks inside the call, and ~58s later the watchdog resets the device. It succeeds most of the time. The hang correlates with live network conditions (areas with registration churn) and does not reliably reproduce stationary on a stable cell.
Software-level deadlock ruled out
I want to lead with what I've eliminated, since it's the obvious first suspect:
lte_lc_func_mode_set(DEACTIVATE_LTE)is called from my application main thread (a supervisor poll loop) — notfrom within thelte_lcevent handler, and not from anynrf_modem_gnsscallback or other modem-library thread context.- My
lte_lcevent handler (registered vialte_lc_connect_async) does no blocking work — it only sets atomics and posts a semaphore onNW_REG_STATUS/RRC_UPDATE. It never calls back into the modem mode-set API. - Geofence crossing is detected in the GNSS callback but only sets atomic flags; the actual deactivate is polled and executed on the main thread, fully decoupled from any callback context.
So the modem RX thread is free to deliver the AT+CFUN=20 response — a classic "blocking call from the RX/callback context deadlocks the modem library" scenario does not apply here structurally.
Log around the failure (device-uptime timestamps in brackets):
[00:14:02.654] telemetry: MQTT/TLS closed before radio-off
[00:14:02.654] app: rrc_idle=1 quiet_ms=110847, deactivating LTE (GNSS stays on)
[00:14:02.6xx] app: about to call lte_lc_func_mode_set(DEACTIVATE_LTE) (main thread)
+CEREG: 1,"1405","09EE3D0F",7,,,"00001010","11100000"
*** Booting nRF Connect SDK v3.3.0 *** <-- ~58s later, watchdog reset
The landmark that should print immediately after lte_lc_func_mode_set returns never appears — the next log line is the reboot banner. RRC was idle and the radio had been quiet for 110+ seconds before the call, so this is not residual data activity.
Current hypothesis (modem-firmware level)
With the software deadlock ruled out, my leading theory is that the block is inside the modem firmware: when AT+CFUN=20 arrives while the network is mid-(re)registration — note the +CEREG: 1 landing right at the call — the modem cannot immediately reach a quiesceable radio state and holds the AT command pending. If the network keeps re-activating the radio, the command may never complete, so the synchronous lte_lc_func_mode_set never returns. I have not confirmed this at the modem level yet (I can capture a modem trace).
Pre-empting likely questions
- Sockets: All application MQTT/TLS sockets are explicitly closed and logged before the call. ⟦VERIFY: confirm no nRF Cloud A-GNSS REST socket is still open at deactivate time —
CONFIG_NRF_CLOUD_AGNSS=yis set.⟧ - Reset type: Clean MCUboot reboot ~58s after the hang (signature verify, boot slot 0), no hardfault/exception dump — consistent with a watchdog reset, not a modem fault or crash. ⟦VERIFY: insert your WDT timeout value and quote any "Reset cause:" line.⟧
- Calling context: Confirmed — main application thread (supervisor poll loop), not a callback/RX-thread context (detailed above).
- PSM/coexistence:
LTE_M_GPSmode, PSM requested (RPTAU 1800s / RAT 20s), no eDRX, no RAI, default GNSS use case,prio_modenot used. - GNSS: Running concurrently throughout (LTE-dark with GNSS-on is the intended state). Not yet tested with GNSS stopped.
- Reproducibility: Intermittent, tied to live network/registration churn; does not reliably reproduce stationary on a stable cell.
- Modem FW: mfw_nrf91x1_2.0.4; not yet tested on a newer modem FW.
Questions
- Can
AT+CFUN=20/lte_lc_func_mode_set(DEACTIVATE_LTE)block indefinitely at the modem-firmware level if the network is mid-registration (e.g. a+CEREGURC arriving as the command is issued)? Is this a known interaction? - If so, what's the recommended way to issue a bounded or abortable deactivate — is there a safe timeout-and-recover pattern, or a way to force the radio to a quiescent state first (e.g. a different CFUN value, or detach sequence) before requesting
DEACTIVATE_LTE? - Is there a recommended pattern for reliably taking the modem LTE-dark (GNSS-on) specifically in poor coverage where the network keeps attempting re-registration?
- Would a newer modem firmware than 2.0.4 contain relevant fixes for CFUN-transition behavior under registration churn?
I can provide a full modem trace captured during the hang if helpful.