AT+CFUN=20 (lte_lc_func_mode_set DEACTIVATE_LTE) blocks indefinitely under registration churn — software deadlock ruled out (nRF9151, mfw 2.0.4, NCS 3.3.0)

Environment

  • nRF9151 (Thingy:91X), modem firmware mfw_nrf91x1_2.0.4
  • nRF Connect SDK v3.3.0 (Zephyr 4.3.99)
  • LTE-M + GNSS: CONFIG_LTE_NETWORK_MODE_LTE_M_GPS=y
  • PSM requested (RPTAU 1800s, RAT 20s), no eDRX, no RAI, default GNSS use case, prio_mode not used

What the application does
Periodically I take the modem LTE-dark while keeping GNSS running (geofence-triggered "radio silent" state): close the MQTT/TLS socket, then call lte_lc_func_mode_set(LTE_LC_FUNC_MODE_DEACTIVATE_LTE). Later it reactivates LTE.

Problem
Intermittently, lte_lc_func_mode_set(DEACTIVATE_LTE) never returns — execution blocks inside the call, and ~58s later the watchdog resets the device. It succeeds most of the time. The hang correlates with live network conditions (areas with registration churn) and does not reliably reproduce stationary on a stable cell.

Software-level deadlock ruled out
I want to lead with what I've eliminated, since it's the obvious first suspect:

  • lte_lc_func_mode_set(DEACTIVATE_LTE) is called from my application main thread (a supervisor poll loop) — notfrom within the lte_lc event handler, and not from any nrf_modem_gnss callback or other modem-library thread context.
  • My lte_lc event handler (registered via lte_lc_connect_async) does no blocking work — it only sets atomics and posts a semaphore on NW_REG_STATUS / RRC_UPDATE. It never calls back into the modem mode-set API.
  • Geofence crossing is detected in the GNSS callback but only sets atomic flags; the actual deactivate is polled and executed on the main thread, fully decoupled from any callback context.

So the modem RX thread is free to deliver the AT+CFUN=20 response — a classic "blocking call from the RX/callback context deadlocks the modem library" scenario does not apply here structurally.

Log around the failure (device-uptime timestamps in brackets):

[00:14:02.654] telemetry: MQTT/TLS closed before radio-off
[00:14:02.654] app: rrc_idle=1 quiet_ms=110847, deactivating LTE (GNSS stays on)
[00:14:02.6xx] app: about to call lte_lc_func_mode_set(DEACTIVATE_LTE)   (main thread)
+CEREG: 1,"1405","09EE3D0F",7,,,"00001010","11100000"
*** Booting nRF Connect SDK v3.3.0 ***          <-- ~58s later, watchdog reset

The landmark that should print immediately after lte_lc_func_mode_set returns never appears — the next log line is the reboot banner. RRC was idle and the radio had been quiet for 110+ seconds before the call, so this is not residual data activity.

Current hypothesis (modem-firmware level)
With the software deadlock ruled out, my leading theory is that the block is inside the modem firmware: when AT+CFUN=20 arrives while the network is mid-(re)registration — note the +CEREG: 1 landing right at the call — the modem cannot immediately reach a quiesceable radio state and holds the AT command pending. If the network keeps re-activating the radio, the command may never complete, so the synchronous lte_lc_func_mode_set never returns. I have not confirmed this at the modem level yet (I can capture a modem trace).

Pre-empting likely questions

  • Sockets: All application MQTT/TLS sockets are explicitly closed and logged before the call. ⟦VERIFY: confirm no nRF Cloud A-GNSS REST socket is still open at deactivate time — CONFIG_NRF_CLOUD_AGNSS=y is set.⟧
  • Reset type: Clean MCUboot reboot ~58s after the hang (signature verify, boot slot 0), no hardfault/exception dump — consistent with a watchdog reset, not a modem fault or crash. ⟦VERIFY: insert your WDT timeout value and quote any "Reset cause:" line.⟧
  • Calling context: Confirmed — main application thread (supervisor poll loop), not a callback/RX-thread context (detailed above).
  • PSM/coexistence: LTE_M_GPS mode, PSM requested (RPTAU 1800s / RAT 20s), no eDRX, no RAI, default GNSS use case, prio_mode not used.
  • GNSS: Running concurrently throughout (LTE-dark with GNSS-on is the intended state). Not yet tested with GNSS stopped.
  • Reproducibility: Intermittent, tied to live network/registration churn; does not reliably reproduce stationary on a stable cell.
  • Modem FW: mfw_nrf91x1_2.0.4; not yet tested on a newer modem FW.

Questions

  1. Can AT+CFUN=20 / lte_lc_func_mode_set(DEACTIVATE_LTE) block indefinitely at the modem-firmware level if the network is mid-registration (e.g. a +CEREG URC arriving as the command is issued)? Is this a known interaction?
  2. If so, what's the recommended way to issue a bounded or abortable deactivate — is there a safe timeout-and-recover pattern, or a way to force the radio to a quiescent state first (e.g. a different CFUN value, or detach sequence) before requesting DEACTIVATE_LTE?
  3. Is there a recommended pattern for reliably taking the modem LTE-dark (GNSS-on) specifically in poor coverage where the network keeps attempting re-registration?
  4. Would a newer modem firmware than 2.0.4 contain relevant fixes for CFUN-transition behavior under registration churn?

I can provide a full modem trace captured during the hang if helpful.

Parents Reply Children
No Data
Related