Long PSM delay problem in NB-IoT mode

While testing CoAP with NB-IoT I normally get PSM (sleep) delay = 20s before PSM mode is set (0.5mA with UART enabled), but after awhile PSM delay increases to ~4.5min (at 34mA) after each transmission has finished. I enabled a modem trace and the Wireshark capture shows me that when PSM delay = 20s (from RRC Inactivity timer=20s + Active time=0) then "RRCConnectionRelease-NB" message is logged just before +CSCON=0, but when the delay is ~4.5min then "RRCConnectionRelease-NB" message is missing and +CSCON=0 takes 4.5min instead of 20s.

I'm using the sample nRF Cloud multi-service app with SDK 2.6.1, Modem v1.3.6, connected to nRF Cloud (using a Spark NZ NB-IoT SIM). After this problem has happened then it sometimes recovers after next data transmission, provided I'm using a slow enough data rate (e.g. 10min). If I test this with a 2min data rate then doesn't seem to recover. In this failed mode the first Tx request always fails (probably the modem thinks it's still connected), and the re-transmit only works after it's logged a +CSCON=0, +CSCON=1 cycle (a forced reconnection recovery by the modem?).  I haven't seen this problem at all with a LTE-M SIM (where PSM delay = 10, from the network RRC inactivity timer).

As a work around I tried enable RAI (e.g. AT%XRAI=3, with CONFIG_COAP_SEND_CONFIRMABLE=y), and found it nearly always helped set PSM immediately, but not always (e.g. still got sometimes got RRC idle delay = 4.5min delay).  I've disabled location reporting in my build, but the same problem happened with the default config (overlay_coap.conf), and also when I tested using the low power build (overlay_min_coap.conf).

Some questions:

  1. Cellular coverage is normally reliable for me - so could the missing "RRCConnectionRelease-NB" be caused by: a network fault (failed to send it), or a comms error (not received), or an nRF9160 (modem or app) problem?  Are there any suggestions for how to avoid or work-around this problem?

  2. I'm curious where the 4.5min delay comes from - when +CSCON: 0 is set after this delay (with missing "RRCConnectionRelease-NB") - does the modem implement this timeout? (e.g. to ensure PSM gets set, and avoid wasting ~34mA forever)

  3. When the problem has happened - does the modem or the SDK do the recovery sequence (+CSCON:0, +CSCON:1 cycle), seen in Terminal log (2) below - for the 2min data rate, to re-establish comms?

Logs with the "RRCConnectionRelease-NB" missing problem:

1) Terminal log - 10min data rate: 4.5min RCC idle delays after 00:50:25 and 01:10:25

*** Booting nRF Connect SDK v3.5.99-ncs1-1 ***
[00:00:00.257,904] <inf> main: nRF Cloud multi-service sample has started, version: 0.8.3, protocol: CoAP
[00:00:00.258,026] <inf> cloud_connection: Enabling connectivity...
[00:00:00.489,746] <inf> nrf_modem_lib_trace: Trace thread ready
[00:00:00.497,375] <inf> nrf_modem_lib_trace: Trace level override: 2
[00:00:00.536,926] <inf> cloud_connection: Setting up nRF Cloud library...
[00:00:00.538,116] <inf> cloud_connection: Waiting for network ready...
+CEREG: 2,"951B","00070E79",7
+CSCON: 1
+CNEC_ESM: 111,0
+CEREG: 2,"951B","00070E79",7,0,15
+CSCON: 0
+CEREG: 2
+CEREG: 2,"9903","00070EDD",9
+CSCON: 1
%XTIME: ,"42902032929484",
+CGEV: ME PDN ACT 0,0
+CNEC_ESM: 50,0
+CEREG: 1,"9903","00070EDD",9,,,"00000000","00000110"
[00:00:12.989,868] <inf> cloud_connection: Network connectivity gained!
[00:00:13.990,051] <inf> cloud_connection: Network is ready
[00:00:13.990,081] <inf> cloud_connection: Connecting to nRF Cloud
[00:00:13.990,722] <inf> cloud_connection: Device ID: nrf-351901936740707
[00:00:14.818,115] <inf> dtls: IMEI:                  351901936740707
[00:00:14.818,145] <inf> dtls: Modem FW version:      mfw_nrf9160_1.3.6
[00:00:18.429,595] <inf> nrf_cloud_coap_transport: Request authorization with JWT
[00:00:19.526,885] <inf> nrf_cloud_coap_transport: Authorization result_code: 2.01
[00:00:19.526,977] <inf> nrf_cloud_coap_transport: Authorized
[00:00:19.527,221] <inf> nrf_cloud_coap_transport: DTLS CID is active
[00:00:20.186,706] <inf> nrf_cloud_codec_internal: Encode modem_info_network
[00:00:20.187,194] <inf> nrf_cloud_codec_internal: Network mode = LTE-M, LTE-M=1, NB-IoT=1
[00:00:20.187,316] <inf> nrf_cloud_codec_internal: Encode modem_info_device
[00:00:21.343,078] <inf> cloud_connection: Connected to nRF Cloud
[00:00:21.343,170] <inf> shadow_support_coap: Checking for shadow delta...
[00:00:21.345,001] <inf> nrf_cloud_fota_poll: Checking for FOTA job...
[00:00:22.267,974] <inf> nrf_cloud_coap: No pending FOTA job
[00:00:22.268,005] <inf> nrf_cloud_fota_poll: No pending FOTA job
[00:00:22.942,047] <inf> application: Waiting for modem to determine current date and time
[00:00:22.942,077] <inf> application: Current date and time determined
[00:00:24.173,858] <inf> application: Temperature is 23 degrees C
[00:00:24.741,149] <inf> message_queue: Coap send complete
[00:00:33.533,172] <inf> shadow_support_coap: Checking for shadow delta...
[00:00:34.237,487] <inf> shadow_support_coap: Checking again in 1800 seconds
+CSCON: 0		//PSM delay = 20s

[00:10:24.173,889] <inf> application: Temperature is 22 degrees C
+CSCON: 1
[00:10:25.831,542] <inf> message_queue: Coap send complete
+CSCON: 0       //PSM delay = 20s

[00:20:24.173,919] <inf> application: Temperature is 22 degrees C
+CEREG: 1,"9903","09BCBCE7",9,,,"00000000","00000110"
[00:20:27.176,849] <err> net_coap: Timeout in poll, retrying send
+CSCON: 1
[00:20:27.842,254] <inf> message_queue: Coap send complete
+CSCON: 0		//PSM delay = 20s

[00:30:22.268,127] <inf> nrf_cloud_fota_poll: Checking for FOTA job...
[00:30:22.270,233] <err> net_coap: Unexpected ACK
[00:30:22.270,233] <err> net_coap: Error handling response
+CEREG: 1,"9903","00070EDD",9,,,"00000000","00000110"
+CSCON: 1
[00:30:23.895,111] <inf> nrf_cloud_coap: No pending FOTA job
[00:30:23.895,111] <inf> nrf_cloud_fota_poll: No pending FOTA job
[00:30:24.174,011] <inf> application: Temperature is 23 degrees C
[00:30:25.741,210] <inf> message_queue: Coap send complete
[00:30:34.237,579] <inf> shadow_support_coap: Checking for shadow delta...
[00:30:35.240,539] <inf> shadow_support_coap: Checking again in 1800 seconds
+CSCON: 0		//PSM delay = 20s

[00:40:24.174,072] <inf> application: Temperature is 22 degrees C
+CSCON: 1
[00:40:26.224,670] <inf> message_queue: Coap send complete
+CSCON: 0		//PSM delay = 20s

[00:50:24.174,133] <inf> application: Temperature is 22 degrees C
+CEREG: 1,"9903","09BCBCE7",9,,,"00000000","00000110"
+CSCON: 1
[00:50:25.865,173] <inf> message_queue: Coap send complete
> AT+CPINR="SIM PIN"
+CPINR: "SIM PIN",3
OK
> AT+CPINR="SIM PIN2"
+CPINR: "SIM PIN2",3
OK
> AT+CPINR="SIM PUK"
+CPINR: "SIM PUK",10
OK
> AT+CPINR="SIM PUK2"
+CPINR: "SIM PUK2",10
OK
> AT+CSCON?
+CSCON: 1,1
OK
+CSCON: 0		//PSM delay = 4.5min - Problem

[01:00:23.895,233] <inf> nrf_cloud_fota_poll: Checking for FOTA job...
[01:00:24.174,194] <inf> application: Temperature is 23 degrees C
+CEREG: 1,"9903","00070EDD",9,,,"00000000","00000110"
+CSCON: 1
[01:00:25.744,140] <inf> nrf_cloud_coap: No pending FOTA job
[01:00:25.744,140] <inf> nrf_cloud_fota_poll: No pending FOTA job
[01:00:26.273,193] <inf> message_queue: Coap send complete
[01:00:35.240,692] <inf> shadow_support_coap: Checking for shadow delta...
[01:00:36.287,536] <inf> shadow_support_coap: Checking again in 1800 seconds
+CSCON: 0	//PSM delay = 20s
> 
[01:10:24.174,255] <inf> application: Temperature is 22 degrees C
+CEREG: 1,"9903","09BCBCE7",9,,,"00000000","00000110"
+CSCON: 1
[01:10:25.826,599] <inf> message_queue: Coap send complete
+CSCON: 0	//PSM delay = 4.5min - Problem

Wireshark capture of same events - 10min data rate: missing "RRCConnectionRelease-NB" before 4.5min RRC idle delays at line 308 (3291s), and line 361 (4491s):

8623.cms_Spark_20240903_2_nbiot_10min.pcapng

2) Excerpt of terminal log - 2min data rate. First RRC idle failed after 00:08:29, no PSM idle before the next transmission + forced recovery before each resend (00:10:36):

...
+CSCON: 0		//PSM delay = 20s

[00:08:27.609,222] <inf> application: Temperature is 22 degrees C
+CEREG: 1,"9903","09BCBCE7",9,,,"00000000","00000110"
+CSCON: 1
[00:08:29.732,788] <inf> message_queue: Coap send complete
//Failed - no PSM

	//Wireshark 627..635s
[00:10:27.609,283] <inf> application: Temperature is 24 degrees C
[00:10:30.612,182] <err> net_coap: Timeout in poll, retrying send
+CSCON: 0		//RRC reconnect forced by modem for Tx retry?
+CSCON: 1
[00:10:36.170,806] <inf> message_queue: Coap send complete
//Failed - no PSM

Wireshark capture of same events - 2min data rate: first missing "RRCConnectionRelease-NB" after line 303 (509s)

6052.cms_Spark_20240902_1_nbiot_2min.pcapng

  • Such delayed PSM may be caused by a HPPLMN network search, you may find a couple of articles here in the forum. Before 1.3.5, Nordic was even searching for global "9xxxx" networks in a HPPLMN search. If a HPPLMN network search is the case, depends mainly on your SIM card. That may come with different prio lists and is usually applied, if the device gets a MCC (country code) and has some prio PLMN in it's lists. If you have a nRF9160-DK, nRF9161-DK, or a Thingy:91 you may apply my demo client and execute the sim cmd to see, what you SIM card contains. Otherwise you may apply some "AR+CRSM" commands and decode the result on your own. See modem SIM card cmds to get some impression about that. Or you ask the SIM card provider.

  • Reading your ticket again, I'm not sure, which period you complain about.

    RRC active => RRC idle : that's RAI. 

    (That will be indicated by +CSCON=1 or +CSCON=0)

    RRC idle => psm sleeping: that's psm active timer and the sometimes the HPPLMN network search.

    (That will be XMODEMSLEEPING after +CSCON=0).

    For the RAI:

    Without RAI the modem usually switchs to +CSCON=0 after a short couple of seconds without traffic.

    With RAI, if you use "AT%XRAI=3, with CONFIG_COAP_SEND_CONFIRMABLE=y" that works only if you send one request (I see a couple of requests in the capture, not only one) and no retransmission occurs (that's the case starting with record 199 in the first capture with 10m). Otherwise, it may fail. RAI means your application assists the modem to release the connection. With "AT%XRAI=3" you inform the modem, that you expect 1 response. But if you send a couple of requests, that assistance misleads the modem. The same applies, when a retransmission occurs, because you receive also more than just on response (even if coap does the deduplication for you). May be that causes then pretty large delays. That's out of my experience, though I use RAI coupled to the CoAP transmission state in order to prevent such "miss assistance". E.g. transfer of a couple of messages => disable RAI, enable it with the last one. Rentransmission, disable RAI for that exchange. 

  • Thank you Achim,

    I looked at several posts where HPPLMN has been mentioned before (e.g.  RE: nRF9160-DK - sometimes busy for 1 minute before entering PSM and RE: PSM active time not Correct). My SIM card is not roaming so from the other posts it seems that the problem is not HPPLMN in my case. e.g. The %XMONITOR response shows: Registered home network (not roaming), and Active time  = 0:
    %XMONITOR: 1,"Spark NZ","Spark","53005","9903",9,28,"09BCBCE7",176,9218,52,28,"","00000000","00000110","01011110"

    Thanks also for suggesting your demo client - I'm testing on nRF9160-DK so I might take a look at that (when I get a chance). At the risk of going off topic - I'm still learning into what's needed to support a robust CoAP client (with reliable reconnect + Tx retry behaviour), and I'm not sure which sample client app will provide the best starting point to leverage?

    Our network provider has confirmed RRC inactivity timer = 20s for NB-IoT, and this is the 'normal' time I've seen it take to enable PSM (sleep) after transmission finished - e.g. this is when +CSCON=0 is logged and current drops to idle level (0.5mA with coap.conf build config)

    The problem I described above is that sometimes the current stays = 34mA for 4.5min before +CSON=0 is logged and it enters PSM (sleep) mode. And whenever this happens the wireshark capture shows that "RRCConnectionRelease-NB" is missing from the capture. So for some reason is seems my modem sometimes does not disconnect after the 20s RRC inactivity timer should have expired. I'm hoping that Nordic support will be able to take a look at the log files and comment on the questions I have raised above.

  • Yes sorry - I should have said I am concerned by the time is it is taking for PSM (sleep) mode, not the time to RCC idle. And for my use-case it should normally take 20s - from RRC inactivity timer (20s) + Active timer (0s).

    I understand on your comments regarding RAI. The above problem happens when I leave AT%XRAI=0 (I only mentioned that I once tried AT%XRAI=3 to see if it made a difference, and it did, but I would need to use this feature properly, and I really want to understand why the PSM (sleep) delay can be 4.5min (while AT%XRAI=0).

  • I'm hoping that Nordic support will be able to take a look at the log files and comment on the questions I have raised above.

    I guess, that you will get asked to provide to full "modem-trace", not only the extracted wireshark-capture. The Cellular Monitor provides that trace under "RAW file name" with the ending ".mtrace"

Related