Long PSM delay problem in NB-IoT mode

While testing CoAP with NB-IoT I normally get PSM (sleep) delay = 20s before PSM mode is set (0.5mA with UART enabled), but after awhile PSM delay increases to ~4.5min (at 34mA) after each transmission has finished. I enabled a modem trace and the Wireshark capture shows me that when PSM delay = 20s (from RRC Inactivity timer=20s + Active time=0) then "RRCConnectionRelease-NB" message is logged just before +CSCON=0, but when the delay is ~4.5min then "RRCConnectionRelease-NB" message is missing and +CSCON=0 takes 4.5min instead of 20s.

I'm using the sample nRF Cloud multi-service app with SDK 2.6.1, Modem v1.3.6, connected to nRF Cloud (using a Spark NZ NB-IoT SIM). After this problem has happened then it sometimes recovers after next data transmission, provided I'm using a slow enough data rate (e.g. 10min). If I test this with a 2min data rate then doesn't seem to recover. In this failed mode the first Tx request always fails (probably the modem thinks it's still connected), and the re-transmit only works after it's logged a +CSCON=0, +CSCON=1 cycle (a forced reconnection recovery by the modem?).  I haven't seen this problem at all with a LTE-M SIM (where PSM delay = 10, from the network RRC inactivity timer).

As a work around I tried enable RAI (e.g. AT%XRAI=3, with CONFIG_COAP_SEND_CONFIRMABLE=y), and found it nearly always helped set PSM immediately, but not always (e.g. still got sometimes got RRC idle delay = 4.5min delay).  I've disabled location reporting in my build, but the same problem happened with the default config (overlay_coap.conf), and also when I tested using the low power build (overlay_min_coap.conf).

Some questions:

  1. Cellular coverage is normally reliable for me - so could the missing "RRCConnectionRelease-NB" be caused by: a network fault (failed to send it), or a comms error (not received), or an nRF9160 (modem or app) problem?  Are there any suggestions for how to avoid or work-around this problem?

  2. I'm curious where the 4.5min delay comes from - when +CSCON: 0 is set after this delay (with missing "RRCConnectionRelease-NB") - does the modem implement this timeout? (e.g. to ensure PSM gets set, and avoid wasting ~34mA forever)

  3. When the problem has happened - does the modem or the SDK do the recovery sequence (+CSCON:0, +CSCON:1 cycle), seen in Terminal log (2) below - for the 2min data rate, to re-establish comms?

Logs with the "RRCConnectionRelease-NB" missing problem:

1) Terminal log - 10min data rate: 4.5min RCC idle delays after 00:50:25 and 01:10:25

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
*** Booting nRF Connect SDK v3.5.99-ncs1-1 ***
[00:00:00.257,904] <inf> main: nRF Cloud multi-service sample has started, version: 0.8.3, protocol: CoAP
[00:00:00.258,026] <inf> cloud_connection: Enabling connectivity...
[00:00:00.489,746] <inf> nrf_modem_lib_trace: Trace thread ready
[00:00:00.497,375] <inf> nrf_modem_lib_trace: Trace level override: 2
[00:00:00.536,926] <inf> cloud_connection: Setting up nRF Cloud library...
[00:00:00.538,116] <inf> cloud_connection: Waiting for network ready...
+CEREG: 2,"951B","00070E79",7
+CSCON: 1
+CNEC_ESM: 111,0
+CEREG: 2,"951B","00070E79",7,0,15
+CSCON: 0
+CEREG: 2
+CEREG: 2,"9903","00070EDD",9
+CSCON: 1
%XTIME: ,"42902032929484",
+CGEV: ME PDN ACT 0,0
+CNEC_ESM: 50,0
+CEREG: 1,"9903","00070EDD",9,,,"00000000","00000110"
[00:00:12.989,868] <inf> cloud_connection: Network connectivity gained!
[00:00:13.990,051] <inf> cloud_connection: Network is ready
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Wireshark capture of same events - 10min data rate: missing "RRCConnectionRelease-NB" before 4.5min RRC idle delays at line 308 (3291s), and line 361 (4491s):

8623.cms_Spark_20240903_2_nbiot_10min.pcapng

2) Excerpt of terminal log - 2min data rate. First RRC idle failed after 00:08:29, no PSM idle before the next transmission + forced recovery before each resend (00:10:36):

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
...
+CSCON: 0 //PSM delay = 20s
[00:08:27.609,222] <inf> application: Temperature is 22 degrees C
+CEREG: 1,"9903","09BCBCE7",9,,,"00000000","00000110"
+CSCON: 1
[00:08:29.732,788] <inf> message_queue: Coap send complete
//Failed - no PSM
//Wireshark 627..635s
[00:10:27.609,283] <inf> application: Temperature is 24 degrees C
[00:10:30.612,182] <err> net_coap: Timeout in poll, retrying send
+CSCON: 0 //RRC reconnect forced by modem for Tx retry?
+CSCON: 1
[00:10:36.170,806] <inf> message_queue: Coap send complete
//Failed - no PSM
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Wireshark capture of same events - 2min data rate: first missing "RRCConnectionRelease-NB" after line 303 (509s)

6052.cms_Spark_20240902_1_nbiot_2min.pcapng