CoAP Observer loses connection after RRC idle

I have a nRF9151 currently set up to observe a CoAP resource on my server. The initial GET request works, and if I respond with new updates from the server (within a few seconds) everything works fine. If I wait more than ~10 seconds however, the device goes into RRC idle, and then the device becomes forever unreachable (subsequent messages from my server result in a context canceled error, which I assume is due to it no longer being able to reach the device)

I have eDRX and PSM both off. No DTLS. My CoAP thread is just a while(1) loop with nrf_recv() waiting for data in blocking mode.

I've tried messing with various socket options such as NRF_SO_KEEPOPEN, but I get the same behaviour.

I logged out the modem activity but there's nothing unusual there. Interestingly if I'm fast enough, even after RRC idle I can receive new messages, but if I wait too much longer after, then it won't ever arrive. The last RRC idle in the log is where this happened. I waited a few more seconds and got nothing after that.

My question is, what could be causing the connection to get dropped? Is there some way I could log whatever causes this?

[00:00:46.052,978] <inf> nrf_modem: recv() fd 0x0, buf 0x2001f2c0, len 1072, flags 0x0 (blocking)
[00:00:47.831,542] <inf> nrf_modem: RPC_IP_RECVFROM_NTF fd 0x0 (28 bytes)
[00:00:47.831,909] <dbg> nrf_modem: Incoming pkt, data 0x20018ba0, len 28
[00:00:47.832,336] <dbg> nrf_modem: Freeing pkt 0x2001cf6c, data 0x20018ba0, sockaddr 0x2001cf84
Request: Received 11 bytes of payload data

[00:00:47.833,007] <inf> nrf_modem: recv() fd 0x0, buf 0x2001f2c0, len 1072, flags 0x0 (blocking)
[00:00:50.129,119] <dbg> nrf_modem: +CSCON: 0
Modem: RRC idle

[00:00:52.730,926] <dbg> nrf_modem: +CSCON: 1
Modem: RRC connected

[00:00:52.876,800] <inf> nrf_modem: RPC_IP_RECVFROM_NTF fd 0x0 (28 bytes)
[00:00:52.877,197] <dbg> nrf_modem: Incoming pkt, data 0x20018ba0, len 28
[00:00:52.877,593] <dbg> nrf_modem: Freeing pkt 0x2001cf6c, data 0x20018ba0, sockaddr 0x2001cf84
Request: Received 11 bytes of payload data

[00:00:52.878,295] <inf> nrf_modem: recv() fd 0x0, buf 0x2001f2c0, len 1072, flags 0x0 (blocking)
[00:00:54.005,798] <dbg> nrf_modem: rpc_softsim_event_handler   cmd: 3, req_id: 50, data_len: 7
[00:00:54.019,287] <dbg> nrf_modem: nrf_modem_softsim_res       cmd: 3, req_id: 50, data_len: 52
[00:00:57.009,338] <dbg> nrf_modem: +CSCON: 0
Modem: RRC idle

[00:00:57.851,104] <dbg> nrf_modem: +CSCON: 1
Modem: RRC connected

[00:00:58.009,918] <inf> nrf_modem: RPC_IP_RECVFROM_NTF fd 0x0 (28 bytes)
[00:00:58.010,314] <dbg> nrf_modem: Incoming pkt, data 0x20018ba0, len 28
[00:00:58.010,711] <dbg> nrf_modem: Freeing pkt 0x2001cf6c, data 0x20018ba0, sockaddr 0x2001cf84
Request: Received 11 bytes of payload data

[00:00:58.011,413] <inf> nrf_modem: recv() fd 0x0, buf 0x2001f2c0, len 1072, flags 0x0 (blocking)
[00:01:00.311,920] <inf> nrf_modem: RPC_IP_RECVFROM_NTF fd 0x0 (28 bytes)
[00:01:00.312,316] <dbg> nrf_modem: Incoming pkt, data 0x20018ba0, len 28
[00:01:00.312,744] <dbg> nrf_modem: Freeing pkt 0x2001cf6c, data 0x20018ba0, sockaddr 0x2001cf84
Request: Received 11 bytes of payload data

[00:01:00.313,415] <inf> nrf_modem: recv() fd 0x0, buf 0x2001f2c0, len 1072, flags 0x0 (blocking)
[00:01:01.911,987] <inf> nrf_modem: RPC_IP_RECVFROM_NTF fd 0x0 (28 bytes)
[00:01:01.912,353] <dbg> nrf_modem: Incoming pkt, data 0x20018ba0, len 28
[00:01:01.912,780] <dbg> nrf_modem: Freeing pkt 0x2001cf6c, data 0x20018ba0, sockaddr 0x2001cf84
Request: Received 11 bytes of payload data

[00:01:01.913,452] <inf> nrf_modem: recv() fd 0x0, buf 0x2001f2c0, len 1072, flags 0x0 (blocking)
[00:01:04.209,564] <dbg> nrf_modem: +CSCON: 0
Modem: RRC idle

Parents
  • Indeed I discovered two issues: The first one was on my server side code. I had to enable a keep-alive option in order to prevent the CoAP library from terminating the socket.

    After that, the timeout would then happen at ~120 seconds. I suppose that's the MNO NAT timeout.

    I had some success sending empty UDP packets periodically to keep the socket open, but over many devices the cost of doing this would be quite expensive.

    So, rather than using CoAP observer, we've now just opened a UDP listener on the device side so that the server can send messages directly. This actually turned out to be a cleaner solution on both sides.

Reply
  • Indeed I discovered two issues: The first one was on my server side code. I had to enable a keep-alive option in order to prevent the CoAP library from terminating the socket.

    After that, the timeout would then happen at ~120 seconds. I suppose that's the MNO NAT timeout.

    I had some success sending empty UDP packets periodically to keep the socket open, but over many devices the cost of doing this would be quite expensive.

    So, rather than using CoAP observer, we've now just opened a UDP listener on the device side so that the server can send messages directly. This actually turned out to be a cleaner solution on both sides.

Children
  • The NAT timeout isn't an issue of CoAP. Using UDP will not prevent to send the empty messages.

    I don't know, why you need the server to initiate sending data to the device, in other cases it's OK to do this as response, when the device initiates the communication.

    Anyway, if you really need the server to initiate the communication, then eDRX and a VPN to your M(V)NO will be your friend (if both is supported).

Related