Hi
I have been looking into this for a while and might just be missing something silly but I would appreciate any help. Our application uses a modified nrf cloud multi service sample and sends regular readings from a few sensors but we noticed after a few hours of operation no more readings are sent. We are using an nRF9160 DK.
I have managed to replicate this a few times so I rebuilt the latest sample from v2.6.0 using VS Code and the following settings modifications:
CONFIG_LOCATION_TRACKING=n # Increased logging CONFIG_COAP_LOG_LEVEL_DBG=y CONFIG_LOCATION_LOG_LEVEL_DBG=y CONFIG_NRF_CLOUD_COAP_LOG_LEVEL_DBG=y CONFIG_MULTI_SERVICE_LOG_LEVEL_DBG=y CONFIG_NRF_MODEM_LIB_TRACE=y # Higher sample time reduces time till fail CONFIG_SENSOR_SAMPLE_INTERVAL_SECONDS=30
It was able to send the fake temperature readings for about 30mins before it failed. On most messages I would see a line similar to below indicating that the message
[00:24:11.460,845] <dbg> message_queue: enqueue_device_message: Messages queue status: 0/50 [00:24:11.460,906] <dbg> message_queue: enqueue_device_message: Adding device message to queue [00:24:11.460,937] <dbg> message_queue: consume_device_message: Waiting for valid connection before transmitting device message [00:24:11.460,968] <dbg> message_queue: consume_device_message: Attempting to transmit enqueued device message [00:24:11.460,998] <dbg> message_queue: consume_device_message: Messages remaining in queue: 0 [00:24:11.461,212] <dbg> nrf_cloud_coap_transport: client_transfer: NON POST msg/d2c Content-Format:CBOR, 28 bytes out, Accept:none [00:24:11.470,886] <dbg> nrf_cloud_coap_transport: client_transfer: Sent bf 01 64 54 45 4d 50 02 fb 40 37 33 33 33 33 33 |..dTEMP. .@733333 33 03 1b 00 00 01 8e e7 a9 ea 06 ff |3....... .... +CSCON: 1 [00:24:12.794,952] <dbg> net_coap: recv_response: Received 12 bytes [00:24:12.795,013] <dbg> nrf_cloud_coap_transport: client_callback: result_code:2.01, offset:0x0, len:0x0, last_block:1 [00:24:12.795,013] <dbg> nrf_cloud_coap_transport: client_callback: End of client transfer [00:24:12.795,135] <dbg> message_queue: consume_device_message: Enqueued device message consumed successfully
But when it hangs we normally don't get any messages after the send for example:
[01:53:07.660,156] <dbg> message_queue: enqueue_device_message: Adding device message to queue [01:53:07.668,884] <dbg> nrf_cloud_coap_transport: client_transfer: Sent bf 01 64 54 45 4d 50 02 fb 40 35 c2 8f 5c 28 f5 |..dTEMP. .@5..\(. c3 03 1b 00 00 01 8e e7 73 44 29 ff |........ sD). +CSCON: 1
Below you can see the connection starts but no application data is ever sent and the application never calls the log line in recv_response about receiving bytes.
I have run AT commands on a different unit and it suggests the modem is still working but I would expect the coap client to timeout as opposed to just hang. It have tried waiting but processing the message queue can hang for hours. It is resolved with a board reset but what is most annoying is I cannot see any error logged. Reading through the code in nrf_cloud_coap_transport.c the coap_client_req function seems to have a max timeout of 5 seconds which should cause the message to retry and log an error.
I have attached a modem trace which should contain some details but I am not really sure how to debug this further. The only similar topic I can find on dev zone is quite old and I do not believe to be related: https://devzone.nordicsemi.com/f/nordic-q-a/64392/coap-client-sample---send-command-freezes-after-a-few-iterations
2703.trace-2024-04-16T15-43-06.535Z.mtrace
As this was occurring on a fresh clone of the sample I am not sure it's an issue with our application code but would be happy to try anything you would suggest.
Thanks,
Sam