nRF9160 LwM2M client sample: occasional disconnections. State machine fails to catch errors and leaves devices disconnected

I am working on a custom device connected to the ThingsBoard LwM2M server (currently utilising the outdated Leshan v2.0.0-M5 release).

I have another ticket pending, which is related, but not the same issue,

Please let me know if you have had any experience with these or similar issues. I'm happy to provide more insight where possible. I have added more logging since analysing the two following cases, so that might also help to see what happened as soon as the next device fails.

every few days ~20 percent of our devices lose connection. I have documented two cases in which the connection has been lost and hope that someone might have had a similar issue in the past and could point me in the right direction. Due to the sporadic nature of the issue I have not been able to capture a modem trace in which the issue occured.

However, the device regularly prints timestamps of the latest states and rd client events, so we have some idea where it fails.

FIRST CASE:

1. The device ran as expected, performing a registration update every 3 minutes.

2. At 1d18h43m43s of uptime, it performed its last successful registration update.

3. 3 minutes later it tried to perform a registration update which timed out after ~2 min and 42 seconds

4. This timeout triggered the LWM2M_RD_CLIENT_EVENT_DISCONNECT event

5. Which triggers the client_state START which in turn calls the lwm2m_rd_client_start() function and sets the client_state to CONNECTING.

6. For the next 21 hours, the device was stuck in the same CONNECTING client_state until the time of documenting the issue.

7. Also, the device is stuck in a permanent power draw of ~4mA, which is way too much of course (normal power draw with UART for logging is ~800uA, without UART during PSM is ~15uA)

--> could it be the lwm2m_rd_client_start() function that fails?

8. Here you can see a log of the state of the device:

[26:41:51.260,467] <dbg> app_lwm2m_client: debug_connection_work_cb: current client_state: 1
[26:41:51.260,498] <dbg> app_lwm2m_client: debug_connection_work_cb: current retry_state: 0
[26:41:51.260,528] <dbg> app_lwm2m_client: debug_connection_work_cb: last lwm2m_rd_client_event: 9
[26:41:51.260,559] <dbg> app_lwm2m_client: debug_connection_work_cb: last lte_lc_nw_reg_status: 1
[26:41:51.260,559] <dbg> app_lwm2m_client: debug_connection_work_cb: current observer_watchdog_counter: 0
[26:41:51.260,589] <dbg> app_lwm2m_client: debug_connection_work_cb: current_uptime: 227183260
[26:41:51.260,620] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_reg_complete: 42676
[26:41:51.260,620] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_reg_timeout: 154166930
[26:41:51.260,650] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_reg_update: 154004006
[26:41:51.260,681] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_reg_update_complete: 153823619
[26:41:51.260,711] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_disconnect: 154166930
[26:41:51.260,742] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_rx_off: 153827940

SECOND CASE:

1. Device does regular registration updates in the same 3 minute interval with no issues

2. After ~21 hours and 5 minutes, the last successful registration update is performed

3. 8 minutes later, a LWM2M_RD_CLIENT_EVENT_NETWORK_ERROR is triggered

4. The NETWORK_ERROR event triggers the NETWORK_ERROR client_state with the reconnect flag set to true.

5. The NETWORK_ERROR client_state tiggers the START client_state that will begin after all the functions within the NETWORK_ERROR client_state case are executed.

6. NETWORK_ERROR calls the lwm2m_rd_client_stop() function, and if reconnect is true (which was set to true by the NETWORK_ERROR event!), executes the lte_lc_offline() function followed by modem_connect() in order to restart the modem.

7. client_state remains START and never seems to make it into the START switch case itself, where it would be changed to CONNECTING. Remains in same mode for one day and 19 hours until the point of documentation.

log:

[27:49:51.262,756] <dbg> app_lwm2m_client: debug_connection_work_cb: current client_state: 0
[27:49:51.262,756] <dbg> app_lwm2m_client: debug_connection_work_cb: current retry_state: 0
[27:49:51.262,786] <dbg> app_lwm2m_client: debug_connection_work_cb: last lwm2m_rd_client_event: 12
[27:49:51.262,817] <dbg> app_lwm2m_client: debug_connection_work_cb: last lte_lc_nw_reg_status: 0
[27:49:51.262,817] <dbg> app_lwm2m_client: debug_connection_work_cb: current observer_watchdog_counter: 0
[27:49:51.262,847] <dbg> app_lwm2m_client: debug_connection_work_cb: current_uptime: 231263262
[27:49:51.262,878] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_reg_complete: 34343
[27:49:51.262,908] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_reg_update: 76087533
[27:49:51.262,908] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_reg_update_complete: 75907081
[27:49:51.262,939] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_rx_off: 75909962
[27:49:51.262,969] <dbg> app_lwm2m_client: debug_connection_work_cb: ts_last_network_error: 76410480

Power draw of constant 7 mA

Thanks for taking the time Slight smile

Parents Reply Children
No Data
Related