Help with LTE connection after loss of connection

We have noticed that several of our nRF9160 in production have problem with reconnecting to the LTE-M network after temporarily losing connection. We have tried reproducing this issue by deactivating the sim for about 10 seconds and then activating it again. After the default six retries it gives up and stop trying to reconnect to the network. the only way we can achieve a connection is if we do a sys_reboot() or a hard reboot with watchdog. Why is the reconnect attempts not enough? Why do we have to force a reboot? Attached are relevant logs.

Parents
  • More info: 

    • We are using NSC version 2.6.0 and mfw 1.3.6.
    • The sims are global (EU, Nordics, Baltics), see image of subscription details.We have mainly worked with Telia sims but some of our clients have tried using other with no improvement. 

  • Hi,

    Could you  and  please confirm that all your questions and logs are related to the same application (based on lwm2m_client in NCS v2.6.1)?

    Could you please provide more information about your application? What exactly do you try to achieve? What does your application do?

    We have noticed that several of our nRF9160 in production have problem with reconnecting to the LTE-M network after temporarily losing connection.

    Can you provide more information on how devices lose connection? Do they work normally and suddenly lose connection? How often does this happen and on how many devices? Where are your failing devices located?

    We have tried reproducing this issue by deactivating the sim for about 10 seconds and then activating it again. After the default six retries it gives up and stop trying to reconnect to the network.

    Why do you think that the issue might be related to SIM? How do you do deactivation/activation of the SIM?

    Matias Marti said:
    It looks like the EXCHANGE_LIFETIME is set to 247s (4 minutes 7s) in the code here. Is there any way we can modify this value? Or is there another way to "give up" the exchange earlier?

    Have you tried changing the value directly in the code?

    Best regards,
    Dejan

  • Hello  ,

    We are using ncs version 2.6.0. We will try to set up modem trace using Cellular Monitor and Wireshark and provide those logs.

    The goal of our application is to get as close to 100% uptime as possible. We have doors, lockers, and similar products across Europe. In order to be able to open these, they need to be able to receive and handle requests any time of the day and night. We want to be able to send requests to our devices 24/7, and sometimes the boards become unavailable for some minutes.

    Most of our devices are in the Nordics, and we have some devices (usually with poor connectivity) that fall asleep 15-20 times/day or even up to 50. It is almost 100% of the time that the devices sleep for 5 minutes (our watchdog triggers a powercycle if there is no successful register update for 5 min). Our IoT server considers a device to be asleep whenever there has been no interaction either way between the board and the server for 93s.

    The reason we where experimenting with the SIM (deactivating and re-activating the subscription) was to be able to try to reproduce the issue we have seen among devices that are part of our fleet. However, this deactivation triggers a different sort of error....

    Here is a screenshot of the change we now have made directly in the code. Just to force the same kind of error, We set the timeout to 1s as in the picture:

    And the resulting logs:

    [1;32muart:~$ [m[8D[J[00:52:14.181,915] [0m<dbg> app_lwm2m_client: rd_client_event: Registration update complete[0m
    [1;32muart:~$ [m[8D[J[00:52:14.181,945] [0m<dbg> app_lwm2m_client: watchdog_Kick: Watchdog feed ok![0m
    [1;32muart:~$ [m[8D[J[00:52:14.182,037] [0m<inf> net_lwm2m_rd_client: Update Done[0m
    [1;32muart:~$ [m[8D[J[1;32muart:~$ [m[8D[J[00:52:33.182,037] [0m<dbg> app_lwm2m_client: rd_client_event: Registration update started[0m
    [1;32muart:~$ [m[8D[J[1;32muart:~$ [m[8D[J[00:52:34.284,301] [0m<dbg> app_lwm2m_client: rd_client_event: Registration update failure![0m
    [1;32muart:~$ [m[8D[J[00:52:34.292,266] [0m<inf> net_lwm2m_rd_client: RD Client started with endpoint '<HIDDEN_DEVICE_ID>' with client lifetime 86400 using server object 0[0m
    [1;32muart:~$ [m[8D[J[00:52:34.619,720] [0m<inf> net_lwm2m_engine: Connected, sock id 2[0m
    [1;32muart:~$ [m[8D[J[00:52:34.622,833] [0m<dbg> app_lwm2m_client: rd_client_event: Disconnected[0m
    [1;32muart:~$ [m[8D[J[00:52:34.622,955] [0m<inf> net_lwm2m_rd_client: Stop LWM2M Client: <HIDDEN_DEVICE_ID>[0m
    [1;32muart:~$ [m[8D[J[00:52:34.708,404] [0m<inf> app_lwm2m_client: Client connect to server[0m
    [1;32muart:~$ [m[8D[J[00:52:34.708,465] [0m<inf> net_lwm2m_rd_client: Start LWM2M Client: <HIDDEN_DEVICE_ID>[0m
    [1;32muart:~$ [m[8D[J[00:52:34.709,106] [0m<inf> net_lwm2m_rd_client: RD Client started with endpoint '<HIDDEN_DEVICE_ID>' with client lifetime 86400 using server object 0[0m
    [1;32muart:~$ [m[8D[J[00:52:34.835,296] [0m<inf> app_lwm2m_client: LwM2M is connecting to server[0m
    [1;32muart:~$ [m[8D[J[00:52:35.023,590] [0m<inf> net_lwm2m_engine: Connected, sock id 2[0m
    [1;32muart:~$ [m[8D[J[1;32muart:~$ [m[8D[J[00:52:35.173,797] [0m<dbg> app_lwm2m_client: rd_client_event: Registration complete[0m
    [1;32muart:~$ [m[8D[J[00:52:35.173,950] [0m<inf> net_lwm2m_rd_client: Registration Done (EP='<HIDDEN>')[0m
    [1;32muart:~$ [m[8D[J[00:52:35.357,604] [0m<inf> app_lwm2m_client: LwM2M is connected to server[0m
    [1;32muart:~$ [m[8D[J[00:52:35.357,666] [0m<inf> app_lwm2m_client: Obtained date-time from modem[0m
    [1;32muart:~$ [m[8D[J[1;32muart:~$ [m[8D[J[00:52:54.173,492] [0m<dbg> app_lwm2m_client: rd_client_event: Registration update started[0m
    [1;32muart:~$ [m[8D[J[00:52:54.586,181] [0m<inf> net_lwm2m_rd_client: Update callback (code:2.4)[0m
    [1;32muart:~$ [m[8D[J[00:52:54.586,212] [0m<dbg> app_lwm2m_client: rd_client_event: Registration update complete[0m

    As we can see, we get the "Registration update failure!" whenever it takes longer than 1s for the registration update to complete. After the failure, the board is back to normal after a few seconds and succeeds in making a new registration update after ca 20s again as if nothing happened. This is the sort of behavior we would actually like to see whenever one of these occasional "pings" are taking too long. And by "too long" in this case, it would be 2 or 3 seconds for us and not 247s.

    A question: What is the expected behavior of the code in zephyr/subsys/net/lib/lwm2m/lwm2m_rd_client.c if we were to set a sleep on our IoT server for e.g. 3 minutes? Would the client wait for a response for the whole time and therefore block other registration updates that normally occur every 20s?

  • A question: What is the expected behavior of the code in zephyr/subsys/net/lib/lwm2m/lwm2m_rd_client.c if we were to set a sleep on our IoT server for e.g. 3 minutes? Would the client wait for a response for the whole time and therefore block other registration updates that normally occur every 20s?

    Can you clarify this question a bit. I really don't understand.

    What goes to sleep?

    The Update period and lifetime can be configured in the client. It is documented in https://docs.zephyrproject.org/latest/connectivity/networking/api/lwm2m.html#configuring-lifetime-and-activity-period

    Then when client is registered, what it does it documented in https://docs.zephyrproject.org/latest/connectivity/networking/api/lwm2m.html#lwm2m-engine-and-application-events

    Basically it constantly loops between those four or three states, green ones. Depending on if the QUEUE mode is used or not. If 100% connectivity is assumed, QUEUE mode is not relevant.

    When LwM2M Update message is send, the CoAP layer retries the message until response is received. If not, then it should go into NETWORK_ERROR state and start recovery but trying to re-handshake the DTLS connection.

    However, NAT timeouts can happen. If NAT timeout is 20s and your update period is 20s, it might cause IP address to change randomly. So it might break the connectivity. DTLS Connection ID helps this, but it also needs server support.

    It helps if you can capture a Wireshark log of the event. It helps to debug if you know whether the connectivity issue is in IP layer, DTLS layer or CoAP layer.

  • Thank you for your answer,  .

    Can you clarify this question a bit. I really don't understand.

    What goes to sleep?

    By sleep, I meant that we make sure our Leshan server waits for 3 minutes before it provides a response for registration updates. This is a way for us to try to reproduce the behavior we are seeing in our boards.

    The Update period and lifetime can be configured in the client. It is documented

    We have taken a look at these values, but the issue is that one single update is taking more than the usual 0.8s, so how would changing these configuration values change the behavior we are seeing?

    If 100% connectivity is assumed, QUEUE mode is not relevant.

    We are not assuming 100% connectivity. We are using QUEUE on our Leshan server to detect when a device becomes unavailable.


    When LwM2M Update message is send, the CoAP layer retries the message until response is received.

    Is there any way that we can set a timeout for registration updates? I mean, in the sense that our board should "give up" on a registration update after 3-4s instead of 247s.

    It helps if you can capture a Wireshark log of the event.

    We will try to provide you with Wireshark logs shortly.

  • OK. Now I understand. So the response for this question:

    What is the expected behavior of the code in zephyr/subsys/net/lib/lwm2m/lwm2m_rd_client.c if we were to set a sleep on our IoT server for e.g. 3 minutes?


    So in this case, the LwM2M engine goes into UPDATE_REGISTRATION state. Sends a CoAP message. Goes into UPDATE_SENT state and waits for EXCHANGE_LIFETIME seconds on that state.

    During that time lwm2m_engine.c:socket_loop() ocassionally checks if the CoAP message needs retransmission. If CONFIG_COAP_INIT_ACK_TIMEOUT_MS has passed, message is retransmitted. It will be retransmitted total of CONFIG_COAP_MAX_RETRANSMIT times. Every time the Ack timeout is increased by CONFIG_COAP_BACKOFF_PERCENT.

    If response is not received until max retries are send and ack times out, CoAP layer call message timeout callback, which in this case should be lwm2m_rd_client.c:do_update_timeout_cb().
    If you don't see this callback triggering, you might have too long CoAP timeouts configured. State transitions should never timeout. That is just a final fallback mechanism if all else fails.

    update_timeout_cb() would trigger full registration (DO_REGISTRATION state) to recover.

    We have taken a look at these values, but the issue is that one single update is taking more than the usual 0.8s, so how would changing these configuration values change the behavior we are seeing?

    If you want to react a CoAP failures faster, reconfigure those mentioned CONFIG_COAP timeout values. Just be aware that LTE network might not be fast enough to respond.

    Also, if you need to be fast to respond, make sure that you don't use PSM power saving. And if eDRX is used, configure the periods to shortest possible, or disable it. Not all networks allow you to do short eDRX periods.

    We are not assuming 100% connectivity. We are using QUEUE on our Leshan server to detect when a device becomes unavailable.

    This is unreliable as Leshan is just waiting for MAX_TRANSMIT_WAIT(93s) as specified in CoAP+LwM2M specs. This does not match cellular networks where NAT timeout will happen much sooner, sometimes in 20 seconds.

    You need device to actively "ping" the network to keep the NAT mapping active. Also you need to configure or disable LTE powersaving modes to maintain reachability in short timeout.

    Is there any way that we can set a timeout for registration updates? I mean, in the sense that our board should "give up" on a registration update after 3-4s instead of 247s.

    LwM2M engine uses those mentioned CONFIG_COAP values. So all CoAP confirmable messages use same values. I don't see why it would make sense to modify these per message type. Any CoAP confirmable message refreshes the NAT mappings and allows server to communicate.

    Also, I think it is not reasonable to expect latency less than 4s in LTE network. Sometimes if the connection is idle for more that 5 seconds, the LTE network tears down the RRC connection. When next message is to be send, modem needs to establish this RRC connection with the network first. It might take time.
    Therefore in our sample, for LTE-M we use
    CONFIG_COAP_INIT_ACK_TIMEOUT_MS=4000

    And for Nb-IOT we use
    CONFIG_COAP_INIT_ACK_TIMEOUT_MS=15000

    If you need to react faster, maybe the BACKOFF_PERCENT and MAX_RETRANSMIT can be modified.
Reply
  • OK. Now I understand. So the response for this question:

    What is the expected behavior of the code in zephyr/subsys/net/lib/lwm2m/lwm2m_rd_client.c if we were to set a sleep on our IoT server for e.g. 3 minutes?


    So in this case, the LwM2M engine goes into UPDATE_REGISTRATION state. Sends a CoAP message. Goes into UPDATE_SENT state and waits for EXCHANGE_LIFETIME seconds on that state.

    During that time lwm2m_engine.c:socket_loop() ocassionally checks if the CoAP message needs retransmission. If CONFIG_COAP_INIT_ACK_TIMEOUT_MS has passed, message is retransmitted. It will be retransmitted total of CONFIG_COAP_MAX_RETRANSMIT times. Every time the Ack timeout is increased by CONFIG_COAP_BACKOFF_PERCENT.

    If response is not received until max retries are send and ack times out, CoAP layer call message timeout callback, which in this case should be lwm2m_rd_client.c:do_update_timeout_cb().
    If you don't see this callback triggering, you might have too long CoAP timeouts configured. State transitions should never timeout. That is just a final fallback mechanism if all else fails.

    update_timeout_cb() would trigger full registration (DO_REGISTRATION state) to recover.

    We have taken a look at these values, but the issue is that one single update is taking more than the usual 0.8s, so how would changing these configuration values change the behavior we are seeing?

    If you want to react a CoAP failures faster, reconfigure those mentioned CONFIG_COAP timeout values. Just be aware that LTE network might not be fast enough to respond.

    Also, if you need to be fast to respond, make sure that you don't use PSM power saving. And if eDRX is used, configure the periods to shortest possible, or disable it. Not all networks allow you to do short eDRX periods.

    We are not assuming 100% connectivity. We are using QUEUE on our Leshan server to detect when a device becomes unavailable.

    This is unreliable as Leshan is just waiting for MAX_TRANSMIT_WAIT(93s) as specified in CoAP+LwM2M specs. This does not match cellular networks where NAT timeout will happen much sooner, sometimes in 20 seconds.

    You need device to actively "ping" the network to keep the NAT mapping active. Also you need to configure or disable LTE powersaving modes to maintain reachability in short timeout.

    Is there any way that we can set a timeout for registration updates? I mean, in the sense that our board should "give up" on a registration update after 3-4s instead of 247s.

    LwM2M engine uses those mentioned CONFIG_COAP values. So all CoAP confirmable messages use same values. I don't see why it would make sense to modify these per message type. Any CoAP confirmable message refreshes the NAT mappings and allows server to communicate.

    Also, I think it is not reasonable to expect latency less than 4s in LTE network. Sometimes if the connection is idle for more that 5 seconds, the LTE network tears down the RRC connection. When next message is to be send, modem needs to establish this RRC connection with the network first. It might take time.
    Therefore in our sample, for LTE-M we use
    CONFIG_COAP_INIT_ACK_TIMEOUT_MS=4000

    And for Nb-IOT we use
    CONFIG_COAP_INIT_ACK_TIMEOUT_MS=15000

    If you need to react faster, maybe the BACKOFF_PERCENT and MAX_RETRANSMIT can be modified.
Children
  • LwM2M engine uses those mentioned CONFIG_COAP values. So all CoAP confirmable messages use same values. I don't see why it would make sense to modify these per message type. Any CoAP confirmable message refreshes the NAT mappings and allows server to communicate.

    Thank you for your answer.

    The reason it would make sense, for us at least, to modify these CONFIG_COAP values per message type is the following:

    We are satisfied with how lwm2m + COAP is working in all aspects apart from these 4min7s registation update timeouts. Here is some more concrete context:

    Let us suppose 1 out of every 1000 (0,1%) of all registration updates of a device end up reaching the default timeout. With a ping every 20s (180 pings/h), the board will become unreachable (i.e. unusable) for a whole 4min7s every 5h34min if these timeouts are evenly distributed.

    The issue we are dealing with is specific to the registration updates, and therefore we would like the solution to be specific to these requests. By changing the CONFIG_COAP values globally, we will be affecting all other lwm2m/COAP requests, which is something we rather not do.

    Maybe I am missing something, but why would we want to change the timeout for every single type of request when it is only one specific type of request that is problematic for us?

  • The issue we are dealing with is specific to the registration updates, and therefore we would like the solution to be specific to these requests.

    I would like to disagree with this claim. If you think about the communication layers


    When LwM2M messages are not going through, then there is problem on the CoAP layer. If CoAP messages are not going through, there is problem in DTLS layer. And same goes to DTLS; UDP, IP, etc..

    So if you cannot see response to messages on certain level, you cannot expect the layer below it works. On all confirmable CoAP messages, the server must respond. If it denies the request, it send either CoAP RESET or respond with a valid error code.

    All LwM2M messages are confirmable CoAP messages, so there must be a response.

    If you cannot see response for LwM2M Update message, then there must be a problem with the communication channel, either DTLS or UDP.

    • CoAP can only fail, if any of the layer below fails or there is too much packet loss.
    • DTLS can fail, if server has dropped the connection and is ignoring our messages.
    • UDP can fail if NAT mapping times out and server is therefore ignoring our messages.
    • IP and all layers below it can fail on various reasons.

    So my opinion is that registration failures are symptoms, not the root cause. If you tweak the solution around it, you might just hide the problem for now and cause issues somewhere else.

    Wireshark capture might reveal what the issue was.

    In the mean time try

    CONFIG_LTE_PSM_REQ=n
    CONFIG_LTE_EDRX_REQ=n
    CONFIG_LWM2M_UPDATE_PERIOD=20
    CONFIG_LWM2M_DTLS_CID=y
    CONFIG_LWM2M_TLS_SESSION_CACHING=y
    CONFIG_LWM2M_RD_CLIENT_LISTEN_AT_IDLE=y

    And make sure you have latest modem firmware so DTLS CID is able to work, also verify from Wireshark that Leshan also supports it.

  • So today, Tue 3 Sep around 08:33 GMT, we captured these logs through Wireshark:


    "457","136.143577","","","AT","16","Rcvd AT Command: OK "
    "458","136.144005","","","AT","22","Sent AT Command: AT+CSCON=1"
    "459","136.144005","","","AT","16","Rcvd AT Command: OK "
    "460","136.144401","","","AT","21","Sent AT Command: AT+CFUN=1"
    "461","136.175651","","","AT","16","Rcvd AT Command: OK "
    "462","142.327901","","","AT","30","Sent AT Command: AT%XMODEMTRACE=1,2"
    "463","142.328725","","","AT","16","Rcvd AT Command: OK "
    "464","142.335225","","","AT","34","Sent AT Command: AT%XSYSTEMMODE=1,0,0,0"
    "465","142.336934","","","AT","16","Rcvd AT Command: OK "
    "466","142.337330","","","AT","46","Sent AT Command: AT+CPSMS=1,,,""00000110"",""00001111"""
    "467","142.337453","","","AT","16","Rcvd AT Command: OK "
    "468","142.338002","","","AT","19","Rcvd AT Command: ERROR "
    "469","142.338399","","","AT","32","Sent AT Command: AT+CEDRXS=2,4,""0001"""
    "470","142.338490","","","AT","16","Rcvd AT Command: OK "
    "471","142.338887","","","AT","28","Sent AT Command: AT%XPTW=4,""0000"""
    "472","142.338978","","","AT","16","Rcvd AT Command: OK "
    "473","142.339345","","","AT","32","Sent AT Command: AT+CEDRXS=2,5,""0000"""
    "474","142.339375","","","AT","16","Rcvd AT Command: OK "
    "475","142.339772","","","AT","28","Sent AT Command: AT%XPTW=5,""0000"""
    "476","142.339802","","","AT","16","Rcvd AT Command: OK "
    "477","142.340321","","","AT","20","Sent AT Command: AT%RAI=0"
    "478","142.340352","","","AT","16","Rcvd AT Command: OK "
    "479","142.340748","","","AT","20","Sent AT Command: AT+CMEE?"
    "480","142.340779","","","AT","26","Rcvd AT Command: +CMEE: 0 OK "
    "481","142.347340","","","AT","21","Sent AT Command: AT+CMEE=1"
    "482","142.347371","","","AT","16","Rcvd AT Command: OK "
    "483","142.347798","","","AT","19","Sent AT Command: AT%CMNG"
    "484","142.350331","","","AT","29","Rcvd AT Command: +CME ERROR: 513 "
    "485","142.350728","","","AT","21","Sent AT Command: AT+CMEE=0"
    "486","142.350758","","","AT","16","Rcvd AT Command: OK "
    "487","142.351369","","","AT","19","Sent AT Command: AT+CGSN"
    "488","142.351430","","","AT","33","Rcvd AT Command: 350457792623843 OK "
    "489","142.352132","","","AT","24","Sent AT Command: AT%HWVERSION"
    "490","142.352254","","","AT","46","Rcvd AT Command: %HWVERSION: nRF9160 SIAA B1A OK "
    "491","142.392537","","","AT","20","Sent AT Command: AT+CMEE?"
    "492","142.392537","","","AT","26","Rcvd AT Command: +CMEE: 0 OK "
    "493","142.392995","","","AT","21","Sent AT Command: AT+CMEE=1"
    "494","142.392995","","","AT","16","Rcvd AT Command: OK "
    "495","142.393452","","","AT","19","Sent AT Command: AT%CMNG"
    "496","142.422169","","","AT","16","Rcvd AT Command: OK "
    "497","142.422627","","","AT","21","Sent AT Command: AT+CMEE=0"
    "498","142.422627","","","AT","16","Rcvd AT Command: OK "
    "499","142.423054","","","AT","20","Sent AT Command: AT+CMEE?"
    "500","142.423085","","","AT","26","Rcvd AT Command: +CMEE: 0 OK "
    "501","142.423482","","","AT","21","Sent AT Command: AT+CMEE=1"
    "502","142.423512","","","AT","16","Rcvd AT Command: OK "
    "503","142.423939","","","AT","19","Sent AT Command: AT%CMNG"
    "504","142.452687","","","AT","16","Rcvd AT Command: OK "
    "505","142.453145","","","AT","21","Sent AT Command: AT+CMEE=0"
    "506","142.453145","","","AT","16","Rcvd AT Command: OK "
    "507","142.453572","","","AT","20","Sent AT Command: AT+CMEE?"
    "508","142.453602","","","AT","26","Rcvd AT Command: +CMEE: 0 OK "
    "509","142.460164","","","AT","21","Sent AT Command: AT+CMEE=1"
    "510","142.460194","","","AT","16","Rcvd AT Command: OK "
    "511","142.460591","","","AT","19","Sent AT Command: AT%CMNG"
    "512","142.489369","","","AT","16","Rcvd AT Command: OK "
    "513","142.489827","","","AT","21","Sent AT Command: AT+CMEE=0"
    "514","142.489827","","","AT","16","Rcvd AT Command: OK "
    "515","142.526112","","","AT","20","Sent AT Command: AT+CMEE?"
    "516","142.526112","","","AT","26","Rcvd AT Command: +CMEE: 0 OK "
    "517","142.532704","","","AT","21","Sent AT Command: AT+CMEE=1"
    "518","142.532735","","","AT","16","Rcvd AT Command: OK "
    "519","142.533192","","","AT","19","Sent AT Command: AT%CMNG"
    "520","142.561970","","","AT","16","Rcvd AT Command: OK "
    "521","142.562428","","","AT","21","Sent AT Command: AT+CMEE=0"
    "522","142.562428","","","AT","16","Rcvd AT Command: OK "
    "523","142.668873","","","AT","21","Sent AT Command: AT+CEREG?"
    "524","142.668965","","","AT","29","Rcvd AT Command: +CEREG: 0,4 OK "
    "525","142.669392","","","AT","20","Sent AT Command: AT+CFUN?"
    "526","142.669453","","","AT","26","Rcvd AT Command: +CFUN: 0 OK "
    "527","142.669881","","","AT","22","Sent AT Command: AT+CEREG=5"
    "528","142.669911","","","AT","16","Rcvd AT Command: OK "
    "529","142.670430","","","AT","36","Sent AT Command: AT%XT3412=1,5000,1200000"
    "530","142.670460","","","AT","16","Rcvd AT Command: OK "
    "531","142.670918","","","AT","22","Sent AT Command: AT+CSCON=1"
    "532","142.670918","","","AT","16","Rcvd AT Command: OK "
    "533","142.671315","","","AT","21","Sent AT Command: AT+CFUN=1"
    "534","142.702687","","","AT","16","Rcvd AT Command: OK "
    "535","143.473561","","","LTE RRC DL_SCH","117","SystemInformationBlockType1"
    "536","143.519642","","","LTE RRC DL_SCH","117","SystemInformation [ SIB2 SIB3 ]"
    "537","443.552145","","","AT","30","Sent AT Command: AT%XMODEMTRACE=1,2"
    "538","443.552969","","","AT","16","Rcvd AT Command: OK "
    "539","443.553457","","","AT","34","Sent AT Command: AT%XSYSTEMMODE=1,0,0,0"
    "540","443.555166","","","AT","16","Rcvd AT Command: OK "
    "541","443.555563","","","AT","46","Sent AT Command: AT+CPSMS=1,,,""00000110"",""00001111"""
    "542","443.555685","","","AT","16","Rcvd AT Command: OK "
    "543","443.556234","","","AT","19","Rcvd AT Command: ERROR "
    "544","443.556631","","","AT","32","Sent AT Command: AT+CEDRXS=2,4,""0001"""
    "545","443.556723","","","AT","16","Rcvd AT Command: OK "
    "546","443.563131","","","AT","28","Sent AT Command: AT%XPTW=4,""0000"""
    "547","443.563223","","","AT","16","Rcvd AT Command: OK "
    "548","443.563620","","","AT","32","Sent AT Command: AT+CEDRXS=2,5,""0000"""
    "549","443.563650","","","AT","16","Rcvd AT Command: OK "
    "550","443.564047","","","AT","28","Sent AT Command: AT%XPTW=5,""0000"""
    "551","443.564077","","","AT","16","Rcvd AT Command: OK "
    "552","443.564596","","","AT","20","Sent AT Command: AT%RAI=0"
    "553","443.564627","","","AT","16","Rcvd AT Command: OK "
    "554","443.565023","","","AT","20","Sent AT Command: AT+CMEE?"
    "555","443.565054","","","AT","26","Rcvd AT Command: +CMEE: 0 OK "
    We will not be able to provide your with the full logs here because of security reasons. However, it is interesting to see that at entry 536 (143 s after 08:33 ) there is no more logs until the watchdog kicks in at 573 (5 minutes after entry 536 ).
    Here is a part of the logs from our Leshan server:
    2024-09-03T08:32:21.870346474Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:32:21.871047635Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:32:21.871058483Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:32:35.711550532Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#4] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:32:35.711585469Z - Updated receive window with sequence number [56]: new upper boundary [56], new bit vector [111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:32:35.713250473Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:32:35.713261494Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<MY_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:32:35.713266901Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:32:35.713271063Z - Dispatching UPDATED event from endpoint <MY_DEVICE>
    2024-09-03T08:32:41.308167383Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#5] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:32:41.308198335Z - Updated receive window with sequence number [2878]: new upper boundary [2878], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:32:41.310311673Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:32:41.310326868Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:32:41.310919136Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:32:41.310929923Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:32:55.149629159Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#1] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:32:55.149658003Z - Updated receive window with sequence number [57]: new upper boundary [57], new bit vector [1111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:32:55.151266445Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:32:55.151277885Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<MY_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:32:55.151290799Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:32:55.151295040Z - Dispatching UPDATED event from endpoint <MY_DEVICE>
    2024-09-03T08:33:00.668352512Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#3] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:33:00.668385496Z - Updated receive window with sequence number [2879]: new upper boundary [2879], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:33:00.670205199Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:33:00.670221346Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxxby RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:33:00.670227034Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:33:00.670231204Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:33:14.591684922Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#2] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:33:14.591719922Z - Updated receive window with sequence number [58]: new upper boundary [58], new bit vector [11111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:33:14.593289049Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:33:14.593300720Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<MY_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:33:14.593306013Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:33:14.593310286Z - Dispatching UPDATED event from endpoint <MY_DEVICE>
    2024-09-03T08:33:20.035205405Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#4] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:33:20.035237700Z - Updated receive window with sequence number [2880]: new upper boundary [2880], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:33:20.039082463Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:33:20.039098124Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:33:20.039103674Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:33:20.039107919Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:33:39.395464146Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#5] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:33:39.395505894Z - Updated receive window with sequence number [2881]: new upper boundary [2881], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:33:39.398320629Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:33:39.398336298Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:33:39.398341549Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:33:39.398345621Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:33:58.828434694Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#1] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:33:58.828468411Z - Updated receive window with sequence number [2882]: new upper boundary [2882], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:33:58.830074886Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:33:58.830089843Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:33:58.830095358Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:33:58.830099536Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:34:18.188468784Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#3] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:34:18.188500726Z - Updated receive window with sequence number [2883]: new upper boundary [2883], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:34:18.190197737Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:34:18.190213112Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:34:18.190218550Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:34:18.190222728Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:34:37.628483161Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#2] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:34:37.628520857Z - Updated receive window with sequence number [2884]: new upper boundary [2884], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:34:37.630136697Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    2024-09-03T08:34:37.630152299Z - Updated registration org.eclipse.leshan.server.registration.UpdatedRegistration@xxxx by RegistrationUpdate [registrationId=<REG_ID>, identity=Identity /<IP_ADDRESS>:<PORT>[psk=<OTHER_DEVICE>], lifeTimeInSec=null, smsNumber=null, bindingMode=null, objectLinks=null]
    2024-09-03T08:34:37.630166175Z [CoapServer(main)#1] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:34:37.630170366Z - Dispatching UPDATED event from endpoint <OTHER_DEVICE>
    2024-09-03T08:34:47.599038933Z [Presence Service] DEBUG c.q.iot.servlet.EventServlet
    2024-09-03T08:34:47.599071673Z - Dispatching SLEEPING event from endpoint <MY_DEVICE>
    2024-09-03T08:34:57.068415605Z [DTLS-Worker-0.0.0.0/0.0.0.0:5684#4] DEBUG o.e.c.s.dtls.DTLSSession
    2024-09-03T08:34:57.068446187Z - Updated receive window with sequence number [2885]: new upper boundary [2885], new bit vector [1111111111111111111111111111111111111111111111111111111111111111]
    2024-09-03T08:34:57.070463768Z [CoapServer(main)#1] DEBUG o.e.l.s.r.RegistrationHandler
    We can see that Leshan considers the device to have fallen asleep 93s after its last (successful) registration update.
    We also managed to capture logs through Cellular Monitor. How would you like us to send the .mtrace file? We would rather not do it here in the forum, since these logs might contain sensitive data.
    In the meantime, we will try the other config values that you suggested  
  • I see that you still have these quite agressive power saving settings enabled:


    Request active time of 30s, then sleep for one hour.
    AT+CPSMS=1,,,""00000110"",""00001111"""

    Request eDRX cycle of 10.24s.
    AT+CEDRXS=2,4,""0001"""

    Request paging time window of 1.28s.
    AT%XPTW=5,""0000"""
    AT%XPTW=4,""0000"""
     
    I don't think this is a root case, but it may cause timeout issues.
    If you are willing to share the .pcap or .mtrace file, you can send me an email. firstname.surname at nordicsemi.no.
    first= seppo, surname=takalo.
  • Thank you  

    # Configure PSM mode
    CONFIG_LTE_PSM_REQ=y
    # Request periodic TAU of 3600 seconds (60 minutes)
    CONFIG_LTE_PSM_REQ_RPTAU="00000110"

    # Set Requested Active Time (RAT) to 30 seconds. Preferably same as the
    # configured LWM2M_QUEUE_MODE_UPTIME. Due to NAT/firewall UDP connections are usually
    # closed within 30-60 seconds so there is in general no point in setting a longer
    # Queue mode uptime / LTE PSM active time.
    CONFIG_LTE_PSM_REQ_RAT="00001111"

    # Request eDRX mode
    CONFIG_LTE_EDRX_REQ=y

    # Requested eDRX cycle length for LTE-M and Nb-IoT
    # This should be fine-tuned for the network and the chosen server.
    # Lowest value is the most responsive, but uses more energy during the active eDRX period.
    # Longer period may cause more CoAP packet drops on server requests.
    # "0000" is 5.12 s
    # "0001" is 10.24 s
    # "0010" is 20.48 s.
    CONFIG_LTE_EDRX_REQ_VALUE_LTE_M="0001"
    Do you think we could try to do anything more than disable PSM mode as well as eDRX (and the other flags that you suggested earlier)? What other values concerning power saving could we disable?
Related