Help with LTE connection after loss of connection

We have noticed that several of our nRF9160 in production have problem with reconnecting to the LTE-M network after temporarily losing connection. We have tried reproducing this issue by deactivating the sim for about 10 seconds and then activating it again. After the default six retries it gives up and stop trying to reconnect to the network. the only way we can achieve a connection is if we do a sys_reboot() or a hard reboot with watchdog. Why is the reconnect attempts not enough? Why do we have to force a reboot? Attached are relevant logs.

Parents
  • More info: 

    • We are using NSC version 2.6.0 and mfw 1.3.6.
    • The sims are global (EU, Nordics, Baltics), see image of subscription details.We have mainly worked with Telia sims but some of our clients have tried using other with no improvement. 

  • Hi,

    Could you  and  please confirm that all your questions and logs are related to the same application (based on lwm2m_client in NCS v2.6.1)?

    Could you please provide more information about your application? What exactly do you try to achieve? What does your application do?

    We have noticed that several of our nRF9160 in production have problem with reconnecting to the LTE-M network after temporarily losing connection.

    Can you provide more information on how devices lose connection? Do they work normally and suddenly lose connection? How often does this happen and on how many devices? Where are your failing devices located?

    We have tried reproducing this issue by deactivating the sim for about 10 seconds and then activating it again. After the default six retries it gives up and stop trying to reconnect to the network.

    Why do you think that the issue might be related to SIM? How do you do deactivation/activation of the SIM?

    Matias Marti said:
    It looks like the EXCHANGE_LIFETIME is set to 247s (4 minutes 7s) in the code here. Is there any way we can modify this value? Or is there another way to "give up" the exchange earlier?

    Have you tried changing the value directly in the code?

    Best regards,
    Dejan

  • Thank you  

    Yes, we are using our own Leshan server.

    https://github.com/eclipse-leshan/leshan/issues/1166

    I read through this issue, and I did not really understand how we would have to modify our Californium.properties file to support CID.

  • That ticket is from a time long ago.

    During the development of RFC 9146 the MAC calculation has changed pretty late. That caused also the usage of a new Hello Extension ID.

    Unfortunately, the mbedtls team wasn't able to adapt and update the implementation in time, therefore to complicated workaround in that old issue.

    Today for Californium you only need to enable DTLS 1.2 CID with

    # DTLS connection ID length. <blank> disabled, 0 enables support without
    # active use of CID.
    DTLS.CONNECTION_ID_LENGTH=6

    But I'm not sure, what is required for Leshan to handle the address changes in other layers as well. Therefore you maybe open an ticket there.

  • Try:

    Fullscreen
    1
    2
    3
    # DTLS update address using CID on newer records.
    # Default: true
    DTLS.UPDATE_ADDRESS_USING_CID_ON_NEWER_RECORDS=true
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I don't see that `DTLS.CONNECTION_ID_LENGTH` setting in our Leshan server, so maybe it supports it by default, if the client requests Connection-ID.

    With the current Leshan, I have not seen any problems with Connection-ID. Even that "UPDATE_ADDRESS" setting seem to be on by default.

    You can verify your configuration by running one client against https://leshan.eclipseprojects.io/ it supports DTLS CID and updates client IP when I do LwM2M Update.

  • Thank you. We will try this. 

    Now, when the server responds to registration updates immediately, the device keeps running longer.

    However, we are consistently seeing that, after almost 50min, the board just restarts itself. See logs:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [00:48:13.524,230] <inf> net_lwm2m_rd_client: Update Done
    [00:48:32.525,146] <dbg> app_lwm2m_client: rd_client_event: Registration update started
    [00:48:32.892,547] <inf> net_lwm2m_rd_client: Update callback (code:2.4)
    [00:48:32.892,608] <dbg> app_lwm2m_client: rd_client_event: Registration update complete
    [00:48:32.892,608] <dbg> app_lwm2m_client: watchdog_Kick: Watchdog feed ok!
    [00:48:32.892,700] <inf> net_lwm2m_rd_client: Update Done
    [00:48:51.892,669] <dbg> app_lwm2m_client: rd_client_event: Registration update started
    [00:48:52.323,059] <inf> net_lwm2m_rd_client: Update callback (code:2.4)
    [00:48:52.323,089] <dbg> app_lwm2m_client: rd_client_event: Registration update complete
    [00:48:52.323,120] <dbg> app_lwm2m_client: watchdog_Kick: Watchdog feed ok!
    [00:48:52.323,211] <inf> net_lwm2m_rd_client: Update Done
    [00:49:11.324,127] <dbg> app_lwm2m_client: rd_client_event: Registration update started
    [00:49:11.685,546] <inf> net_lwm2m_rd_client: Update callback (code:2.4)
    [00:49:11.685,577] <dbg> app_lwm2m_client: rd_client_event: Registration update complete
    [00:49:11.685,607] <dbg> app_lwm2m_client: watchdog_Kick: Watchdog feed ok!
    [00:49:11.685,699] <inf> net_lwm2m_rd_client: Update Done
    [00:49:30.685,607] <dbg> app_lwm2m_client: rd_client_event: Registration update started
    [00:49:31.045,043] <inf> net_lwm2m_rd_client: Update callback (code:2.4)
    [00:49:31.045,074] <dbg> app_lwm2m_client: rd_client_event: Registration update complete
    [00:49:31.045,104] <dbg> app_lwm2m_client: watchdog_Kick: Watchdog feed ok!
    [00:49:31.045,196] <inf> net_lwm2m_rd_client: Update Done
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Could there be something outside of the lwm2m client or anything else that is causing a reboot after 50 min?

    Here is our prj.conf file:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    # General config
    CONFIG_ASSERT=y
    CONFIG_REBOOT=y
    # Network
    CONFIG_NETWORKING=y
    CONFIG_NET_NATIVE=n
    CONFIG_NET_IPV6=n
    CONFIG_NET_IPV4=y
    CONFIG_NET_SOCKETS=y
    CONFIG_NET_SOCKETS_OFFLOAD=y
    # Sensors
    CONFIG_ADC=y
    CONFIG_SPI=y
    CONFIG_SPI_NRFX=y
    CONFIG_SENSOR=y
    CONFIG_I2C=y
    # LwM2M and IPSO
    CONFIG_LWM2M=y
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • > Try:

    >  # DTLS update address using CID on newer records.
    >  # Default: true
    >  # DTLS.UPDATE_ADDRESS_USING_CID_ON_NEWER_RECORDS=true

    The more complete documentation is in the javadoc of DtlsConfig:

    /**
     * Update the ip-address from DTLS 1.2 CID records only for newer records
     * based on epoch/sequence_number.
     *
     * @see <a href= "">www.rfc-editor.org/.../rfc9146.html
     *      target= "_blank">RFC 9146, Connection Identifiers for DTLS 1.2, 6.
     *      Peer Address Update</a>
     */

    In general all CID record will update the address of the DTLS context. (If leshan is using that as well, is out of my scope). But assuming, that records may be received in inverse order, it may cause to update to a deprecated address. This setting therefore updates the address only for the newest record according the dtls record sequence number. In the very, very most cases it doesn't make a difference (because the record order doesn't change that frequently nor will the address change that fast), and it is already on per default.  

    I don't see that `DTLS.CONNECTION_ID_LENGTH` setting in our Leshan server

    Therefore I recommend to ask the leshan project, how they set that up. For Californium on it's own, it's required, because the default there is for v3. "off".

Reply
  • > Try:

    >  # DTLS update address using CID on newer records.
    >  # Default: true
    >  # DTLS.UPDATE_ADDRESS_USING_CID_ON_NEWER_RECORDS=true

    The more complete documentation is in the javadoc of DtlsConfig:

    /**
     * Update the ip-address from DTLS 1.2 CID records only for newer records
     * based on epoch/sequence_number.
     *
     * @see <a href= "">www.rfc-editor.org/.../rfc9146.html
     *      target= "_blank">RFC 9146, Connection Identifiers for DTLS 1.2, 6.
     *      Peer Address Update</a>
     */

    In general all CID record will update the address of the DTLS context. (If leshan is using that as well, is out of my scope). But assuming, that records may be received in inverse order, it may cause to update to a deprecated address. This setting therefore updates the address only for the newest record according the dtls record sequence number. In the very, very most cases it doesn't make a difference (because the record order doesn't change that frequently nor will the address change that fast), and it is already on per default.  

    I don't see that `DTLS.CONNECTION_ID_LENGTH` setting in our Leshan server

    Therefore I recommend to ask the leshan project, how they set that up. For Californium on it's own, it's required, because the default there is for v3. "off".

Children
No Data