nRF 9160 modem fault during file download

I am debugging firmware and file downloads with the nRF9160 on a custom nRF9160 board running the LWM2M client, and recently we have noticed that the modem firmware appears to occasionally crash. This triggers the modem to issue a LWM2M_CARRIER_EVENT_LTE_POWER_OFF event to shut the modem down. I have a separate ticket that I opened with respect to the proper way to handle that event and recover, but I want to see about getting some assistance with debugging why the modem is faulting in the first place.  I have not been able to pinpoint what exactly causes the failure as it is not consistent (and sometimes does not happen at all).

Below is a snippet from the nRF9160 logs with the fault that occurs, and I have included the modem lib trace in this post as well. 

As far as HW/SW goes, the unit is:

NCS 2.3.0

Modem FW: mfw_nrf9160_1.3.5 

HW: nRF9160 SICA B1 2126E4

SIM: AT&T currently

[00:11:17.760,253] <inf> nrf_modem: connect() fd 0x3
[00:11:17.760,345] <inf> nrf_modem: sa_family 0x1, destaddr_len 0x4, destport 443
[00:11:17.872,589] <inf> nrf_modem: RPC_IP_CONNECT_RES fd 0x3, result RPC_IP_ERR_OK
[00:11:17.872,741] <dbg> nrf_modem: Waiting for handshake semaphore
[00:11:17.872,772] <inf> nrf_modem: Attaching sock fd 0x3
[00:11:17.872,863] <dbg> nrf_modem: Hostname: none
[00:11:17.872,894] <dbg> nrf_modem: role 0x2, verify 0x2, cache 0x0, tags count 1
[00:11:17.872,955] <dbg> nrf_modem: tag[0]: 8675309
[00:11:17.902,069] <inf> nrf_modem: RPC_IP_TLS_ATTACH_RES fd 0x3, result RPC_IP_ERR_OK
[00:11:20.112,945] <err> nrf_modem: Modem fault occurred, gpmem1: 0x10004, gmpem2: 0xe6910
[00:11:20.112,976] <err> nrf_modem: Modem error: 0x4, PC: 0xe6910
[00:11:20.113,067] <err> download_client: Unable to connect, errno 110
[00:11:20.113,098] <inf> nrf_modem: close() fd 0x3
[00:11:20.113,159] <dbg> nrf_modem: Socket was handshaking, releasing semaphore
[00:11:20.113,220] <inf> app_update: Failed to connect, err -110

RTTLogger_Channel_modem_trace_modemFault_231229.log

Parents
  • Hello, and happy new year! 

    I have been assigned this ticket as well. Thanks for sharing the logs.  

    I will have our carrier lib developers have a look at this. 

    Kind regards,
    Øyvind

  • Hi Øyvind,

    Just some follow up to this, I think I may have at least some place to look. I swapped back to mfw_1.3.4 because SDK 2.3.0 is compatible with 1.3.5 it per the modem firmware compatibility chart, but the carrier certifications with the LWM2M carrier library do not have a mfw_1.3.5, LWM2M carrier library, and SDK 2.3.0 listed together.

    I did not see the crash when I ran it on a Verizon SIM with 1.3.4 (thought this was just one test, so I would not call it statistically significant). With AT&T I ran my file download on 1.3.4 and while I am not seeing the modem faults, the connection stability for the download looks much worse and in particular when trying to connect to the server, I see more errors occur specifically on the TLS handshake. I have included the modem traces for reference (the successful Verizon and the unsuccessful AT&T ones). What I am seeing in particular is that it always seems to start off fine at power up, and the downloads will start, but as soon as the connection fails, it seems to continue indefinitely. If I restart the board, it comes back up and works the way I would expect (at least until it hits the error again), so it almost seems like the LWM2M gets into a locked up state which may cause a crash in some cases? 

    It sounds like it may be 2 different issues, but maybe there is some information there that is helpful.


    RTTLogger_Channel_modem_trace_vzw_1.3.4_240102.logRTTLogger_Channel_modem_trace_ATT_240102.logRTTLogger_Channel_modem_trace_att_failure2_240102.log

  • Hello, have not got an answer for you yet. Our developers are looking into this. I was also forwarded your email correspondence with our US sales team, and will provide an answer to you within tomorrow, Friday.

    SpencerMougey said:
    I swapped back to mfw_1.3.4 because SDK 2.3.0 is compatible with 1.3.5 it per the modem firmware compatibility chart, but the carrier certifications with the LWM2M carrier library do not have a mfw_1.3.5, LWM2M carrier library, and SDK 2.3.0 listed together.

    Yes, for Verizon the latest certified combination is modem fw 1.3.5 and nRF Connect SDK v2.4.2, but for AT&T this was mfw 1.3.4 and NCS 2.3.0. Note that AT&T have removed their device management and thus the need for LwM2M Carrier library. Will need to discuss with our carrier lib developers how to handle multiple carriers / software paths.

    Kind regards,
    Øyvind

  • Hello again, here is the reply I got from our carrier library developers based on the question in the email.

    CONFIG_LWM2M_CARRIER_ATT=n will disable AT&T support in the LwM2M Carrier library. There is no need for 2 software paths for the initialization, LWM2M_CARRIER_EVENT_INIT is sent regardless of the operator, the event mostly concerns modem and some internal initialization. Disabling AT&T support implies that the LwM2M Carrier library will not attempt to connect to AT&T servers if it detects that the device is running in AT&T network, it will simply remain idle until a change occurs (e.g. registering in a different network).

    From what I see, this is fixed in the later release (NCS 2.4.2). There the modem library and carrier library has been decoupled, so that you don't need 2 software paths. If you have to stick to the current version (NCS 2.3.0) I will ask the developers for advice.

    Judging by the compatibility table, NCS 2.4.2 should work fine since AT&T has discontinued their device management. (Verizon and T-Mobile are both certified with NCS 2.4.2)

    To read out the operator, you may use the AT%XOPERID AT command as described AT command manual (see https://docs.nordicsemi.com/bundle/ref_at_commands/page/REF/at_commands/nw_service/xoperid_set.html) Keep in mind that for that purpose you will need to either set the modem to normal functional mode (CFUN=1) or activate the SIM (CFUN=41).

  • Hi Øyvind,

    I will work on upgrading from 2.3.0 to 2.4.2 and report back since that sounds like the correct path forward. I would rather not try to hack around the limitations in 2.3.0.

Reply Children
Related