Network stability debug help

We are running into an issue of stability with our LTE connection. Our nRF9160 makes a call every 5 minutes to our backend. On start-up, the device connects and starts making these calls.  After some time (90 minutes to 3 days) we stop seeing the communication to our backend. Once we power cycle the device, we start to see the calls coming in again. The 9160 application has the watchdog enabled, and is not hanging as I'm still seeing heartbeat print outs on RTT viewer.

I'm getting the system setup to collect the modem trace files with nRF Connect Trace Collector. Are there any additional debug outputs I should be collecting to help diagnose this network connection issue?

We're using:

NVS v2.1.2

MFW v1.3.3

Zephyr 3.1.99

On ATT network

  • Hello, I apologize again for the late reply. 

    ERob said:
    Are you able to have this reviewed for any issues towards the end of the file?

    Looking through the logs, at the end of the file as you mention, there are attach rejects from the network. 

    Currently three EPS Mobility Management (EMM) reject cause are provided as per3GPP TS 24.301 Annex A

    • Cause #9 – UE identity cannot be derived by the network.
      • This EMM cause is sent to the UE when the network cannot derive the UE's identity from the GUTI/S-TMSI/P- TMSI and RAI e.g. no matching identity/context in the network or failure to validate the UE's identity due to integrity check failure of the received message.
    • Cause #11 – PLMN not allowed
      • This EMM cause is sent to the UE if it requests service, or if the network initiates a detach request, in a PLMN where the UE, by subscription or due to operator determined barring, is not allowed to operate.
    • Cause #15 – No suitable cells in tracking area
      • This EMM cause is sent to the UE if it requests service, or if the network initiates a detach request, in a tracking area where the UE, by subscription, is not allowed to operate, but when it should find another allowed tracking area or location area in the same PLMN or an equivalent PLMN.

    Sounds like a question to bring back to AT&T. Is your device stationary?

  • Thanks Oyvind,

    We have one more trace we captured from a different device. Can you review this to see if the same 3 EPS mobility management reject causes are provided? I'd like to confirm the 2 devices failed in the same way.

    Yes, our device is stationary.

    I'll bring these questions to AT&T.

    All the best,
    Eric

    RTT_0000575a9b9e4abe_9160_Start 20231130-1638.mtrace

  • Eric, my sincere apologies for the late reply. Thanks for reaching out to your RSM! I forgot to answer you back in your last reply, but I forwarded to our modem team on the same day.

    I need to verify what the issue is in the last modem trace. It does look like the reject cause is 7 - EPS services not allowed. Will update within the day (Thursday Norwegian time). 

    Kind regards,
    Øyvind

  • Hi Eric, 

    Our modem team have been looking into the modem logs and provide the following feedback:

    The UE loses the AT&T cell as the coverage decreases/goes out of range. This can be due to e.g. interference. For the UE it takes some time to get in touch with the AT&T cell. During this time the UE attempts to connect neighboring cells (both T-Mo and Vzw) and those reject the UE with different EMM Causes depending when/how the UE attempts the attach. We are still working on the issue. Waiting for more feedback from our network experts. 

    Kind regards,
    Øyvind

  • Here is an update from our modem team. First from out carrier expert:

    UE tries first T-Mo (311-490) and does TAU. Since the MME in T-Mo network has not seen this UE before (UE's identity from the GUTI/S-TMSI/PTMSI in unknown) the MME responds with TAU REJECT Cause “Cause: UE identity cannot be derived by the network (9)”.

    Next the UE attempts attach to the same cell and T-Mo (311-490) .. and receives attach REJECT with Cause “Cause: PLMN not allowed (11)“ likely because roaming in the T-Mo network is not allowed to this subscription

    After this the UE attempts again a new AT&T cell but on a FirstNet PLMN.. and gets REJECT with Cause “Cause: No Suitable Cells In tracking area (15)“. I believe the subscription has no FirstNet provisioned

    UE attempts again T-Mo but in another cell gets another REJECT with Cause “Cause: PLMN not allowed (11)“. Likely no roaming in T-Mo network allowed for this subscription.

    Then UE attempts Vzw (311-480) and obviously gets a REJECT with Cause “Cause: PLMN not allowed (11)“. Likely no roaming in Vzw network allowed for this subscription

    Then UE attempts to the same AT&T cell it recently got rejected but on non-FirstNet PLMN and this succeeds. The service is resumed.

    Based on the above the UE works as expected.

    Then our network specialist answered:

    The root cause here seems to be the failing RRC connection establishments all of a sudden. At least mapped RSRP is very good at all time, and if the use case is a lock then we're assuming the device doesn’t move at all

    One example of continuous lower layer failures in RRC connection establishment:

    08:47:26.976961  NAS_PDU_SERVICE_REQUEST [c75cb551]
    08:47:31.023805  ERRC_EST_REJ_s { header : { msg_id : ERRC_EST_REJ, sender : TASK_ERRC, receiver : TASK_EMMSM } }
    08:47:31.024019  NAS_PDU_SERVICE_REQUEST [c75cb551]
    08:47:33.034853  ERRC_EST_REJ_s { header : { msg_id : ERRC_EST_REJ, sender : TASK_ERRC, receiver : TASK_EMMSM } }
    08:47:33.035097  NAS_PDU_SERVICE_REQUEST [c75cb551]
    08:47:35.045930  ERRC_EST_REJ_s { header : { msg_id : ERRC_EST_REJ, sender : TASK_ERRC, receiver : TASK_EMMSM } } 

    Due to these failures, as per 3GPP, the modem attempts to connect to other networks and gets rejected as expected. In the end, the modem returns to AT&T. The lower layer failures have disappeared and everything seems to work smoothly.

    Routing to L1 for investigations.

    Kind regards,
    Øyvind

Related