This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrf9160 modem not responding after switching to offline mode

My application uses the nrf9160 in eDRX mode. I regularly have the problem, that when switching the modem to OFFLINE mode (same for POWER-OFF mode), the modem is not responding anymore to any AT commands. I need to switch to OFFLINE mode for battery saving and for switching between Cat-M1 and Cat-NB1.

I wrote a minimalistic application for the demo board (PCA10090 0.8.5, modem Firmware 1.0.0), where I can reproduce the problem.

First the modem is configured to LTE Cat-M1 mode with an eDRX intervall of 82 seconds. Then it is set so ONLINE mode. After a connection is established or a timeout of 100 seconds expires, the modem is set to OFFLINE mode. Then this sequence is repeated.

For the application to run, the AT Command Driver and Logging with synchronous processing must be configured.

Here is the application code for the demo board:

#include <zephyr.h>
#include <stdio.h>
#include <stdlib.h>
#include <uart.h>
#include <string.h>
#include <logging/log.h>
#include <at_cmd.h>

LOG_MODULE_REGISTER(main, LOG_LEVEL_DBG);

volatile int regStatus = 0;

static void modem_NotificationHandler(char* msg)
{
  LOG_DBG("%s", msg);
  if (memcmp(msg, "+CEREG:", strlen("+CEREG:")) == 0)
  {
    char* p = msg + strlen("+CEREG:");
    regStatus = strtol(p, NULL, 10);
  }
}

void main(void)
{
  const u32_t timeout = 100;
  bool connected;
  u32_t t = 0;

	LOG_DBG("application started");

  at_cmd_set_notification_handler(modem_NotificationHandler);
  at_cmd_write("AT+CEREG=3", NULL, 0, NULL);

  while (1)
  {
    regStatus = 0;
    LOG_DBG("configuring modem");
    at_cmd_write("AT%XSYSTEMMODE=1,0,0,0", NULL, 0, NULL); // LTE Cat-M1
    at_cmd_write("AT+CEDRXS=2,4,\"0101\"", NULL, 0, NULL); // 81.92 seconds
    LOG_DBG("modem configured");
    at_cmd_write("AT+CFUN=1", NULL, 0, NULL);
    LOG_DBG("connecting . . .");
    t = 0;
    connected = false;
    while (!connected && t++ < timeout)
    {
      k_sleep(1000);
      connected = regStatus == 1 || regStatus == 5;
    }
    if (connected)
    {
      LOG_DBG("connected");
      k_sleep(10000);
    }
    else
    {
      LOG_DBG("connection timed out");
    }
    LOG_DBG("setting modem to OFFLINE");
    at_cmd_write("AT+CFUN=4", NULL, 0, NULL);
    LOG_DBG("modem set to OFFLINE");
  }
}

After the second connection cycle, when setting the modem to OFFLINE mode, the modem is not responding anymore.

Here is the output:

***** Booting Zephyr OS build v1.14.99-ncs3-snapshot2-1276-g4493a423a645 *****
[00:00:00.365,661] <dbg> main.main: application started
[00:00:00.371,734] <dbg> main.main: configuring modem
[00:00:00.379,455] <dbg> main.main: modem configured
[00:00:00.415,435] <dbg> main.main: connecting . . .
[00:00:30.800,292] <dbg> main.modem_NotificationHandler: +CEREG: 2,"0328","010D2B08",7,0,0

[00:01:05.904,235] <dbg> main.modem_NotificationHandler: +CEREG: 5,"0328","010D2B08",7

[00:01:05.912,994] <dbg> main.modem_NotificationHandler: +CEDRXP: 4,"0101","0101","0000"

[00:01:06.426,910] <dbg> main.main: connected
[00:01:16.431,854] <dbg> main.main: setting modem to OFFLINE
[00:01:16.592,407] <dbg> main.modem_NotificationHandler: +CEREG: 0,"0328","010D2B08",7,0,0

[00:01:18.002,746] <dbg> main.main: modem set to OFFLINE
[00:01:18.008,575] <dbg> main.main: configuring modem
[00:01:18.022,521] <dbg> main.main: modem configured
[00:01:18.064,727] <dbg> main.main: connecting . . .
[00:01:19.749,694] <dbg> main.modem_NotificationHandler: +CEREG: 2,"0328","01158B05",7,0,0

[00:02:58.079,345] <dbg> main.main: connection timed out
[00:02:58.085,144] <dbg> main.main: setting modem to OFFLINE

Any suggestions?

Parents
  • modem Firmware 1.0.0

    There is a MFW 1.0.1 out now for a while.  I don't know of any specific mentions in the changelog related to this, but you should probably try it out just in case.

    And I agree that this should not happen when you go OFFLINE and back, but it's actually the behavior I would expect going to POWER-OFF and back.  Nordic has confirmed that once you put a modem in POWER OFF mode you shouldn't expect it to do *anything* until after the next reboot.  Putting the modem into POWER OFF is a one-way trip.

    EDIT: We are using EDRX in our application and can go online/offline repeatedly without issue.  I'm wondering if it's somehow related to the CEDRXS or XSYSTEMMODE swtiching while offline.  We rarely do either.  The CEDRXS you can actually do while online.  One possibly related issue I've seen is that if you have other sockets open to the modem for things like TCP/UDP traffic and are not servicing them, they can block the whole subsystem and cause the AT socket to block.  But it doesn't look like you have other sockets open in this case...

  • I just checked, I am using MFW 1.0.1 on our custom board and 1.0.0 on the demo board.

    I now updated also the demo board to 1.0.1, and I still see the same behavior. Thanks for pointing that out!

    According to the AT Command Manual, the modem must be switched to OFFLINE when the XSYSTEMMODE is changed. I guess the issue is somehow connected to the eDRX mode. I also tried the CEDRX command after going ONLINE, but I does not make a difference.

    In my simplified program I do not open any other sockets as far as I see. And I see the same behavior also with the AT_CLIENT sample.

    Another Observation maybe worth noting is that for some time I still get CEREG notifications occasionally in multiples of the eDRX interval (82 seconds). So the modem is still running somehow and occasionally switches to a different cell, but is not responding…

Reply
  • I just checked, I am using MFW 1.0.1 on our custom board and 1.0.0 on the demo board.

    I now updated also the demo board to 1.0.1, and I still see the same behavior. Thanks for pointing that out!

    According to the AT Command Manual, the modem must be switched to OFFLINE when the XSYSTEMMODE is changed. I guess the issue is somehow connected to the eDRX mode. I also tried the CEDRX command after going ONLINE, but I does not make a difference.

    In my simplified program I do not open any other sockets as far as I see. And I see the same behavior also with the AT_CLIENT sample.

    Another Observation maybe worth noting is that for some time I still get CEREG notifications occasionally in multiples of the eDRX interval (82 seconds). So the modem is still running somehow and occasionally switches to a different cell, but is not responding…

Children
  • In my simplified program I do not open any other sockets as far as I see. And I see the same behavior also with the AT_CLIENT sample.

    Can you run AT%CESQ=1 to enable RSRP reports before trying it again in at_client?  I'm wondering if your modem is right on the fringe of unusable signal strength.  That still shouldn't be causing this issue, but it would be interesting to know.

  • Yes, signal strength is rather low in here. I get RSRP values of 24 and 28, that's around -120 dBm. But anyway, that shouldn't not hang up the modem FW... Maybe I should record a modem trace, so a Nordic engineer can look into what's going on

  • Hi,

     

    Cycling between CFUN=1 and CFUN=4 shouldn't be a problem. But, as mentions, please update to the latest modem fw on all boards.

    I have seen behavior like this in LTE Link Monitor, if I get many debug prints incoming and also sending commands back-and-forth, where the GUI can drag on, but haven't seen this sort of behavior when using a terminal (putty, teraterm etc).

    I would recommend checking the return value for the at_cmd calls, in case any of them fail.

    Also try setting CONFIG_LOG=y, and CONFIG_AT_CMD_LOG_LEVEL_DBG=y to get debug logs from the at_cmd module.

     

    Kind regards,

    Håkon

  • Also in Tera Term the following sequence of AT commands results in modem hang-up:

    AT+CFUN?
    AT+CEREG=3
    AT%CESQ=1
    AT%XSYSTEMMODE=1,0,0,0
    AT+CEDRXS=2,4,"0101"
    AT+CFUN=1
    <<< wait until +CEREG: 1 (or 5) shows >>>
    AT+CFUN=4
    AT+CFUN=1
    <<< wait until +CEREG: 2 shows >>>
    AT+CFUN=4
    

    AT_CMD debug messages in the application do not show anything out of the ordinary...

    Could you please try to reproduce it? Maybe you can see what's happening here.

    Our use case does not allow power cycling the device, so a hung up modem is a No-Go for our application. We really need to know what's going on here and how to prevent modem hang-up.

  • Yep.  This is surprisingly easy to reproduce.  I rebuilt at_client with debug logging enabled, loaded it on my dev kit, and did this on my second try (leftmost timestamps from TeraTerm):

    [2019-10-04 12:54:35.031] ***** Booting Zephyr OS build zephyr-v2.0.0-875-gd6e67554cfeb *****
    [2019-10-04 12:54:35.031] The AT host sample started
    [2019-10-04 12:54:36.016] [00:00:00.270,935] <dbg> at_cmd.at_cmd_driver_init: Common AT socket created
    [2019-10-04 12:54:36.016] [00:00:00.270,965] <dbg> at_cmd.at_cmd_driver_init: Common AT socket processing thread created
    [2019-10-04 12:54:36.032] [00:00:00.273,437] <dbg> at_cmd.socket_thread_fn: AT socket thread started
    [2019-10-04 12:54:37.329] AT
    [2019-10-04 12:54:37.846] OK
    [2019-10-04 12:54:38.033] [00:00:03.070,739] <dbg> at_cmd.at_write: Sending command AT
    [2019-10-04 12:54:38.049] 
    [2019-10-04 12:54:38.049] [00:00:03.079,467] <dbg> at_cmd.at_write: Bytes sent: 3
    [2019-10-04 12:54:53.861] AT+CFUN?
    [2019-10-04 12:54:53.877] +CFUN: 4
    [2019-10-04 12:54:53.877] OK
    [2019-10-04 12:54:54.049] [00:00:19.108,978] <dbg> at_cmd.at_write: Sending command AT+CFUN?
    [2019-10-04 12:54:54.064] 
    [2019-10-04 12:54:54.064] [00:00:19.117,828] <dbg> at_cmd.at_write: Bytes sent: 9
    [2019-10-04 12:55:00.743] AT+CEREG=3
    [2019-10-04 12:55:00.758] OK
    [2019-10-04 12:55:01.071] [00:00:25.990,020] <dbg> at_cmd.at_write: Sending command AT+CEREG=3
    [2019-10-04 12:55:01.071] 
    [2019-10-04 12:55:01.071] [00:00:25.998,840] <dbg> at_cmd.at_write: Bytes sent: 11
    [2019-10-04 12:55:05.292] AT%CESQ=1
    [2019-10-04 12:55:05.308] OK
    [2019-10-04 12:55:06.074] [00:00:30.533,386] <dbg> at_cmd.at_write: Sending command AT%CESQ=1
    [2019-10-04 12:55:06.090] 
    [2019-10-04 12:55:06.090] [00:00:30.542,175] <dbg> at_cmd.at_write: Bytes sent: 10
    [2019-10-04 12:55:10.404] AT%XSYSTEMMODE=1,0,0,0
    [2019-10-04 12:55:10.420] OK
    [2019-10-04 12:55:11.092] [00:00:35.654,785] <dbg> at_cmd.at_write: Sending command AT%XSYSTEMMODE=1,0,0,0
    [2019-10-04 12:55:11.107] 
    [2019-10-04 12:55:11.107] [00:00:35.665,130] <dbg> at_cmd.at_write: Bytes sent: 23
    [2019-10-04 12:55:15.952] AT+CEDRXS=2,4,"0101"
    [2019-10-04 12:55:15.952] OK
    [2019-10-04 12:55:16.108] [00:00:41.190,765] <dbg> at_cmd.at_write: Sending command AT+CEDRXS=2,4,"0101"
    [2019-10-04 12:55:16.108] 
    [2019-10-04 12:55:16.108] [00:00:41.199,645] <dbg> at_cmd.at_write: Bytes sent: 21
    [2019-10-04 12:55:21.844] AT+CFUN=1
    [2019-10-04 12:55:21.891] OK
    [2019-10-04 12:55:22.125] [00:00:47.093,719] <dbg> at_cmd.at_write: Sending command AT+CFUN=1
    [2019-10-04 12:55:22.125] 
    [2019-10-04 12:55:22.125] [00:00:47.133,117] <dbg> at_cmd.at_write: Bytes sent: 10
    [2019-10-04 12:55:31.753] AT
    [2019-10-04 12:55:32.065] OK
    [2019-10-04 12:55:32.127] [00:00:57.311,767] <dbg> at_cmd.at_write: Sending command AT
    [2019-10-04 12:55:32.143] 
    [2019-10-04 12:55:32.143] [00:00:57.312,103] <dbg> at_cmd.at_write: Bytes sent: 3
    [2019-10-04 12:56:54.363] %CESQ: 34,1
    [2019-10-04 12:56:54.363] +CEREG: 2,"0411","00F63F11",7,0,0
    [2019-10-04 12:56:56.019] +CEREG: 5,"0411","00F63F11",7
    [2019-10-04 12:56:56.035] +CEDRXP: 4,"0101","0101","0000"
    [2019-10-04 12:57:07.841] AT+CFUN=4
    [2019-10-04 12:57:08.153] [00:02:33.082,122] <dbg> at_cmd.at_write: Sending command AT+CFUN=4
    [2019-10-04 12:57:08.153] 
    [2019-10-04 12:58:05.442] 
    [2019-10-04 12:58:05.886] at
    [2019-10-04 12:58:09.858] AT
    [2019-10-04 12:58:10.427] AT

    In my case, it actually froze up when I was trying to send it the CFUN=4 to disconnect.  Based on the log messages, it looks like the call to send() the command over the socket never returned.  (I let it sit for tens of minutes...)

    I've never been able to prove it, but I've always been suspicious that the AT send command will lockup if there is data coming back the other way from a modem status message at exactly the wrong time.  Early on in our development, while we were still deciding which messages were useful or interesting, we enabled *every* unsolicited message for the modem and just logged everything.  Things used to lock up all the damn time.  Over time, we turned off most of the messages and things got more stable.  Admittedly, this was also as we were moving up through modem firmware versions, so it's likely the fixes in there also resolved most of the lockups, but I'm still suspicious...

Related