NRF9160 SLM self reboot cause

Hello,

I'm using the SLM application with an NRF9160.  I'm sending AT commands for MQTT publish.  After ~10hours the device reboots.

(22:56:29.123) AT%XTEMP?
(22:56:29.123) AT#XMQTTPUB="sensor/255/battery/V","11.59",0,0
(22:56:29.162) CPU Temperature: 28 deg
(22:56:29.162) AT#XMQTTPUB="sensor/255/cpu_temp/C","28",0,0
(22:56:30.130) SLM: [11:00:38.999,267] <ESC>[0m<dbg> slm_at_host: cmd_send: RX
(22:56:30.130) SLM:                                       41 54 25 58 54 45 4d 50  3f                      |AT%XTEMP ?       <ESC>[0m
(22:56:30.156) SLM: [11:00:39.008,331] <ESC>[0m<dbg> slm_at_host: slm_at_send_indicate: TX
(22:56:30.176) SLM:                                       0d 0a 25 58 54 45 4d 50  3a 20 32 38 0d 0a 0d 0a |..%XTEMP : 28....
(22:56:30.176) SLM:                                       4f 4b 0d 0a                                      |OK..             <ESC>[0m
(22:56:30.194) SLM: [11:00:39.008,392] <ESC>[0m<dbg> slm_at_host: cmd_send: RX
(22:56:30.194) SLM:                                      41 54 23 58 4d 51 54 54  0 42 3d 22 73 65 6e |AT#XMQTT PUB="sen
(22:56:30.213) SLM:                                       73 6f 72 2f 32 35 35 2f   61 74 74 65 72 79 2f |sor/255/ battery/
(22:56:30.213) SLM:                                       56 22 2c 22 31 31 2e 35   22 2c 30 2c 30       |V","11.5 9",0,0  <ESC>[0m
(22:56:30.213) SLM: [11:00:39.015,869] <ESC>[0m<dbg> slm_at_host: slm_at_send_indicate:X
(22:56:30.213) SLM:                                       0d 0a 4f 4b 0d 0a                             |..OK..           <ESC>[0m
(22:56:30.237) SLM: [11:00:39.016,815] <ESC>[0m<dbg> slm_at_host: cmd_send: RX
(22:56:30.237) SLM:                                    41 54 23 58 4d 51 54 54  50 42 3d 22 73 65 6e |AT#XMQTT PUB="sen
(22:56:30.268) SLM:                                       73 6f 72 2f 32 35 35 2f  70 75 5f 74 65 6d 70 |sor/255/ cpu_temp
(22:56:30.268) SLM:                                       2f 43 22 2c 22 32 38 22   30 2c 30             |/C","28" ,0,0    <ESC>[0m
(22:56:30.268) SLM: [11:00:39.017,639] <ESC>[0m<dbg> slm_at_host: slm_at_send_indicate:X
(22:56:30.268) SLM:                                       0d 0a 4f 4b 0d 0a                             |..OK..           <ESC>[0m
(22:56:35.795) SLM: *** Booting nRF Connect SDK v2.8.0-a2386bfc8401 ***
(22:56:35.831) SLM: *** Using Zephyr OS v3.7.99-0bc3393fb112 ***
(22:56:35.831) SLM: [00:00:00.252,044] <ESC>[0m<inf> at_cmd_custom: Custom AT commands enabled with 70 entries.<ESC>[0m
(22:56:35.831) SLM: [00:00:00.252,075] <ESC>[0m<dbg> slm: main: RR: 0x00000000<ESC>[0m
(22:56:35.831) SLM: [00:00:00.257,873] <ESC>[0m<inf> fs_nvs: 2 Sectors of 4096 bytes<ESC>[0m
(22:56:35.831) SLM: [00:00:00.257,873] <ESC>[0m<inf> fs_nvs: alloc wra: 0, fe8<ESC>[0m
(22:56:35.854) SLM: [00:00:00.257,873] <ESC>[0m<inf> fs_nvs: data wra: 0, 0<ESC>[0m
(22:56:36.227) SLM: Ready
(22:56:36.227) SLM: [00:00:00.491,821] <ESC>[0m<inf> slm: lib_modem init: 0<ESC>[0m
(22:56:36.249) SLM: [00:00:00.491,912] <ESC>[0m<inf> mcuboot_util: Image index: 0, Swap type: none<ESC>[0m
(22:56:36.249) SLM: [00:00:00.491,912] <ESC>[0m<inf> slm: Serial LTE Modem<ESC>[0m
(22:56:36.275) SLM: [00:00:00.585,327] <ESC>[0m<inf> nrf_cloud_info: Device ID: 504d5632-3736-4649-808b-2321133ae3a6<ESC>[0m
(22:56:36.275) SLM: [00:00:00.590,576] <ESC>[0m<inf> nrf_cloud_info: IMEI:      350457793853522<ESC>[0m
(22:56:36.275) SLM: [00:00:00.682,861] <ESC>[0m<inf> nrf_cloud_info: UUID:      504d5632-3736-4649-808b-2321133ae3a6<ESC>[0m
(22:56:36.275) SLM: [00:00:00.688,598] <ESC>[0m<inf> nrf_cloud_info: Modem FW:  mfw_nrf9160_1.3.7<ESC>[0m
(22:56:36.307) SLM: [00:00:00.688,629] <ESC>[0m<inf> nrf_cloud_info: Protocol:          MQTT<ESC>[0m
(22:56:36.307) SLM: [00:00:00.688,659] <ESC>[0m<inf> nrf_cloud_info: Download protocol: HTTPS<ESC>[0m
(22:56:36.307) SLM: [00:00:00.688,690] <ESC>[0m<inf> nrf_cloud_info: Sec tag:           16842753<ESC>[0m
(22:56:36.307) SLM: [00:00:00.688,720] <ESC>[0m<inf> nrf_cloud_info: Host name:         mqtt.nrfcloud.com<ESC>[0m
(22:56:36.335) SLM: [00:00:00.688,812] <ESC>[0m<inf> slm_uart_handler: UART baud: 11520 d/p/s-bits: 3/0/1 HWFC: 0<ESC>[0m
(22:56:36.335) SLM: [00:00:00.688,964] <ESC>[0m<dbg> slm_at_host: slm_at_send_indicate:
(22:56:36.373) SLM:                                       52 65 61 64 79 0d 0a                          |Ready..          <ESC>[0m
(22:56:36.373) SLM: [00:00:00.688,964] <ESC>[0m<inf> slm_at_host: at_host init done<ESC>[0m[00:00:00.689,025] <ESC>[0m<dbg> slm: configure_power_pin_interrupt: Configured interrupt (0x3400000) on power pin (6).<ESC>[0m
(22:56:59.010) AT%XTEMP?

The setup AT commands from powerup are as follows:

(11:55:51.312) SLM: *** Booting nRF Connect SDK v2.8.0-a2386bfc8401 ***
(11:55:51.343) SLM: *** Using Zephyr OS v3.7.99-0bc3393fb112 ***
(11:55:51.343) SLM: [00:00:00.252,014] <ESC>[0m<inf> at_cmd_custom: Custom AT commands enabled with 70 entries.<ESC>[0m
(11:55:51.365) SLM: [00:00:00.252,044] <ESC>[0m<dbg> slm: main: RR: 0x00000000<ESC>[0m
(11:55:51.365) SLM: [00:00:00.257,843] <ESC>[0m<inf> fs_nvs: 2 Sectors of 4096 bytes<ESC>[0m
(11:55:51.365) SLM: [00:00:00.257,843] <ESC>[0m<inf> fs_nvs: alloc wra: 0, fe8<ESC>[0m
(11:55:51.365) SLM: [00:00:00.257,843] <ESC>[0m<inf> fs_nvs: data wra: 0, 0<ESC>[0m
(11:55:51.743) SLM: Ready
(11:55:51.743) SLM: [00:00:00.491,577] <ESC>[0m<inf> slm: lib_modem init: 0<ESC>[0m
(11:55:51.772) SLM: [00:00:00.491,668] <ESC>[0m<inf> mcuboot_util: Image index: 0, Swap type: none<ESC>[0m
(11:55:51.772) SLM: [00:00:00.491,668] <ESC>[0m<inf> slm: Serial LTE Modem<ESC>[0m
(11:55:51.800) SLM: [00:00:00.585,083] <ESC>[0m<inf> nrf_cloud_info: Device ID: 504d5632-3736-4649-808b-2321133ae3a6<ESC>[0m
(11:55:51.800) SLM: [00:00:00.590,332] <ESC>[0m<inf> nrf_cloud_info: IMEI:      350457793853522<ESC>[0m
(11:55:51.800) SLM: [00:00:00.682,617] <ESC>[0m<inf> nrf_cloud_info: UUID:      504d5632-3736-4649-808b-2321133ae3a6<ESC>[0m
(11:55:51.800) SLM: [00:00:00.688,354] <ESC>[0m<inf> nrf_cloud_info: Modem FW:  mfw_nrf9160_1.3.7<ESC>[0m
(11:55:51.832) SLM: [00:00:00.688,385] <ESC>[0m<inf> nrf_cloud_info: Protocol:          MQTT<ESC>[0m
(11:55:51.832) SLM: [00:00:00.688,415] <ESC>[0m<inf> nrf_cloud_info: Download protocol: HTTPS<ESC>[0m
(11:55:51.832) SLM: [00:00:00.688,446] <ESC>[0m<inf> nrf_cloud_info: Sec tag:           16842753<ESC>[0m
(11:55:51.832) SLM: [00:00:00.688,476] <ESC>[0m<inf> nrf_cloud_info: Host name:         mqtt.nrfcloud.com<ESC>[0m
(11:55:51.857) SLM: [00:00:00.688,568] <ESC>[0m<inf> slm_uart_handler: UART baud: 115200 d/p/s-bits: 3/0/1 HWFC: 0<ESC>[0m
(11:55:51.857) SLM: [00:00:00.688,720] <ESC>[0m<dbg> slm_at_host: slm_at_send_indicate:TX
(11:55:51.857) SLM:                                       52 65 61 64 79 0d 0a                          |Ready..          <ESC>[0m
(11:55:51.857) SLM: [00:00:00.688,720] <ESC>[0m<inf> slm_at_host: at_host init done<ESC>[0m[00:00:00.688,781] <ESC>[0m<dbg> slm: configure_power_pin_interrupt: Configured interrupt (0x3400000) on power pin (6).<ESC>[0m
(11:56:00.334) AT%MDMEV=1
(11:56:00.542) AT+CGDCONT=1,"IP","gigsky-02"
(11:56:00.734) AT%XSYSTEMMODE=1,1,0,0
(11:56:00.942) AT%XSIM=1
(11:56:01.134) AT+CFUN=1
.(11:56:01.342) AT+CGSN=1
(11:56:01.560) AT+CEMODE?
(11:56:01.743) +CGSN: "350457793853
(11:56:01.743) +CEMODE: 2
(11:56:01.743) AT+CMEE=1
(11:56:01.950) AT+CNEC=24
(11:56:02.142) AT+CIND=1,1,1
(11:56:02.351) %XSIM: 1
(11:56:02.351) AT+CEREG=5                         
(11:56:02.581) AT%CESQ=1
(11:56:02.750) SLM:        0d 0a 4f 4b
(11:56:02.750) AT+CSCON=1
(11:56:02.958) AT+COPS=0
(11:56:03.167) %MDMEV: SEARCH STATUS 2
(11:56:03.167) AT+CGEREP=1
(11:56:08.110) %CESQ: 39,1,10,1
(11:56:08.110) +CEREG: 2,"1812","0033751E",7
(11:56:08.110) %CESQ: 35,1,4,0
(11:56:08.398) +CSCON: 1
(11:56:10.382) +CGEV: ME PDN ACT 0,0
(11:56:10.477) +CNEC_ESM: 50,0
(11:56:10.510) +CEREG: 5,"1812","0033751E",7,,,"11100000","11100000"

The application is programmed using the basic SLM code, with a couple of config changes to output the debug to UART and disable the mx25r64 external flash in the overlay (no idea if I needed to, but I just have the NRF chip and an ATMEGA on my PCB).  The code is built in VSCode from the example application nrf9160dk_nrf9160_ns - no idea whether the NS version is correct, again I got it to compile so called it good.  The "SLM:" prefix in the logs above is to help me see what comes from the NRF and what is printed by the ATMEGA chip.

The application is running from a PSU rather than batteries - I don't think it will be a brown-out, but interested if there's a way to check?

(09:05:49.531) AT%HWVERSION
(09:05:49.738) %HWVERSION: nRF9160 SICA B1A
(09:05:49.738) AT%SHORTSWVER
(09:05:49.947) %SHORTSWVER: nrf9160_1.3.7

I don't have a Segger programmer/RTT, but could maybe justify one if it would help diagnose the issue.

Can you please suggest any likely causes and whether there are any config flags I could set to help diagnose the issue?

Thank you for any assistance.

  • Unsure if this is progress, but I have made a change to the prj options and have a different error.  I saw in another answer to try 

    CONFIG_RESET_ON_FATAL_ERROR=n
    CONFIG_THREAD_NAME=y

    I have started to see the below error (last 2 runs, after a couple of hours...)

    slm_uart_handler: UART_RX_RDY failure: -35

    (18:59:47.348) AT%XTEMP?
    (18:59:47.348) AT#XMQTTPUB="sensor/255/battery/V","11.59",0,0
    (18:59:47.348) CPU Temperature: 30 deg
    (18:59:47.348) AT#XMQTTPUB="sensor/255/cpu_temp/C","30",0,0
    (18:59:48.355) SLM: [01:50:07.503,997] <ESC>[0m<dbg> slm_at_host: cmd_send: RX
    (18:59:48.391) SLM:                                       41 54 25 58 54 45 4d 50  3f                      |AT%XTEMP ?       <ESC>[0m
    (18:59:48.391) SLM: [01:50:07.504,791] <ESC>[0m<dbg> slm_at_host: slm_at_send_indicate: TX
    (18:59:48.391) SLM:                                       0d 0a 25 58 54 45 4d 50  3a 20 33 30 0d 0a 0d 0a |..%XTEMP : 30....
    (18:59:48.391) SLM:                                       4f 4b 0d 0a                                      |OK..             <ESC>[0m
    (18:59:48.424) SLM: [01:50:07.504,852] <ESC>[0m<dbg> slm_at_host: cmd_send: RX
    (18:59:48.424) SLM:                                      41 54 23 58 4d 51 54 54  0 42 3d 22 73 65 6e |AT#XMQTT PUB="sen
    (18:59:48.424) SLM:                                       73 6f 72 2f 32 35 35 2f   61 74 74 65 72 79 2f |sor/255/ battery/
    (18:59:48.424) SLM:                                       56 22 2c 22 31 31 2e 35  22 2c 30 2c 30       |V","11.5 9",0,0  <ESC>[0m
    (19:00:17.219) AT%XTEMP?
    (19:00:17.219) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:00:47.106) AT%XTEMP?
    (19:00:47.106) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:01:16.993) AT%XTEMP?
    (19:01:16.993) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:01:46.863) AT%XTEMP?
    (19:01:46.863) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:02:16.750) AT%XTEMP?
    (19:02:16.750) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:02:17.774) SLM: [01:52:36.916,259] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 59<ESC>[0m
    (19:02:46.638) AT%XTEMP?
    (19:02:46.638) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:02:47.645) SLM: [01:53:06.796,569] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 59<ESC>[0m
    (19:03:16.508) AT%XTEMP?
    (19:03:16.508) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:03:17.532) SLM: [01:53:36.674,346] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 21<ESC>[0m
    (19:03:17.548) SLM: [01:53:36.674,407] <ESC>[1;33m<wrn> slm_uart_handler: Disabling UART RX: No event space.<ESC>[0m
    (19:03:17.564) SLM: [01:53:36.679,992] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 38<ESC>[0m
    (19:03:46.395) AT%XTEMP?
    (19:03:46.395) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:03:47.419) SLM: [01:54:06.564,178] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 59<ESC>[0m
    (19:04:16.282) AT%XTEMP?
    (19:04:16.282) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:04:17.306) SLM: [01:54:36.447,021] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 59<ESC>[0m
    (19:04:46.169) AT%XTEMP?
    (19:04:46.169) AT#XMQTTPUB="sensor/255/battery/V","11.56",0,0
    (19:04:47.193) SLM: [01:55:06.335,906] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 59<ESC>[0m
    (19:05:16.056) AT%XTEMP?
    (19:05:16.056) AT#XMQTTPUB="sensor/255/battery/V","11.59",0,0
    (19:05:17.064) SLM: [01:55:36.217,376] <ESC>[1;31m<err> slm_uart_handler: UART_RX_RDY failure: -35, dropped: 41<ESC>[0m
    (19:05:45.943) AT%XTEMP?
    (19:05:45.943) AT#XMQTTPUB="sensor/255/battery/V","11.66",0,0
    (19:06:15.814) AT%XTEMP?
    (19:06:15.830) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:06:45.701) AT%XTEMP?
    (19:06:45.701) AT#XMQTTPUB="sensor/255/battery/V","11.63",0,0
    (19:07:15.604) AT%XTEMP?
    (19:07:15.604) AT#XMQTTPUB="sensor/255/battery/V","11.56",0,0

    Is that progress... ?!  Any ideas? Thanks!

    [EDIT] Maybe not - I have seen the reboot again too - perhaps there are two issues!?  Disappointed

  • Hello,

    9160-DEV said:
    [EDIT] Maybe not - I have seen the reboot again too - perhaps there are two issues!? 

    It resets even after setting

    CONFIG_RESET_ON_FATAL_ERROR=n
    Have you checked the RESETREAS register?
  • Hello Hakon,

    I haven’t. Is it possible to check that via the SLM AT commands, or should I make changes to the SLM code to output the register on start?

    Many thanks 

  • I don't think there is an AT command for that.

  • Hi Hakon,

    I've not seen the reset again following the change to print out the RESETREAS register unfortunately.

    I have however seen the other issue lots - I think there's a problem in the SLM, or down to the way I'm using it.  It appears to get stuck, perhaps there's a timeout somewhere I could change?

    The general pattern is to request the temperature, then send an MQTT publish, for example;

    (02:06:45.772) AT%XTEMP?
    (02:06:45.806) CPU Temperature: 23 deg
    (02:06:45.806) OK
    (02:06:47.772) AT#XMQTTPUB="sensor/255/cpu_temp/C","23",0,0
    (02:06:47.794) OK

    Then after some time, there is no response to the AT#MQTTPUB command.  The reason for UART_RX_RDY failure appears to be that the buffer is full because I had continued to send AT commands - so I have changed the code to wait for OK/ERROR response instead.

    Now after some time waiting for OK, there is a heap overflow where the CESQ isn't output by the SLM but is stored internally;

    (02:44:17.964) SLM: [04:06:22.879,882] <ESC>[1;33m<wrn> at_monitor: No heap space for incoming notification: %CESQ: 28,1,11,1
    (02:44:28.236) SLM: [04:06:33.144,195] <ESC>[1;33m<wrn> at_monitor: No heap space for incoming notification: %CESQ: 29,1,3,0
    (02:44:38.860) SLM: [04:06:43.771,484] <ESC>[1;33m<wrn> at_monitor: No heap space for incoming notification: %CESQ: 29,1,13,1
    ...

    Then, after some hours (of nothing being output rather than the heap error), the OK/ERROR finally arrives and notifications start coming again;

    (05:56:02.744) #XMQTTEVT: 1,-113
    (05:56:02.788) ERROR
    (05:56:02.788) ERROR
    (05:56:02.788) %XTEMP: 21
    (05:56:02.788) CPU Temperature: 21 deg
    (05:56:02.788) OK
    (05:56:02.788) %CESQ: 28,1,7,1
    

    I've looked at the SLM code and can't see an obvious reason for the pause... but then I don't really understand it.  Also how can I get to see the MQTT error output via UART (eg "NET_ERR"), which might help?

    Thanks

Related