This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

nRF9160 goes to sleep and becomes stuck in idle task

Hello,

I am using the Nordic SDK at v1.5.0, to create an application that runs on the Nordic nRF9160. 

In the typical use case scenario, my application is mains powered and once per day is contacted by our remote TCP client, for a readout or sensor data.

However I noticed that when left unattended, after about 1h30 ~2h it appears to go into some sort of power saving (or sleep ?) mode, enters the Zephyr idle task and remains there.

It is also not reachable by our remote client.

I have disabled all power saving options for the modem part, so I do not understand why or how this could be occuring. As far as I can tell, there are no power related registers on the user CPU of the nRF9160 that can be configured to prevent this sleep mode.

So my question is, how can I prevent the application from going into power saving (or sleep) mode ? 

Thanks in advance,

 Nelson Gonçalves

Parents
  • Hello,

    do you have code that I can use to reproduce the issue? It doesn't need to be the original code, you can just include the necessary code to reproduce the same behavior/bug.

  • Hello Hakon,

    Sorry for the late reply, I was busy with other projects. To reproduce this issue I use this code 6472.tcp_listen.zip which starts a TCP server and listens for remote client connections. You might need to adapt the APN to which the modem connects, because I am using a different carrier the one use by the Nordic SDK board. In the zip file there is also a Tcl script that implements my remote client

    Using the Segger Embeded Studio, I build, flashed and started the application. Then I started my remote client, which sent the same sentence 10 times to the server on the nRF9160. And the server would reply with the same sentence in upper case.

    Then I waited a little over an hour, and tried again. According to the Wireshark capture that I also uploaded, the TCP server accepted the connection, and it read the data sent by the client. The server then sends an ACK to the client, but does not push any data.

    On the debugger, the application appears to be stuck in some sort of low power, inside the idle task. My guess is that the TCP traffic at 9:53:38 is being handled by the modem, but for some reason the application the TCP events (new client, read data) are not given to the application.

    Here is the Wireshark capture in_low_power_mode_capture.pcapng

    and also a screenshot of the debugger, showing the location in the code where it is stuck   

    I am not sure why this is occuring. Maybe I need to setup a handler for modem events or there is a specific IRQ that is raised by the modem, but I cannot find any information on it.

    Also, there does not seem to be any information about disabling the low power mode for the user CPU of the nRF9160.

    [EDIT] I forgot to mention the software versions I am using, they are:

    • the NCS v1.5.0 for the application
    • the nRF9160 modem has firmware version mfw_nrf9160_1.2.3

    Also, this issue might be related to this one wich I am also having: devzone.nordicsemi.com/.../how-to-detect-that-the-peer-closed-a-tcp-socket

Reply
  • Hello Hakon,

    Sorry for the late reply, I was busy with other projects. To reproduce this issue I use this code 6472.tcp_listen.zip which starts a TCP server and listens for remote client connections. You might need to adapt the APN to which the modem connects, because I am using a different carrier the one use by the Nordic SDK board. In the zip file there is also a Tcl script that implements my remote client

    Using the Segger Embeded Studio, I build, flashed and started the application. Then I started my remote client, which sent the same sentence 10 times to the server on the nRF9160. And the server would reply with the same sentence in upper case.

    Then I waited a little over an hour, and tried again. According to the Wireshark capture that I also uploaded, the TCP server accepted the connection, and it read the data sent by the client. The server then sends an ACK to the client, but does not push any data.

    On the debugger, the application appears to be stuck in some sort of low power, inside the idle task. My guess is that the TCP traffic at 9:53:38 is being handled by the modem, but for some reason the application the TCP events (new client, read data) are not given to the application.

    Here is the Wireshark capture in_low_power_mode_capture.pcapng

    and also a screenshot of the debugger, showing the location in the code where it is stuck   

    I am not sure why this is occuring. Maybe I need to setup a handler for modem events or there is a specific IRQ that is raised by the modem, but I cannot find any information on it.

    Also, there does not seem to be any information about disabling the low power mode for the user CPU of the nRF9160.

    [EDIT] I forgot to mention the software versions I am using, they are:

    • the NCS v1.5.0 for the application
    • the nRF9160 modem has firmware version mfw_nrf9160_1.2.3

    Also, this issue might be related to this one wich I am also having: devzone.nordicsemi.com/.../how-to-detect-that-the-peer-closed-a-tcp-socket

Children
  • As recomended in my other issue, I updated the modem firmware verion 1.3.0. This issue (nRF9160 goes to sleep and becomes stuck in idle task) still occurs in version 1.3.0 but the behaviour now is different. The client cannot open the TCP connection:

    - the client IP is 192.168.178.28

    - the modem IP is 178.50.180.135

    - at 15:22:25 the client attempts to connect again, last connection was at 14:32:12

    there is no reply from the modem to the client TCP SYN packet. The client tries again a few minutes latter, again no reply.

    in_low_power_mode_capture_modem_fw_1.3.0_issue.pcapng

    I am not sure if this is relevant to the issue, the modem hardware is "nRF9160 SICA B0A" (so revision 1). I am aware that the modem firmware 1.3.0 is only for testing and developing on this revision, but at the moment I have no other hardware on which to test.

  • Hi,

    Håkon is currently on vacation, so I have taken over this ticket.

    If you are able to take a modem trace, you can now decode them to a .pcap yourself with the Trace Collector v2 preview. That way, we can see both sides of the communication.

    Does your device have a static IP address?

    Often, cellular devices are behind a NAT layer, which makes it hard for other devices to initiate communication with them if the NAT rules has changed. This is more of a problem on UDP than TCP, but 1,5 hours could be long enough that the NAT rules has changed. If that is the problem, the network will no longer be able to route the messages to the device.

    Best regards,

    Didrik

  • Hi Didrik,

    Our device does not have a static IP, but I don't think it is a network issue.

    For the modem firmware version 1.2.3, the modem replied to the client TCP request (the TCP SYN gets an ACK from the modem). 

    For version 1.3.0, the modem did not replied. In both cases, it is the same mobile network and same dynamic IP range (modem has 178.50.*.*) only the modem firmware version has changed.

    I will try to see if it is possible to collect traces from the modem. Since I am using our prototype hardware, it might not be trivial to collect the traces.

    BR,

     Nelson

  • Hello Didrik,

    I did all the steps in order to prepare for collecting modem traces described in the link you gave.

    For info, this is my setup:

    - the hardware is the nRF9160 dev board, PCA10090 v1.0.0

    - the modem hardware revision is: nRF9160 SICA B1A

    - the modem firmware version is: mfw_nrf9160_1.3.0

    - I am using the Nordic SDK v1.5.0

    - the board controller nRF52 chip was flashed to the latest version as indicated in the setup guide for collecting traces

    - I am using the iBasis SIM card

    I flashed the application, started the trace collector. Because the IP address is in a private network, I cannot connect to the TCP listening server from my PC. So I just left the device idle, listening for incoming connections.

    At about 50 minutes after startup, the application is constantly notified that there is a new client connection, and when it tried to accept it if fails it the error (9) "Bad Descriptor".

    I am attaching the output of my application, the traces collected from the modem and the application code running on the board. It is almost identical to the one I posted originally, but with some debug information.

    In these traces there is no client connecting to the application, which is not what you originally had asked for. But for the moment I cannot do it with the dev board, and there is the weird behaviour at 50 mins after startup. I noticed this happening twice, so I believe it can be relevant for the issue.

    Best regards,

    Nelson Gonçalves

    modem_and_app_traces.zip

    3652.tcp_listen.zip

  • I've taken a look at your trace, and I believe I have an idea about what happens. That said, there are some details I am not quite sure of, so I'll ask our modem team to have a look at it as well.

    While we wait for their analysis, here is what I believe is happening:

    After the device has run through the initial startup, there isn't much happening for the first ~50 minutes. Then, the device tries to perform a tracking are update. However, the network rejects it, thereby disconnecting the device from the network. The reason given by the network for the rejection is "UE identity cannot be derived by the network (9)".

    As the device is now not connected to the network, the modem closes the socket. This is why you get a POLLHUP.

    The POLLIN looks like a bug to me. I can't see any signs of an incoming connection in the trace. There has been several improvements to the modem library since NCS 1.5.0. Could you try to upgrade to 1.7.0 and see if you still get the POLLINs?

    Edit: Also, looking at your application a bit more, you never check for POLLHUP or POLLERR in the modem socket, so when the socket is closed, the application keeps polling it. This would explain why you get a "bad file descriptor" when you try to read the non-existing client socket.

Related