This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 infinite nrf_connect blocking behavior post LTE PSM wakeup

Hi,

I've noticed several times now when the NRF9160 comes out of a PSM sleep cycle that when a call to an underlying socket connect(), or nrf_getaddrinfo() call (whether it be from a socket connect at the app level, or the connect() within sntp_init) that the underlying nrf_connect/nrf_getaddrinfo blocks forever.

So my first question is

- Why would the nrf_connect/nrf_getaddrinfo block indefinitely, even if there is a strange state issue with DNS or other with the provider iBasis or otherwise, a failure timeout should occur.

- Can either of those functions within nrf91_sockets be non-blocking enabled with O_NONBLOCKING? This is assuming there isn't an internal K_FOREVER lock state within the modem firmware which would lock it all up regardless.

I'm using the latest modem firmware and V1.0.0 tag

Thanks!

Parents
  • Hi,

     

    - Can either of those functions within nrf91_sockets be non-blocking enabled with O_NONBLOCKING? This is assuming there isn't an internal K_FOREVER lock state within the modem firmware which would lock it all up regardless.

    You can set the timeout, or use fcntl() to enable flag O_NONBLOCK on a socket.

     

    - Why would the nrf_connect/nrf_getaddrinfo block indefinitely, even if there is a strange state issue with DNS or other with the provider iBasis or otherwise, a failure timeout should occur.

    Does it block, or does the program hang? Do you ever get out of this state, or do you need to reset in order to recover?

    Do you have a log output showing a failure, or a debug trace?

     

    Kind regards,

    Håkon

  • Hi Håkon!

    I suspect that if nrf_getaddrinfo also occasionally blocks then the O_NONBLOCK approach might not work, but I'll give it a try.

    The program hasn't fully hung as a worker that blinks LEDs continues running.

    In terms of log output I've added additional printk's within the BSD sockets code, right before nrf_getaddrinfo/connect which display fine over RTT, so everything stops within nrf_xyz.

    Thanks!

  • Hi,

    I tried the O_NONBLOCK approach:

    fcntl (sockno, F_SETFL, opt | O_NONBLOCK)

    if ((res = connect (sockno, addr, addrlen)) < 0)
    {
    if (errno == EINPROGRESS)
    {
    fd_set wait_set;

    // make file descriptor set with socket
    FD_ZERO (&wait_set);
    FD_SET (sockno, &wait_set);

    // wait for socket to be writable; return after given timeout
    res = select (sockno + 1, NULL, &wait_set, NULL, timeout);
    }
    }

    when the select is called the Kernel faults with a panic

    ASSERTION FAIL [num_events > 0] @ C:/Nordic/NRF9160-100/ncs/zephyr/kernel/poll.c:195
    zero events

    ***** Kernel Panic! *****
    Current thread ID = 0x20025c90
    Faulting instruction address = 0x5e0bc

    I may just watchdog the whole application for a clean reboot which will probably be better, as I'd end up restarting LTE anyways.

    Thx

  • Hi,

     

    GJSea said:
    res = select (sockno + 1, NULL, &wait_set, NULL, timeout);

    This will not resolve to nrf_select, as this isn't offloaded through the socket offloading API:

    https://github.com/NordicPlayground/fw-nrfconnect-zephyr/blob/master/include/net/socket_offload_ops.h#L36

    You can use nrf_select directly if you include nrf_socket.h in your source file, but you could use poll() (which is offloaded and will translate to nrf_poll) in this situation and see if that works better?

     

    Kind regards,

    Håkon

Reply Children
  • Thanks! I'll give that a try.

    Somewhat related is there away to determine when the system/modem has re-established a connection to the network/provider when it comes out of sleep, or a reasonable number of seconds period after waking, or is it assumed the connection is always valid (DNS, etc) once the initial registration after bootup to the network provider has occurred?

  • GJSea said:
    Somewhat related is there away to determine when the system/modem has re-established a connection to the network/provider when it comes out of sleep, or a reasonable number of seconds period after waking, or is it assumed the connection is always valid (DNS, etc) once the initial registration after bootup to the network provider has occurred?

    This behavior heavily depends on what mode you're currently in. Since you mention sleep, the most common one is PSM (power saving mode).

    If you are in PSM, its the application that triggers a "wakeup" by issuing a socket operation (send/recv/getaddrinfo etc), then the modem starts communicating with the cell tower again, and you perform the operations that is specified, before eventually going back to PSM mode.

     

    Kind regards,

    Håkon

  • Hi Håkon,

    Is there a way to turn on more Modem Firmware tracing to understand why something like getaddrinfo would either block or return -10 on a well known, always-up endpoint such as microsoft.com. I'm trying to understand if it is between app/modem firmware and the network provider, or between network provider and internet. The device becomes connectionless from 2 to 40+ sleep cycles where only a reboot fixes the situation, the first connection after reboot works 100% of the time.

    Thanks

  • Hi,

     

    -10 resolves to ECHILD (defined in errno.h), and it sounds like the device is having issues reconnecting to the cell tower.

    Have you checked what the reported RSRP (received strength) is in your environment? If you run the at_client, and connect, you should get this by running "AT+CESQ"

    The last number is then the RSRP, given in units from -140 dBm, so if you get "60" back, it indicates -80 dBm signal strength.

     

    If the signal strength is good (better than -100 dBm), I'd recommend that you enable modem traces, as described in this blog post:

    https://devzone.nordicsemi.com/nordic/nordic-blog/b/blog/posts/how-to-get-modem-trace-using-trace-collector-in-nrf-connect

     

    Note the AT command "AT%XMODEMTRACE=1,2" needs to be enabled prior to tracing. The .bin file should be several hundred kB (in a matter of ~30 secs) when its working.

     

    Kind regards,

    Håkon

Related