This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

poll() not blocking

Hello,

I am using the Nordic SDK v1.5.0, the modem firmware is mfw_nrf9160_1.2.3 and the DK for the nRF9160

In our application, we are using AT commands to setup and control the modem. On startup, the modem PSM and eDRX

features are switched off to prevent a disconnection due to power saving features. Then we open a TCP listening socket and

wait, on a separate thread, for remote clients to connect. This thread also reads data from connected clients, and uses poll() to

block and wait for events, as follows:

struct pollfd pollSet[SOCKET_ID_MAX]; 

pollSet[SOCKET_ID_CLIENT].fd = Sockets[SOCKET_ID_CLIENT];
pollSet[SOCKET_ID_CLIENT].events = POLLIN;

pollSet[SOCKET_ID_SERVER].fd = Sockets[SOCKET_ID_SERVER];
pollSet[SOCKET_ID_SERVER].events = POLLIN;

// should block forever, until a client connects or a 
// connected client has sent data
if (0 < poll(pollSet,SOCKET_ID_MAX,-1))
{
    // sockets handling code
}

Initially only the listening socket is open. However the poll() call never blocks, instead  it returns immediately

with the server socket revents field set to zero ( = no events). I cannot understand why, it should block for

an wait incomming client connection.

For reference, these are the configuration I am using:

# Zephyr OS Network
CONFIG_NETWORKING=y
CONFIG_NET_NATIVE=n
CONFIG_NET_SOCKETS=y
CONFIG_NET_SOCKETS_OFFLOAD=y

# Nordic Libraries
CONFIG_NRF_MODEM_LIB=y
CONFIG_LTE_LINK_CONTROL=n
CONFIG_LTE_AUTO_INIT_AND_CONNECT=n

Parents
  • Hi,

     

    I would recommend that you store the return from poll() and print errno if ret < 0.

    Could you check this and come back with the errno?

    SOCKET_ID_MAX, what is this defined to? If this is larger than 2, you will pass uninitialized memory to poll().

     

    Kind regards,

    Håkon

  • Hello Hakon,

    The poll() call returns 1, meaning one of the sockets has events. But after the call, for both sockets the fields revents is zero.

    SOCKET_ID_MAX is 2. I did not post the full code, but I believe the arguments to the call are correct.

    With the debugger, I followed the poll() call all the way down to nrf91_socket_offload_poll() in file nrf\lib\nrf_modem_lib\nrf91_sockets.c

    This function looks correct, but the actual implementation is done in nrf_poll() for which I do not have the source code. This call returns 1, although all revents fields are all zero.

    My guess is that the modem firmware is not properly initialized, although I don't know why or where.  The modem is registered on the network, there is an active PDP context and the modem has an IP address.

    Kind regards,

      Nelson

  • Hi,

     

    NelsonGoncalves said:
    My guess is that the modem firmware is not properly initialized, although I don't know why or where.  The modem is registered on the network, there is an active PDP context and the modem has an IP address.

    If you are connected to the network, I do not see how anything can not be initialized. 

    NelsonGoncalves said:
    This function looks correct, but the actual implementation is done in nrf_poll() for which I do not have the source code. This call returns 1, although all revents fields are all zero.

    You have verified that the return path is from nrf91_sockets.c::nrf91_socket_offload_poll->nrf_poll()? the offloaded function also holds a translated tmp array, could you also peek into this when it misbehaves?

    https://github.com/nrfconnect/sdk-nrf/blob/master/lib/nrf_modem_lib/nrf91_sockets.c#L1008

     

    Kind regards,

    Håkon

  • Hi Hakon,

    The issue was indeed at the nrf_poll() call.

    In the original poll() call, the array of file descriptors has two sockets: client and server. However initially, only the server socket is open and the client socket number is -1.

    So eventually, nrf_poll() gets called with two file descriptors (one valid, another invalid). I was expecting the negative file descriptor to be ignored (which I understand is the standard POSIX behavior  https://man7.org/linux/man-pages/man2/poll.2.html).

    However it does not, and instead it sets POLLNVAL on the revents for the client socket. So technically, there is an event (invalid file descriptor), although the documentation for poll() says that negative file descriptors are ignored.

    Then the function  nrf91_socket_offload_poll() goes through the temp array returned by nrf_poll, to copy the events to the original function array argument. However it ignores negative file descriptors, but does not decrement the number of sockets with events. So the result is a return value of 1 (one socket has events), but all revents fields are zero.

    Long story short, if I call poll() with only the currently opened sockets, everything works are expected. I believe this is a deviation from the expected POSIX behavior. 

    Kind regards,

     Nelson Gonçalves

  • Hi,

     

    I tried reproducing this by modifying mqtt_simple to use an fd array instead, and setting fds[1].fd = -1, but I wasn't able, unfortunately. Could you provide the contents of "pollSet" variable when you were able to reproduce this?

     

    My deepest apologies. This is indeed a bug. I was not on the same libmodem version as you were.

    This seems to be fixed in libmodem v1.1.0:

    https://github.com/nrfconnect/sdk-nrfxlib/blob/master/nrf_modem/doc/CHANGELOG.rst#nrf_modem-110

    Could you also verify that this fixes the issue on your side?

     

    Kind regards,

    Håkon

  • Hey,

    I was able to sidestep this bug by only passing the currently open sockets to poll(), so this is no longer an issue for us.

    Right now it is not opportune for me upgrade the modem firmware, I will postpone that to the end of our sprint.

    Thanks for help,

     Nelson

Reply Children
No Data
Related