Bug report: Non-blocking send fails with EWOULDBLOCK even though poll returned POLLOUT

Sample code for nRF9160:

#include <zephyr/kernel.h>
#include <stdio.h>
#include <modem/lte_lc.h>
#include <zephyr/net/socket.h>
#include <fcntl.h>

void my_assert(bool b) {
    if (!b) {
        printk("failed %d\r\n", errno);
        exit(1);
    }
}

void main(void)
{
	int err;

	printk("Waiting for network.. ");
	err = lte_lc_init_and_connect();
	if (err) {
		printk("Failed to connect to the LTE network, err %d\n", err);
		return;
	}

	int sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sk < 0) {
		printk("Failed to create socket\n");
		return;
	}
	struct sockaddr_in sa;
	sa.sin_family = AF_INET;
	sa.sin_port = htons(9632);
	sa.sin_addr.s_addr = ...; // FILL IN IP ADDRESS

	if (connect(sk, (const struct sockaddr *)&sa, sizeof(sa)) < 0) {
		printk("Failed to connect: %d\n", errno);
		return;
	}
	printk("Connected\n");

    err = fcntl(sk, F_SETFL, O_NONBLOCK);
    my_assert(!err);

    printk("Connected to server\r\n");
    for (int i = 0; i < 20; i++) {
        struct pollfd pfd = {sk, POLLIN | POLLOUT, 0};
        int res = poll(&pfd, 1, -1);
        my_assert(res == 1);
        printk("Poll result: 0x%02x\r\n", pfd.revents);
        if (pfd.revents & POLLOUT) {
            char buf[1] = {'A' + i};
            err = send(sk, buf, 1, 0);
            if (err == -1) {
                printk("Send failed with errno %d\n", errno);
            } else {
                printk("send done\r\n");
            }
        }
    }
    printk("Done\r\n");
	close(sk);
}

Sample code for the server (Node.js):

const net = require('net');

const server = net.createServer((c) => {
    console.log("connected");
    c.on('data', (d) => {
        console.log(d.toString('utf8'));
    });
    c.on('error', (e) => {
        console.log(e);
    });
    c.on('end', () => {
        console.log('end');
    });
    c.on('close', () => {
        console.log('close');
    });
});

server.listen(9632, () => {
    console.log('bound');
});

Output nRF9160:

*** Booting Zephyr OS build v3.2.99-ncs2 ***
Waiting for network.. Connected
Connected to server
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
Send failed with errno 11
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Done

Output server:

bound
connected
A
BCDEFGHIJKLMNOP
RST
end
close

This error is being reproduced with 100% probability for me in the exact same way.

In the output from nRF9160, we can see that send returns -1 with errno set to EWOULDBLOCK in the 17th call. This is in violation with the POSIX contract, since the poll function returned POLLOUT for the socket, indicating that it is possible to put more data in the socket's send buffer. The expected outcome is that the poll function should block and exit with POLLOUT first when there is at least 1 byte free in the socket's send buffer, so that a send call will succeed.

We see that Q is missing in the output from the server, indicating that the send call indeed failed.

nRF Connect SDK version: 2.3.0.

Modem firmware: 1.3.4.

Mobile operator: Telenor SE.

Parents
  • But also note that even when the socket reports POLLOUT and you get EAGAIN from send(), you are not effectively busy-looping so you can poll() in a loop and just retry on EAGAIN.

  • You mean that poll() the next time will actually correctly block, so that we avoid poll-send busy loop?

  • That is at least my understanding of that idea. Using poll to check, what is possible next, reading received data or send more data.

    As I wrote, I didn't try it on my own. I used CoAP/DTLS 1.2 CID in 99.9% with request/response pattern so I send one message and wait to receive the response. For me poll with receive is pretty well.

    Other had the idea to use UDP with a bulk of messages. If that is a good idea at all, is a different question, but if such messages then get "swallowed silently" by the modem, then this has no chance to work.

  • In this specific case the modem is enabling flow control, which is a mechanism by which it informs the application processor that it has ran out of buffers for send().

    POLLOUT does reflect the state of the modem's buffers using flow control, but that does not work "pre-emptively". Only _after_ the buffers are actually full (and send() has returned EAGAIN here), will flow control be enabled.

    So, the next time you call poll() that socket will have POLLOUT only if the modem has memory for the next send() operation. This is visible in your output, where after receiving EAGAIN from send() the socket has POLLOUT the next time you poll(), and send() succeeds.

    Note that the availability of buffers on the application core is not reflected by POLLOUT. So it could happen that POLLOUT is reported, modem has flow control disabled, but send() can't acquire any buffer and it returns EAGAIN. Even in that case, send() will not busy-loop. There are mechanism in place to specifically prevent that.

Reply
  • In this specific case the modem is enabling flow control, which is a mechanism by which it informs the application processor that it has ran out of buffers for send().

    POLLOUT does reflect the state of the modem's buffers using flow control, but that does not work "pre-emptively". Only _after_ the buffers are actually full (and send() has returned EAGAIN here), will flow control be enabled.

    So, the next time you call poll() that socket will have POLLOUT only if the modem has memory for the next send() operation. This is visible in your output, where after receiving EAGAIN from send() the socket has POLLOUT the next time you poll(), and send() succeeds.

    Note that the availability of buffers on the application core is not reflected by POLLOUT. So it could happen that POLLOUT is reported, modem has flow control disabled, but send() can't acquire any buffer and it returns EAGAIN. Even in that case, send() will not busy-loop. There are mechanism in place to specifically prevent that.

Children
Related