Bug report: Non-blocking send fails with EWOULDBLOCK even though poll returned POLLOUT

Sample code for nRF9160:

#include <zephyr/kernel.h>
#include <stdio.h>
#include <modem/lte_lc.h>
#include <zephyr/net/socket.h>
#include <fcntl.h>

void my_assert(bool b) {
    if (!b) {
        printk("failed %d\r\n", errno);
        exit(1);
    }
}

void main(void)
{
	int err;

	printk("Waiting for network.. ");
	err = lte_lc_init_and_connect();
	if (err) {
		printk("Failed to connect to the LTE network, err %d\n", err);
		return;
	}

	int sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sk < 0) {
		printk("Failed to create socket\n");
		return;
	}
	struct sockaddr_in sa;
	sa.sin_family = AF_INET;
	sa.sin_port = htons(9632);
	sa.sin_addr.s_addr = ...; // FILL IN IP ADDRESS

	if (connect(sk, (const struct sockaddr *)&sa, sizeof(sa)) < 0) {
		printk("Failed to connect: %d\n", errno);
		return;
	}
	printk("Connected\n");

    err = fcntl(sk, F_SETFL, O_NONBLOCK);
    my_assert(!err);

    printk("Connected to server\r\n");
    for (int i = 0; i < 20; i++) {
        struct pollfd pfd = {sk, POLLIN | POLLOUT, 0};
        int res = poll(&pfd, 1, -1);
        my_assert(res == 1);
        printk("Poll result: 0x%02x\r\n", pfd.revents);
        if (pfd.revents & POLLOUT) {
            char buf[1] = {'A' + i};
            err = send(sk, buf, 1, 0);
            if (err == -1) {
                printk("Send failed with errno %d\n", errno);
            } else {
                printk("send done\r\n");
            }
        }
    }
    printk("Done\r\n");
	close(sk);
}

Sample code for the server (Node.js):

const net = require('net');

const server = net.createServer((c) => {
    console.log("connected");
    c.on('data', (d) => {
        console.log(d.toString('utf8'));
    });
    c.on('error', (e) => {
        console.log(e);
    });
    c.on('end', () => {
        console.log('end');
    });
    c.on('close', () => {
        console.log('close');
    });
});

server.listen(9632, () => {
    console.log('bound');
});

Output nRF9160:

*** Booting Zephyr OS build v3.2.99-ncs2 ***
Waiting for network.. Connected
Connected to server
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
Send failed with errno 11
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Done

Output server:

bound
connected
A
BCDEFGHIJKLMNOP
RST
end
close

This error is being reproduced with 100% probability for me in the exact same way.

In the output from nRF9160, we can see that send returns -1 with errno set to EWOULDBLOCK in the 17th call. This is in violation with the POSIX contract, since the poll function returned POLLOUT for the socket, indicating that it is possible to put more data in the socket's send buffer. The expected outcome is that the poll function should block and exit with POLLOUT first when there is at least 1 byte free in the socket's send buffer, so that a send call will succeed.

We see that Q is missing in the output from the server, indicating that the send call indeed failed.

nRF Connect SDK version: 2.3.0.

Modem firmware: 1.3.4.

Mobile operator: Telenor SE.

Parents
  • I agree that is how it should work, send() should be able to send at least 1 byte if POLLOUT is set. Unfortunately I don't think it will be possible to have it work like that.

    There are too many pools which we allocate from, and poll() can't keep track of all the bytes available in all these pools across two different cores. Even if we did there is no guarantee that once poll() has returned those resources are still available for your thread.

    Note this is currently a limitation in Zephyr's TCP/IP stack as well.

Reply
  • I agree that is how it should work, send() should be able to send at least 1 byte if POLLOUT is set. Unfortunately I don't think it will be possible to have it work like that.

    There are too many pools which we allocate from, and poll() can't keep track of all the bytes available in all these pools across two different cores. Even if we did there is no guarantee that once poll() has returned those resources are still available for your thread.

    Note this is currently a limitation in Zephyr's TCP/IP stack as well.

Children
Related