Bug report: Non-blocking send fails with EWOULDBLOCK even though poll returned POLLOUT

Sample code for nRF9160:

#include <zephyr/kernel.h>
#include <stdio.h>
#include <modem/lte_lc.h>
#include <zephyr/net/socket.h>
#include <fcntl.h>

void my_assert(bool b) {
    if (!b) {
        printk("failed %d\r\n", errno);
        exit(1);
    }
}

void main(void)
{
	int err;

	printk("Waiting for network.. ");
	err = lte_lc_init_and_connect();
	if (err) {
		printk("Failed to connect to the LTE network, err %d\n", err);
		return;
	}

	int sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sk < 0) {
		printk("Failed to create socket\n");
		return;
	}
	struct sockaddr_in sa;
	sa.sin_family = AF_INET;
	sa.sin_port = htons(9632);
	sa.sin_addr.s_addr = ...; // FILL IN IP ADDRESS

	if (connect(sk, (const struct sockaddr *)&sa, sizeof(sa)) < 0) {
		printk("Failed to connect: %d\n", errno);
		return;
	}
	printk("Connected\n");

    err = fcntl(sk, F_SETFL, O_NONBLOCK);
    my_assert(!err);

    printk("Connected to server\r\n");
    for (int i = 0; i < 20; i++) {
        struct pollfd pfd = {sk, POLLIN | POLLOUT, 0};
        int res = poll(&pfd, 1, -1);
        my_assert(res == 1);
        printk("Poll result: 0x%02x\r\n", pfd.revents);
        if (pfd.revents & POLLOUT) {
            char buf[1] = {'A' + i};
            err = send(sk, buf, 1, 0);
            if (err == -1) {
                printk("Send failed with errno %d\n", errno);
            } else {
                printk("send done\r\n");
            }
        }
    }
    printk("Done\r\n");
	close(sk);
}

Sample code for the server (Node.js):

const net = require('net');

const server = net.createServer((c) => {
    console.log("connected");
    c.on('data', (d) => {
        console.log(d.toString('utf8'));
    });
    c.on('error', (e) => {
        console.log(e);
    });
    c.on('end', () => {
        console.log('end');
    });
    c.on('close', () => {
        console.log('close');
    });
});

server.listen(9632, () => {
    console.log('bound');
});

Output nRF9160:

*** Booting Zephyr OS build v3.2.99-ncs2 ***
Waiting for network.. Connected
Connected to server
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
Send failed with errno 11
Poll result: 0x04
send done
Poll result: 0x04
send done
Poll result: 0x04
send done
Done

Output server:

bound
connected
A
BCDEFGHIJKLMNOP
RST
end
close

This error is being reproduced with 100% probability for me in the exact same way.

In the output from nRF9160, we can see that send returns -1 with errno set to EWOULDBLOCK in the 17th call. This is in violation with the POSIX contract, since the poll function returned POLLOUT for the socket, indicating that it is possible to put more data in the socket's send buffer. The expected outcome is that the poll function should block and exit with POLLOUT first when there is at least 1 byte free in the socket's send buffer, so that a send call will succeed.

We see that Q is missing in the output from the server, indicating that the send call indeed failed.

nRF Connect SDK version: 2.3.0.

Modem firmware: 1.3.4.

Mobile operator: Telenor SE.

Parents
  • Hi Emil,

     

    This is what man poll states (linux tends to be a bit more verbose compared to posix):

           POLLOUT
                  Writing  is  now  possible,  though  a write larger than the available space in a socket or pipe will
                  still block (unless O_NONBLOCK is set).

    poll() cannot effectively state anything about the amount of bytes still available, meaning that it can still give a errno if one start sending larger packets.

    However; it should be able to query if there's 1 byte available in the underlying buffer. Thank you for helping us improve! I'll report this back to the libmodem team.

     

    Kind regards,

    Håkon

  • Exactly, if POLLOUT is returned, I should be able to put at least 1 byte in the send queue by calling `send`.

    The POSIX idea is that for (at least with) non-blocking tcp sockets, if POLLOUT is set and I try to send more data that can currently fit in the send buffer, then the implementation shall accept as many bytes that fits (at least 1) and return this amount. So if I try to write 1000 bytes but only 12 bytes fit, the first 12 bytes should be sent, the rest should be ignored, and 12 be returned.

Reply
  • Exactly, if POLLOUT is returned, I should be able to put at least 1 byte in the send queue by calling `send`.

    The POSIX idea is that for (at least with) non-blocking tcp sockets, if POLLOUT is set and I try to send more data that can currently fit in the send buffer, then the implementation shall accept as many bytes that fits (at least 1) and return this amount. So if I try to write 1000 bytes but only 12 bytes fit, the first 12 bytes should be sent, the rest should be ignored, and 12 be returned.

Children
No Data
Related