Network stack not recovering form Filled Buffers.

The network stack from Zephyr is not working correctly on my nrf5340. I want to receive data from a custom driver through the network stack to my application. For this, I request network packets to be allocated and fill them with data.

Using the packet manager from: docs.nordicsemi.com/.../net_pkt.html, I was able to do this successfully. This is the code of the driver adding messages to the network stack:

	pkt = net_pkt_rx_alloc_with_buffer(ctx->iface, rx_len, AF_UNSPEC, 0, K_MSEC(100));
	if (!pkt)
	{
		LOG_ERR("Message NMR: %lld, ERROR: Failed to allocate RX buffer. Wanted to allocate: %d Bytes", MessageCouter, rx_len);
		eth_stats_update_errors_rx(ctx->iface);
		w6100_command(dev, W6100_RV_SR_CR_RECV);
		return;
	}

	read_len = rx_len;
	reader = off + 2;

	do
	{
		size_t frag_len;
		uint8_t *data_ptr;
		size_t frame_len;

		data_ptr = pkt_buf->data;

		frag_len = net_buf_tailroom(pkt_buf);

		if (read_len > frag_len)
		{
			frame_len = frag_len;
		}
		else
		{
			frame_len = read_len;
		}

		w6100_readbuf(dev, reader, data_ptr, frame_len);
		net_buf_add(pkt_buf, frame_len);
		reader += frame_len;

		read_len -= frame_len;

		if (pkt_buf->frags == NULL && read_len > 0 && read_len >= frame_len)
		{
			LOG_ERR("Message NMR: %lld, ERROR: Buffer Filled Dropping remaining message", MessageCouter);
			read_len = 0;
		}
		else
		{
			pkt_buf = pkt_buf->frags;
		}

	} while (read_len > 0);

	if (net_pkt_is_empty(pkt))
	{
		LOG_ERR("Message NMR: %lld, ERROR: Packet is empty", MessageCouter);
		// net_pkt_unref(pkt);
		// w6100_command(dev, W6100_RV_SR_CR_RECV);
		// return;
	}
	

	int ret = net_recv_data(ctx->iface, pkt);

	if (ret < 0)
	{
		LOG_ERR("Message NMR: %lld, Failed to notify uperstack of nieuw network message, Err: %d", MessageCouter, ret);
		net_pkt_unref(pkt);
	}


The problem occurs when the buffer fills up and there is no more room to allocate. It declines allocation and continues to handle the error.

The network stack, however, seems to never regain space to allocate new buffers. And the network stack function zsock_recvfrom does not output any messages. I have tried lowering and raising the priority of the driver thread, but it did not seem to change anything. This occurs at high traffic but is not solved by lowering or temporarily stopping traffic.

Why could this be occurring?
Especially why does the network stack stop freeing my packets?

Some Screen shots after buffer filled:


Screen shots before buffer filled:

Parents
  • Hi, sorry for the late response.

    I feel like the issue is something else. I only open one socket that I keep open. After receiving from the network stack with zsock_recvfrom, I send the data into a FIFO buffer. This buffer does not seem to fill up, and the receiving thread keeps running. As far as I can see, it gets stuck on zsock_recvfrom.

    Even if the buffer were to fill, the other threads should be able to empty it over time, and some messages should be processed, right? I have tested this with multiple priority settings, and the results seem the same.

    I need to recover from a filled buffer and prevent it from filling up and being unable to empty.

    The Receive Thread:

    	while (true /*ret >= 0*/) 
    	{
    		LOG_INF("\x1B[0;35m" "receiving" "\x1b[0m");
    		ret = zsock_recvfrom(socketVal, (void*)(&(recv_data.buffer[0])), sizeof(recv_data.buffer), 0x00, (struct sockaddr *)(&(recv_data.sender_addr)), (socklen_t *)(&(recv_data.sender_addr_len)));
    		if (ret > 0) 
    		{	
    			// NET_INFO("UDP |Count: %d Data received: %s, Buffer Size: %u", Count, recv_data.buffer, recv_data.buffer_size);
    			// LOG_INF("UDP "/*|Count: %d Data received: ", Count%s", , recv_data.buffer*/);
    			Count++;
    			recv_data.buffer_size = ret;
    			if (recv_data.buffer_size >= sizeof(recv_data.buffer))
    			{
    				LOG_ERR("Wrong buffer size!");
    				k_sleep(K_SECONDS(1));
    			}
    
    			Message message;
    
    			ret = handleEthernetInput(&(recv_data.buffer[0]), recv_data.buffer_size, &message);
    			if (ret != 0)
    			{
    				LOG_ERR("Protocol not correct");
    				return;
    			}
    		}
    	}

Reply
  • Hi, sorry for the late response.

    I feel like the issue is something else. I only open one socket that I keep open. After receiving from the network stack with zsock_recvfrom, I send the data into a FIFO buffer. This buffer does not seem to fill up, and the receiving thread keeps running. As far as I can see, it gets stuck on zsock_recvfrom.

    Even if the buffer were to fill, the other threads should be able to empty it over time, and some messages should be processed, right? I have tested this with multiple priority settings, and the results seem the same.

    I need to recover from a filled buffer and prevent it from filling up and being unable to empty.

    The Receive Thread:

    	while (true /*ret >= 0*/) 
    	{
    		LOG_INF("\x1B[0;35m" "receiving" "\x1b[0m");
    		ret = zsock_recvfrom(socketVal, (void*)(&(recv_data.buffer[0])), sizeof(recv_data.buffer), 0x00, (struct sockaddr *)(&(recv_data.sender_addr)), (socklen_t *)(&(recv_data.sender_addr_len)));
    		if (ret > 0) 
    		{	
    			// NET_INFO("UDP |Count: %d Data received: %s, Buffer Size: %u", Count, recv_data.buffer, recv_data.buffer_size);
    			// LOG_INF("UDP "/*|Count: %d Data received: ", Count%s", , recv_data.buffer*/);
    			Count++;
    			recv_data.buffer_size = ret;
    			if (recv_data.buffer_size >= sizeof(recv_data.buffer))
    			{
    				LOG_ERR("Wrong buffer size!");
    				k_sleep(K_SECONDS(1));
    			}
    
    			Message message;
    
    			ret = handleEthernetInput(&(recv_data.buffer[0]), recv_data.buffer_size, &message);
    			if (ret != 0)
    			{
    				LOG_ERR("Protocol not correct");
    				return;
    			}
    		}
    	}

Children
  • Hi, 

    There’s no error handling for zsock_recvfrom () , one possibility is that the socket starts to report errors for whatever reason and is stuck in an error loop.

    The logic in the stack is pretty simple, if the received packet matches one of the open ports, it’s queued for the target net_context/socket, otherwise, it’s dropped. Some suggestions I could have are:

    • check net conn command to see if there are no stray ports open accumulating packets,

    • or in the customer’s receive thread, set a socket timeout for receive (enable CONFIG_ET_CONTEXT_RCVTIMEO and set setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, ...) on a socket), to see, if the zsock_recv() function reports no data (-EAGAIN returned via errno) or whether it’s some higher-priority thread busy-looping for instance, preventing the receiver thread from running.

    But generally, I’m afraid it’d be really hard to help much here, I’ve heard of no cases of network packets being not freed which were not application errors. If we had a bug in the stack somewhere, I’m pretty sure it’d be reported on upstream Zephyr already. Debugging issues like this require a holistic approach, ideally with a debugger involved, I’m not able to give clear and certain answers based on a few code snippets only.

    -Amanda H.

Related