This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Socket API "send" hangs when sending multiple TCP messages

Hi, 

   I am developing an application that requires communication between two nrf52840-based devices. There is a scenario in my application where the devices needs to exchange multiple TCP packets, however after transmitting nearly 50 messages, send API hangs. The inter-message transmission interval is 70-200 ms. I am using TCP over IEEE 802.15.4 with POSIX names enabled. My application's message size 14-byte. I would be really thankful for your help. 

Best regards,

Omer

Parents
  • Hello, Omer!

    Are you able to provide any logs from the devices? I would also like information about which code your samples are based on, and whether you are using the nRF Connect SDK or the nRF5 SDK.

    Best regards,
    Carl Richard

  • Hi Carl, 

        Regarding logs - I am obtaining information by using printk and displaying the information on terminal. I am relatively new to Zephyr, therefore I do not know whether Zephyr stores some log files. If so, I would really appreciate if you could also tell me where those log files are stored. 

       Code Sample: I built my code using the socket client server example available at zephyr/samples/net/sockets. 

       I am using nRf Connect SDK. 

    Best regards,

    Omer

  • Hi Carl, 

       I have tested echo-client/server example by transmitting data packets in a loop. I can now confirm that the blocking behavior is reproduced in the echo-client/server example as well. I have attached the source files and proj.conf file. Please note that I am using nRF Connect SDK Version 1.3.0, however I have also tested on over nRF Connect Version 1.5.0.  6472.prj.conf

    #include <zephyr.h>
    #include <errno.h>
    #include <stdio.h>
    
    #include <net/socket.h>
    #include <linker/sections.h>
    #include <shell/shell.h>
    #include <net/net_core.h>
    
    #define MY_PORT 4242
    #define PEER_PORT 4242
    
    void main()
    {
        printk("Starting the Process ... \n") ; 
    	struct sockaddr_in6 addr6, addr_peer;
    	int sock_id = 0 ; 
    	int ret = 0 ;  
    	char msg[] = "HELLO\n" ; 
        char msg1[100] ; 
    	
    	memset(&addr6, 0, sizeof(addr6));
    	addr6.sin6_family = AF_INET6;
    	addr6.sin6_port = htons(MY_PORT);
    	inet_pton(AF_INET6, CONFIG_NET_CONFIG_MY_IPV6_ADDR,
    			  &addr6.sin6_addr);
    	sock_id = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP);
    	if(sock_id == -1)
    	{
    		printk("Error in creating socket \n") ; 
    		return ; 
    	}
    	
    	ret = bind(sock_id, (struct sockaddr *) &addr6, sizeof(addr6)) ; 
    	
    	if(ret < 0)
    	{
    			printk("Error in Binding the Socket \n") ; 
    			return ; 
    	}
    	while(true)
    	{
    		(void) memset(&addr_peer, 0, sizeof(addr_peer)); 
    		addr_peer.sin6_family = AF_INET6;
    		addr_peer.sin6_port = htons(PEER_PORT);
    		inet_pton(AF_INET6, CONFIG_NET_CONFIG_PEER_IPV6_ADDR,
    			  &addr_peer.sin6_addr);
            ret = sendto(sock_id, msg, sizeof(msg), 0, (struct sockaddr *)&addr_peer, sizeof(addr_peer)) ;
    		if(ret < 0)
    		{
    			printk("Error in Transmitting the Message \n"); 
    			return ; 
    		}
    		else
    		{
    			printk("The message has been transmitted \n") ;
    			ret = recv(sock_id, (void*) msg1, sizeof(msg1), 0) ;
    			printk("%s\n" ,msg) ; 
    			/*stop_time = k_cycle_get_32();
    			cycles_spent = stop_time - start_time;
    			nanoseconds_spent = k_cyc_to_ns_ceil32(cycles_spent) ; //SYS_CLOCK_HW_CYCLES_TO_NS(cycles_spent);*/
    		}
    		memset(msg1, sizeof(msg1), 0) ; 
    		k_msleep(1000) ;
    	}
    	return ; 
    }
    #include <zephyr.h>
    #include <errno.h>
    #include <stdio.h>
    
    #include <net/socket.h>
    #include <linker/sections.h>
    #include <shell/shell.h>
    #include <net/net_core.h>
    
    #define MY_PORT 4242
    #define PEER_PORT 4242
    
    void main()
    {
    	struct sockaddr_in6 addr6, addr_peer;
    	int sock_id = 0 ; 
    	int ret = 0 ;  
    	char msg[100] ; 
    
    	(void)memset(&addr6, 0, sizeof(addr6));
    	addr6.sin6_family = AF_INET6;
    	addr6.sin6_port = htons(MY_PORT);
    	inet_pton(AF_INET6, CONFIG_NET_CONFIG_MY_IPV6_ADDR,
    			  &addr6.sin6_addr);
    	sock_id = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP);
    	if(sock_id == -1)
    	{
    		printk("Error in creating socket \n") ; 
    		return ; 
    	}
    	
    	ret = bind(sock_id, (struct sockaddr *) &addr6, sizeof(addr6)) ;
            addr_peer.sin6_family = AF_INET6;
            addr_peer.sin6_port = htons(PEER_PORT);
            inet_pton(AF_INET6, CONFIG_NET_CONFIG_PEER_IPV6_ADDR,
                      &addr_peer.sin6_addr);
    	
    	while(true)
    	{ 
    		/* Continously send message to a server after every 5 seconds */
    		ret = recv(sock_id, (void*) msg, sizeof(msg), 0) ; 
    		if(ret < 0)
    		{
    			printk("Error in Receiving a Message \n"); 
    			return ; 
    		}
    		else
    		{
    			printk("%s\n", msg) ; 
    			ret = sendto(sock_id, msg, sizeof(msg), 0, (struct sockaddr *)&addr_peer, sizeof(addr_peer)) ;
    			if(ret < 0 )
    			{
    					printk("Error in transmitting message \n") ; 
    			}
    		}
    	}
    	close(sock_id) ; 
    	return ; 
    }
     

  • Hello again, Omer!

    Apologies for the delayed answer and thanks for the additional information! It seems like this might be a bug on our side. I'll report this internally and will get back to you soon. 

    Best regards,
    Carl Richard

  • Hi again!

    The developers are looking into this now. In the meantime, could you try to increase the following buffers and see if that give you better results?

    CONFIG_NET_PKT_RX_COUNT
    CONFIG_NET_PKT_TX_COUNT
    CONFIG_NET_BUF_RX_COUNT
    CONFIG_NET_BUF_TX_COUNT

    And for some clarification; I see that TCP isn't enabled in the echo samples you tested last. Hence, we can assume that this isn't related to the TCP/UDP operation and rather some lower level issue. Agree?

    Best regards,
    Carl Richard

  • Hi Carl, 

        Thanks for the message. I did try to increase the buffers you have mentioned, however I still observe the same behavior. 

     

        TCP/UDP: In my own application, I am using TCP and my application hangs. In the echo client server application, I am using UDP, and these applications also hang. Therefore, we cannot say that there is no issue with TCP/UDP implementation. 

    Best regards,

    Omer

  • Hi again, Omer!

    I've gotten some feedback from the developers. Listing it here:

    • For many devices the chosen entropy node is Cryptocell by default. This can immensely slow down the applications. You can change this by adding nrf52840dk_nrf52840.overlay files containing the code below in your applications.
       
      / {
      	/*
      	* In some default configurations within the nRF Connect SDK,
      	* e.g. on nRF52840, the chosen zephyr,entropy node is &cryptocell.
      	* This devicetree overlay ensures that default is overridden wherever it
      	* is set, as this application uses the RNG node for entropy exclusively.
      	*/
      	chosen {
      		zephyr,entropy = &rng;
      	};
      };
    • They suggest using the applying the changes below to the overlay-802154.conf of the applications to ensure correct operation on the nRF devices.

      diff --git a/samples/net/sockets/echo_client/overlay-802154.conf b/samples/net/sockets/echo_client/overlay-802154.conf
      index 2fc07cf685..5707ebbffe 100644
      --- a/samples/net/sockets/echo_client/overlay-802154.conf
      +++ b/samples/net/sockets/echo_client/overlay-802154.conf
      @@ -1,7 +1,8 @@
       CONFIG_BT=n
       
       # Disable TCP and IPv4 (TCP disabled to avoid heavy traffic)
      -CONFIG_NET_TCP=n
      +CONFIG_NET_TCP=y
      +CONFIG_NET_UDP=n
       CONFIG_NET_IPV4=n
       
       CONFIG_NET_CONFIG_NEED_IPV6=y
      @@ -14,5 +15,6 @@ CONFIG_NET_CONFIG_PEER_IPV6_ADDR="2001:db8::1"
       CONFIG_NET_L2_IEEE802154=y
       CONFIG_NET_L2_IEEE802154_SHELL=y
       CONFIG_NET_L2_IEEE802154_LOG_LEVEL_INF=y
      +CONFIG_NET_L2_IEEE802154_FRAGMENT_REASS_CACHE_SIZE=8
       
       CONFIG_NET_CONFIG_IEEE802154_CHANNEL=26
    • Lastly, they noticed (at least for you echo client/server test) that may lead to hangs/instability. Quoting: 
    Their client app, after sending the UDP packet enters the recv() function in a blocking manner, w/o any timeout configured. This means that the client assumes that it'll always receive a response from the server. This is not a correct approach in general, but it's even worse in lossy networks like 802.15.4, where it's not that uncommon that either request or reponse may be lost. Especially that Zephyr's 802.15.4 MAC does not use ACK mechanism by default (have to be enabled explicitly in the config file).

    Please make sure to take the above into account!

    If anything is unclear, please reach out!

    Best regards,
    Carl Richard

Reply
  • Hi again, Omer!

    I've gotten some feedback from the developers. Listing it here:

    • For many devices the chosen entropy node is Cryptocell by default. This can immensely slow down the applications. You can change this by adding nrf52840dk_nrf52840.overlay files containing the code below in your applications.
       
      / {
      	/*
      	* In some default configurations within the nRF Connect SDK,
      	* e.g. on nRF52840, the chosen zephyr,entropy node is &cryptocell.
      	* This devicetree overlay ensures that default is overridden wherever it
      	* is set, as this application uses the RNG node for entropy exclusively.
      	*/
      	chosen {
      		zephyr,entropy = &rng;
      	};
      };
    • They suggest using the applying the changes below to the overlay-802154.conf of the applications to ensure correct operation on the nRF devices.

      diff --git a/samples/net/sockets/echo_client/overlay-802154.conf b/samples/net/sockets/echo_client/overlay-802154.conf
      index 2fc07cf685..5707ebbffe 100644
      --- a/samples/net/sockets/echo_client/overlay-802154.conf
      +++ b/samples/net/sockets/echo_client/overlay-802154.conf
      @@ -1,7 +1,8 @@
       CONFIG_BT=n
       
       # Disable TCP and IPv4 (TCP disabled to avoid heavy traffic)
      -CONFIG_NET_TCP=n
      +CONFIG_NET_TCP=y
      +CONFIG_NET_UDP=n
       CONFIG_NET_IPV4=n
       
       CONFIG_NET_CONFIG_NEED_IPV6=y
      @@ -14,5 +15,6 @@ CONFIG_NET_CONFIG_PEER_IPV6_ADDR="2001:db8::1"
       CONFIG_NET_L2_IEEE802154=y
       CONFIG_NET_L2_IEEE802154_SHELL=y
       CONFIG_NET_L2_IEEE802154_LOG_LEVEL_INF=y
      +CONFIG_NET_L2_IEEE802154_FRAGMENT_REASS_CACHE_SIZE=8
       
       CONFIG_NET_CONFIG_IEEE802154_CHANNEL=26
    • Lastly, they noticed (at least for you echo client/server test) that may lead to hangs/instability. Quoting: 
    Their client app, after sending the UDP packet enters the recv() function in a blocking manner, w/o any timeout configured. This means that the client assumes that it'll always receive a response from the server. This is not a correct approach in general, but it's even worse in lossy networks like 802.15.4, where it's not that uncommon that either request or reponse may be lost. Especially that Zephyr's 802.15.4 MAC does not use ACK mechanism by default (have to be enabled explicitly in the config file).

    Please make sure to take the above into account!

    If anything is unclear, please reach out!

    Best regards,
    Carl Richard

Children
  • Hi Carl, 

        Thanks a lot for the detailed response. I was using UDP for the echo client/server application, and yes reliability can be an issue. However, I did enable layer 2 ACKs by having the following in proj.conf file, "CONFIG_NET_L2_IEEE802154_ACK_REPLY=y". Based on zephyr documentation I think this is the way to enable layer 2 ACKs, right? With all the modifications to the proj.conf and overlay files that you have suggested in your recent answer the problem still exists. 

      Second, the original application that I am developing is based on TCP. TCP is a reliable protocol, i.e., if the message gets lost then TCP will automatically retransmit the message. Hence, the hanging issue is not due to lost packets.   

      Third, I modified echo client/server application so that the applications use TCP instead of UDP for communication. Again, I also incorporated the changes you have suggested in the proj.conf and overlap files, however the hanging issue still exists (with this message I am attaching all the sources). For a few minutes the messages are exchanged, however afterwards the system hangs. 

      I suspect there might be an issue new "net_pkt buffer_allocation". May be buffers are not corrected deallocated, and it might be causing the issue while buffer allocation. My suggestion would be to run the attached pieces of codes, and your team may identify the problem. 

    Best regards,

    Omermy_echo_server.zipmy_echo_client.zip  

  • Hello again, Omer!

    We have not managed to reproduce your issue with the applications you attached. Our tests were done using NCS v1.5.0. What version are you running?

    The developers also noted that setting the ACK should be done differently when using the nRF driver for 802.15.4. To do this in runtime you need to o configure the L2 so that it sets the ACK request bit in the outgoing frames:

    net_mgmt(NET_REQUEST_IEEE802154_SET_ACK, iface, NULL, 0);

    For testing purposes this can also be done using the ieee802154 shell command: ieee802154 ack set

    Could you also share the HW version of your nRF52840 DKs and the file "<project_root>/zephyr/include/generated/autoconf.h"?

    Best regards,
    Carl Richard

  • Hi Carl, 

       Thanks very much for your reply and help. I am using NCS v1.3.0, and now I have switched to NCS v1.5.0. In my initial tests, the application is no longer hanging after I produce an executable using NCS v1.5.0. I will do more rigorous testing, and if I found something I will get in touch. 

    I tried to use net_mgmt(NET_REQUEST_IEEE802154_SET_ACK, iface, NULL, 0) to enable L2 ACKs, however I got compile time errors. One of the error is "NET_REQUEST_IEEE802154_SET_ACK" not defined. Could you please tell me what header files I need to include so the "NET_REQUEST_IEEE802154_SET_ACK" can be found? Second, could you please also tell me how I can obtain "iface" on nRF52840 DK so that I can pass it to the net_mgmt function? 

    Once again thank you very much for your help. 

    Best regards,

    Omer

  • Hello again, Omer!

    Apologies for the slow response from me. I hope your testing has gone well! To use the net management API you must include net_mgmt.h and enable it in prj.conf by adding CONFIG_NET_MGMT=y. To obtain "iface" the developers suggest that the easiest way will be to use net_if_get_ieee802154() function (available in net/net_if.h). 

    Hope this is clear and please reach out if you face any other issues.

    Best regards,
    Carl Richard

Related