aws_iot sample: the purpose of delayed connect_work

I have been trying the aws_iot sample in nRF Connect SDK v2.7.0 for build target nrf7002dk/nrf5340/cpuapp, in which connect_work is scheduled with a 5 second delay in the function on_net_event_l4_connected. What is the purpose of this delay?

As a test I have tried replacing the 5 second delay with K_NO_WAIT, and noticed that aws_iot_connect returns error -116 with an increased rate (although this error also occasionally happens with the K_SECONDS(5) delay), is this the reason for the 5 second delay? Is it recommended to always have some amount delay between a NET_EVENT_L4_CONNECTED event and calling aws_iot_connect?

  • Hi,

     

    By default, the sample waits for the layer 4 to be connected, but this does not mean that you've gotten everything you need (DNS, DHCP etc).

    Instead of having a delay, you can wait for the DNS to be added for instance, or when the DHCP is bound, like done here in the TWT sample:

    https://github.com/nrfconnect/sdk-nrf/blob/v2.6.1/samples/wifi/twt/src/main.c#L408-L410

     

    The events you can listen to is listed here:

    https://github.com/nrfconnect/sdk-zephyr/blob/v3.5.99-ncs1/include/zephyr/net/net_event.h

     

    Kind regards,

    Håkon

  • I have observed NET_EVENT_DNS_SERVER_ADD events, but not NET_EVENT_IPV4_DHCP_BOUND events. By inspecting zephyr/subsys/net/lib/dns/resolve.c i found out that the NET_EVENT_DNS_SERVER_ADD event comes with a pointer to struct sockaddr, let's call this dns_server. It seems that if aws_iot_connect is attempted after just NET_EVENT_DNS_SERVER_ADD with dns_server -> sa_family == AF_INET6, then aws_iot_connect fails with

    [00:00:22.392,333] <err> mqtt_helper: mqtt_connect, error: -116
    [00:00:22.401,123] <err> aws_iot: mqtt_helper_connect, error: -116
    [00:00:22.410,186] <err> aws_iot_sample: aws_iot_connect, error: -116

    and if aws_iot_connect is attempted after just NET_EVENT_DNS_SERVER_ADD with dns_server -> sa_family == AF_INET, then aws_iot_connect fails with

    [00:00:23.339,416] <err> mqtt_helper: getaddrinfo() failed, error -11
    [00:00:23.348,724] <err> aws_iot: mqtt_helper_connect, error: 11
    [00:00:23.357,604] <err> aws_iot_sample: aws_iot_connect, error: 11

    Calling aws_iot_connect is successful if it is attempted only after both kinds of NET_EVENT_DNS_SERVER_ADD events have been generated.

  • Hi,

     

    Sorry, I see the same issue as you wrt. the DNS events. If you have several IPs being reported from your router, you cannot use the DNS event.

    Could you try the DHCP bound event and see if this works better on your end? at my end, this arrives approx. the same time as the l4-connected event, but the connection procedure always goes through here. Your timing might be different.

     

    Kind regards,

    Håkon

  • At first I tried concatenating NET_EVENT_IPV4_DHCP_BOUND to the L4_EVENT_MASK and adding a corresponding log message to l4_event_handler, but did not see evidence of NET_EVENT_IPV4_DHCP_BOUND events. Eventually I was able to see NET_EVENT_IPV4_DHCP_BOUND events by using a separate callback for them. With a little testing I have observed up to a 10 second delay between the l4-connected event and NET_EVENT_IPV4_DHCP_BOUND event. AWS connection seems to work when attempted after NET_EVENT_IPV4_DHCP_BOUND without adding a delay.

  • Hi,

     

    Johan Kopra said:
    At first I tried concatenating NET_EVENT_IPV4_DHCP_BOUND to the L4_EVENT_MASK and adding a corresponding log message to l4_event_handler, but did not see evidence of NET_EVENT_IPV4_DHCP_BOUND events. Eventually I was able to see NET_EVENT_IPV4_DHCP_BOUND events by using a separate callback for them.

    My apologies, I should have given an example here, but I'm glad to hear that you found a solution to this.

    Johan Kopra said:
    With a little testing I have observed up to a 10 second delay between the l4-connected event and NET_EVENT_IPV4_DHCP_BOUND event. AWS connection seems to work when attempted after NET_EVENT_IPV4_DHCP_BOUND without adding a delay.

    Glad to hear that the initial connection is more stable now. This is an interesting find. I tested a couple of networks, and got both of these events almost simultaneously (milliseconds apart). I'll report your findings back to the team responsible for these samples.

     

    Kind regards,

    Håkon

Related