This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Can not see any CoAP packets on sniffer when Leader role

[development software, with versions]

  1. nRF5_SDK_for_Thread_and_Zigbee_v1.0.0
  2. RaspPi_OT_Border_Router_Demo_v1.0.0-1.alpha
  3. NCP example located in <InstallFolder>/examples/thread/ncp/uart/hex/nrf52840_xxaa.hex
  4. nRF52840-PDK
  5. Raspberry Pi connect through an Ethernet cable to my switch that provides IPv4 connectivity with the DHCP service.

I've asked this question before. I'd like to create similar application, using nRF5_SDK_for_Thread_and_Zigbee_v1.0.0. My application is based on ble_app_blinky_c. I've added Thread protocol support to it, referring to Adding dynamic multiprotocol Thread support to BLE examples, and called thread_coap_utils_cloud_data_update(...) function in on_adv_report(...) function. In thread_instance_init(...) function, I've set

.role                  = RX_OFF_WHEN_IDLE,

then (built and) run. After that, OT_DEVICE_ROLE become Child and it works (I can see CoAP packets on sniffer). So that, I've set

.role                  = RX_ON_WHEN_IDLE,

then (built and) run. After that, OT_DEVICE_ROLE become Child or Router or Leader. When OT_DEVICE_ROLE become Child or Router, my application works (I can see CoAP packets on sniffer). But when OT_DEVICE_ROLE become Leader, I can not see any CoAP packets on sniffer (Of course, thread_coap_utils_cloud_data_update(...) function is called many times). In this case, when my application starts from OT_DEVICE_ROLE_CHILD, (after a short time) OT_DEVICE_ROLE changes to Router, then (after a short time) OT_DEVICE_ROLE changes to Leader (and CoAP packets are disappeared on sniffer).

Is this behavior expected? Can anyone help me?

By the way, is there any way to limit OT_DEVICE_ROLE Child and Router? SetLeaderRoleEnabled = false, or something...
Thank you.

EDIT: I've uploaded ble_app_scan_c_th.zip .

EDIT2: I've uploaded main.c_20180831.zip .

Parents
  • Hi,

    I do not think it is possible to disable the Leader role, as this is needed if you are the only Router in the network.

    Note: There is always a single Leader in each Thread network partition.

    Have you checked if you get any error codes when the device is in the Leader role? It sounds very strange that the role should affect the ability to send CoAP packets.

    Can you please upload your project so we can try to reporoduce the issue?

    Best regards,
    Jørgen

  • Hi,

    I'm sorry for the slow reply. I have tried to reproduce the issue, but I'm facing some issues getting your example to connect correctly with the border router. When trying to ping the BR or Google IPv4 DNS, it take very long before the answer is received (~100 seconds). Have you seen this issue with your setup?

    > ping fdaa:bb:1::1
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > 8 bytes from fdaa:bb:1:0:0:0:0:1: icmp_seq=1 hlim=64 time=133121ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=2 hlim=64 time=100213ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=3 hlim=64 time=96772ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=4 hlim=64 time=95663ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=5 hlim=64 time=94768ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=6 hlim=64 time=93881ms
    
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping 64:ff9b::0808:0808
    > 8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=7 hlim=64 time=215865ms
    8 bytes from 64:ff9b:0:0:0:0:808:808: icmp_seq=8 hlim=116 time=116348ms

    When testing the CLI example with the same border router, the response come "immediately":

    > ping 64:ff9b::0808:0808
    > 8 bytes from 64:ff9b:0:0:0:0:808:808: icmp_seq=1 hlim=116 time=58ms
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > 8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=2 hlim=64 time=54ms

    I will try to reproduce the issue further today, if I'm not able to do this, I will ask our Thread developers if they have any suggestions.

    Best regards,
    Jørgen

  • Is the node running this FW still a part of a larger thread network when it is in the leader state (i.e., are you seeing other routers/childs when you run the 'router list'/'child list' CLI commands)? I can't get two boards running your FW to form one network, without involving other nodes. They either stay both in leader state, or in router/child state when I involve one node running the CLI FW from SDK. If the node is no longer part of the Thread network when you try to send the CoAP packet, it might be the reason you can't sniff the CoAP packets anymore (the receiver is in another network, and the packets is not transmitted as there is no receiver).

    If you can confirm/reject that you are seeing the same behavior, I can try to debug this further.

  • I've tried 'router list'/'child list' CLI commands on the node. The following are results:

    > state
    child
    Done
    > router list
    28
    Done
    > child list
    
    Done
    > state
    router
    Done
    > router list
    28 58
    Done
    > child list
    
    Done
    > state
    leader
    Done
    > router list
    58
    Done
    > child list
    
    Done
    >

    According to this results (and your comment), I guess the node running this FW is not a part of a larger Thread network when it is in the leader state (I think Border_Router is #28 and my test node is #58. When my test node is in the leader state, Border_Router #28 doesn't exist in the same Thread network), so that I can't sniff the CoAP packets in this state. Is there any way to fix this strange behavior?

  • Can you try to replace call to nrf_pwr_mgmt_run() with thread_sleep() in idle_state_handle()? This seems to resolve the issue in my tests.

  • Thank you for your continuous support. Have you reproduced the strange behavior? I've tried to replace call to nrf_pwr_mgmt_run() with thread_sleep() in idle_state_handle(), but obtained same result (I can’t get it working). Is there another way to fix this strange behavior?

  • In my setup, I tested with two nodes running your firmware. The boards does not seem to form a single network, as both nodes stay in leader state. I was able to solve this issue by replacing call to nrf_pwr_mgmt_run() with thread_sleep() in idle_state_handle(), I was then able to send CoAP packets to the other node (changed the destination IP in your application). If you are not connected to the Border Router and your CoAP server is located outside the Thread network, it is expected that the packets will not be sent on the air, as there is no destination route.

    How do you promote the state of the test board to leader state? This will not happen automatically, unless the connection to the current leader node in the network is lost.

    I have requested the help from our Thread developers, to have a look at your issue. They will try to reproduce this issue and get back to you as soon as possible.

Reply
  • In my setup, I tested with two nodes running your firmware. The boards does not seem to form a single network, as both nodes stay in leader state. I was able to solve this issue by replacing call to nrf_pwr_mgmt_run() with thread_sleep() in idle_state_handle(), I was then able to send CoAP packets to the other node (changed the destination IP in your application). If you are not connected to the Border Router and your CoAP server is located outside the Thread network, it is expected that the packets will not be sent on the air, as there is no destination route.

    How do you promote the state of the test board to leader state? This will not happen automatically, unless the connection to the current leader node in the network is lost.

    I have requested the help from our Thread developers, to have a look at your issue. They will try to reproduce this issue and get back to you as soon as possible.

Children
  • Thank you for your reply. In my setup, I've tested with one node running my FW (One test board and one Border Router). Immediately after flashing the FW, the test board and the Border Router did not seem to form a single Thread network, as both nodes stayed in leader state (my guess). So I run 'routerrole disable' CLI command on the test board. Then, after confirming the test board was in child state (and CoAP packets could be seen on sniffer; I thought the Border Router was in leader state), I run 'routerrole enable' CLI command on the test board. After a short time, the test board promoted to router state (I thought the Border Router was still in leader state). Then after a short time, the test board promoted to leader state automatically. As you mentioned, this 'automatic promotion of the test board to leader state' is strange.

    I'll try your two test boards configuration (with thread_sleep()) tomorrow. I'm out of the office now.

    EDIT: I've tried your two test boards configuration (with thread_sleep()). Unfortunately, 'automatic promotion of the test board to leader state' happened sometimes. Compared with one test board configuration, occurrence of the issue seems to be a little low on my observation.

  • It looks like you are sending too many messages - on_adv_report() may be called very frequently (every few ms). Thread network may not handle that (due to no free buffers for network management packets or simply we are transmitting so often that we cannot hear other boards) breaking apart into multiple partitions (that is why your board promotes to leader - every partition have its own leader even if it is the only node in partition). When we have multiple Thread network partitions there is no routing between them - and this is what you are observing. For test, try to send this data to cloud every 1s, everything should be fine then.

  • Thank you for your reply. I'm using main.c in main.c_20180831.zip (already uploaded) now. This code is supposed to do "BLE scan is stopped after first advertising report was obtained, then call thread_coap_utils_cloud_data_update(...) function every 5[s]". Even with this main.c, "automatic promotion of the test board to leader state" happens frequently. Is there another way to move forward?

Related