This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Can not see any CoAP packets on sniffer when Leader role

[development software, with versions]

  1. nRF5_SDK_for_Thread_and_Zigbee_v1.0.0
  2. RaspPi_OT_Border_Router_Demo_v1.0.0-1.alpha
  3. NCP example located in <InstallFolder>/examples/thread/ncp/uart/hex/nrf52840_xxaa.hex
  4. nRF52840-PDK
  5. Raspberry Pi connect through an Ethernet cable to my switch that provides IPv4 connectivity with the DHCP service.

I've asked this question before. I'd like to create similar application, using nRF5_SDK_for_Thread_and_Zigbee_v1.0.0. My application is based on ble_app_blinky_c. I've added Thread protocol support to it, referring to Adding dynamic multiprotocol Thread support to BLE examples, and called thread_coap_utils_cloud_data_update(...) function in on_adv_report(...) function. In thread_instance_init(...) function, I've set

.role                  = RX_OFF_WHEN_IDLE,

then (built and) run. After that, OT_DEVICE_ROLE become Child and it works (I can see CoAP packets on sniffer). So that, I've set

.role                  = RX_ON_WHEN_IDLE,

then (built and) run. After that, OT_DEVICE_ROLE become Child or Router or Leader. When OT_DEVICE_ROLE become Child or Router, my application works (I can see CoAP packets on sniffer). But when OT_DEVICE_ROLE become Leader, I can not see any CoAP packets on sniffer (Of course, thread_coap_utils_cloud_data_update(...) function is called many times). In this case, when my application starts from OT_DEVICE_ROLE_CHILD, (after a short time) OT_DEVICE_ROLE changes to Router, then (after a short time) OT_DEVICE_ROLE changes to Leader (and CoAP packets are disappeared on sniffer).

Is this behavior expected? Can anyone help me?

By the way, is there any way to limit OT_DEVICE_ROLE Child and Router? SetLeaderRoleEnabled = false, or something...
Thank you.

EDIT: I've uploaded ble_app_scan_c_th.zip .

EDIT2: I've uploaded main.c_20180831.zip .

Parents
  • Hi,

    I do not think it is possible to disable the Leader role, as this is needed if you are the only Router in the network.

    Note: There is always a single Leader in each Thread network partition.

    Have you checked if you get any error codes when the device is in the Leader role? It sounds very strange that the role should affect the ability to send CoAP packets.

    Can you please upload your project so we can try to reporoduce the issue?

    Best regards,
    Jørgen

  • Hi,

    I'm sorry for the slow reply. I have tried to reproduce the issue, but I'm facing some issues getting your example to connect correctly with the border router. When trying to ping the BR or Google IPv4 DNS, it take very long before the answer is received (~100 seconds). Have you seen this issue with your setup?

    > ping fdaa:bb:1::1
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > 8 bytes from fdaa:bb:1:0:0:0:0:1: icmp_seq=1 hlim=64 time=133121ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=2 hlim=64 time=100213ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=3 hlim=64 time=96772ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=4 hlim=64 time=95663ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=5 hlim=64 time=94768ms
    8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=6 hlim=64 time=93881ms
    
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > ping 64:ff9b::0808:0808
    > 8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=7 hlim=64 time=215865ms
    8 bytes from 64:ff9b:0:0:0:0:808:808: icmp_seq=8 hlim=116 time=116348ms

    When testing the CLI example with the same border router, the response come "immediately":

    > ping 64:ff9b::0808:0808
    > 8 bytes from 64:ff9b:0:0:0:0:808:808: icmp_seq=1 hlim=116 time=58ms
    > ping fdde:ad00:beef:0:c684:3295:c65f:97e0
    > 8 bytes from fdde:ad00:beef:0:c684:3295:c65f:97e0: icmp_seq=2 hlim=64 time=54ms

    I will try to reproduce the issue further today, if I'm not able to do this, I will ask our Thread developers if they have any suggestions.

    Best regards,
    Jørgen

  • I'm not exactly sure what could be causing this. Did you test with RX_ON or RX_OFF setting?

    I have tried to reproduce the CoAP issue, but I'm not able to see the same behavior. If testing with the address you set in your application, I can't see CoAP packets at all. If I test with the address off one of the other nodes, I'm able to see CoAP packets in all three states, and with both RX_ON and RX_OFF setting. Have you tested to send CoAP packets to another node?

  • Hi,

    Thank you for inspecting the behavior. I've confirmed I couldn't see CoAP packets at all when I've tested with unreachable IP address in my application. I'm sorry I couldn't notice this requirement.

    Per your advice, I've tested to send CoAP packets to another PC, but it gave the same result (CoAP issue is ongoing). Once the device is in the Leader role, this state usually last very long, so that I cannot post my data for a long time. It's big issue. What am I missing?

    Thank you.

  • I'm able to see CoAP packets in any role, with RX_ON_IDLE set. I'm not sure why you are not able to see the packets when in Leader role, but it could be that you are giving too much radio time to the softdevice/BLE part of your application. You have a scan interval of 100ms, and a scan window of 50 ms. This will affect the Thread stacks ability to send/receive packets correctly. As described in the documentation, the performance is reduced drastically even with 90% timeslot available for Thread. I had troubles getting two boards to join the same network when BLE scanning was enabled.

    Have you tried disabling BLE scanning, to see if you are still seeing the issue?

  • Per your suggestion, I've changed main.c for disabling BLE scanning, and observed the response. Unfortunately, I've got same CoAP issue. What could be the cause of this issue?

    This main.c is supposed to do "BLE scan is stopped after first Advertising report was obtained, then call thread_coap_utils_cloud_data_update(...) function periodically (5[s] period)". Do you have time to review this main.c? I've uploaded main.c_20180831.zip.

  • Is the node running this FW still a part of a larger thread network when it is in the leader state (i.e., are you seeing other routers/childs when you run the 'router list'/'child list' CLI commands)? I can't get two boards running your FW to form one network, without involving other nodes. They either stay both in leader state, or in router/child state when I involve one node running the CLI FW from SDK. If the node is no longer part of the Thread network when you try to send the CoAP packet, it might be the reason you can't sniff the CoAP packets anymore (the receiver is in another network, and the packets is not transmitted as there is no receiver).

    If you can confirm/reject that you are seeing the same behavior, I can try to debug this further.

Reply
  • Is the node running this FW still a part of a larger thread network when it is in the leader state (i.e., are you seeing other routers/childs when you run the 'router list'/'child list' CLI commands)? I can't get two boards running your FW to form one network, without involving other nodes. They either stay both in leader state, or in router/child state when I involve one node running the CLI FW from SDK. If the node is no longer part of the Thread network when you try to send the CoAP packet, it might be the reason you can't sniff the CoAP packets anymore (the receiver is in another network, and the packets is not transmitted as there is no receiver).

    If you can confirm/reject that you are seeing the same behavior, I can try to debug this further.

Children
  • I've tried 'router list'/'child list' CLI commands on the node. The following are results:

    > state
    child
    Done
    > router list
    28
    Done
    > child list
    
    Done
    > state
    router
    Done
    > router list
    28 58
    Done
    > child list
    
    Done
    > state
    leader
    Done
    > router list
    58
    Done
    > child list
    
    Done
    >

    According to this results (and your comment), I guess the node running this FW is not a part of a larger Thread network when it is in the leader state (I think Border_Router is #28 and my test node is #58. When my test node is in the leader state, Border_Router #28 doesn't exist in the same Thread network), so that I can't sniff the CoAP packets in this state. Is there any way to fix this strange behavior?

  • Can you try to replace call to nrf_pwr_mgmt_run() with thread_sleep() in idle_state_handle()? This seems to resolve the issue in my tests.

  • Thank you for your continuous support. Have you reproduced the strange behavior? I've tried to replace call to nrf_pwr_mgmt_run() with thread_sleep() in idle_state_handle(), but obtained same result (I can’t get it working). Is there another way to fix this strange behavior?

  • In my setup, I tested with two nodes running your firmware. The boards does not seem to form a single network, as both nodes stay in leader state. I was able to solve this issue by replacing call to nrf_pwr_mgmt_run() with thread_sleep() in idle_state_handle(), I was then able to send CoAP packets to the other node (changed the destination IP in your application). If you are not connected to the Border Router and your CoAP server is located outside the Thread network, it is expected that the packets will not be sent on the air, as there is no destination route.

    How do you promote the state of the test board to leader state? This will not happen automatically, unless the connection to the current leader node in the network is lost.

    I have requested the help from our Thread developers, to have a look at your issue. They will try to reproduce this issue and get back to you as soon as possible.

  • Thank you for your reply. In my setup, I've tested with one node running my FW (One test board and one Border Router). Immediately after flashing the FW, the test board and the Border Router did not seem to form a single Thread network, as both nodes stayed in leader state (my guess). So I run 'routerrole disable' CLI command on the test board. Then, after confirming the test board was in child state (and CoAP packets could be seen on sniffer; I thought the Border Router was in leader state), I run 'routerrole enable' CLI command on the test board. After a short time, the test board promoted to router state (I thought the Border Router was still in leader state). Then after a short time, the test board promoted to leader state automatically. As you mentioned, this 'automatic promotion of the test board to leader state' is strange.

    I'll try your two test boards configuration (with thread_sleep()) tomorrow. I'm out of the office now.

    EDIT: I've tried your two test boards configuration (with thread_sleep()). Unfortunately, 'automatic promotion of the test board to leader state' happened sometimes. Compared with one test board configuration, occurrence of the issue seems to be a little low on my observation.

Related