Joiner callback never being executed

Hi all,

I'm trying to integrate joiner into my project, here's my goal:

We will start a device as our commissioner, and while it is active, we will start the other devices as joiners such that they will handshake with the active commissioner and obtain the network parameters. Then we can enable them in the joiner callback function on success (currently not added in the code below, as I simply only want to see the outputs of the joiner callback function).

Currently, the workflow in my main.c goes like: otInstanceInitSingle -> otIp6SetEnabled -> start_joiner.

The problem is, I don't think the callback function was ever executed, as no outputs are shown, and the joiner is always stuck in the Discover state. 

When I simply flash the CLI sample into 2 devices, commissioning works via the command line inputs:

  • Start device A with new dataset, wait until it is leader, then start commissioner and add joiner (wildcard) with PSKd J01NME
  • Start device B, factory reset, ot ifconfig up, ot joiner start J01NME
  • Then the join is successful and I get the command line outputs. 

However, using my current program approach:

  • Flash CLI sample into device A, start it as usual, start as commissioner (same as the CLI approach) 
  • Flash a modified CLI sample code (shown below, implemented callback and start_joiner) onto device B
  • Check commissioner state of device A, verify that it is active
  • ot factoryreset on device B to reboot it, and it should start joiner with the defined PSKd.
  • No further logs are displayed, after time limit reaches for commissioner, it will output Commissioner: Joiner remove.

The logs and implementation of start_joiner and joiner_callback are shown below, the left terminal is the commissioner, and the right terminal is the joiner. 

How could it be that the joiner_callback was never called? The joiner is also stuck in the Discover state forever, even after the commissioner has expired. No errors were given.

I am experimenting on two 54L15DKs, with SDK v2.9.1. I would appreciate any help on this, thanks. 

Best regards,

Allan 

Parents
  • I would also like to follow up,

    I have also tried specifying eui64 in commissioner, but that does not work either, same behavior as described above.

  • Hi Allan,

    I'll try to reproduce this on Monday. If there is more code you have edited than in your screenshot, please share.

    Best regards,

    Maria

  • Hi Allan,

    Thank you for sharing the prj.conf and main.c files. I have reproduced the issue, but I was not able to find an explanation nor a fix today. I'll work on it more the next day(s).

    Just a quick question: Do you have something vital in main.h? I was able to build fine after just commenting it out, so I'm mostly wondering if there is something in it at all.

    Best regards,

    Maria

  • Hi Maria, 

    Sorry I forgot to clarify this, but this was a piece of CLI code modified by a previous firmware engineer, and I do not know why there is a main.h either. But essentially the main.h just has include statements, and I have moved all the relevant ones into main.c already. I will go comment out #include main.h as well now. 

    Here's main.h for reference:

    // My stuff
    #include <openthread/thread.h>
    #include <openthread/joiner.h>
    #include <openthread/instance.h>
    #include <openthread/link.h>
    #include <openthread/tasklet.h>
    #include <openthread/ip6.h>
    #include <openthread/dataset.h>
    #include <openthread/commissioner.h>
    
    void start_joiner(void);
     

    I will let you know if I make any progress or findings as well. 

    Best regards,

    Allan

  • Hi Maria, 

    I still haven't found a solution to the problem, but I would like you to know that we can also perform the following: 
    1. start the commissioner, add wildcard joiner

    2. factoryreset the joiner device

    3. `ot stop joiner` on joiner device

    4. `ot ifconfig down` on joiner device

    5. `ot ifconfig up` on joiner device

    6. `ot joiner start J01NME` on joiner device

    Then the join can still be successful, and everything works out, and the success logs are displayed on the commissioner, as shown below: 




  • I ran the debugger on the Joiner program, but it seems like it halts due to reason 3. However, the command line seems to be active, always. Thus, I don't think I have observed a fatal halt. 

  • Hi Maria,

    I still haven't figured out a solution to this issue. By looking at my code and procedures, do you think it SHOULD "just work out" normally? I believe this is quite a simple program that should be straight forward to create. 

    Another strange thing that I have noticed recently is that when I run the debugger on my code, it lands at a fatal halt with reason 3. Is this expected behaviour? What does reason 3 mean in this context? I don't think this is something that affects the execution of my program though, as all of my programs run into this issue, but most of them execute without any problems. 

    Also, I tried implementing a commissioner by program as well, but it seems like it gets stuck at the petitioning state, similar to how the joiner gets stuck at the discover state. 
    I made sure that the commissioner device would become the leader before starting commissioner as well. 

    Here are some logs that the 2 programs produced:

    As you see, it is quite unusual, as the commissioner should petition for less than a second before it becomes active (given that it is the leader already), and the joiner should also return some kind of error and return to Idle mode, if it doesn't find a corresponding commissioner within the set time. 

    Here are the logs when I manually interact with them, stopping their commissioner/joiner and restarting them by command line: 

    The join is successful. 

    Below is the code for starting the commissioner. Please note that I did not implement a callback function for the commissioner when a join event occurs for simplicity. 

    #include <openthread/commissioner.h>
    #include <openthread/instance.h>
    #include <openthread/ip6.h>
    #include <openthread/thread_ftd.h>
    #include <openthread/platform/logging.h>
    #include <zephyr/logging/log.h>
    #include <zephyr/kernel.h>
    LOG_MODULE_REGISTER(main, LOG_LEVEL_INF);
    
    otInstance *ot_instance;
    
    
    // Optional callback
    void commissioner_callback(otCommissionerState aState, void *aContext)
    {
        switch (aState)
        {
        case OT_COMMISSIONER_STATE_ACTIVE:
            LOG_INF("Commissioner started successfully");
            break;
        case OT_COMMISSIONER_STATE_DISABLED:
            LOG_INF("Commissioner disabled");
            break;
        case OT_COMMISSIONER_STATE_PETITION:
            LOG_INF("Commissioner petitioning");
            break;
        }
    }
    
    int main() {
    	ot_instance = otInstanceInitSingle();
    	if (ot_instance == NULL) {
    		LOG_ERR("Failed to initialize OpenThread instance");
    		return -1;
    	}
    
    	otError err = otIp6SetEnabled(ot_instance, true);
    	if (err != OT_ERROR_NONE) {
    		LOG_ERR("Failed to enable IPv6: %d", err);
    		return -1;
    	}
    	err = otThreadSetEnabled(ot_instance, true);
    	if (err != OT_ERROR_NONE) {
    		LOG_ERR("Failed to enable Thread: %d", err);
    		return -1;
    	}
    	while (otThreadGetDeviceRole(ot_instance) != OT_DEVICE_ROLE_LEADER) {
    		LOG_INF("Waiting for the device to become a leader...");
    		k_msleep(4000);
    	}
    	LOG_INF("Device is now leader");
    	for (int i = 0; i < 5; i++) {
    		LOG_INF("Delay time left: %d", 5 - i);
    		k_msleep(1000);
    	}
    	err = otCommissionerStart(ot_instance, commissioner_callback, NULL, NULL);
    	if (err != OT_ERROR_NONE) {
    		LOG_ERR("Failed to start commissioner: %d", err);
    		return -1;
    	}
    
    	while (otCommissionerGetState(ot_instance) != OT_COMMISSIONER_STATE_ACTIVE) {
    		LOG_INF("Waiting for commissioner to become active...");
    		k_msleep(4000);
    	}
    
    	otCommissionerAddJoiner(ot_instance, NULL, "J01NME", 300);
    
    	while (1) {
    		k_msleep(1000);
    	}
    	return 0;
    }

    Please also note that I am conducting my experiments with the nRF54L15DK this time, instead of our custom modules which has caused some trouble in the past. 

    Thanks.

    Best regards,

    Allan

Reply
  • Hi Maria,

    I still haven't figured out a solution to this issue. By looking at my code and procedures, do you think it SHOULD "just work out" normally? I believe this is quite a simple program that should be straight forward to create. 

    Another strange thing that I have noticed recently is that when I run the debugger on my code, it lands at a fatal halt with reason 3. Is this expected behaviour? What does reason 3 mean in this context? I don't think this is something that affects the execution of my program though, as all of my programs run into this issue, but most of them execute without any problems. 

    Also, I tried implementing a commissioner by program as well, but it seems like it gets stuck at the petitioning state, similar to how the joiner gets stuck at the discover state. 
    I made sure that the commissioner device would become the leader before starting commissioner as well. 

    Here are some logs that the 2 programs produced:

    As you see, it is quite unusual, as the commissioner should petition for less than a second before it becomes active (given that it is the leader already), and the joiner should also return some kind of error and return to Idle mode, if it doesn't find a corresponding commissioner within the set time. 

    Here are the logs when I manually interact with them, stopping their commissioner/joiner and restarting them by command line: 

    The join is successful. 

    Below is the code for starting the commissioner. Please note that I did not implement a callback function for the commissioner when a join event occurs for simplicity. 

    #include <openthread/commissioner.h>
    #include <openthread/instance.h>
    #include <openthread/ip6.h>
    #include <openthread/thread_ftd.h>
    #include <openthread/platform/logging.h>
    #include <zephyr/logging/log.h>
    #include <zephyr/kernel.h>
    LOG_MODULE_REGISTER(main, LOG_LEVEL_INF);
    
    otInstance *ot_instance;
    
    
    // Optional callback
    void commissioner_callback(otCommissionerState aState, void *aContext)
    {
        switch (aState)
        {
        case OT_COMMISSIONER_STATE_ACTIVE:
            LOG_INF("Commissioner started successfully");
            break;
        case OT_COMMISSIONER_STATE_DISABLED:
            LOG_INF("Commissioner disabled");
            break;
        case OT_COMMISSIONER_STATE_PETITION:
            LOG_INF("Commissioner petitioning");
            break;
        }
    }
    
    int main() {
    	ot_instance = otInstanceInitSingle();
    	if (ot_instance == NULL) {
    		LOG_ERR("Failed to initialize OpenThread instance");
    		return -1;
    	}
    
    	otError err = otIp6SetEnabled(ot_instance, true);
    	if (err != OT_ERROR_NONE) {
    		LOG_ERR("Failed to enable IPv6: %d", err);
    		return -1;
    	}
    	err = otThreadSetEnabled(ot_instance, true);
    	if (err != OT_ERROR_NONE) {
    		LOG_ERR("Failed to enable Thread: %d", err);
    		return -1;
    	}
    	while (otThreadGetDeviceRole(ot_instance) != OT_DEVICE_ROLE_LEADER) {
    		LOG_INF("Waiting for the device to become a leader...");
    		k_msleep(4000);
    	}
    	LOG_INF("Device is now leader");
    	for (int i = 0; i < 5; i++) {
    		LOG_INF("Delay time left: %d", 5 - i);
    		k_msleep(1000);
    	}
    	err = otCommissionerStart(ot_instance, commissioner_callback, NULL, NULL);
    	if (err != OT_ERROR_NONE) {
    		LOG_ERR("Failed to start commissioner: %d", err);
    		return -1;
    	}
    
    	while (otCommissionerGetState(ot_instance) != OT_COMMISSIONER_STATE_ACTIVE) {
    		LOG_INF("Waiting for commissioner to become active...");
    		k_msleep(4000);
    	}
    
    	otCommissionerAddJoiner(ot_instance, NULL, "J01NME", 300);
    
    	while (1) {
    		k_msleep(1000);
    	}
    	return 0;
    }

    Please also note that I am conducting my experiments with the nRF54L15DK this time, instead of our custom modules which has caused some trouble in the past. 

    Thanks.

    Best regards,

    Allan

Children
  • Hi Allan,

    Allan-led said:
    Sorry I forgot to clarify this, but this was a piece of CLI code modified by a previous firmware engineer, and I do not know why there is a main.h either.

    No worries, I just asked to be thorough. I can see that there are some include statements which are not in my main.c, so I have included them now.

    Allan-led said:

    I still haven't found a solution to the problem, but I would like you to know that we can also perform the following: 
    1. start the commissioner, add wildcard joiner

    2. factoryreset the joiner device

    3. `ot stop joiner` on joiner device

    4. `ot ifconfig down` on joiner device

    5. `ot ifconfig up` on joiner device

    6. `ot joiner start J01NME` on joiner device

    Then the join can still be successful, and everything works out,

    Thank you for sharing this. It does indicate that starting the joiner through the source code is what is not working and not a limitation with configuration of the modified CLI sample.

    We could learn more by sniffing the traffic of this. I will do this tomorrow morning and report back with my findings. If you can also do a trace of the traffic that is a good bonus.

    Allan-led said:
    I ran the debugger on the Joiner program, but it seems like it halts due to reason 3. However, the command line seems to be active, always. Thus, I don't think I have observed a fatal halt. 

    I'll look out for this as well. Are you building with debugging optimizations before running the debugging session?

    Allan-led said:
    Another strange thing that I have noticed recently is that when I run the debugger on my code, it lands at a fatal halt with reason 3. Is this expected behaviour? What does reason 3 mean in this context? I don't think this is something that affects the execution of my program though, as all of my programs run into this issue, but most of them execute without any problems. 

    I tried to search for a short time for an explanation on what reason 3 is, but I did not find it. I will search some more and possibly ask one of my colleagues if they know.

    Allan-led said:
    By looking at my code and procedures, do you think it SHOULD "just work out" normally? I believe this is quite a simple program that should be straight forward to create. 

    I think there are some pieces missing, but I don't know which yet. I was considering the CONFIG_OPENTHREAD_JOINER_AUTOSTART feature, but then you won't get the custom callback.

    Maybe the sniffer trace will shed some light on what is going on.

    Allan-led said:
    Please also note that I am conducting my experiments with the nRF54L15DK this time, instead of our custom modules which has caused some trouble in the past. 

    Noted. I am also using nRF54L15 DKs.

    Best regards,

    Maria

Related