Commissioner not responding to join request

nrf52840

NCS 1.9.1

We have a Thread project based on the Coap Client and Server example.  The Coap server acts as the Commissioner, and other devices join using a known pskd.  Sometimes this works fine, but after the server has been running a while (overnght) it does not respond to the join request.

A button press on the server allows joining:

static void on_button_changed(uint32_t button_state, uint32_t has_changed)
{
	uint32_t pressed = button_state & has_changed;
	uint32_t released = ~button_state & has_changed;

	// Button 1:  
	if (pressed & DK_BTN1_MSK) 
	{
		k_work_submit(&joining_work);		// calls activate_joining()
	}
}

// Called in workqueue joining_work
static void activate_joining(struct k_work *item)
{
	ARG_UNUSED(item);

	struct openthread_context *ot_context = openthread_get_default_context();
	otError err;

	otInstance *instance = ot_context->instance;

	otCommissionerState state;
	if ((state = otCommissionerGetState(instance)) == OT_COMMISSIONER_STATE_ACTIVE)
	{
		err = otCommissionerAddJoiner(instance, NULL, pskd, JOINER_TIMEOUT); 
		if (err != OT_ERROR_NONE)
		{
			printk("\r\notCommissionerAddJoiner() failed [%d]\r\n", err);
		}
		else
		{
			//k_timer_start(&led_timer, K_MSEC(100), K_MSEC(100));
			g_joining_allowed = true;

			printk("\r\nCommissioner joiner started for %d seconds...\r\n", JOINER_TIMEOUT);
		}
	}
	else
	{
		printk("\r\nCommissioner state = %d. Restarting...", state);

		err = otCommissionerStart(ot_context->instance, commissioner_state_cb, commissioner_joiner_cb, ot_context);
		if (err != OT_ERROR_NONE)
			printk("otCommissionerStart() failed [%d]\r\n", err);
		else
			start_joiner = true;		// trigger AddJoiner when "commissioner active" happens
	}
}

This happens successfully all the time.  I see the message "Commissioner joiner started for 120 seconds..." every time.

The Coap client uses auto joining by setting these configuration parameters.

# Thread Joiner
CONFIG_OPENTHREAD_JOINER=y
CONFIG_OPENTHREAD_JOINER_AUTOSTART=y
CONFIG_OPENTHREAD_JOINER_PSKD="J01NUS"
CONFIG_OPENTHREAD_MANUAL_START=n
On WireShark I can see the Discovery Request with "Joiner Flag: Intending" but there is no response from the Coap Server.  At the same time I have a device that is broadcasting Discovery Requests (without the joining flag set).  The Coap Server responds to these.  And also responds to requests for router information (otThreadGetRouterInfo() ) with this
--- Router fd70:a900:acc1:3d9f:0:ff:fe00:9000 ----

Ext Address: 0xfb58ff0e
Rloc16: 0x9000
IP6 Address List:
- fd70:a900:acc1:3d9f:0:ff:fe00:fc37    Commissioner
- fd70:a900:acc1:3d9f:0:ff:fe00:fc00    Leader
- fd70:a900:acc1:3d9f:0:ff:fe00:9000
- fd70:a900:acc1:3d9f:355e:d60e:5def:e87b    Coap Server
- fe80:0:0:0:cff:58fb:cdf3:6c77

So, the Coap Server is the Commissioner, and joining is allowed.  So, why isn't it responding to the joiner's request?

Why does this work sometimes and but not always?

Where can I look to diagnose this?

Mary

Parents
  • Hi, 

    I am looking into your case and will update it next week. 

    Regards,
    Amanda H.

  • I recompiled with this change to the prf.conf:

    CONFIG_OPENTHREAD_DEBUG=y
    #CONFIG_OPENTHREAD_DEBUG=n
    CONFIG_OPENTHREAD_LOG_LEVEL_NOTE=y
    When a client is trying to join, I saw this message on the Server:
    W: [WARN]-MLE: Failed to process Discovery Request: Security
    I see this message before I press the "Allow Joining" button, and again when it tries to join after I press the button.
    otCommissionerAddJoiner(instance, NULL, pskd, JOINER_TIMEOUT);  is not returning an error, but it doesn't seem like it worked.
    Mry
  • I'm trying to look at the Commissioning data (which includes steering data) with this call:

    otNetDataGetCommissioningDataset()
    In prj.conf, I have
    CONFIG_OPENTHREAD_FTD=y
    CONFIG_OPENTHREAD_NETDATA_PUBLISHER=y
    And, in the compiler output I see:
    - OT_NETDATA_PUBLISHER=ON --> OPENTHREAD_CONFIG_NETDATA_PUBLISHER_ENABLE=1
    But, the end result of compilation is this error:
    ../src/coap_server.c:387: undefined reference to `otNetDataGetCommissioningDataset'
    How do I include this function?
    Mary
  • Hi, 

    This function was introduced in Openthread API somewhere around November this year.  Thus this function is not available at NCS 1.9.1. You will have to move to the latest version of NCS. 

    -Amanda H.

  • As noted in my reply on Nov 9, the project was migrated to NCS 2.4.1

    Mary

  • When joining fails, I do not see this line in the log:

    [00:09:06.992,126] <inf> [I] JoinerRouter--: Joiner Router: start

    The code that creates this log message is in

    <sdk>/modules/lib/openthread/src/core/meshcop/joiner_router.cpp, line 67

    void JoinerRouter::Start(void)
    {
        VerifyOrExit(Get<Mle::MleRouter>().IsFullThreadDevice());
    
        if (Get<NetworkData::Leader>().IsJoiningEnabled())
        {
            uint16_t port = GetJoinerUdpPort();
    
            VerifyOrExit(!mSocket.IsBound());
    
            IgnoreError(mSocket.Open(&JoinerRouter::HandleUdpReceive, this));
            IgnoreError(mSocket.Bind(port));
            IgnoreError(Get<Ip6::Filter>().AddUnsecurePort(port));
            LogInfo("Joiner Router: start");
        }
        else
        {
            VerifyOrExit(mSocket.IsBound());
    
            IgnoreError(Get<Ip6::Filter>().RemoveUnsecurePort(mSocket.GetSockName().mPort));
    
            IgnoreError(mSocket.Close());
        }
    
    exit:
        return;
    }

    My main question is why wasn't joining enabled?

    How can I detect and correct this situation?

    Mary

  • It seems that there is a conversation between the Commissioner and the Leader before Joining is fully enabled.  The signal that this has completed is a Thread state change with the OT_CHANGED_THREAD_NETDATA flag set.

    So, I implemented a work around.

    When the commissioner allows joining, start a timer (joinerstart_timer) with a 2 second timeout.

    // Called in workqueue joining_work
    static void activate_joining(struct k_work *item)
    {
    	ARG_UNUSED(item);
    
    	struct openthread_context *ot_context = openthread_get_default_context();
    	otError err;
    
    	otInstance *instance = ot_context->instance;
    
    	otCommissionerState state;
    	if ((state = otCommissionerGetState(instance)) == OT_COMMISSIONER_STATE_ACTIVE)
    	{
    		openthread_api_mutex_lock(ot_context);
    		err = otCommissionerAddJoiner(instance, NULL, pskd, JOINER_TIMEOUT); 
    		openthread_api_mutex_unlock(ot_context);
    		if (err != OT_ERROR_NONE)
    		{
    			printk("\r\notCommissionerAddJoiner() failed [%d]\r\n", err);
    		}
    		else
    		{
    			g_joining_allowed = true;
    
    			printk("\r\nCommissioner joiner started for %d seconds...\r\n", JOINER_TIMEOUT);
    			printk("Commissioner SessionID: [%d]\r\n\n", otCommissionerGetSessionId(instance));
    			k_timer_start(&joinerstart_timer, K_MSEC(2000), K_MSEC(2000));
    		}
    	}
    	else
    	{
    		printk("\r\nCommissioner state = %d. Restarting...", state);
    
    		openthread_api_mutex_lock(ot_context);
    		err = otCommissionerStart(ot_context->instance, commissioner_state_cb, commissioner_joiner_cb, ot_context);
    		openthread_api_mutex_unlock(ot_context);
    		if (err != OT_ERROR_NONE)
    			printk("otCommissionerStart() failed [%d]\r\n", err);
    		else
    			g_start_joiner = true;		// trigger AddJoiner when "commissioner active" happens
    	}
    }

    If the Thread state changes with OT_CHANGED_THREAD_NETDATA, stop the timer.  Everything is ok.

    static void on_thread_state_changed(otChangedFlags flags, struct openthread_context *ot_context, void *user_data)
    {
    	otError err;
    
    	printk("\r\nThread state changed: role: %d  flags: 0x%x\r\n", otThreadGetDeviceRole(ot_context->instance), flags);
    
    	if (flags & OT_CHANGED_THREAD_ROLE) 
    	{
    		switch (otThreadGetDeviceRole(ot_context->instance)) {
    		case OT_DEVICE_ROLE_CHILD:
    		case OT_DEVICE_ROLE_ROUTER:
    		case OT_DEVICE_ROLE_LEADER:
    			ot_is_connected = true;
    
    			if (otCommissionerGetState(ot_context->instance) == OT_COMMISSIONER_STATE_DISABLED)
    			{
    				openthread_api_mutex_lock(ot_context);
    				err = otCommissionerStart(ot_context->instance, commissioner_state_cb, commissioner_joiner_cb, ot_context);
    				openthread_api_mutex_unlock(ot_context);
    				if (err != OT_ERROR_NONE)
    					printk("otCommissionerStart() failed [%d]\r\n", err);
    			}
    			break;
    
    		case OT_DEVICE_ROLE_DISABLED:
    		case OT_DEVICE_ROLE_DETACHED:
    		default:
    			ot_is_connected = false;
    			break;
    
    		} // endswitch: role
    	} // endif: role changed
    
    	if (flags & OT_CHANGED_THREAD_NETDATA)
    	{
    		k_timer_stop(&joinerstart_timer);
    	}
    }

    But, if the timer expires first.  Stop the commissioner.  

    void on_joinerstart_timeout(struct k_work *work)
    {
    	otError err;
    	struct openthread_context *ot_context = openthread_get_default_context();
    
    	printk("\r\nJoiner Start timed out. Restarting commissioner.");
    
    	// Net Data Changed was not received, restart commissioner
    	openthread_api_mutex_lock(ot_context);
    	err = otCommissionerStop(ot_context->instance);
    	openthread_api_mutex_unlock(ot_context);
    
    	if (err != OT_ERROR_NONE)
    		printk("otCommissionerStop() failed [%d]\r\n", err);
    }
    
    K_WORK_DEFINE(joinerstarttimout_work, on_joinerstart_timeout);
    
    void joinerstart_timer_handler(struct k_timer *p_timer)
    {
    	k_timer_stop(&joinerstart_timer);
    	k_work_submit(&joinerstarttimout_work);		// stops commissioner
    }

    The user will have to re-initiate joining (button press in our case), which will restart the Commissioner, then call activate_joining().

    I still don't know why the process fails occasionally, but the workaround has succeeded several times.

    Mary

Reply
  • It seems that there is a conversation between the Commissioner and the Leader before Joining is fully enabled.  The signal that this has completed is a Thread state change with the OT_CHANGED_THREAD_NETDATA flag set.

    So, I implemented a work around.

    When the commissioner allows joining, start a timer (joinerstart_timer) with a 2 second timeout.

    // Called in workqueue joining_work
    static void activate_joining(struct k_work *item)
    {
    	ARG_UNUSED(item);
    
    	struct openthread_context *ot_context = openthread_get_default_context();
    	otError err;
    
    	otInstance *instance = ot_context->instance;
    
    	otCommissionerState state;
    	if ((state = otCommissionerGetState(instance)) == OT_COMMISSIONER_STATE_ACTIVE)
    	{
    		openthread_api_mutex_lock(ot_context);
    		err = otCommissionerAddJoiner(instance, NULL, pskd, JOINER_TIMEOUT); 
    		openthread_api_mutex_unlock(ot_context);
    		if (err != OT_ERROR_NONE)
    		{
    			printk("\r\notCommissionerAddJoiner() failed [%d]\r\n", err);
    		}
    		else
    		{
    			g_joining_allowed = true;
    
    			printk("\r\nCommissioner joiner started for %d seconds...\r\n", JOINER_TIMEOUT);
    			printk("Commissioner SessionID: [%d]\r\n\n", otCommissionerGetSessionId(instance));
    			k_timer_start(&joinerstart_timer, K_MSEC(2000), K_MSEC(2000));
    		}
    	}
    	else
    	{
    		printk("\r\nCommissioner state = %d. Restarting...", state);
    
    		openthread_api_mutex_lock(ot_context);
    		err = otCommissionerStart(ot_context->instance, commissioner_state_cb, commissioner_joiner_cb, ot_context);
    		openthread_api_mutex_unlock(ot_context);
    		if (err != OT_ERROR_NONE)
    			printk("otCommissionerStart() failed [%d]\r\n", err);
    		else
    			g_start_joiner = true;		// trigger AddJoiner when "commissioner active" happens
    	}
    }

    If the Thread state changes with OT_CHANGED_THREAD_NETDATA, stop the timer.  Everything is ok.

    static void on_thread_state_changed(otChangedFlags flags, struct openthread_context *ot_context, void *user_data)
    {
    	otError err;
    
    	printk("\r\nThread state changed: role: %d  flags: 0x%x\r\n", otThreadGetDeviceRole(ot_context->instance), flags);
    
    	if (flags & OT_CHANGED_THREAD_ROLE) 
    	{
    		switch (otThreadGetDeviceRole(ot_context->instance)) {
    		case OT_DEVICE_ROLE_CHILD:
    		case OT_DEVICE_ROLE_ROUTER:
    		case OT_DEVICE_ROLE_LEADER:
    			ot_is_connected = true;
    
    			if (otCommissionerGetState(ot_context->instance) == OT_COMMISSIONER_STATE_DISABLED)
    			{
    				openthread_api_mutex_lock(ot_context);
    				err = otCommissionerStart(ot_context->instance, commissioner_state_cb, commissioner_joiner_cb, ot_context);
    				openthread_api_mutex_unlock(ot_context);
    				if (err != OT_ERROR_NONE)
    					printk("otCommissionerStart() failed [%d]\r\n", err);
    			}
    			break;
    
    		case OT_DEVICE_ROLE_DISABLED:
    		case OT_DEVICE_ROLE_DETACHED:
    		default:
    			ot_is_connected = false;
    			break;
    
    		} // endswitch: role
    	} // endif: role changed
    
    	if (flags & OT_CHANGED_THREAD_NETDATA)
    	{
    		k_timer_stop(&joinerstart_timer);
    	}
    }

    But, if the timer expires first.  Stop the commissioner.  

    void on_joinerstart_timeout(struct k_work *work)
    {
    	otError err;
    	struct openthread_context *ot_context = openthread_get_default_context();
    
    	printk("\r\nJoiner Start timed out. Restarting commissioner.");
    
    	// Net Data Changed was not received, restart commissioner
    	openthread_api_mutex_lock(ot_context);
    	err = otCommissionerStop(ot_context->instance);
    	openthread_api_mutex_unlock(ot_context);
    
    	if (err != OT_ERROR_NONE)
    		printk("otCommissionerStop() failed [%d]\r\n", err);
    }
    
    K_WORK_DEFINE(joinerstarttimout_work, on_joinerstart_timeout);
    
    void joinerstart_timer_handler(struct k_timer *p_timer)
    {
    	k_timer_stop(&joinerstart_timer);
    	k_work_submit(&joinerstarttimout_work);		// stops commissioner
    }

    The user will have to re-initiate joining (button press in our case), which will restart the Commissioner, then call activate_joining().

    I still don't know why the process fails occasionally, but the workaround has succeeded several times.

    Mary

Children
No Data
Related