This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

nRF5 SDK for Mesh: How to reset a Bluetooth Mesh node?

Dear Nordic experts,
 
This short code snippet is supposed to reset (unprovision) a Bluetooth Mesh node and reboot it (nRF5 SDK for Mesh 5.0.0):
 

Fullscreen
1
2
3
4
5
6
7
8
9
10
void node_reset(void)
{
__LOG(LOG_SRC_APP, LOG_LEVEL_INFO, " >>>>> Resetting node <<<<< \n");
if (mesh_stack_is_device_provisioned()) {
mesh_stack_config_clear();
mesh_stack_device_reset();
}
schedule_reboot(); // just waits for 2 seconds before calling sd_nvic_SystemReset(); to reboot the node.
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

 
node_reset() is called at the main() loop once an input pin is set. At the time it is called, the (custom/vendor) model is still operating. The problem is, that mesh_stack_config_clear() ends up in app_error_fault_handler().  
 
The following is printed at the debug terminal in SES:
 
Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<t: 2006463>, main.c, 252, >>>>> Resetting node <<<<<
<t: 2006553>, main.c, 213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t: 2006556>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006559>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006650>, main.c, 213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t: 2006653>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006656>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006747>, main.c, 213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t: 2006750>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006753>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006844>, main.c, 213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t: 2006847>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006850>, main.c, 217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t: 2006869>, app_error_weak.c, 105, Mesh assert at 0x0002E65E (:0)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

 
And the stacktrace looks like this:
 
Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
app_error_fault_handler()
mesh_assertion_handler()
backend_evt_handler()
write_complete_cb()
process_action_queue()
send_end_events()
bearer_event_handler()
bearer_event_flag_set()
mesh_config_backend_record_write()
dirty_entries_process()
mesh_config_entry_set()
seqnum_block_allocate()
mesh_stack_config_clear()
node_reset()
main()
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

 
The questions now are:  
What's causing this issue? Or: How do you reset/unprovision and reboot a mesh node in nRF5 SDK for Mesh 5.0.0?  
 
Your help is very much appreciated,
Thank you,
Michael.

Parents
  • Hi Mike, 

    I would suggest mesh_stack_config_clear() instead of calling mesh_config_clear() 

    Can you reproduce the issue using one of our example ? For example the light switch server ? We do have the functionality that if you press button 4 it will clear the setting and reset the node. Please check the node_reset() function in main.c 

    Also, as I can see in the code how the config server handle node reset command from the client (check handle_node_reset() ). It will go through the following stages: 
    typedef enum
    {
    NODE_RESET_IDLE,
    NODE_RESET_PENDING,
    #if MESH_FEATURE_GATT_PROXY_ENABLED
    NODE_RESET_PENDING_PROXY,
    #endif /* MESH_FEATURE_GATT_PROXY_ENABLED */
    NODE_RESET_FLASHING,
    } node_reset_state_t;

    So if the flash module is busy it will stay in pending state(NODE_RESET_FLASHING) and wait for NRF_MESH_EVT_FLASH_STABLE event before it continue to the node reset process. 

  • Hi Hung,
     
    Thanks for your help. Just a quick question:  
    Imagine a simple custom node, containing a single element with a single instance of a proprietary/vendor model plus a configuration server and health server instance.
     
    Whats the recommend api to store a few hundred bytes of model related data in flash?
    model_config_file_xxx(), mesh_config_entry_xxx() flesh_manager_xxx()? Or something else?
     
    Are the model_config_file_xxx() api calls needed for vendor (i.e. non standard/generic) models? Would it be safe to omit those calls?  (please note: config and health server are running as well.)
    Right now, I'm using the mesh_config_entry_xxx() api - which works quite well, except the previously mention fault triggered by mesh_stack_config_clear(), which makes me think that I'm most likely using the wrong api for the job...

    Any clarification is welcome,
    Thank you,
    Michael.


     

  • Hi BlueMike, 
    Please use  addr2line.exe tool to check, it's easier than looking at the code and guess what could be wrong. Or you can send us your .elf file I can run it here. 

  • Hi Hung,

    in debug configuration, addr2line outputs:

    Fullscreen
    1
    nrf5_SDK_for_Mesh_v5.0.0_src/mesh/core/src/mesh_config.c:322 (discriminator 1)
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Oddly enough the problem doesn't happen that often in debug configuration. In release builds, however, it does happen reliably.

    Also: Although the firmware gets trapped in app_error_fault_handler() when calling mesh_stack_config_clear(), the node is unprovisioned after rebooting it.

    Any ideas what could be wrong here, or how to fix it, are welcome. Thank you,
    Michael.

  • Hi Michael, 

    How do you initialize the mesh stack ? 
    If you initialize it with , then you should call the mesh_stack_config_clear() at the same interrupt level. Here is what said in the documentation:  

    You must ensure that no mesh API functions are called from an IRQ priority other than the one specified in the configuration.

    Line 322 in the mesh_config.c points to : 
      NRF_MESH_ASSERT_DEBUG(*p_flags & MESH_CONFIG_ENTRY_FLAG_BUSY);

    It seems that there could be something else accessing the flag when you are trying to clear it. 
    Did the issue occurred all the time when you run in debug mode ? 
    Could you provide a modified version of the light switch example that expose the same issue  ? 

  • Hi Hung,  
     
    Thanks for providing that link!  

    So the problem is that main() runs at another IRQ priority than the mesh stack an hence calling mesh_stack_config_clear() triggers the error handler.

    Question now is what's the best way to fix it?  

    1. The scheduler (as explained in the link) would be an option, but I'm a little afraid of side effects, because the main loop already handles the business logic.
    2. How about the Event generator unit (EGU)? Would it be safe to use it to trigger a software interrupt with the same priority than the mesh stack? The software interrupt would only be used to clear the flash, unprovision the node and reboot the device, but what if the mesh stack is writing to the flash while the software interrupt wants to delete the same data? Is that scenario already handled?
    3. Or maybe something else...?

    What approach do you recommend?
     
    Again, thank you,
    Michael.

  • Hi Michael, 

    How do you initialize the mesh stack ? If you initialize it with  NRF_MESH_IRQ_PRIORITY_THREAD  then it's fine to call mesh_stack_config_clear() from inside main(). 

    I don't think the scheduler would do any help here as it acts the same as what you are already doing, instead of running the code in an interrupt context it execute the code in main() context (THREAD).

    If you initialize mesh with NRF_MESH_IRQ_PRIORITY_LOWEST and if you have the button GPIOTE event irq at the same priority level then you just need to call mesh_stack_config_clear() directly from the interrupt handler (the same as we did in our example)

    If you have different interrupt level and want to change the context you can use the EGU /SWI. Please follow what we do with SWI_IRQn in app_timer_mesh.c file to see how we trigger a software interrupt. 

Reply
  • Hi Michael, 

    How do you initialize the mesh stack ? If you initialize it with  NRF_MESH_IRQ_PRIORITY_THREAD  then it's fine to call mesh_stack_config_clear() from inside main(). 

    I don't think the scheduler would do any help here as it acts the same as what you are already doing, instead of running the code in an interrupt context it execute the code in main() context (THREAD).

    If you initialize mesh with NRF_MESH_IRQ_PRIORITY_LOWEST and if you have the button GPIOTE event irq at the same priority level then you just need to call mesh_stack_config_clear() directly from the interrupt handler (the same as we did in our example)

    If you have different interrupt level and want to change the context you can use the EGU /SWI. Please follow what we do with SWI_IRQn in app_timer_mesh.c file to see how we trigger a software interrupt. 

Children
  • Hello Hung,
     
    sorry, I was already a step ahead:  

    How do you initialize the mesh stack?

    With NRF_MESH_IRQ_PRIORITY_LOWEST, as used in the examples:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    model_config_file_init();
    mesh_stack_init_params_t init_params = {
    .core.irq_priority = NRF_MESH_IRQ_PRIORITY_LOWEST,
    .core.lfclksrc = DEV_BOARD_LF_CLK_CFG,
    .core.p_uuid = NULL,
    .models.models_init_cb = models_init_cb,
    .models.config_server_cb = config_server_evt_cb
    };
    uint32_t status = mesh_stack_init(&init_params, &device_provisioned);
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I'll try to fix the issue by changing the context to NRF_MESH_IRQ_PRIORITY_LOWEST, when resetting the node from main().
     
    ---
     
    Thanks for pointing to app_timer_mesh.c. What's interesting is that app_timer_mesh.c calls some NVIC_xxx() functions (NVIC_EnableIRQ(), NVIC_SetPendingIRQ(), etc.). Google, however, is having difficulties finding any api docs for those functions, making me wonder if that's really the right api to use - considering a firmware application that's also using the softdevice.  
     
    According to infocenter, there seem to be three alternatives available to temporarily elevate from main() to NRF_MESH_IRQ_PRIORITY_LOWEST:

    1. SWI Driver - legacy layer
      Legacy sounds a little deprecated, but at least it comes with a guide.
    2. SWI driver:
      The description says: Driver for managing software interrupts (SWI). Which sounds promising.
    3. EGU HAL:
      The docs state: Hardware access layer for managing the Event Generator Unit (EGU) peripheral.
      Looks like it could also fit the bill.

    Now I'm a little confused which of these four apis should be used...  
    Which one would you recommend?

    Thank you,
    Michael.

  • Hi Hung,

    Thanks for your help. Got it running using the SWI driver. Still uncertain if the other options would be a better choice. Anyway, would be great if the docs would explain which of the apis to chose in which situations.

    Btw. docs:
    It would be a real time safer (prevent a lot of 'grep-ing' through the SDK directories) if the docs would mention to which header files they relate.

    One more thing that could be a bug:
    My firmware uses app_timer_mesh.c to calculate the uptime, which by default uses SWI0. Calling nrfx_swi_alloc() also tries to allocate SWI0, causing the following linker error:

    Fullscreen
    1
    (.text.SWI0_EGU0_IRQHandler+0x0): multiple definition of `SWI0_EGU0_IRQHandler'; build/AWESOME_DEVICE_V3_nrf52832_s132_7.3.0_Release/obj/nrfx_swi.o:nrfx_swi.c:(.text.SWI0_EGU0_IRQHandler+0x0): first defined here
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    According to the docs, nrfx_swi_alloc() should allocating the first unused SWI instance, which apparently fails if app_timer_mesh.c is in use. It can easily be fixed by assigning another number to APP_TIMER_CONFIG_SWI_NUMBER in sdk_config.h, but still smells like a bug - or something that should be mentioned at the docs.
    Anyway,
    Again, thanks for your help,
    Michael.