This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

nRF5 SDK for Mesh: How to reset a Bluetooth Mesh node?

Dear Nordic experts,
 
This short code snippet is supposed to reset (unprovision) a Bluetooth Mesh node and reboot it (nRF5 SDK for Mesh 5.0.0):
 

void node_reset(void)
{
    __LOG(LOG_SRC_APP, LOG_LEVEL_INFO, " >>>>> Resetting node <<<<< \n");
    if (mesh_stack_is_device_provisioned()) {
        mesh_stack_config_clear();   
        mesh_stack_device_reset();
    }
 
    schedule_reboot();  // just waits for 2 seconds before calling sd_nvic_SystemReset(); to reboot the node.
} 

 
node_reset() is called at the main() loop once an input pin is set. At the time it is called, the (custom/vendor) model is still operating. The problem is, that mesh_stack_config_clear() ends up in app_error_fault_handler().  
 
The following is printed at the debug terminal in SES:
 
<t:    2006463>, main.c,  252,  >>>>> Resetting node <<<<<  
<t:    2006553>, main.c,  213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t:    2006556>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006559>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006650>, main.c,  213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t:    2006653>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006656>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006747>, main.c,  213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t:    2006750>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006753>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006844>, main.c,  213, Mesh event: NRF_MESH_EVT_CONFIG_STABLE
<t:    2006847>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006850>, main.c,  217, Mesh event: NRF_MESH_EVT_FLASH_STABLE
<t:    2006869>, app_error_weak.c,  105, Mesh assert at 0x0002E65E (:0) 

 
And the stacktrace looks like this:
 
app_error_fault_handler()
mesh_assertion_handler()
backend_evt_handler()
write_complete_cb()
process_action_queue()
send_end_events()
bearer_event_handler()
bearer_event_flag_set()
mesh_config_backend_record_write()
dirty_entries_process()
mesh_config_entry_set()
seqnum_block_allocate()
mesh_stack_config_clear()
node_reset()
main() 

 
The questions now are:  
What's causing this issue? Or: How do you reset/unprovision and reboot a mesh node in nRF5 SDK for Mesh 5.0.0?  
 
Your help is very much appreciated,
Thank you,
Michael.

Parents
  • Hi Mike, 

    I would suggest mesh_stack_config_clear() instead of calling mesh_config_clear() 

    Can you reproduce the issue using one of our example ? For example the light switch server ? We do have the functionality that if you press button 4 it will clear the setting and reset the node. Please check the node_reset() function in main.c 

    Also, as I can see in the code how the config server handle node reset command from the client (check handle_node_reset() ). It will go through the following stages: 
    typedef enum
    {
    NODE_RESET_IDLE,
    NODE_RESET_PENDING,
    #if MESH_FEATURE_GATT_PROXY_ENABLED
    NODE_RESET_PENDING_PROXY,
    #endif /* MESH_FEATURE_GATT_PROXY_ENABLED */
    NODE_RESET_FLASHING,
    } node_reset_state_t;

    So if the flash module is busy it will stay in pending state(NODE_RESET_FLASHING) and wait for NRF_MESH_EVT_FLASH_STABLE event before it continue to the node reset process. 

  • Hi Hung,
     
    Thanks for your help. Just a quick question:  
    Imagine a simple custom node, containing a single element with a single instance of a proprietary/vendor model plus a configuration server and health server instance.
     
    Whats the recommend api to store a few hundred bytes of model related data in flash?
    model_config_file_xxx(), mesh_config_entry_xxx() flesh_manager_xxx()? Or something else?
     
    Are the model_config_file_xxx() api calls needed for vendor (i.e. non standard/generic) models? Would it be safe to omit those calls?  (please note: config and health server are running as well.)
    Right now, I'm using the mesh_config_entry_xxx() api - which works quite well, except the previously mention fault triggered by mesh_stack_config_clear(), which makes me think that I'm most likely using the wrong api for the job...

    Any clarification is welcome,
    Thank you,
    Michael.


     

  • Hi Michael, 
    The recommended way of storing data in mesh is to use your own file and entries in the file (mesh_config_entry_xxx). I would suggest to have a look at the enocean example where we store some custom data in a separated file. 

    Could you try to reproduce the issue with one of our example so we can test here ? 

  • Hi Hung,

    Thanks for the fast reply! I'm using mesh_config_entry_xxx() api's right now, but they are causing some issues - especially when deleting the data during reset/unprovisioning.

    Could you try to reproduce the issue with one of our example so we can test here ? 

    Sure, but the problem is a little more complicated: The way it looks right now, the issue happens if the node is provisioned, but not fully set up. For example when the (Android) app starts provisioning, but the connection gets lost before an App key is assigned. Trying to reset the node by calling mesh_stack_config_clear() causes the firmware to get trapped in app_error_fault_handler() - basically brick-ing the device...

    I need some way to recover from it. Do you have any ideas how to prevent app_error_fault_handler() from making the node inaccessible/unresponsive?

  • Hi BlueMike, 

    You can use addr2line.exe tool (I got it from installing MinGW) to find what cause app_error_fault_handler() . You just need to input the .elf file and the line that throwing the error (in your case 0x0002E65E) and it will show you the file and the line of code that causing the issue. 

  • Hi Hung,

    again thanks for your help. Considering how quickly the softdevice ends up at app_error_fault_handler(), I assume that there's something fundamentally wrong. But let's start again at the beginning:

    1. The firmware initializes an input pin (E7: P0.05 - AIN3) to receive an event when a button is pushed.

    nrfx_gpiote_in_config_t configButtonA = NRFX_GPIOTE_CONFIG_IN_SENSE_TOGGLE(false);
    configButtonA.pull = NRF_GPIO_PIN_PULLUP;
    ERROR_CHECK(nrfx_gpiote_in_init(BUTTON_A_INPUT_PIN, &configButtonA, button_a_pushed_event));
    nrfx_gpiote_in_event_enable(BUTTON_A_INPUT_PIN, true); 

    2. The button's event handler just remembers the time the button is pushed:

    static void button_a_pushed_event(nrfx_gpiote_pin_t pin, nrf_gpiote_polarity_t action)
    {
        if (pin == BUTTON_A_INPUT_PIN) {
            int value = nrf_gpio_pin_read(pin);
            if (value) {
                reset_start_time = 0; // cancel reset
            } else {
                reset_start_time = get_uptime_in_milliseconds();
            }
        }
    }

    3. The main() function periodically polls the current time and passes it to handle_node_reset(), which performs the actual reset:

    int main(void)
    {
        initialize();
        start();
    
        while (true)
        {
            uint64_t uptime_in_milliseconds = calculate_uptime_in_milliseconds();
            handle_node_reset(uptime_in_milliseconds);
    
            handle_business_logic(uptime_in_milliseconds);
    
            nrf_delay_ms(50);
        }
    }
    
    

    @Hung: The firmware's business logic is working as intended, just resetting causes issues. Still, is this "busy-loop" in main() a safe way to handle the business logic and the reset? Do you see any potential issues here?

    4. handle_node_reset() performs the actual reset if the button remains pushed for a certain amount of time (RESET_DELAY below):

    void handle_node_reset(int64_t time_in_milliseconds)
    {
        if (reset_start_time == 0)
        {
            return;
        }
    
        if (time_in_milliseconds - reset_start_time > RESET_DELAY)
        {
            reset_start_time = 0; // we don't want to fall in here a second time before the reboot happens.
    
            if (mesh_stack_is_device_provisioned()) 
            {
                if (proxy_is_enabled()) 
                {
                    // Calling proxy_stop() also triggers app_error_fault_handler(), so disabled for now:
                    // proxy_stop(); 
                }
    
                mesh_stack_config_clear();
            }
    
            node_reset();
        }
    }



    When mesh_stack_config_clear() is executed, it ends up in app_error_fault_handler(). The call stack looks like this (Softdevice 132, 7.3.0 in debug build):

        File:                       Line:   Function:
        app_error_weak.c            79      void app_error_fault_handler(unsigned int id=0x00000000, unsigned int pc=0x00000000, unsigned int info=0x00000000)
        assertion_handler_weak.c    54      void mesh_assertion_handler(unsigned int pc=0x2000ecab)
        mesh_config.c               322     void backend_entry_evt_handler(const mesh_config_backend_evt_t* p_evt=0x2000edf0)
        mesh_config.c               372     void backend_evt_handler(const mesh_config_backend_evt_t* p_evt=0x0000003e)
        mesh_config_flashman_glue.c 132     void write_complete_cb(const flash_manager_t* p_manager=0x00000000, const fm_entry_t p_entry=0x2000ecab, enum result=FM_RESULT_SUCCESS (0))
        flash_manager.c             471     void end_action(action_t* p_action=0x20003cc0, const fm_entry_t* p_entry=0x00075008)
        flash_manager.c             773     _Bool process_action_queue()
        flash_manager.c             792     void flash_op_ended_callback(enum user=0x3e, const flash_operation_t* p_op=0x2000ecab, short unsigned int token=0x6a81)
        mesh_flash.c                159     _Bool send_end_events()
        bearer_event.c              380     _Bool bearer_event_handler()
        bearer_event.c              145     void QDEC_IRQHandler()
        bearer_event.c              303     void bearer_event_flag_set(unsigned int flag=0x0e5939a0)
        flash_manager.c             1090    void flash_manager_entry_commit(const fm_entry_t* p_entry=0x20003ccc)
        mesh_config_flashman_glue.c 371     uint32_t mesh_config_backend_record_write(mesh_config_backend_file_t* p_file=0x0ddfd680, const uint8_t* p_data=0x2000ef28, unsigned int length=0x00000005)
        mesh_config_backend.c       139     uint32_t mesh_config_backend_store(struct id={short unsigned int file=0x0000, short unsigned int record=0x0003}, const uint8_t* p_entry=0x2000ef28, unsigned int entry_len=0x00000005)
        mesh_config.c               157     uint32_t default_file_store(const mesh_config_entry_t* p_params=0x000475c4, struct id={short unsigned int file=0x0000, short unsigned int record=0x0003})
        mesh_config.c               195     void dirty_entries_process()
        mesh_config.c               234     uint32_t entry_store(const mesh_config_entry_params_t* p_params=0x000475c4, struct id={short unsigned int file=0x0000, short unsigned int record=0x0003}, const void* p_entry=0x2000efbc)
        mesh_config.c               510     uint32_t mesh_config_entry_set(struct id={short unsigned int file=0x0000, short unsigned int record=0x0003}, const void* p_entry=0x2000efbc)
        net_state.c                 462     void seqnum_block_allocate()
        net_state.c                 554     void net_state_reset()
        mesh_stack.c                272     void mesh_stack_config_clear()
                                            void handle_node_reset(long long int time_in_milliseconds=0x0000000000000003)
                                            void main()
    

    Note: The call stack was transcribed, not copied, so there could be some typos in there, sorry if there are.

    The question is, what can possibly cause the softdevice to end up in app_error_fault_handler() when calling mesh_stack_config_clear()?
    Any ideas are welcome.

  • Hi BlueMike, 
    Please use  addr2line.exe tool to check, it's easier than looking at the code and guess what could be wrong. Or you can send us your .elf file I can run it here. 

Reply Children
  • Hi Hung,

    in debug configuration, addr2line outputs:

    nrf5_SDK_for_Mesh_v5.0.0_src/mesh/core/src/mesh_config.c:322 (discriminator 1)

    Oddly enough the problem doesn't happen that often in debug configuration. In release builds, however, it does happen reliably.

    Also: Although the firmware gets trapped in app_error_fault_handler() when calling mesh_stack_config_clear(), the node is unprovisioned after rebooting it.

    Any ideas what could be wrong here, or how to fix it, are welcome. Thank you,
    Michael.

  • Hi Michael, 

    How do you initialize the mesh stack ? 
    If you initialize it with , then you should call the mesh_stack_config_clear() at the same interrupt level. Here is what said in the documentation:  

    You must ensure that no mesh API functions are called from an IRQ priority other than the one specified in the configuration.

    Line 322 in the mesh_config.c points to : 
      NRF_MESH_ASSERT_DEBUG(*p_flags & MESH_CONFIG_ENTRY_FLAG_BUSY);

    It seems that there could be something else accessing the flag when you are trying to clear it. 
    Did the issue occurred all the time when you run in debug mode ? 
    Could you provide a modified version of the light switch example that expose the same issue  ? 

  • Hi Hung,  
     
    Thanks for providing that link!  

    So the problem is that main() runs at another IRQ priority than the mesh stack an hence calling mesh_stack_config_clear() triggers the error handler.

    Question now is what's the best way to fix it?  

    1. The scheduler (as explained in the link) would be an option, but I'm a little afraid of side effects, because the main loop already handles the business logic.
    2. How about the Event generator unit (EGU)? Would it be safe to use it to trigger a software interrupt with the same priority than the mesh stack? The software interrupt would only be used to clear the flash, unprovision the node and reboot the device, but what if the mesh stack is writing to the flash while the software interrupt wants to delete the same data? Is that scenario already handled?
    3. Or maybe something else...?

    What approach do you recommend?
     
    Again, thank you,
    Michael.

  • Hi Michael, 

    How do you initialize the mesh stack ? If you initialize it with  NRF_MESH_IRQ_PRIORITY_THREAD  then it's fine to call mesh_stack_config_clear() from inside main(). 

    I don't think the scheduler would do any help here as it acts the same as what you are already doing, instead of running the code in an interrupt context it execute the code in main() context (THREAD).

    If you initialize mesh with NRF_MESH_IRQ_PRIORITY_LOWEST and if you have the button GPIOTE event irq at the same priority level then you just need to call mesh_stack_config_clear() directly from the interrupt handler (the same as we did in our example)

    If you have different interrupt level and want to change the context you can use the EGU /SWI. Please follow what we do with SWI_IRQn in app_timer_mesh.c file to see how we trigger a software interrupt. 

  • Hello Hung,
     
    sorry, I was already a step ahead:  

    How do you initialize the mesh stack?

    With NRF_MESH_IRQ_PRIORITY_LOWEST, as used in the examples:

      model_config_file_init();       
     
        mesh_stack_init_params_t init_params = {
            .core.irq_priority = NRF_MESH_IRQ_PRIORITY_LOWEST,
            .core.lfclksrc = DEV_BOARD_LF_CLK_CFG,   
            .core.p_uuid = NULL,
            .models.models_init_cb = models_init_cb,
            .models.config_server_cb = config_server_evt_cb
        };
     
        uint32_t status = mesh_stack_init(&init_params, &device_provisioned); 

    I'll try to fix the issue by changing the context to NRF_MESH_IRQ_PRIORITY_LOWEST, when resetting the node from main().
     
    ---
     
    Thanks for pointing to app_timer_mesh.c. What's interesting is that app_timer_mesh.c calls some NVIC_xxx() functions (NVIC_EnableIRQ(), NVIC_SetPendingIRQ(), etc.). Google, however, is having difficulties finding any api docs for those functions, making me wonder if that's really the right api to use - considering a firmware application that's also using the softdevice.  
     
    According to infocenter, there seem to be three alternatives available to temporarily elevate from main() to NRF_MESH_IRQ_PRIORITY_LOWEST:

    1. SWI Driver - legacy layer
      Legacy sounds a little deprecated, but at least it comes with a guide.
    2. SWI driver:
      The description says: Driver for managing software interrupts (SWI). Which sounds promising.
    3. EGU HAL:
      The docs state: Hardware access layer for managing the Event Generator Unit (EGU) peripheral.
      Looks like it could also fit the bill.

    Now I'm a little confused which of these four apis should be used...  
    Which one would you recommend?

    Thank you,
    Michael.

Related