This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SDK for Meshv v4.1.0 :: access_model_publish() triggers an assert on bearer_event_in_correct_irq_priority() .... but same code is in prod on SDK for Meshv v3.2.0

Sequence of events:

1. A GPIOTE button interrupt in LPN schedules a handler routine.

2. Handler routine (called by app_scheduler()) makes a call to access_model_publish() (to send an unacknowledged message).

3. Somewhere in the call stack for access_model_publish(), the assert in mesh/core/src/timer_scheduler.c:214 is triggered.

It seems we are failing the check in bearer_event_in_correct_irq_priority(), but this call is made from a function called by app_scheduler() (aka: outside of any interrupt context).

To try to debug the issue further, I printed the interrupt values:

volatile IRQn_Type active_irq = hal_irq_active_get();
volatile uint32_t prio = NVIC_GetPriority(active_irq);

/* log, then wait 5ms for buffer to flush before triggering assert */
LOG_FLUSH("active irq %d with NVIC priority %d\n", active_irq, prio);
/* log output:
 * "active irq -16 with NVIC priority 0"
 */

/* assert triggered here */
access_model_publish(/* ... */);

This code builds as-is with no warnings (using -Wall and -Wextra) on both SDK for Meshv v3.2.0 and v4.1.0.

This code runs with no issues (and is in production) on v3.2.0, but on v4.1.0 throws the above error.

Full build spec for v3.2.0:

-- Configuring CMake for nRF5 SDK for Bluetooth Mesh 3.2.0
-- SDK: nRF5_SDK_15.3.0_59ac345
-- Platform: nrf52832_xxAA
-- Arch: cortex-m4f
-- SoftDevice: s132_6.1.1
-- The C compiler identification is GNU 8.3.1

Full build spec for v4.1.0:

-- Configuring CMake for nRF5 SDK for Bluetooth Mesh 4.1.0
-- SDK: nRF5_SDK_16.0.0_98a08e2
-- Platform: nrf52832_xxAA
-- Arch: cortex-m4f
-- SoftDevice: s132_7.0.1
-- Board: pca10040
-- The C compiler identification is GNU 8.3.1

  • Hi Sirio, 

    Could you add a breakpoint inside bearer_event_in_correct_irq_priority() or put the log there to print out what caused the function return 0. 

    Do you have a minimal example with the LPN example that reproduce the issue so we can test here ?

  • Thank you for taking a look at this.

    I added some log prints, so bearer_event_in_correct_irq_priority() looks like this:

    bool bearer_event_in_correct_irq_priority(void)
    {
        volatile IRQn_Type active_irq = hal_irq_active_get();
        BBN_LOG_FLUSH("active_irq %d\n", hal_irq_active_get());
        BBN_LOG_FLUSH("hal_irq_is_enabled(EVENT_IRQn) %d\n", hal_irq_is_enabled(EVENT_IRQn));
        BBN_LOG_FLUSH("m_irq_priority %d\n", m_irq_priority);
        if (active_irq == HAL_IRQn_NONE)
        {
            return (m_irq_priority == NRF_MESH_IRQ_PRIORITY_THREAD || !hal_irq_is_enabled(EVENT_IRQn));
        }
        else
        {
            volatile uint32_t prio = NVIC_GetPriority(active_irq);
            BBN_LOG_FLUSH("prio %d\n", prio);
            return (prio == m_irq_priority || (prio > m_irq_priority && !hal_irq_is_enabled(EVENT_IRQn)));
        }
    }

    During provisioning (with proxy enabled), I am seeing:

    <t:    1398790>, bearer_event.c,  419, active_irq 22
    <t:    1398958>, bearer_event.c,  420, hal_irq_is_enabled(EVENT_IRQn) 1
    <t:    1399126>, bearer_event.c,  421, m_irq_priority 6
    <t:    1399294>, bearer_event.c,  429, prio 6

    When establishing friendship (proxy now off), I am seeing:

    <t:   16130599>, bearer_event.c,  419, active_irq 18
    <t:   16130773>, bearer_event.c,  420, hal_irq_is_enabled(EVENT_IRQn) 1
    <t:   16130941>, bearer_event.c,  421, m_irq_priority 6
    <t:   16131109>, bearer_event.c,  429, prio 6

    When calling access_model_publish() (to send an unacknowledged message to the mesh), when the assert is triggered, I am seeing:

    <t:   16131294>, bearer_event.c,  419, active_irq -16
    <t:   16131462>, bearer_event.c,  420, hal_irq_is_enabled(EVENT_IRQn) 1
    <t:   16131630>, bearer_event.c,  421, m_irq_priority 6

    Unfortunately the amount of time needed to try and reproduce this product's functionality in the LPN example makes that a no-go.

  • Have you made sure when you initialize the mesh stack (mesh_init() ) you used .core.irq_priority =NRF_MESH_IRQ_PRIORITY_THREAD  instead of NRF_MESH_IRQ_PRIORITY_LOWEST ?

    This is discussed here: https://infocenter.nordicsemi.com/topic/com.nordic.infocenter.meshsdk.v4.1.0/md_doc_user_guide_mesh_interrupt_priorities.html?cp=7_2_2_5

    The mesh network must know in which IRQ priority the user application is running. To know this, nrf_mesh_init_params_t (used by nrf_mesh_init) now has an irq_priority field. Normally, it must be set to NRF_MESH_IRQ_PRIORITY_LOWEST, or NRF_MESH_IRQ_PRIORITY_THREAD if running in the main loop. You can also use the APP_IRQ_PRIORITY_* defines in the nRF5 SDK.

    In your case you are calling the function from app_schedule so it's the THREAD level. 

  • Thank you very much.

    This info really helped and I have been able to reproduce the issue using the standard `lpn` example from SDK410.

    I have made this reproduction into a series of patches:

    1. Modify the example so it will build with both SDK 3.2.0 and SDK 4.1.0:

    From 1638d925ff94b27be66cfde4809d8caa50d5ac64 Mon Sep 17 00:00:00 2001
    From: Sirio Balmelli <[email protected]>
    Date: Tue, 28 Apr 2020 20:25:42 +0200
    Subject: [PATCH 1/3] lpn_irq_issue: build with sdk320
    
    ---
     include/nrf_mesh_config_app.h   | 5 +++++
     src/main.c                      | 7 +++++++
     2 files changed, 12 insertions(+)
    
    diff --git a/include/nrf_mesh_config_app.h b/include/nrf_mesh_config_app.h
    index 28be1c07..489f87db 100644
    --- a/include/nrf_mesh_config_app.h
    +++ b/include/nrf_mesh_config_app.h
    @@ -38,6 +38,11 @@
     #ifndef NRF_MESH_CONFIG_APP_H__
     #define NRF_MESH_CONFIG_APP_H__
     
    +#if NRF_MESH_SDK_VERSION < 400
    +#define ACCESS_FLASH_PAGE_COUNT (1)
    +#define DSM_FLASH_PAGE_COUNT (1)
    +#endif
    +
     /**
      * @defgroup NRF_MESH_CONFIG_APP nRF Mesh application configuration
      *
    diff --git a/src/main.c b/src/main.c
    index 2c8bfed0..afc0d73c 100644
    --- a/src/main.c
    +++ b/src/main.c
    @@ -102,6 +102,11 @@
     /* Transition time value used by the OnOff client for sending OnOff Set messages. */
     #define APP_ONOFF_TRANSITION_TIME_MS    (100)
     
    +/* compatibility with earlier SDKs where this was not centrally defined */
    +#ifndef STATIC_AUTH_DATA
    +#define STATIC_AUTH_DATA \
    +    {0x6E, 0x6F, 0x72, 0x64, 0x69, 0x63, 0x5F, 0x65, 0x78, 0x61, 0x6D, 0x70, 0x6C, 0x65, 0x5F, 0x31}
    +#endif
     
     /*****************************************************************************
      * Forward declaration of static functions
    @@ -623,7 +628,9 @@ static void start(void)
             mesh_provisionee_start_params_t prov_start_params =
             {
                 .p_static_data    = static_auth_data,
    +#if NRF_MESH_SDK_VERSION >= 400
                 .prov_sd_ble_opt_set_cb = NULL,
    +#endif
                 .prov_complete_cb = provisioning_complete_cb,
                 .prov_device_identification_start_cb = device_identification_start_cb,
                 .prov_device_identification_stop_cb = NULL,
    -- 
    2.26.1
    
    

    2. Modify example to call send_app_state(APP_STATE_OFF) from outside of an interrupt context. This works on SDK 3.2.0 but crashes on 4.1.0 (precisely in mesh/core/src/timer_scheduler.c:214):

    From 074b9c79d62e7620f195e8661d66799bc2423aad Mon Sep 17 00:00:00 2001
    From: Sirio Balmelli <[email protected]>
    Date: Tue, 28 Apr 2020 20:26:29 +0200
    Subject: [PATCH 2/3] lpn_irq_issue: reproduce IRQ issue by sending from app
     context instead of interrupt
    
    ---
     src/main.c | 10 +++++++++-
     1 file changed, 9 insertions(+), 1 deletion(-)
    
    diff --git a/src/main.c b/src/main.c
    index afc0d73c..2ae951f1 100644
    --- a/src/main.c
    +++ b/src/main.c
    @@ -364,6 +364,8 @@ static const char m_usage_string[] =
         "\t\t------------------------------------------------------\n";
     #endif
     
    +static bool send_off;
    +
     static void button_event_handler(uint32_t button_number)
     {
         /* Increase button number because the buttons on the board is marked with 1 to 4 */
    @@ -389,7 +391,8 @@ static void button_event_handler(uint32_t button_number)
                 break;
     
             case 2:
    -            send_app_state(APP_STATE_OFF);
    +            send_off = true;
    +            //send_app_state(APP_STATE_OFF);
                 break;
     
             case 3:
    @@ -663,6 +666,11 @@ int main(void)
     
         for (;;)
         {
    +        if (send_off)
    +        {
    +            send_app_state(APP_STATE_OFF);
    +            send_off = false;
    +        }
             (void)sd_app_evt_wait();
         }
     }
    -- 
    2.26.1
    
    

    3. Failed attempt to move to the proposed NRF_MESH_IRQ_PRIORITY_THREAD model. I followed the guide in https://infocenter.nordicsemi.com/index.jsp?topic=%2Fcom.nordic.infocenter.meshsdk.v4.1.0%2Fmd_doc_user_guide_mesh_interrupt_priorities.html&cp=7_2_2_5 exactly and am getting compilation issues inside the mesh stack. Yes, I am deleting and recreating the build directory so this is not a cmake config bug.

    From 7ba6e971bb57bdf6c8f792d4826267e8ba008d93 Mon Sep 17 00:00:00 2001
    From: Sirio Balmelli <[email protected]>
    Date: Tue, 28 Apr 2020 20:54:02 +0200
    Subject: [PATCH 3/3] lpn_irq_issue: failed attempt to move to
     NRF_MESH_IRQ_PRIORITY_THREAD by following
     https://infocenter.nordicsemi.com/index.jsp?topic=%2Fcom.nordic.infocenter.meshsdk.v4.1.0%2Fmd_doc_user_guide_mesh_interrupt_priorities.html&cp=7_2_2_5
    
    ---
     CMakeLists.txt       |  4 ++++
     include/app_config.h |  5 +++++
     src/main.c           | 11 +++++++++--
     3 files changed, 18 insertions(+), 2 deletions(-)
    
    diff --git a/CMakeLists.txt b/CMakeLists.txt
    index c658766f..c9c2c02c 100644
    --- a/CMakeLists.txt
    +++ b/CMakeLists.txt
    @@ -10,6 +10,8 @@ add_executable(${target}
     
         "${CMAKE_SOURCE_DIR}/examples/common/src/simple_hal.c"
     
    +    "${SDK_ROOT}/components/libraries/scheduler/app_scheduler.c"
    +
         ${BLE_DFU_SUPPORT_SOURCE_FILES}
         ${BLE_SOFTDEVICE_SUPPORT_SOURCE_FILES}
         ${WEAK_SOURCE_FILES}
    @@ -32,6 +34,8 @@ target_include_directories(${target} PUBLIC
         "${CMAKE_CURRENT_SOURCE_DIR}/include"
         "${CMAKE_SOURCE_DIR}/examples/common/include"
         "${CMAKE_SOURCE_DIR}/external/rtt/include"
    +    "${SDK_ROOT}/components/libraries/scheduler"
    +
         ${BLE_DFU_SUPPORT_INCLUDE_FILES}
         ${BLE_SOFTDEVICE_SUPPORT_INCLUDE_DIRS}
         ${GENERIC_ONOFF_CLIENT_INCLUDE_DIRS}
    diff --git a/include/app_config.h b/include/app_config.h
    index 8fbd557b..9df18b44 100644
    --- a/include/app_config.h
    +++ b/include/app_config.h
    @@ -78,6 +78,11 @@
     #define NRFX_POWER_ENABLED 0
     #define POWER_ENABLED 0
     
    +#define APP_SCHEDULER_ENABLED 1
    +#define NRF_SDH_DISPATCH_MODEL 1
    +#define APP_TIMER_CONFIG_USE_SCHEDULER 1
    +#define APP_SCHED_EVENT_SIZE APP_TIMER_SCHED_EVENT_DATA_SIZE
    +#define APP_SCHED_QUEUE_SIZE 32
     /** @} end of APP_SDK_CONFIG */
     
     #endif /* APP_CONFIG_H__ */
    diff --git a/src/main.c b/src/main.c
    index 2ae951f1..bdfe510c 100644
    --- a/src/main.c
    +++ b/src/main.c
    @@ -76,6 +76,7 @@
     /* nRF5 SDK */
     #include "nrf_soc.h"
     #include "nrf_pwr_mgmt.h"
    +#include "app_scheduler.h"
     
     /*****************************************************************************
      * Definitions
    @@ -549,7 +550,8 @@ static void mesh_init(void)
     {
         mesh_stack_init_params_t init_params =
         {
    -        .core.irq_priority       = NRF_MESH_IRQ_PRIORITY_LOWEST,
    +        //.core.irq_priority       = NRF_MESH_IRQ_PRIORITY_LOWEST,
    +        .core.irq_priority       = NRF_MESH_IRQ_PRIORITY_THREAD,
             .core.lfclksrc           = DEV_BOARD_LF_CLK_CFG,
             .core.p_uuid             = NULL,
             .models.models_init_cb   = models_init_cb,
    @@ -661,6 +663,7 @@ static void start(void)
     
     int main(void)
     {
    +    APP_SCHED_INIT(APP_SCHED_EVENT_SIZE, APP_SCHED_QUEUE_SIZE);
         initialize();
         start();
     
    @@ -671,6 +674,10 @@ int main(void)
                 send_app_state(APP_STATE_OFF);
                 send_off = false;
             }
    -        (void)sd_app_evt_wait();
    +        app_sched_execute();
    +        if (nrf_mesh_process())
    +        {
    +            (void)sd_app_evt_wait();
    +        }
         }
     }
    -- 
    2.26.1
    
    

    Any help you could give getting this test to build and run with NRF_MESH_IRQ_PRIORITY_THREAD would be much appreciated.

    Once this example works I can try to retrofit into our actual project.

  • Hi Sirio,

    Could you post the compilation error you have when using NRF_MESH_IRQ_PRIORITY_THREAD ? 

    Have you tried to test the same in a SDK v4.1 example ? 

Related