This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SDK for Meshv v4.1.0 :: access_model_publish() triggers an assert on bearer_event_in_correct_irq_priority() .... but same code is in prod on SDK for Meshv v3.2.0

Sequence of events:

1. A GPIOTE button interrupt in LPN schedules a handler routine.

2. Handler routine (called by app_scheduler()) makes a call to access_model_publish() (to send an unacknowledged message).

3. Somewhere in the call stack for access_model_publish(), the assert in mesh/core/src/timer_scheduler.c:214 is triggered.

It seems we are failing the check in bearer_event_in_correct_irq_priority(), but this call is made from a function called by app_scheduler() (aka: outside of any interrupt context).

To try to debug the issue further, I printed the interrupt values:

volatile IRQn_Type active_irq = hal_irq_active_get();
volatile uint32_t prio = NVIC_GetPriority(active_irq);

/* log, then wait 5ms for buffer to flush before triggering assert */
LOG_FLUSH("active irq %d with NVIC priority %d\n", active_irq, prio);
/* log output:
 * "active irq -16 with NVIC priority 0"
 */

/* assert triggered here */
access_model_publish(/* ... */);

This code builds as-is with no warnings (using -Wall and -Wextra) on both SDK for Meshv v3.2.0 and v4.1.0.

This code runs with no issues (and is in production) on v3.2.0, but on v4.1.0 throws the above error.

Full build spec for v3.2.0:

-- Configuring CMake for nRF5 SDK for Bluetooth Mesh 3.2.0
-- SDK: nRF5_SDK_15.3.0_59ac345
-- Platform: nrf52832_xxAA
-- Arch: cortex-m4f
-- SoftDevice: s132_6.1.1
-- The C compiler identification is GNU 8.3.1

Full build spec for v4.1.0:

-- Configuring CMake for nRF5 SDK for Bluetooth Mesh 4.1.0
-- SDK: nRF5_SDK_16.0.0_98a08e2
-- Platform: nrf52832_xxAA
-- Arch: cortex-m4f
-- SoftDevice: s132_7.0.1
-- Board: pca10040
-- The C compiler identification is GNU 8.3.1

  • Migrating my example to this NRF_MESH_IRQ_PRIORITY_THREAD model as recommended, I am having problems handling flash correctly.

    Specifically, I seem to loop forever waiting for flash stability.

    This code sample illustrates the problem:

        /* NOTE: error checking removed for readability */
        if (!m_device_provisioned)
        {
            static const uint8_t static_auth_data[NRF_MESH_KEY_SIZE] = STATIC_AUTH_DATA;
            mesh_provisionee_start_params_t prov_start_params = {
                .p_static_data    = static_auth_data,
                .prov_complete_cb = provisioning_complete_cb,
                .prov_device_identification_start_cb = device_identification_start_cb,
                .prov_device_identification_stop_cb = NULL,
                .prov_abort_cb = provisioning_aborted_cb,
                .p_device_uri = EX_URI_LPN
            };
            mesh_provisionee_prov_start(&prov_start_params);
        }
    
        mesh_app_uuid_print(nrf_mesh_configure_device_uuid_get());
    
        mesh_stack_start();
    
        app_flash_data_load(HANDLE, &variable, sizeof(variable));
        /* NOTE: several other app_flash_data_load() calls here */
    
        /* sometimes _not_ stable even though nothing pending */
        while (app_flash_pending || !flash_manager_is_stable())
        {
    #if NRF_SDH_DISPATCH_MODEL == NRF_SDH_DISPATCH_MODEL_INTERRUPT
            /* this works OK if NRF_SDH_DISPATCH_MODEL_INTERRUPT */
            ;
    #elif NRF_SDH_DISPATCH_MODEL == NRF_SDH_DISPATCH_MODEL_APPSH
            /* this hangs forever if NRF_SDH_DISPATCH_MODEL_APPSH */
            nrf_mesh_process();
            app_sched_execute();
    #endif
        }

  • Also note that the affected files are the SAME in both SDK 320 and 410:

    - mesh/core/src/bearer_event.c

    - mesh/core/src/timer_scheduler.c

    This means the *same* IRQ checking code passes in SDK320 and fails in SDK410.

    Does SDK320 put the caller in an interrupt context if calling from the app (non-interrupt)?

    As an additional note, it is still unclear to me why it would be an error to call the mesh stack from a LOWER (non-interrupt) priority (which is interrupted by the mesh stack if it needs to execute anything).

    This behavior was allowed in SDK320 and never caused any problems.

    Forcing the caller to be in an interrupt context to be able to call the mesh is a problem, for example:

    - when a GPIO or timer interrupt requires longer processing times and would miss the next GPIO/timer if waiting too long.

    - when handling a mesh message requires background processing like saving to flash, we want to acknowledge the message in the interrupt quickly (to avoid timing out) and schedule writing to flash in a non-interrupt context.

    - when logic requires saving to flash before calling a mesh function ... we do not want to save to flash from an interrupt context but now we can't call mesh stack outside of an interrupt context.

    It would make sense that the app (from a lower priority) should be able to call mesh stack functions, and that these can then trigger interrupts (which suspend the app) as needed.

    So could you explain a little more about why the architecture is this way?

  • I discovered that the discrepancy between SDK320 and SDK410: 320 was building with -DNDEBUG and 410 was not.

    If I build without -DNDEBUG then *both* SDKs assert on this IRQ issue.

    So, the next step would be:

    1. Understanding if it's possible to migrate to NRF_MESH_IRQ_PRIORITY_THREAD even though we calls to flash are a problem (as above).

    or

    2. Understanding a little more about this restriction (why is this a NRF_MESH_ASSERT_DEBUG and not NRF_MESH_ASSERT, is it safe to ignore in practice? also see questions above).

    Thank you for your help sorting this out.

  • Hi Sirio, 

    I'm checking with our mesh team about the NRF_MESH_ASSERT_DEBUG () and will let you know when I have any information. 


     I can find in both SDK v3.2 and SDK v4.1have this in gccarmemb.cmake: 

    set(CMAKE_C_FLAGS_DEBUG "-Og -g3" CACHE STRING "")
    set(CMAKE_C_FLAGS_MINSIZEREL "-Os -g " CACHE STRING "")
    set(CMAKE_C_FLAGS_RELWITHDEBINFO "-O3 -g " CACHE STRING "")
    set(CMAKE_C_FLAGS_RELEASE "-O3 -DNDEBUG" CACHE STRING "")

    So when building with RELEASE flag, -DNDEBUG will be included (in both SDK v3.2 and SDK v4.1). Could you point me to where you find the difference ? 

    It's a little bit strange that in SES I don't see any place where NDEBUG is defined, i would need to check with the team. 

    Regarding your question on why a function at lower priority shouldn't call a mesh function that supposed to run on higher priority is that we don't want a mesh activity (called by this function) to be pre-empted by a mesh event (which run on higher interrupt level) this will cause unexpected behavior. 

  • Thank you very much.

    You are correct, the SDKs do not differ, but we had made a change to SDK320 remove '-g' and add '-DNDEBUG' ... this is because the "Rel" in "MinSizeRel" implies "Release" which should not have debug prints or debug symbols.

    It was not realized at the time that this impacted NRF_MESH_ASSERT_DEBUG().

    Confirmed that generated SES projects are missing NDEBUG in c_preprocessor_definitions, even if the CMake build specifies it.

    OK, I'm tracking with that logic ... so is there a function which can be called to temporarily elevate the interrupt level when calling from main()?

    The alternative NRF_MESH_ASSERT_DEBUG sounds good in theory but seems to have some issues with handling flash (as above) ... so if possible might be better to sort out this interrupt issue and stay with NRF_MESH_IRQ_PRIORITY_LOWEST?

Related