What can cause RTC1_IRQHandler() to throw Mesh error 8 at 0x00000000?

Dear Nordic Experts,

This one's a tough nut to crack:

The firmware is based on nRF5 SDK for Mesh 5.0.0 with nRF5 SDK 17.0.2 and Softdevice s132 7.3.0. It runs on a nRF52832. When provisioning, it very rarely happens that the firmware gets trapped at app_error_fault_handler(). The short question is what can cause this, and how to fix it?


When the error happens, the stacktrace looks like:

app_error_fault_handler()   0x0002708a
app_error_handler_bare()    0x000392bc
RTC1_IRQHandler()           0x0003f0e8
                            0x00041f10
main()                      0x00026948
                            0x000262AE     
           

And SES logged:

<t:     377511>, main.c,  291, Successfully provisioned
<t:     377516>, main.c,  175, Restoring default configuration!
<t:     377543>, ble_dfu_support.c,  171, ble_dfu_support_service_init()
<t:     377549>, main.c,  286, Node Address: 0x0004 Addresses used: 1
<t:     377556>, main.c,  250, Mesh event: 0x1D
<t:     378068>, main.c,  250, Mesh event: 0x12
<t:     378071>, main.c,  218, Mesh event: NRF_MESH_EVT_RX_FAILED
<t:     424714>, proxy.c,  632, Connected
<t:     439992>, ble_softdevice_support.c,  104, Successfully updated connection parameters
<t:     489533>, main.c,  250, Mesh event: 0x4
<t:     489537>, mesh_gatt.c,  168, HVN data: 4101007D520E366F2048C10000000025B661239B
<t:     489545>, mesh_gatt.c,  219, status: 8 len: 20 usable-mtu:20 sar_type: 1
<t:     493487>, app_error_weak.c,  115, Mesh error 8 at 0x00000000 (:0)


Unfortunately this happened on a release build, so the elf file isn't of much help.

Questions now are:

  1. What could case RTC1_IRQHandler() to trigger Mesh error 8 (NRF_ERROR_INVALID_STATE)?
  2. And how to avoid it?


Since this is related to the timer, I suspect that my firmware may have a fundamental flaw. Let me explain:

  1. The firmware uses app_timer_xxx() api to compute the device's uptime:

    uint64_t calculate_uptime_in_milliseconds() {
        uint32_t this_ticks = app_timer_cnt_get();
        uptime_in_ticks += app_timer_cnt_diff_compute(this_ticks, last_ticks);
        last_ticks = this_ticks;
    
        uptime_in_milliseconds = (uptime_in_ticks * 1000) / APP_TIMER_CLOCK_FREQ;
    
        return uptime_in_milliseconds;
    }


  2. The main loop uses the uptime to drive the business logic:
    int main(void)
    {
        initialize();
        start();
    
        while (true)
        {
            uint64_t uptime_in_milliseconds = calculate_uptime_in_milliseconds();
            handle_node_reset(uptime_in_milliseconds);
    
            handle_leds(uptime_in_milliseconds);
            handle_business_logic(&mesh_server, uptime_in_milliseconds);
    
            nrf_delay_ms(50);
        }
    }


What's your point on this approach? Could the way app_timer is used be related to RTC1_IRQHandler() causing Mesh error 8 (NRF_ERROR_INVALID_STATE)?

The issue happens only once every few hundred provisioning attempts, so any advise is very welcome!

Thank you,
Michael.

EDIT:

There were some technical issues yesterday (site didn't finish sending), that caused this question to be posted three times. Unfortunately I've noticed it too late.

Related