Information on MBTLE-3811

I'm seeing an issue where a Mesh Proxy node temporarily stops relaying mesh packets to other nodes. The problem seems to persist for roughly an hour at a time. In other words, a packet received via the Proxy GATT service, destined for a unicast address not belonging to the current node, does NOT get sent out as a relay packet. Instrumenting the application showed that the device was not relaying packets because the relay packet buffer was already full.

We are using version 4.1 of the Mesh SDK. The release notes for 4.2 reference a bug fix for a timer issue (MBTLE-3811). Can Nordic please provide additional information on the following topics:

a) What were the observed symptoms of this bug? Was it ever observed to cause problems with packet relaying?

b) What were the specific changes (i.e. the code diff) required to fix it?

Our theory is that perhaps this timer bug is causing the relay advertiser to not properly clear its queue, and therefore prevents it from relaying any packets until the timer overflows. It looks like the timer overflows roughly every hour, so this would line up with the behavior that we're seeing.

  • Hi,

    I will see if I can find more information regarding the bugfix.

    Although, I will highly recommend to move to our latest Mesh SDK v.5.0.0. By not doing so you willl miss out on lots of bug fixes and improvements that have been made. Is there a reasoin for not using the latest version?

  • Yes, understood that it would be best to upgrade. However the upgrade path is not so simple because my understanding is that there's a mandatory bootloader update.

    Unfortunately we are not able to reproduce this issue in a controlled environment, so we have limited options when it comes to testing speculative fixes.

  • Hi,

    1. You can see more of the symptoms that was seen in this case.

    2. You can find the fix that was made below:

    commit e6665e87b38c58c2d82d42fe9d94e9c338362a5b
    Date:   Wed Jun 3 14:20:24 2020 +0200
    
    diff --git a/mesh/core/src/timer.c b/mesh/core/src/timer.c
    index 13f835b70..949fef547 100644
    --- a/mesh/core/src/timer.c
    +++ b/mesh/core/src/timer.c
    @@ -49,7 +49,7 @@
     /* Margin is required to prevent situation when written CC value is equal to COUNTER.
      * Situation with equality will cause losing interrupt for the tail counting until next overflow. */
     #define PROTECTION_MARGIN_FOR_TIMER_START   3ul
    -#define PROTECTION_MARGIN_FOR_OVFW_HANDLER  1ul
    +#define PROTECTION_MARGIN_FOR_OVFW_HANDLER  2ul
     
     #define TIMER_US_TO_TICKS(US)                              \
                 ((uint32_t)ROUNDED_DIV(                        \
    @@ -95,8 +95,9 @@ void nrf_mesh_timer_ovfw_handle(void)
                 if (m_tail_timer_counter > NRF_RTC1->COUNTER)
                 {
                     _DISABLE_IRQS(was_masked);
    -                NRF_RTC1->CC[1] = m_tail_timer_counter > NRF_RTC1->COUNTER + PROTECTION_MARGIN_FOR_OVFW_HANDLER ?
    -                        m_tail_timer_counter : NRF_RTC1->COUNTER + PROTECTION_MARGIN_FOR_TIMER_START;
    +                uint32_t cnt = NRF_RTC1->COUNTER;
    +                NRF_RTC1->CC[1] = m_tail_timer_counter > cnt + PROTECTION_MARGIN_FOR_OVFW_HANDLER ?
    +                        m_tail_timer_counter : cnt + PROTECTION_MARGIN_FOR_TIMER_START;
                     _ENABLE_IRQS(was_masked);
                     NRF_RTC1->EVTENSET = RTC_EVTEN_COMPARE1_Msk;
                     NRF_RTC1->INTENSET = RTC_INTENSET_COMPARE1_Msk;
    

    It might be possible that your issue is related to the bug as it looks similar to the case above.

Related