BLE Mesh node resets with 'mpsl_init: MPSL ASSERT: 112, 1622'

Dear all,

one of our products is a mesh light node, using NRF52833 and, as the firmware, the example 'Light Fixture' ( https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/samples/bluetooth/mesh/light_ctrl/README.html ), using nRF Connect SDK v2.7.0-rc1-04938c8d40e3.

In many installations, such node is connected to ambience and presence sensors in order to 'modulate' the lightness level (hence PWM) to save energy.

Unfortunately, we are suffering from random unexplainable resets with the following details:

00> [03:01:35.854,553] <err> mpsl_init: MPSL ASSERT: 112, 1622
00> [03:01:35.854,553] <err> os: ***** HARD FAULT *****
00> [03:01:35.854,583] <err> os: Fault escalation (see below)
00> [03:01:35.854,583] <err> os: ARCH_EXCEPT with reason 3
00>
00> [03:01:35.854,614] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x20006a28
00> [03:01:35.854,644] <err> os: r3/a4: 0x00000003 r12/ip: 0x20000f78 r14/lr: 0x00050257
00> [03:01:35.854,644] <err> os: xpsr: 0x01000018
00> [03:01:35.854,675] <err> os: s[ 0]: 0x2000cb50 s[ 1]: 0x00000000 s[ 2]: 0x00000000 s[ 3]: 0x00023709
00> [03:01:35.854,675] <err> os: s[ 4]: 0x00000070 s[ 5]: 0x00323131 s[ 6]: 0x00000001 s[ 7]: 0x200033a3
00> [03:01:35.854,705] <err> os: s[ 8]: 0x200033c8 s[ 9]: 0x00022e95 s[10]: 0x200033a8 s[11]: 0x0004d855
00> [03:01:35.854,736] <err> os: s[12]: 0x200081a4 s[13]: 0x00020fed s[14]: 0x00000000 s[15]: 0x00000040
00> [03:01:35.854,736] <err> os: fpscr: 0x0004d855
00> [03:01:35.854,766] <err> os: Faulting instruction address (r15/pc): 0x0004ad74
00> [03:01:35.854,797] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
00> [03:01:35.854,797] <err> os: Fault during interrupt handling
00>
00> [03:01:35.854,858] <err> os: Current thread: 0x20006a28 (unknown)
00> [03:01:36.456,115] <err> fatal_error: Resetting system
This generates a problem for our customers, since during the reset, the lightness level restarts from 0, a noticeable ramp especially during night.
We tried to investigate the origins of the problem by increasing/decreasing the number of nodes and messages, assuming it is a reset due to resource starvation.
But we still haven't find any correlation.
Does anyone know what the assertion 'mpsl_init: MPSL ASSERT: 112, 1622'  is referred to? 
Any other idea on the possible causes of the reset or how to further investigate it?
Best,
M
Parents
  • Hi,

    First off I would like to mention that the rc1 might not be the most optimal for a product. We prefer to use the stable tags for those, like 2.7.0 now that it is out. Though I assume you made a product at a time where you were forced to use 2.7-rc1. It is probably fine either way, I just had to mention it. 

    I'll look into what the error code is referring to exactly. Though it looks like it is related to timing or the hardware here, maybe the HFXO. There have been some similar cases in the past in which the HFXO wasn't soldered on well enough, or started up correctly. Are you doing a lot of work in interrupts, that could potentially be problematic in regards to timing priorities?

    Regards,

    Elfving

Reply
  • Hi,

    First off I would like to mention that the rc1 might not be the most optimal for a product. We prefer to use the stable tags for those, like 2.7.0 now that it is out. Though I assume you made a product at a time where you were forced to use 2.7-rc1. It is probably fine either way, I just had to mention it. 

    I'll look into what the error code is referring to exactly. Though it looks like it is related to timing or the hardware here, maybe the HFXO. There have been some similar cases in the past in which the HFXO wasn't soldered on well enough, or started up correctly. Are you doing a lot of work in interrupts, that could potentially be problematic in regards to timing priorities?

    Regards,

    Elfving

Children
  • Dear Elfving,

    Thank you for your answer. We are double-checking the HFXO as for your suggestion.

    However, we have one more element: apparently, this problem only (or mostly) happens when the light node is configured with mode-on, which means, the light LC server modulates the lightness based on the received LUX messages. It seems that if we disable such functionality, the reset stops to occur (or less likely, need more time to confirm).

    Regarding the question about the interrupts, we didn't change anything from the provided example.

    Best,

    TSW

  • Could you expand on why you are using rc1 btw?

    TechSW said:

    Regarding the question about the interrupts, we didn't change anything from the provided example.

    Is that the case regarding the sample as a whole, or just the use of interrupts?

    I assume then that you do not disable interrupts with __disable_irq(), using additional interrupts with extremely high priority or anything like that. 

    You are seeing this on multiple nodes right? Not just on one physical device?

    Regards,

    Elfving

  • Hi again,

    The relevant R&D team on our side has tried reproducing what you see with the default sample on DKs, and do not observe what you are describing. (Generic on-off server functionality with e.g. transition time set to 3s, and delay 100ms. A led on the DK dims back and forth.)

    I assume then that it is the HFXO that is at fault.

    Regards,

    Elfving

Related