BLE Mesh node resets with 'mpsl_init: MPSL ASSERT: 112, 1622'

TechSW 4 months ago

Dear all,

one of our products is a mesh light node, using NRF52833 and, as the firmware, the example 'Light Fixture' ( https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/samples/bluetooth/mesh/light_ctrl/README.html ), using nRF Connect SDK v2.7.0-rc1-04938c8d40e3.

In many installations, such node is connected to ambience and presence sensors in order to 'modulate' the lightness level (hence PWM) to save energy.

Unfortunately, we are suffering from random unexplainable resets with the following details:

00> [03:01:35.854,553] <err> mpsl_init: MPSL ASSERT: 112, 1622

00> [03:01:35.854,553] <err> os: ***** HARD FAULT *****

00> [03:01:35.854,583] <err> os: Fault escalation (see below)

00> [03:01:35.854,583] <err> os: ARCH_EXCEPT with reason 3

00>

00> [03:01:35.854,614] <err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x20006a28

00> [03:01:35.854,644] <err> os: r3/a4: 0x00000003 r12/ip: 0x20000f78 r14/lr: 0x00050257

00> [03:01:35.854,644] <err> os: xpsr: 0x01000018

00> [03:01:35.854,675] <err> os: s[ 0]: 0x2000cb50 s[ 1]: 0x00000000 s[ 2]: 0x00000000 s[ 3]: 0x00023709

00> [03:01:35.854,675] <err> os: s[ 4]: 0x00000070 s[ 5]: 0x00323131 s[ 6]: 0x00000001 s[ 7]: 0x200033a3

00> [03:01:35.854,705] <err> os: s[ 8]: 0x200033c8 s[ 9]: 0x00022e95 s[10]: 0x200033a8 s[11]: 0x0004d855

00> [03:01:35.854,736] <err> os: s[12]: 0x200081a4 s[13]: 0x00020fed s[14]: 0x00000000 s[15]: 0x00000040

00> [03:01:35.854,736] <err> os: fpscr: 0x0004d855

00> [03:01:35.854,766] <err> os: Faulting instruction address (r15/pc): 0x0004ad74

00> [03:01:35.854,797] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0

00> [03:01:35.854,797] <err> os: Fault during interrupt handling

00>

00> [03:01:35.854,858] <err> os: Current thread: 0x20006a28 (unknown)

00> [03:01:36.456,115] <err> fatal_error: Resetting system

This generates a problem for our customers, since during the reset, the lightness level restarts from 0, a noticeable ramp especially during night.

We tried to investigate the origins of the problem by increasing/decreasing the number of nodes and messages, assuming it is a reset due to resource starvation.

But we still haven't find any correlation.

Does anyone know what the assertion 'mpsl_init: MPSL ASSERT: 112, 1622' is referred to?

Any other idea on the possible causes of the reset or how to further investigate it?

Best,

Parents

0 Elfving 4 months ago

Hi,

First off I would like to mention that the rc1 might not be the most optimal for a product. We prefer to use the stable tags for those, like 2.7.0 now that it is out. Though I assume you made a product at a time where you were forced to use 2.7-rc1. It is probably fine either way, I just had to mention it.

I'll look into what the error code is referring to exactly. Though it looks like it is related to timing or the hardware here, maybe the HFXO. There have been some similar cases in the past in which the HFXO wasn't soldered on well enough, or started up correctly. Are you doing a lot of work in interrupts, that could potentially be problematic in regards to timing priorities?

Regards,

Elfving
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Elfving 4 months ago

Hi,

First off I would like to mention that the rc1 might not be the most optimal for a product. We prefer to use the stable tags for those, like 2.7.0 now that it is out. Though I assume you made a product at a time where you were forced to use 2.7-rc1. It is probably fine either way, I just had to mention it.

I'll look into what the error code is referring to exactly. Though it looks like it is related to timing or the hardware here, maybe the HFXO. There have been some similar cases in the past in which the HFXO wasn't soldered on well enough, or started up correctly. Are you doing a lot of work in interrupts, that could potentially be problematic in regards to timing priorities?

Regards,

Elfving
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 TechSW 4 months ago in reply to Elfving

Dear Elfving,

Thank you for your answer. We are double-checking the HFXO as for your suggestion.

However, we have one more element: apparently, this problem only (or mostly) happens when the light node is configured with mode-on, which means, the light LC server modulates the lightness based on the received LUX messages. It seems that if we disable such functionality, the reset stops to occur (or less likely, need more time to confirm).

Regarding the question about the interrupts, we didn't change anything from the provided example.

Best,

TSW
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Elfving 4 months ago in reply to TechSW

Could you expand on why you are using rc1 btw?

TechSW said:
Regarding the question about the interrupts, we didn't change anything from the provided example.

Is that the case regarding the sample as a whole, or just the use of interrupts?

I assume then that you do not disable interrupts with __disable_irq(), using additional interrupts with extremely high priority or anything like that.

You are seeing this on multiple nodes right? Not just on one physical device?

Regards,

Elfving
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Elfving 4 months ago in reply to TechSW

Hi again,

The relevant R&D team on our side has tried reproducing what you see with the default sample on DKs, and do not observe what you are describing. (Generic on-off server functionality with e.g. transition time set to 3s, and delay 100ms. A led on the DK dims back and forth.)

I assume then that it is the HFXO that is at fault.

Regards,

Elfving
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel