crashes after 15 hours publishing messages with mqtt helper

Hi,

I am testing an actinius_icarus board V2.

I read data from senso BM68X every 500 ms, and publish on mqtt broker every 1 seconds. they are 256 bytes every sending.

After 15 hours or so the system crashes. Already two times and almost after the same time running. 

In the log window I have the following messages

ASSERTION FAIL [ret == 0] @ WEST_TOPDIR/zephyr/subsys/net/lib/mqtt/mqtt_os.h:61
sys_mutex_unlock failed with -22
[13:55:55.165,130] <err> os: r0/a1: 0x00000004 r1/a2: 0x0000003d r2/a3: 0x00000004
[13:55:55.165,130] <err> os: r3/a4: 0x00000004 r12/ip: 0x0000000c r14/lr: 0x00021397
[13:55:55.165,161] <err> os: xpsr: 0x01000000
[13:55:55.165,161] <err> os: s[ 0]: 0x20011c78 s[ 1]: 0x0003bb85 s[ 2]: 0x00000000 s[ 3]: 0x20011c78
[13:55:55.165,191] <err> os: s[ 4]: 0x20012838 s[ 5]: 0x00027cbf s[ 6]: 0x00000000 s[ 7]: 0x00000000
[13:55:55.165,222] <err> os: s[ 8]: 0x00000000 s[ 9]: 0x00000001 s[10]: 0x20017818 s[11]: 0x0001365f
[13:55:55.165,252] <err> os: s[12]: 0x00000000 s[13]: 0x0001365f s[14]: 0x20017828 s[15]: 0x0001366f
[13:55:55.165,283] <err> os: fpscr: 0x2000f1c8
[13:55:55.165,283] <err> os: Faulting instruction address (r15/pc): 0x000359ec
[13:55:55.165,313] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[13:55:55.165,344] <err> os: Current thread: 0x2000f1c8 (unknown)
[13:55:55.297,790] <err> os: Halting system

ASSERTION FAIL [ret == 0] @ WEST_TOPDIR/zephyr/subsys/net/lib/mqtt/mqtt_os.h:61
sys_mutex_unlock failed with -22
[12:01:59.972,900] <err> os: r0/a1: 0x00000004 r1/a2: 0x0000003d r2/a3: 0x00000003
[12:01:59.972,930] <err> os: r3/a4: 0x00000004 r12/ip: 0x0000000c r14/lr: 0x0002136b
[12:01:59.972,930] <err> os: xpsr: 0x01000000
[12:01:59.972,961] <err> os: s[ 0]: 0x20011d1c s[ 1]: 0x0003ba05 s[ 2]: 0x00000000 s[ 3]: 0x20011d1c
[12:01:59.972,991] <err> os: s[ 4]: 0x200128cc s[ 5]: 0x00027cbf s[ 6]: 0x00000000 s[ 7]: 0x00000000
[12:01:59.972,991] <err> os: s[ 8]: 0x00000000 s[ 9]: 0x00000001 s[10]: 0x20017c8c s[11]: 0x00013633
[12:01:59.973,022] <err> os: s[12]: 0x00000000 s[13]: 0x00013633 s[14]: 0x001de893 s[15]: 0x00000000
[12:01:59.973,052] <err> os: fpscr: 0x20017cb0
[12:01:59.973,052] <err> os: Faulting instruction address (r15/pc): 0x0003586c
[12:02:45.467,498] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[12:02:52.951,721] <err> os: Current thread: 0x2000f278 (unknown)
[12:02:53.069,427] <err> os: Halting system

In my project I am not using any mutex.

Do anyone has any suggestion on debug direction.?

Best Regards

Parents
  • Hi,

     

    ASSERTION FAIL [ret == 0] @ WEST_TOPDIR/zephyr/subsys/net/lib/mqtt/mqtt_os.h:61
    sys_mutex_unlock failed with -22

    errno 22 is EINVAL, and the docs for the mutex unlock function states:

    https://github.com/nrfconnect/sdk-zephyr/blob/v3.7.99-ncs1/include/zephyr/sys/mutex.h#L108-L124

    * @retval -EINVAL Provided mutex not recognized by the kernel or mutex wasn't
    * locked

     

    This indicates that the pointer going into the function is somehow corrupted, usually due to a stack overflow or similar. The thread its being called from seems to differ between the two logs:

    [13:55:55.165,130] <err> os: r0/a1: 0x00000004 r1/a2: 0x0000003d r2/a3: 0x00000004
    [13:55:55.165,130] <err> os: r3/a4: 0x00000004 r12/ip: 0x0000000c r14/lr: 0x00021397
    [13:55:55.165,161] <err> os: xpsr: 0x01000000
    [13:55:55.165,161] <err> os: s[ 0]: 0x20011c78 s[ 1]: 0x0003bb85 s[ 2]: 0x00000000 s[ 3]: 0x20011c78
    [13:55:55.165,191] <err> os: s[ 4]: 0x20012838 s[ 5]: 0x00027cbf s[ 6]: 0x00000000 s[ 7]: 0x00000000
    [13:55:55.165,222] <err> os: s[ 8]: 0x00000000 s[ 9]: 0x00000001 s[10]: 0x20017818 s[11]: 0x0001365f
    [13:55:55.165,252] <err> os: s[12]: 0x00000000 s[13]: 0x0001365f s[14]: 0x20017828 s[15]: 0x0001366f
    [13:55:55.165,283] <err> os: fpscr: 0x2000f1c8
    [13:55:55.165,283] <err> os: Faulting instruction address (r15/pc): 0x000359ec
    [13:55:55.165,313] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
    [13:55:55.165,344] <err> os: Current thread: 0x2000f1c8 (unknown)

    [12:01:59.972,900] <err> os: r0/a1: 0x00000004 r1/a2: 0x0000003d r2/a3: 0x00000003
    [12:01:59.972,930] <err> os: r3/a4: 0x00000004 r12/ip: 0x0000000c r14/lr: 0x0002136b
    [12:01:59.972,930] <err> os: xpsr: 0x01000000
    [12:01:59.972,961] <err> os: s[ 0]: 0x20011d1c s[ 1]: 0x0003ba05 s[ 2]: 0x00000000 s[ 3]: 0x20011d1c
    [12:01:59.972,991] <err> os: s[ 4]: 0x200128cc s[ 5]: 0x00027cbf s[ 6]: 0x00000000 s[ 7]: 0x00000000
    [12:01:59.972,991] <err> os: s[ 8]: 0x00000000 s[ 9]: 0x00000001 s[10]: 0x20017c8c s[11]: 0x00013633
    [12:01:59.973,022] <err> os: s[12]: 0x00000000 s[13]: 0x00013633 s[14]: 0x001de893 s[15]: 0x00000000
    [12:01:59.973,052] <err> os: fpscr: 0x20017cb0
    [12:01:59.973,052] <err> os: Faulting instruction address (r15/pc): 0x0003586c
    [12:02:45.467,498] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
    [12:02:52.951,721] <err> os: Current thread: 0x2000f278 (unknown)

     

    Have you tried to use addr2line to backtrace where the fault was called from? And what thread(s) are in-use (search for current thread address in the zephyr.map file) ?

    Here's a howto on addr2line usage: https://academy.nordicsemi.com/courses/nrf-connect-sdk-intermediate/lessons/lesson-2-debugging/topic/exercise-2-11/

     

    Kind regards,

    Håkon

  • Unfortunately I was not able to debug. I did not enable in prj file. But I will do. Anyway your suggestion about stack overflow is helping a lot.

    Best.

Reply Children
Related