This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Simple mutex issue causing board reset

I have a strange issue on the nRF9160-DK board using the nRF Connect SDK, GNU ARM for GCC v10.2.1, built using makefiles.

I have two threads that both wait on a semaphore to begin their work. One launches at 59s and the other at 60s so they're staggered when semaphores are given.
I do this to simply test mutex behaviour.

When the semaphore is taken, each task attempts to then lock a mutex (this will eventually protect a shared UART).
Naturally, the task launching at 59s gets there first and the task at 60s blocks and timeouts after 1s waiting.
The quicker task is a dummy task and simply waits 2s before unlocking the mutex. So all is OK the first time.

On the second launch, the quicker task is now 2s ahead of the slower one so it grabs the mutex again but this time, it unlocks the mutex at the exact same time (<1ms) as the slower task times out waiting for the mutex. And then the board resets.

Does anybody know why this might be? Or how I can get the actual exception handler printed out over RTT before the board resets?

I've tried setting CONFIG_FAULT_DUMP=2 and CONFIG_RESET_ON_FATAL_ERROR=n but I don't get any further on printing that info out.

Parents
  • I think, for now, the problem is resolved. There was a debug message on the next line after the timeout that attempted to print out the thread name of the current mutex owner that blocked it. So, the process of events was:

    Thread A took the mutex
    Thread B asked for the mutex and timed out
    Microseconds later and before the log message, Thread A released the mutex
    Microseconds later, Thread B attempted to log the error and dereferenced the "owner" pointer that was now set to NULL.

    I felt I had removed this and still had the issue but cannot now replicate that so perhaps it was a misbuild or unsaved file.

    I'll re-open this thread if it occurs again without the NPD.

Reply
  • I think, for now, the problem is resolved. There was a debug message on the next line after the timeout that attempted to print out the thread name of the current mutex owner that blocked it. So, the process of events was:

    Thread A took the mutex
    Thread B asked for the mutex and timed out
    Microseconds later and before the log message, Thread A released the mutex
    Microseconds later, Thread B attempted to log the error and dereferenced the "owner" pointer that was now set to NULL.

    I felt I had removed this and still had the issue but cannot now replicate that so perhaps it was a misbuild or unsaved file.

    I'll re-open this thread if it occurs again without the NPD.

Children
Related