I have code that looks something like this:
static void some_worker(struct k_work *work)
{
struct state *state =
CONTAINER_OF(work, struct state, some_work);
k_spinlock_key_t key = k_spin_lock(&state->lock);
// Asserts with a recursive spinlock error
}
static void schedule_my_work(struct state *state)
{
k_spinlock_key_t key = k_spin_lock(&state->lock);
// do some stuff with the lock held
k_work_submit(&state->some_work);
k_spin_unlock(&state->lock, key);
}
The version of Zephyr in nRF Connect 1.9.1 has a bug that causes k_work_submit() to call k_yield(), even if k_work_submit() is called with a spinlock held. Consequently, the scheduler may immediately context-switch to the worker thread and start executing my worker function, which again acquires the spinlock. If spinlock validation is enabled, this will cause a fatal assert. (If it's not enabled, it may cause even more surprising behavior.)
This problem is similar, but not identical, to these old bugs:
https://github.com/zephyrproject-rtos/zephyr/issues/16273
https://github.com/zephyrproject-rtos/zephyr/pull/16386
The good news is that it has been fixed upstream. I applied this fix locally and it fixed the crash. Could you please make sure this gets incorporated into the next Nordic SDK release:
https://github.com/zephyrproject-rtos/zephyr/commit/8d94967ec4773d9af67cb70167fe765085f3f737