This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

k_work_submit() schedules in atomic context

I have code that looks something like this:

static void some_worker(struct k_work *work)
{
struct state *state =
CONTAINER_OF(work, struct state, some_work);

k_spinlock_key_t key = k_spin_lock(&state->lock);
// Asserts with a recursive spinlock error
}

static void schedule_my_work(struct state *state)
{
k_spinlock_key_t key = k_spin_lock(&state->lock);
// do some stuff with the lock held
k_work_submit(&state->some_work);
k_spin_unlock(&state->lock, key);
}

The version of Zephyr in nRF Connect 1.9.1 has a bug that causes k_work_submit() to call k_yield(), even if k_work_submit() is called with a spinlock held.  Consequently, the scheduler may immediately context-switch to the worker thread and start executing my worker function, which again acquires the spinlock.  If spinlock validation is enabled, this will cause a fatal assert.  (If it's not enabled, it may cause even more surprising behavior.)

This problem is similar, but not identical, to these old bugs:

https://github.com/zephyrproject-rtos/zephyr/issues/16273
https://github.com/zephyrproject-rtos/zephyr/pull/16386

The good news is that it has been fixed upstream.  I applied this fix locally and it fixed the crash.  Could you please make sure this gets incorporated into the next Nordic SDK release:

https://github.com/zephyrproject-rtos/zephyr/commit/8d94967ec4773d9af67cb70167fe765085f3f737

Related