NCS 3.2.1 - crashs with assert in "nrf_modem_os_timedwait() - k_sem_take"

Question

nRF9160-DK, mfw 1.3.7, NCS 3.2.1 
 A UDP client (CoAP) exchanges data with "sendto( ... MSG_DONTWAIT)", "poll(...)" and "recvfrom(...). It was working with several NCS versions, including 2.6.4, 2.9.2 and 3.1.2. 
 But migrating to NCS 3.2.1 it seems to crash with an assert on the second call of sendto. I already enlarged the main stack, but without success. 
 Using the debugger shows the related callstack: 
 assert_post_action(const char * file, unsigned int line) (/home/achim/ncs/v3.2.1/zephyr/lib/os/assert.c:43) z_swap_irqlock(unsigned int key) (/home/achim/ncs/v3.2.1/zephyr/kernel/include/kswap.h:210) z_swap_irqlock(unsigned int key) (/home/achim/ncs/v3.2.1/zephyr/kernel/include/kswap.h:202) z_impl_k_sem_take(struct k_sem * sem, k_timeout_t timeout) (/home/achim/ncs/v3.2.1/zephyr/kernel/sem.c:158) k_sem_take(struct k_sem * sem) (/home/achim/ncs/v3.2.1/coaps-client/build_nrf9160dk_nrf9160_ns/coaps-client/zephyr/include/generated/zephyr/syscalls/kernel.h:1158) nrf_modem_os_timedwait(uint32_t context, int32_t * timeout) (/home/achim/ncs/v3.2.1/nrf/lib/nrf_modem_lib/nrf_modem_os.c:201) _req_forward (Unknown Source:0) nrf_sendto (Unknown Source:0) nrf9x_socket_offload_sendto(void * obj, const void * buf, size_t len, int flags, const struct sockaddr * to, socklen_t tolen) (/home/achim/ncs/v3.2.1/nrf/lib/nrf_modem_lib/nrf9x_sockets.c:702) z_impl_zsock_sendto(int sock, const void * buf, size_t len, int flags, const struct sockaddr * dest_addr, socklen_t addrlen) (/home/achim/ncs/v3.2.1/zephyr/subsys/net/lib/sockets/sockets.c:342) zsock_sendto(socklen_t addrlen, const struct sockaddr * dest_addr, int flags, size_t len, const void * buf, int sock) (/home/achim/ncs/v3.2.1/coaps-client/build_nrf9160dk_nrf9160_ns/coaps-client/zephyr/include/generated/zephyr/syscalls/socket.h:260) send_to_peer(dtls_app_data_t * app, const uint8_t * data, size_t len) (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:1194) recvfrom_peer(dtls_app_data_t * app) (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:1381) dtls_loop(dtls_app_data_t * app, int reboot) (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:2376) main() (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:3110) 
 The assert, which fires, is: 
 __ASSERT(arch_irq_unlocked(key) || _current->base.thread_state & (_THREAD_DUMMY | _THREAD_DEAD), "Context switching while holding lock!"); I don't see, how the application code is able to cause that assert. 
 Any ideas?

Bjarki Andreasen · Accepted Answer

Hi there, TL;DR the zephyr-coaps-client library is locking interrupts from a thread, then calling a blocking API which ends up trying to unready the thread with interrupts disabled, which is not allowed. Offending code (in current main of library) is here github.com/.../dtls_client.c 
 With the spinlock held, a chain of calls results in the nrf_modem_lib trying to sleep its thread here github.com/.../nrf_modem_os.c 
 Fix has to be applied to the zephyr-coaps-client library, and the fix is to use a sem or mutex to lock access to the dtls_buffer, or copy it to a temporary one or something.

NCS 3.2.1 - crashs with assert in "nrf_modem_os_timedwait() - k_sem_take"

Top Replies