NCS 3.2.1 - crashs with assert in "nrf_modem_os_timedwait() - k_sem_take"

nRF9160-DK, mfw 1.3.7, NCS 3.2.1

A UDP client (CoAP) exchanges data with "sendto( ... MSG_DONTWAIT)", "poll(...)" and "recvfrom(...). It was working with several NCS versions, including 2.6.4, 2.9.2 and 3.1.2.

But migrating to NCS 3.2.1 it seems to crash with an assert on the second call of sendto. I already enlarged the main stack, but without success.

Using the debugger shows the related callstack:

assert_post_action(const char * file, unsigned int line) (/home/achim/ncs/v3.2.1/zephyr/lib/os/assert.c:43)
z_swap_irqlock(unsigned int key) (/home/achim/ncs/v3.2.1/zephyr/kernel/include/kswap.h:210)
z_swap_irqlock(unsigned int key) (/home/achim/ncs/v3.2.1/zephyr/kernel/include/kswap.h:202)
z_impl_k_sem_take(struct k_sem * sem, k_timeout_t timeout) (/home/achim/ncs/v3.2.1/zephyr/kernel/sem.c:158)
k_sem_take(struct k_sem * sem) (/home/achim/ncs/v3.2.1/coaps-client/build_nrf9160dk_nrf9160_ns/coaps-client/zephyr/include/generated/zephyr/syscalls/kernel.h:1158)
nrf_modem_os_timedwait(uint32_t context, int32_t * timeout) (/home/achim/ncs/v3.2.1/nrf/lib/nrf_modem_lib/nrf_modem_os.c:201)
_req_forward (Unknown Source:0)
nrf_sendto (Unknown Source:0)
nrf9x_socket_offload_sendto(void * obj, const void * buf, size_t len, int flags, const struct sockaddr * to, socklen_t tolen) (/home/achim/ncs/v3.2.1/nrf/lib/nrf_modem_lib/nrf9x_sockets.c:702)
z_impl_zsock_sendto(int sock, const void * buf, size_t len, int flags, const struct sockaddr * dest_addr, socklen_t addrlen) (/home/achim/ncs/v3.2.1/zephyr/subsys/net/lib/sockets/sockets.c:342)
zsock_sendto(socklen_t addrlen, const struct sockaddr * dest_addr, int flags, size_t len, const void * buf, int sock) (/home/achim/ncs/v3.2.1/coaps-client/build_nrf9160dk_nrf9160_ns/coaps-client/zephyr/include/generated/zephyr/syscalls/socket.h:260)
send_to_peer(dtls_app_data_t * app, const uint8_t * data, size_t len) (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:1194)
recvfrom_peer(dtls_app_data_t * app) (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:1381)
dtls_loop(dtls_app_data_t * app, int reboot) (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:2376)
main() (/home/achim/ncs/v3.2.1/coaps-client/src/dtls_client.c:3110) 

The assert, which fires, is: 

 __ASSERT(arch_irq_unlocked(key) ||
 _current->base.thread_state & (_THREAD_DUMMY | _THREAD_DEAD),
 "Context switching while holding lock!");

I don't see, how the application code is able to cause that assert.

Any ideas?

Parents
  • Hello,

    The spin lock validation was added after SDK v3.1.x by this commit: https://github.com/nrfconnect/sdk-zephyr/commit/5183fc5693832b6aa1fe1cab06e628681300d0fa. Does your application implement any custom driver / declare any interrupt handlers directly?

    Best regards,

    Vidar

  • Does your application implement any custom driver

    No.

    declare any interrupt handlers directly?

    I use a GPIO interrupt with "gpio_init_callback", not sure, if that is "directly". 

  • Thanks for confirming. The GPIO callback is not a problem as long as you don't use the modem API directly inside it. Zephyr has something called Zero Latency Interrupts (ZLI), which if used incorrectly, can lead to the assertion you are seeing. That is why I asked if you had declared any interrupt handlers in your application or in a custom driver. But since you are not using any ZLIs, I'm not sure what could be triggering the assertion in your case. I will need to ask internally. Are you able to upload your .config file for the application as well so we can review it?

  • prj.conf

    #
    # Copyright (c) 2022 Achim Kraus CloudCoap.net
    #
    # See the NOTICE file(s) distributed with this work for additional
    # information regarding copyright ownership.
    #
    # This program and the accompanying materials are made available under the
    # terms of the Eclipse Public License 2.0 which is available at
    # www.eclipse.org/.../epl-2.0
    #
    # SPDX-License-Identifier: EPL-2.0
    #

    CONFIG_KERNEL_BIN_NAME="cloudcoap"
    CONFIG_NCS_APPLICATION_BOOT_BANNER_STRING="cloudcoap"

    # General config
    CONFIG_NEWLIB_LIBC=y
    CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y
    CONFIG_NCS_SAMPLES_DEFAULTS=y
    CONFIG_FPU=y

    # FLASH
    CONFIG_FLASH=y
    CONFIG_FLASH_PAGE_LAYOUT=y
    CONFIG_FLASH_MAP=y
    CONFIG_STREAM_FLASH=y
    CONFIG_MPU_ALLOW_FLASH_WRITE=y
    CONFIG_IMG_ERASE_PROGRESSIVELY=y

    # Settings
    CONFIG_SETTINGS=y
    CONFIG_SETTINGS_NVS=y
    CONFIG_NVS=y
    CONFIG_SETTINGS_RUNTIME=y
    CONFIG_BASE64=y

    # CONFIG_STACK_CANARIES=y
    # CONFIG_USERSPACE=y
    CONFIG_DEBUG_THREAD_INFO=y

    # Heap and stacks
    CONFIG_MAIN_STACK_SIZE=8192
    CONFIG_HEAP_MEM_POOL_SIZE=16384
    #CONFIG_HEAP_MEM_POOL_SIZE=8192
    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048

    CONFIG_REBOOT=y
    CONFIG_HWINFO=y
    CONFIG_PM_DEVICE=y

    CONFIG_WATCHDOG=y
    CONFIG_WDT_DISABLE_AT_BOOT=n

    # NCS 3.2.1/2 fails in sendto with assert, if y
    CONFIG_SPIN_VALIDATE=n

    # Logging
    CONFIG_LOG=y
    CONFIG_LOG_MODE_DEFERRED=y
    #CONFIG_LOG_MODE_IMMEDIATE=y

    CONFIG_TFM_LOG_LEVEL_SILENCE=n
    CONFIG_TFM_SPM_LOG_LEVEL_DEBUG=y
    CONFIG_TFM_EXCEPTION_INFO_DUMP=y
    CONFIG_TFM_SECURE_UART_SHARE_INSTANCE=y
    CONFIG_TFM_SECURE_UART0=y

    # Use UART in async mode (DMA)
    CONFIG_UART_ASYNC_API=y
    CONFIG_UART_0_ASYNC=y
    CONFIG_UART_0_INTERRUPT_DRIVEN=n
    CONFIG_UART_0_NRF_HW_ASYNC=y
    CONFIG_UART_0_NRF_HW_ASYNC_TIMER=2
    CONFIG_NRFX_TIMER=y

    # Network
    CONFIG_NETWORKING=y
    CONFIG_NET_NATIVE=n
    CONFIG_NET_IPV6=n
    CONFIG_NET_IPV4=y
    CONFIG_NET_SOCKETS=y
    CONFIG_NET_SOCKETS_OFFLOAD=y
    CONFIG_POSIX_API=y
    CONFIG_XSI_SINGLE_PROCESS=y

    # LTE link control
    CONFIG_LTE_LINK_CONTROL=y
    CONFIG_MODEM_INFO=n

    # Uncomment for tracing or 
    # enable it via cli "-DCONFIG_NRF_MODEM_LIB_TRACE=y"
    # CONFIG_NRF_MODEM_LIB_TRACE=y

    # Modem library
    CONFIG_NRF_MODEM_LIB=y
    CONFIG_NRF_MODEM_LIB_ON_FAULT_APPLICATION_SPECIFIC=y

    # since NCS 2.9.0 (? after NCS 2.6.2)
    CONFIG_LTE_LC_PSM_MODULE=y
    CONFIG_LTE_LC_EDRX_MODULE=y
    CONFIG_LTE_LC_TAU_PRE_WARNING_MODULE=y
    CONFIG_LTE_LC_NEIGHBOR_CELL_MEAS_MODULE=y
    CONFIG_LTE_LC_MODEM_SLEEP_MODULE=y
    CONFIG_LTE_LC_RAI_MODULE=y
    CONFIG_LTE_LC_PDN_MODULE=y

    CONFIG_LTE_LC_PDN_ESM_STRERROR=y

    # LTE parameters
    ## Network Mode / LTE category
    #CONFIG_LTE_NETWORK_MODE_NBIOT=y
    #CONFIG_LTE_NETWORK_MODE_LTE_M=y
    #CONFIG_LTE_NETWORK_MODE_LTE_M_NBIOT=y
    CONFIG_LTE_NETWORK_MODE_LTE_M_NBIOT_GPS=y
    CONFIG_LTE_MODE_PREFERENCE_AUTO=y
    #CONFIG_LTE_MODE_PREFERENCE_LTE_M=y
    #CONFIG_LTE_MODE_PREFERENCE_NBIOT=y
    #CONFIG_LTE_MODE_PREFERENCE_LTE_M_PLMN_PRIO=y
    #CONFIG_LTE_MODE_PREFERENCE_NBIOT_PLMN_PRIO=y
    CONFIG_LTE_LC_MODEM_SLEEP_NOTIFICATIONS=y
    CONFIG_LTE_LINK_CONTROL_LOG_LEVEL_INF=y
    # indicates all sleeps longer that 30s
    CONFIG_LTE_LC_MODEM_SLEEP_NOTIFICATIONS_THRESHOLD_MS=30000
    CONFIG_AT_MONITOR=y
    #CONFIG_LTE_POWER_ON_OFF_CONFIG_SWITCH=y

    ## PSM
    CONFIG_UDP_PSM_ENABLE=y
    # 001xxxxx : hours
    # CONFIG_LTE_PSM_REQ_RPTAU="00100100"
    # CONFIG_LTE_PSM_REQ_RPTAU="00100001"
    # CONFIG_LTE_PSM_REQ_RPTAU="00101000"
    # 24h
    CONFIG_LTE_PSM_REQ_RPTAU="00111000"
    # 101xxxxx : minutes
    # CONFIG_LTE_PSM_REQ_RPTAU="10100100"
    # 000xxxxx : 10 minutes
    # CONFIG_LTE_PSM_REQ_RPTAU="00000001"

    # CONFIG_LTE_PSM_REQ_RAT is overwritten in modem.c 
    CONFIG_LTE_PSM_REQ_RAT="00000000"

    ## RAI
    CONFIG_AS_RAI_ON=y
    # CONFIG_CP_RAI_ON=y

    ## eDRX
    CONFIG_UDP_EDRX_ENABLE=n
    #CONFIG_LTE_EDRX_REQ=n
    # 655,36 seconds
    #CONFIG_LTE_EDRX_REQ_VALUE_LTE_M="1011"
    #CONFIG_LTE_EDRX_REQ_VALUE_NBIOT="1011"
    #CONFIG_LTE_PTW_VALUE_LTE_M="1111"
    #CONFIG_LTE_PTW_VALUE_NBIOT="1001"
    # 163.84 seconds
    #CONFIG_LTE_EDRX_REQ_VALUE_LTE_M="1001"
    #CONFIG_LTE_EDRX_REQ_VALUE_NBIOT="1001"
    # 81.92 seconds
    #CONFIG_LTE_EDRX_REQ_VALUE_LTE_M="0101"
    #CONFIG_LTE_EDRX_REQ_VALUE_NBIOT="0101"
    # 40.96 seconds
    #CONFIG_LTE_EDRX_REQ_VALUE_LTE_M="0011"
    #CONFIG_LTE_EDRX_REQ_VALUE_NBIOT="0011"
    # 20.48 seconds
    CONFIG_LTE_EDRX_REQ_VALUE_LTE_M="0010"
    CONFIG_LTE_EDRX_REQ_VALUE_NBIOT="0010"

    CONFIG_LTE_LOCK_BANDS=n

    # CONFIG_LTE_LOCK_PLMN=y
    # Germany:
    # - Telekom: 26201
    # - Vodafone: 26202
    # - Telefonica: 26203
    # CONFIG_LTE_LOCK_PLMN_STRING="26202"

    #CONFIG_LTE_FEATURE_PLMN_SELECT_OPTIMIZATION=n
    #CONFIG_LTE_FEATURE_HPPLMN_SKIP=n

    ## PDN
    CONFIG_AT_MONITOR_HEAP_SIZE=4096
    CONFIG_PDN_DEFAULT_FAM_IPV4=y
    CONFIG_PDN_DEFAULTS_OVERRIDE=y

    ## SMS
    CONFIG_SMS=y
    CONFIG_SMS_SUBSCRIBERS_MAX_CNT=2
    CONFIG_SMS_LOG_LEVEL_DBG=n

    ## CoAP
    CONFIG_COAP=y

    # Logging
    CONFIG_COAP_CLIENT_LOG_LEVEL_INF=y
    CONFIG_UI_LOG_LEVEL_INF=y
    CONFIG_STORAGE_LOG_LEVEL_INF=y

    # TinyDtls
    CONFIG_LIBTINYDTLS=y
    CONFIG_LIBTINYDTLS_PSK=y
    CONFIG_LIBTINYDTLS_ECDHE_ECDSA=y
    CONFIG_TINYDTLS_LOG_LEVEL_INF=y

    # External SHT21 temperature sensor
    # CONFIG_SHT21=y
    # CONFIG_I2C_LOG_LEVEL_OFF=y

    and

    -Dcoaps-client_SNIPPET="nrf91-modem-trace-uart"

    appended calling west build

  • boards/nrf9160dk_nrf9160_ns.conf

    #
    # Copyright (c) 2022 Achim Kraus CloudCoap.net
    #
    # See the NOTICE file(s) distributed with this work for additional
    # information regarding copyright ownership.
    #
    # This program and the accompanying materials are made available under the
    # terms of the Eclipse Public License 2.0 which is available at
    # www.eclipse.org/.../epl-2.0
    #
    # SPDX-License-Identifier: EPL-2.0
    #

    # CAF - Common Application Framework
    CONFIG_GPIO=y
    CONFIG_LED=y
    CONFIG_LED_GPIO=y

    # Enable SPI flash
    # CONFIG_SPI_NOR_FLASH_LAYOUT_PAGE_SIZE=4096
    # CONFIG_SPI_NOR_SFDP_RUNTIME=y

  • I was looking for the configuration file generated for the build (located in build/<application name>/zephyr) to see all the symbols that end up getting selected for your app. Could you please upload that instead?

Reply Children
  • build/..../.config (removed some defines of credentials ..., NCS 3.2.2)

    z.config.zip

  • In your build folder under the following path: build/<application name>/zephyr/.config

  • I already found it. I just zipped it and added a z infront, because uploading ".config" was refused.

  • Thanks. Did not find any potential problematic settings in the config. Are you by any chance calling irq_lock()/arch_irq_lock()/__set_BASEPRI() in your code? Please also try profiling the stack usage in your threads by adding the following to your project configuration:

    CONFIG_THREAD_ANALYZER=y                                                                           
    CONFIG_THREAD_ANALYZER_AUTO=y                                                                      
    CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=5                                                             
    CONFIG_THREAD_NAME=y

    There should be at least 128 bytes of headroom on each stack. You may also enable CONFIG_MPU_STACK_GUARD=y in addition to builtin stack guard.

  • I've added the configs above. Without 

    CONFIG_SPIN_VALIDATE=n

    I only see on analyzer output and it crashes 2s after that.

    With CONFIG_SPIN_VALIDATE=n it runs and the below output is the one after a 30s and a couple of send messages:

    Thread analyze:
     trace_thread        : STACK: unused 528 usage 240 / 768 (31 %); CPU: 1 %
                         : Total CPU cycles used: 12068
     thread_analyzer     : STACK: unused 480 usage 544 / 1024 (53 %); CPU: 0 %
                         : Total CPU cycles used: 921
     shutdown            : STACK: unused 1856 usage 192 / 2048 (9 %); CPU: 0 %
                         : Total CPU cycles used: 0
     cmd_workq           : STACK: unused 640 usage 1408 / 2048 (68 %); CPU: 0 %
                         : Total CPU cycles used: 190
     io_workq            : STACK: unused 924 usage 1124 / 2048 (54 %); CPU: 0 %
                         : Total CPU cycles used: 74
     uart_workq          : STACK: unused 968 usage 184 / 1152 (15 %); CPU: 0 %
                         : Total CPU cycles used: 5
     sh_cmd_workq        : STACK: unused 1864 usage 184 / 2048 (8 %); CPU: 0 %
                         : Total CPU cycles used: 1
     lte_lc_work_q       : STACK: unused 840 usage 184 / 1024 (17 %); CPU: 0 %
                         : Total CPU cycles used: 0
     sysworkq            : STACK: unused 1112 usage 936 / 2048 (45 %); CPU: 0 %
                         : Total CPU cycles used: 134
     logging             : STACK: unused 1632 usage 416 / 2048 (20 %); CPU: 0 %
                         : Total CPU cycles used: 3886
     idle                : STACK: unused 336 usage 48 / 384 (12 %); CPU: 96 %
                         : Total CPU cycles used: 1102218
     main                : STACK: unused 5080 usage 3112 / 8192 (37 %); CPU: 1 %
                         : Total CPU cycles used: 21338
     ISR0                : STACK: unused 1592 usage 456 / 2048 (22 %)

Related