Precise data bus error in psa_generate_random (mutex.c
rf_cc3xx_platform_mutex_zephyr.c)

Question

Hello,

ncs 2.6.0

I'm encountering a weird bus data error when I'm calling psa_generate_random inside a work queue thread (it has name "api"):

00> [00:01:54.764,038] <err> os: ***** BUS FAULT *****
00> [00:01:54.764,068] <err> os:   Precise data bus error
00> [00:01:54.764,068] <err> os:   BFAR Address: 0x40f04fc3
00> [00:01:54.764,099] <err> os: r0/a1:  0x40f04fb5  r1/a2:  0xffffffff  r2/a3:  0x20003678
00> [00:01:54.764,099] <err> os: r3/a4:  0x00000000 r12/ip:  0xffffffff r14/lr:  0x00014281
00> [00:01:54.764,099] <err> os:  xpsr:  0xa1000000
00> [00:01:54.764,129] <err> os: Faulting instruction address (r15/pc): 0x0002b4aa
00> [00:01:54.764,160] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
00> [00:01:54.764,190] <err> os: Current thread: 0x20003678 (api)

The code I'm running inside work queue is the following:

    uint8_t test[16];
    psa_status_t status;

    status = psa_generate_random(test, ARRAY_SIZE(test));
    if (status != PSA_SUCCESS) {
        LOG_ERR("gen rand err: %d", status);
        return;
    }

    LOG_HEXDUMP_INF(test, ARRAY_SIZE(test), "test data");

Faulting instruction address points to: workspace/zephyr/kernel/mutex.c:147 , which contains the following code (please see my comment in the code):

	SYS_PORT_TRACING_OBJ_FUNC_BLOCKING(k_mutex, lock, mutex, timeout);

	new_prio = new_prio_for_inheritance(_current->base.prio,
					    mutex->owner->base.prio); // THIS IS LINE #147

	LOG_DBG("adjusting prio up on mutex %p", mutex);

r14/lr points to: workspace/nrfxlib/crypto/nrf_cc310_platform/src/nrf_cc3xx_platform_mutex_zephyr.c:306, which contains the following code (please see my comment again):

        ret = k_mutex_lock(p_mutex, K_FOREVER);
        if (ret == 0) { // !!!! This is line #306 !!!!
            return NRF_CC3XX_PLATFORM_SUCCESS;
        } else {
            return NRF_CC3XX_PLATFORM_ERROR_MUTEX_FAILED;
        }

If I comment psa_generate_random - everything works just fine.

The fun part of this bug - it appears only when the data published to the work queue is from USB CDC ACM. When the same data comes from BLE, the psa_generate_random (in the same place) works just fine. So I might assume it could be somehow related to USB.

Please help to resolve this issue.

Kind regards,

Oleh Hordiichuk

Oleh Hordiichuk · Answer

After additional investigations we found out that this was stack overflow issue in USB thread. So it's not related to psa_generate_random. However, I have few additional questions in that regard: 
 
 1. What's so special about psa_generate_random that the kernel has detected Bus fault, but it was unable to detect it many other places of our\zephyr
ordic code, which was called by the same thread multiple times. Why only this function was capable to raise the error? 
 2. It would be great if Nordic could create a guide on how to debug those kind of bugs, since they're very tricky and time consuming. Are there any best practices? 
 
 For those who will encounter with similar bugs, the following config options may help you with debugging: 
 
 CONFIG_THREAD_ANALYZER=y
CONFIG_THREAD_STACK_INFO=y
CONFIG_THREAD_MONITOR=y
CONFIG_THREAD_ANALYZER_ISR_STACK_USAGE=y
CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=5
CONFIG_THREAD_ANALYZER_AUTO=y
CONFIG_THREAD_ANALYZER_USE_LOG=y
CONFIG_STACK_CANARIES=y

Precise data bus error in psa_generate_random (mutex.c\nrf_cc3xx_platform_mutex_zephyr.c)