Crash on connect with NCS 2.6.0

You can find all information about the issue and how to produce it on GitHub: github.com/.../ncs-2.6.0-connect-crash

  • Sry, that I missed your question about addr2line. There is no line information for that closed-source function, but it's from nrfxlib/mpsl/lib/cortex-m4/soft-float/libmpsl.a and this is the relevant objdump:

    0000bb5c <sym_DQONLUECJTIEYFOFJXXAPJO4POIAJKJNKBGVN5A>:
        bb5c:       ea81 0300       eor.w   r3, r1, r0
        bb60:       079b            lsls    r3, r3, #30
        bb62:       e92d 41f0       stmdb   sp!, {r4, r5, r6, r7, r8, lr}
        bb66:       d15d            bne.n   bc24 <sym_DQONLUECJTIEYFOFJXXAPJO4POIAJKJNKBGVN5A+0xc8>
        bb68:       0787            lsls    r7, r0, #30
        bb6a:       d05f            beq.n   bc2c <sym_DQONLUECJTIEYFOFJXXAPJO4POIAJKJNKBGVN5A+0xd0>
        bb6c:       2a00            cmp     r2, #0
        bb6e:       d057            beq.n   bc20 <sym_DQONLUECJTIEYFOFJXXAPJO4POIAJKJNKBGVN5A+0xc4>
        bb70:       4603            mov     r3, r0
        bb72:       e001            b.n     bb78 <sym_DQONLUECJTIEYFOFJXXAPJO4POIAJKJNKBGVN5A+0x1c>
        bb74:       2a00            cmp     r2, #0
        bb76:       d053            beq.n   bc20 <sym_DQONLUECJTIEYFOFJXXAPJO4POIAJKJNKBGVN5A+0xc4>

  • And some more questions I forgot to answer Upside down

    > Hmm, is there a reason you're using this Zephyr advertising API on your end? I don't see it being a great fit with the SoftDevice controller.

    I see it the other way around: We don't need and don't want to use extended advertising due to incompatibility with certain centrals. There is no reason for us to enable or use that API and it might save flash size(which not an important reason though).

    > It could be MTU related it seems, so could it be that the MTU sizes are invalid when disabling the extended advertising config and trying to enable the UART service?

    I don't know, but isn't that handled by zephyr internally? The only thing we can do is to set some limits using kconfigs and accept or reject a centrals change request using one of the connection callbacks.

  • Hi

    Okay, the BUS error seems to come from the SoftDevice handler, and after some discussing with colleagues, we are back at the UART likely being the blocker here and it stops the SoftDevice controller from handling the MTU exchange and callbacks correctly. I reproduced it here as well, but have not had time to test this out yet on my end. Try moving uart_rx_enable() into a workqueue as is done in the peripheral_uart sample, with the following:

    k_work_init_delayable(&uart_work, uart_work_handler);
    
    static void uart_work_handler(struct k_work *item)
    {
        struct uart_data_t *buf;
     
        buf = k_malloc(sizeof(*buf));
        if (buf) {
            buf->len = 0;
        } else {
            LOG_WRN("Not able to allocate UART receive buffer");
            k_work_reschedule(&uart_work, UART_WAIT_FOR_BUF_DELAY);
            return;
        }
     
        uart_rx_enable(uart, buf->data, sizeof(buf->data), UART_WAIT_FOR_RX);
    }

    Best regards,

    Simon

  • first: We don't need a workaround, because we are willing to wait a little bit for a proper fix. We simply don't want to be stuck with 2.5.1 forever(where the bug doesn't exist).

    I implemented your suggested change and pushed it to GitHub. It did not make a difference:

    [00:00:00.012,939] <inf> main: BT ready
    [00:00:00.037,963] <dbg> bt_hci_core: bt_recv: buf 0x200187ec len 68
    [00:00:00.037,994] <dbg> bt_hci_core: rx_work_handler: Getting net_buf from queue
    [00:00:00.038,024] <dbg> bt_hci_core: rx_work_handler: buf 0x200187ec type 1 len 68
    [00:00:00.038,024] <dbg> bt_hci_core: hci_event: event 0x3e
    [00:00:00.038,055] <dbg> bt_hci_core: hci_le_meta_event: subevent 0x08
    [00:00:00.038,085] <dbg> bt_ecc: bt_hci_evt_le_pkey_complete: status: 0x00
    [00:00:01.000,488] <inf> main: UART initialized
    [00:00:05.054,046] <dbg> bt_hci_core: bt_recv: buf 0x200187ec len 33
    [00:00:05.055,511] <err> os: ***** BUS FAULT *****
    [00:00:05.055,511] <err> os:   Imprecise data bus error
    [00:00:05.055,541] <err> os: r0/a1:  0x0014043e  r1/a2:  0x200125bf  r2/a3:  0x00000006
    [00:00:05.055,572] <err> os: r3/a4:  0x0014043f r12/ip:  0x00000003 r14/lr:  0x00004e27
    [00:00:05.055,572] <err> os:  xpsr:  0xa1000000
    [00:00:05.055,572] <err> os: Faulting instruction address (r15/pc): 0x0000bb74
    [00:00:05.055,633] <err> os: >>> ZEPHYR FATAL ERROR 26: Unknown error on CPU 0
    [00:00:05.055,664] <err> os: Current thread: 0x20012958 (MPSL Work)

  • Hello,

    Simon is out of office, and asked if I could look into your ticket. 

    So the MPSL issue is because of an error in the UART handler. Your UART handler was empty, so you probably didn't notice it being called, but it was being called with event->type = UART_RX_BUF_REQUEST. As you can see from uart.h:

    	/**
    	 * @brief Driver requests next buffer for continuous reception.
    	 *
    	 * This event is triggered when receiving has started for a new buffer,
    	 * i.e. it's time to provide a next buffer for a seamless switchover to
    	 * it. For continuous reliable receiving, user should provide another RX
    	 * buffer in response to this event, using uart_rx_buf_rsp function
    	 *
    	 * If uart_rx_buf_rsp is not called before current buffer
    	 * is filled up, receiving will stop.
    	 */
    	UART_RX_BUF_REQUEST,

    So you need to provide a buffer via uart_rx_buf_rsp(). 

    The reason you need to provide another buffer is that the uart is double buffered. 

    Try something like this:

    uint8_t my_uart_buffer[100];
    
    static void uart_callback(const struct device *const uart_dev, struct uart_event *const event,
    			  void *const user_data)
    {
        
        switch (event->type) {
            case UART_RX_BUF_REQUEST:
                LOG_INF("uart cb type UART_RX_BUF_REQUEST");
                uart_rx_buf_rsp(uart_dev, my_uart_buffer, sizeof(my_uart_buffer));
                break;
            default:
                LOG_INF("uart cb type %d", event->type);
                break;
        }
    
    }
    
    static void work_handler(struct k_work *work)
    {
    	ARG_UNUSED(work);
    
    	int ret;
    
    	ret = uart_callback_set(uart_dev, uart_callback, NULL);
    	if (ret) {
    		LOG_ERR("Failed to set UART driver callback: %d", ret);
    		return;
    	}
    
    	//static uint8_t memory[100];
        ret = uart_rx_enable(uart_dev, my_uart_buffer, sizeof(my_uart_buffer), SYS_FOREVER_US);
    	if (ret) {
    		LOG_ERR("Failed to enable UART RX: %d", ret);
    		return;
    	}
    
    	LOG_INF("UART initialized");
    }

    That worked in my case. 

    You may notice that I am just using the same buffer again, so this means that it will use the same buffer when the buffer is full. So if you receive data too fast, it may start overwriting the start of the buffer before you get to read it out. To work around this, either see how it is done in the peripheral_uart sample, or use a double buffer, where you change what buffer is being provided in the UART_RX_BUF_REQUEST event. 

    Best regards,

    Edvin

Related