Peripheral crashes when Central powers off

Hi,

My fw running on nRF52840 (ncs2.4.0, zephyr) develops from HRS sample application. It acts as a peripheral.

In this project, I have a dedicated thread to receiving data from a queue, then notify to the connected Central

void func_thread()
{
    while (k_msgq_get(&uart_msgq, &msg, k_is_in_isr() ? K_NO_WAIT : K_FOREVER) == 0)
    {
        if (is_streaming)
    	{
    		notify_user_data(msg.buf, msg.len);
    	}
        ...
    }
}

`notify_user_data()` goes through some steps and arrives in this function with inputs of an `attr`, `buf` and `len`

static void notify_data(struct bt_gatt_attr *attr, uint8_t *buf, uint8_t len)
{
    bool is_subscribed = bt_gatt_is_subscribed(current_conn, attr, BT_GATT_CCC_NOTIFY);

    if (current_conn == NULL || !is_subscribed)
    {
        LOG_ERR("failed: %d %d", current_conn != NULL, is_subscribed);
        return;
    }

    int err = bt_gatt_notify(NULL, attr, buf, len);
    if (err == 0)
    {
        LOG_INF("succeeded: %d", len);

        k_timer_stop(&streaming_started_timer);
    }
    else
    {
        print_attr_uuid(attr);
        LOG_ERR("err: %d", err);
    }
}

static void on_disconnected(struct bt_conn *conn, uint8_t reason)
{
    // C:\ncs\v2.4.0\zephyr\include\zephyr\bluetooth\hci_err.h
    LOG_INF("Disconnected (reason 0x%02x)\n", reason);

    bt_conn_unref(current_conn);
    current_conn = NULL;
    current_conn_handle = 0xffff;

    ...
}

Here is the problem: sometimes my fw crashes, sometimes it does not when I power off the connected Central: ESP32 or an iOS device

When it crashes with ESP32:

notify_ecg: ecg: 254 125
notify_data: succeeded 125 bytes
notify_data: succeeded 125 bytes
notify_data: succeeded 10 bytes
notify_ecg: ecg: 254 125
chr_ccc_dat_ecg_changed: DAT1 Notification disabled
chr_ccc_dat_disease_changed: DAT2 Notification disabled
on_disconnected: Disconnected (reason 0x08)

ecg_event: event: 2
<wrn> bt_att: att_get: Not connected
print_attr_uuid: attr uuid: dd02
notify_data: err: -128
<err> os: bus_fault: ***** BUS FAULT *****
<err> os: bus_fault:   Imprecise data bus error
<err> os: esf_dump: r0/a1:  0x2001e1e2  r1/a2:  0x20030a2b  r2/a3:  0x2000ec0a
<err> os: esf_dump: r3/a4:  0x20040000 r12/ip:  0x00000000 r14/lr:  0x00026bd9
<err> os: esf_dump:  xpsr:  0x21000000
<err> os: esf_dump: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x00000000
<err> os: esf_dump: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0x00000000  s[ 7]:  0x00000000
<err> os: esf_dump: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x00000000  s[11]:  0x00000000
<err> os: esf_dump: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00000000
<err> os: esf_dump: fpscr:  0xaaaaaaaa
<err> os: esf_dump: Faulting instruction address (r15/pc): 0x00057480
<err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 26: Unknown error on CPU 0
<err> os: z_fatal_error: Current thread: 0x200028e8 (uart_pro_task)
<err> fatal_error: k_sys_fatal_error_handler: Resetting system

When it crashes with an iOS device:

notify_data: 244
notify_data: 14
chr_ccc_dat_ecg_changed: DAT1 Notification disabled
chr_ccc_dat_disease_changed: DAT2 Notification disabled
on_disconnected: Disconnected (reason 0x08)

ecg_event: event: 2
<wrn> bt_att: att_get: Not connected
print_attr_uuid: attr uuid: dd02
notify_data: err: -128
ASSERTION FAIL [conn]
        invalid parameter

[00:00:25.506,256] <err> os: esf_dump: r0/a1:  0x00000004  r1/a2:  0x00000000  r2/a3:  0x00000009
[00:00:25.506,286] <err> os: esf_dump: r3/a4:  0x20002cc8 r12/ip:  0x0000000c r14/lr:  0x00037823
[00:00:25.506,317] <err> os: esf_dump:  xpsr:  0x41000000
[00:00:25.506,347] <err> os: esf_dump: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x00000000
[00:00:25.506,347] <err> os: esf_dump: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0x00000000  s[ 7]:  0x00000000
[00:00:25.506,378] <err> os: esf_dump: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x00000000  s[11]:  0x00000000
[00:00:25.506,408] <err> os: esf_dump: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00000000
[00:00:25.506,408] <err> os: esf_dump: fpscr:  0x2001e1d0
[00:00:25.506,439] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00050670
[00:00:25.506,469] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:00:25.506,530] <err> os: z_fatal_error: Current thread: 0x200028e8 (uart_pro_task)
[00:00:25.750,122] <err> fatal_error: k_sys_fatal_error_handler: Resetting system

When it does not crash:

[00:00:45.407,226] <inf> chr_ccc_dat_ecg_changed: DAT ECG Notification disabled
[00:00:45.407,287] <inf> chr_ccc_dat_disease_changed: DAT Disease Notification disabled
[00:00:45.407,318] <inf> on_disconnected: Disconnected (reason 0x08)

[00:00:45.407,348] <inf> ecg_event: event: 2
[00:00:45.407,501] <wrn> bt_att: att_get: Not connected
[00:00:45.408,233] <inf> print_attr_uuid: attr uuid: dd02
[00:00:49.252,777] <wrn> uart: uart_process: no nand written block 0 page 5

The pattern of this crash: before it crashes, disconnected reason 0x08 is reported via callback `on_disconnected()`, followed by lots of `esf_dump` message. When it crashes with iOS device, I also see `ASSERTION FAIL [conn] invalid parameter`

How should I read the crash message more clearly, instead of `esf_dump` messages? Can you guess any causes of this issue?

Best regards

Quan

Parents
  • Hello,

    It looks like the assert in the "ios" case is raised here because the connection object pointer is NULL: https://github.com/nrfconnect/sdk-zephyr/blob/52292768176a4372eaeb682d191dccd5c6064b66/subsys/bluetooth/host/gatt.c#L2090, which maybe makes sense given the connection has been terminated. However, I'm not sure what could be causing the busfault. 

    I recommend you build the project with CONFIG_RESET_ON_FATAL_ERROR=n, then try to debug your project in VS code. After the fault exception has been raised, you can click on the last stack frame for the 'uart_pro_task' thread to see where the program was running prior to the crash.

    Best regards,

    Vidar  

  • Hi Vidar, 

    Thanks for the link in your last reply, which hints me about the bug in `func_thread()`

    When the connected central disconnects, BLE stack generates an IRQ, which calls to my on_disconnected() callback that was set up in the other module of my project?

    In this link: https://docs.zephyrproject.org/latest/kernel/services/data_passing/message_queues.html

    However, this can increase interrupt latency as interrupts are locked while a data item is written or read.

    While the queue is read by `k_msgq_get()`, the Interrupt comes in, thus, the thread crashes. 

    What do you think about this crash flow?

    By the way, I summarize my code below:

    typedef struct esp_msg
    {
    	short len;
    	uint8_t buf[512];
    } esp_msg_t;
    
    void func_thread1()
    {
        // stack size: 4096
        // prio: 1
        
        while(1)
        {
            esp_msg_t msg;
    		msg.len = 125 * 2 + 4;
    		memcpy(msg.buf, ecg_send_pkt, msg.len);
    		k_msgq_put(&uart_msgq, &msg, K_NO_WAIT);
    		
    		...
    		// sleep
        }
    }
    
    void func_thread2()
    {
        // stack size: 16024, increased from 4096
        // prio:  2
        
        esp_msg_t msg;
    
        while (1)
        {
            int status = k_msgq_get(&uart_msgq, &msg, K_MSEC(500));
    		if (status != 0)
    		{
    			// k_yield(); no need due to K_MSEC(500)
    			continue;
    		}
    		
    		notify_user_data(msg.buf, msg.len);
        }
    }

    In `func_thread2()` I changed from `k_msgq_get(&uart_msgq, &msg, k_is_in_isr() ? K_NO_WAIT : K_FOREVER) == 0` to `

    k_msgq_get(&uart_msgq, &msg, K_MSEC(500))` because I don't think it's needed. However, it still crashes.
    Somehow, k_msgq_get() blocks the system when ISR comes in. That's why it crashes?
    How should I solve this?
    Regards
Reply
  • Hi Vidar, 

    Thanks for the link in your last reply, which hints me about the bug in `func_thread()`

    When the connected central disconnects, BLE stack generates an IRQ, which calls to my on_disconnected() callback that was set up in the other module of my project?

    In this link: https://docs.zephyrproject.org/latest/kernel/services/data_passing/message_queues.html

    However, this can increase interrupt latency as interrupts are locked while a data item is written or read.

    While the queue is read by `k_msgq_get()`, the Interrupt comes in, thus, the thread crashes. 

    What do you think about this crash flow?

    By the way, I summarize my code below:

    typedef struct esp_msg
    {
    	short len;
    	uint8_t buf[512];
    } esp_msg_t;
    
    void func_thread1()
    {
        // stack size: 4096
        // prio: 1
        
        while(1)
        {
            esp_msg_t msg;
    		msg.len = 125 * 2 + 4;
    		memcpy(msg.buf, ecg_send_pkt, msg.len);
    		k_msgq_put(&uart_msgq, &msg, K_NO_WAIT);
    		
    		...
    		// sleep
        }
    }
    
    void func_thread2()
    {
        // stack size: 16024, increased from 4096
        // prio:  2
        
        esp_msg_t msg;
    
        while (1)
        {
            int status = k_msgq_get(&uart_msgq, &msg, K_MSEC(500));
    		if (status != 0)
    		{
    			// k_yield(); no need due to K_MSEC(500)
    			continue;
    		}
    		
    		notify_user_data(msg.buf, msg.len);
        }
    }

    In `func_thread2()` I changed from `k_msgq_get(&uart_msgq, &msg, k_is_in_isr() ? K_NO_WAIT : K_FOREVER) == 0` to `

    k_msgq_get(&uart_msgq, &msg, K_MSEC(500))` because I don't think it's needed. However, it still crashes.
    Somehow, k_msgq_get() blocks the system when ISR comes in. That's why it crashes?
    How should I solve this?
    Regards
Children
Related