BLE scanning base on lte_ble_gateway and app crash

Hello, I made modifications based on the nrf9160 lte_ble_gateway demo. I modified the BLE.c file and changed the BLE connection to BLE scanning.
The scanning algorithm is as follows: idle for 1000ms, then scan for 90ms, and repeat the cycle. To ensure that the BLE scan time is not delayed for a long time, I use the K_TIMER timer. I submit work in the K_TIMER callback, and then start and stop Bluetooth scanning in the work.

I found that after scanning for a period of time, about 10-20 minutes, the application may crash.
In addition, there are many Bluetooth devices in my surrounding environment. Approximately 20 broadcast messages can be scanned within 90ms.

My own product has encountered this issue. I also reproduced this issue on the nrf9160dk_nrf9160(PCA10090 1.0.0) development board, with NCS version 2.2. The LTE modem version is 1.3.6.

Modify on lte_ble_gateway:

1. Modify the ble.c file, please see the attachment.

2. Add define in prj.conf. (CONFIG_BT_BUF_ACL_RX_SIZE=251)

3. Change the uart speed from 1000000 to 115200. (current-speed = <115200> in nrf9160dk_nrf9160_ns.overlay)

 52840 is based on hci_lpuart and only modifies the UART rate to 115200.

I tried to increase CONFIG_BT_BUF_ACL_RX_SIZE, but still encountered a crash. I am trying to reduce the number of BLE devices in the surrounding area, which will lower the probability of this problem occurring.  I speculate that it may be due to receiving a large number of Bluetooth broadcasts that the internal cache of the HCI driver is insufficient?

I hope you can help. thank you!

------------------------------

The crush log on nrf9160dk was:
[00:13:32.103,240] [1;31m<err> bt_driver: TX interrupt but no pending buffer![0m
[22:32:38.840]收←◆ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:335
k_sem_take failed with err -11
[00:13:42.115,631] [1;31m<err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000000[0m
[00:13:42.115,661] [1;31m<err> os: r3/a4: 0x00018b61 r12/ip: 0x00000000 r14/lr: 0x000119d7[0m
[00:13:42.115,661] [1;31m<err> os: xpsr: 0x41000000[0m
[00:13:42.115,692] [1;31m<err> os: Faulting instruction address (r15/pc): 0x000119e2[0m
[00:13:42.115,722] [1;31m<err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0[0m
[00:13:42.115,753] [1;31m<err> os: Current thread: 0x2000ef20 (unknown)[0m
[00:13:42.160,675] [1;31m<err> fatal_error: Resetting system[0m\0
[22:32:39.667]收←◆*** Booting Zephyr OS build v3.2.99-ncs1 ***

/*
 * Copyright (c) 2018 Nordic Semiconductor ASA
 *
 * SPDX-License-Identifier: LicenseRef-Nordic-5-Clause
 */

#include <zephyr/kernel.h>

#include <zephyr/bluetooth/bluetooth.h>
#include <zephyr/bluetooth/gatt.h>
#include <zephyr/bluetooth/uuid.h>
#include <bluetooth/gatt_dm.h>
#include <bluetooth/scan.h>

#include <dk_buttons_and_leds.h>
#include <zephyr/sys/byteorder.h>

#include <net/nrf_cloud.h>
#include <zephyr/logging/log.h>
#include "aggregator.h"

LOG_MODULE_DECLARE(lte_ble_gw);


static void connected(struct bt_conn *conn, uint8_t conn_err)
{
}

static struct bt_conn_cb conn_callbacks = {
	.connected = connected,
};


static int gap_scan_num = 0;
static void ble_scan_no_filter_match(struct bt_scan_device_info *device_info, bool connectable)
{
	gap_scan_num++;
}

static void scan_timer_handler(struct k_timer *dummy);

BT_SCAN_CB_INIT(scan_cb, NULL, ble_scan_no_filter_match, NULL, NULL);
K_TIMER_DEFINE(scan_timer_id, scan_timer_handler, NULL);
static struct k_work gap_scan_worker;
static void scan_work_callback(struct k_work *work);

bool gap_is_scanning = false;
#define SCAN_INTERVAL_MS 1000

static void scan_timer_handler(struct k_timer *dummy)
{
	if (gap_is_scanning)
    {
    	k_timer_start(&scan_timer_id, K_MSEC(SCAN_INTERVAL_MS), K_NO_WAIT);
    }
    else
    {
        k_timer_start(&scan_timer_id, K_MSEC(90), K_NO_WAIT);
    }
    
	k_work_submit(&gap_scan_worker);
}


static struct bt_le_scan_param gap_scan_param =
{
	.type = BT_LE_SCAN_TYPE_PASSIVE,
	.options = (BT_LE_SCAN_OPT_FILTER_DUPLICATE),
	.interval = 0x10,
	.window = 0x10,
};

static struct bt_scan_init_param gap_scan_init =
{
	.connect_if_match = 0,
	.scan_param = &gap_scan_param,
	.conn_param = BT_LE_CONN_PARAM_DEFAULT,
};


static void scan_start(void)
{
	bt_scan_init(&gap_scan_init);
	bt_scan_cb_register(&scan_cb);

	k_timer_start(&scan_timer_id, K_MSEC(5000), K_NO_WAIT);
	LOG_INF("Scanning...");
}

static void ble_ready(int err)
{
	LOG_INF("Bluetooth ready");
	bt_conn_cb_register(&conn_callbacks);
	scan_start();
}

static void scan_work_callback(struct k_work *work)
{    	
	static uint32_t gap_total_start = 0;

	gap_total_start++;
    if (gap_is_scanning)
    {   
        int err = bt_scan_stop();
        if (err)
	    {
	        LOG_ERR("Fail to stop scan, err %d", err);
	        return;
	    }
	    else
	    {
	    	LOG_INF("Stop scan success:%d, num:%d", gap_scan_num, gap_total_start);
	    }
	    gap_is_scanning = false;
    }
    else
    {	
		gap_scan_num = 0;
        //check if need refresh scannning cfg before start scanning
        int err = bt_scan_start(BT_SCAN_TYPE_SCAN_PASSIVE);
	    if (err)
	    {
	    	LOG_ERR("Fail to start scan, err %d", err);
	    	return;
	    }
	    else
	    {
	    	LOG_INF("start scan success");
	    }
	    gap_is_scanning = true;
    }
}

/*
void nrf_ble_hardware_reset(void)
{
}
*/

void ble_init(void)
{
	int err;

	LOG_INF("Initializing Bluetooth..");
	err = bt_enable(ble_ready);
	if (err) {
		LOG_ERR("Bluetooth init failed (err %d)", err);
		return;
	}
	
    k_work_init(&gap_scan_worker, scan_work_callback);
}
9160dk_crash_log.txt

  • Hi,

    If there was an error on the controller side (nRF52), I would have expected that you would see an error and typically a reset in the log. The assert that you see on the nRF91 (host) is related to a mtimeout on senidng a HCI command though, so somethig is causing a breakdown of communication between the hos tand controller. Perhasp it would make sense to increase the CONFIG_BT_BUF_CMD_TX_COUNT?

    If not, then I suggest enabling more logging on both sides to see if we can learn more about what is happening. Perhaps start with CONFIG_BT_HCI_CORE_LOG_LEVEL_DBG=y on the nRF91?

  • Hi, 

    I have added the following configuration on nrf91, increasing TX COUNT to 12. After a period of Bluetooth scanning, the app still crashed. Is there any other possible reason? thanks!

    //project configuration:
    CONFIG_LOG_BUFFER_SIZE=8192
    CONFIG_BT_BUF_CMD_TX_COUNT=12
    CONFIG_BT_DEBUG_LOG=y
    CONFIG_BT_DEBUG_HCI_CORE=y
    Disabled LTE calling in the test:
    //work_init();
    //cloud_init();
    //modem_configure();
    //cloud_connect(NULL);

    crash log

    ----------------------------------------------------------------

    [00:04:10.581,329] [0m<dbg> bt_hci_core: process_events: ev->state 0[0m
    [00:04:10.581,481] [0m<dbg> bt_hci_core: hci_tx_thread: Calling k_poll with 2 events[0m
    [00:04:11.263,366] [1;33m<wrn> lpuart: uart_callback: Uart events flipped.[0m
    ASSERTION FAIL [data->rx_state == RX_ACTIVE] @ WEST_TOPDIR/nrf/drivers/serial/uart_nrf_sw_lpuart.c:435
    [00:04:17.907,989] [1;31m<err> os: esf_dump: r0/a1: 0x00000004 r1/a2: 0x000001b3 r2/a3: 0x00000000[0m
    [00:04:17.907,989] [1;31m<err> os: esf_dump: r3/a4: 0x00018211 r12/ip: 0x00000000 r14/lr: 0x0001929b[0m
    [00:04:17.908,020] [1;31m<err> os: esf_dump: xpsr: 0x41000041[0m
    [00:04:17.908,050] [1;31m<err> os: esf_dump: Faulting instruction address (r15/pc): 0x0002c320[0m
    [00:04:17.908,081] [1;31m<err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0[0m
    [00:04:17.908,081] [1;31m<err> os: z_fatal_error: Fault during interrupt handling
    [0m
    [00:04:17.908,142] [1;31m<err> os: z_fatal_error: Current thread: 0x200107e8 (unknown)[0m
    [00:04:17.966,949] [1;31m<err> fatal_error: k_sys_fatal_error_handler: Resetting system[0m
    *** Booting Zephyr OS build v3.2.99-ncs1 ***

    lte_ble_gateway_debug_log_0919.txt

  • Hi,

    I am sorry for the delay.

    The log here shows an assert that also indicate a problem on the controller side (nrf52). Have you been able to enable logging on that side? It may be simplest to enable debugging of the nRF and select RTT as logg backend in this case. The question is what happens on the nRF52 side and next, what we can do about it.

    PS: the nRF52840 on the nRF9160 DK mainly functions as a board controller, and the gateway sample was made as a demo for the Thingy:91 that can also be built on the DK. So it should be regarded more as a proof of concept than anything else.

  • Thank you for your suggestion. I have change the prj file of the hci_lpuart project like follow:
    # Segger RTT enable
    CONFIG_LOG=y
    CONFIG_USE_SEGGER_RTT=y
    CONFIG_RTT_CONSOLE=y
    CONFIG_LOG_BACKEND_RTT=y

    I have enabled UART log tracking for hci_lpuart, but I found that when there is an exception in the lte_ble_gateway, hci_lpuart only has one restart log. I think this is because the lte-ble_gateway reset it. Do I need to enable the debug log for hci_lpuart?

    the log of hci_lpuart 

    -----------------------------------

    [09:10:36.310] RX←◆[00:00:00.000,152] [0m<inf> board_control: vcom0_pins_routing is ENABLED[0m
    [00:00:00.000,213] [0m<inf> board_control: vcom2_pins_routing is ENABLED[0m
    [00:00:00.000,244] [0m<inf> board_control: led1_pin_routing is ENABLED[0m
    [00:00:00.000,305] [0m<inf> board_control: led2_pin_routing is ENABLED[0m
    [00:00:00.000,335] [0m<inf> board_control: led3_pin_routing is ENABLED[0m
    [00:00:00.000,366] [0m<inf> board_control: led4_pin_routing is ENABLED[0m
    [00:00:00.000,427] [0m<inf> board_control: switch1_pin_routing is ENABLED[0m
    [00:00:00.000,457] [0m<inf> board_control: switch2_pin_routing is ENABLED[0m
    [00:00:00.000,518] [0m<inf> board_control: button1_pin_routing is ENABLED[0m
    [00:00:00.000,549] [0m<inf> board_control: button2_pin_routing is ENABLED[0m
    [00:00:00.000,610] [0m<inf> board_control: nrf_interface_pins_0_2_routing is ENABLED[0m
    [00:00:00.000,640] [0m<inf> board_control: nrf_interface_pins_3_5_routing is ENABLED[0m
    [00:00:00.000,701] [0m<inf> board_control: nrf_interface_pins_6_8_routing is disabled[0m
    [00:00:00.000,732] [0m<inf> board_control: GPIO reset line enabled on pin P1.02, holding...[0m
    [00:00:00.000,762] [0m<inf> board_control: Board configured.[0m
    *** Booting Zephyr OS build v3.2.99-ncs1 ***
    [00:00:00.001,251] [0m<inf> sdc_hci_driver: SoftDevice Controller build revision:
    6d 90 41 2a 38 e8 ad 17 29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..
    1f 85 d8 e1 |.... [0m
    [00:00:00.001,861] [0m<inf> bt_hci_raw: Bluetooth enabled in RAW mode[0m

    [09:26:50.049]RX←◆[00:00:00.000,152] [0m<inf> board_control: vcom0_pins_routing is ENABLED[0m
    [00:00:00.000,213] [0m<inf> board_control: vcom2_pins_routing is ENABLED[0m
    [00:00:00.000,274] [0m<inf> board_control: led1_pin_routing is ENABLED[0m
    [00:00:00.000,305] [0m<inf> board_control: led2_pin_routing is ENABLED[0m
    [00:00:00.000,335] [0m<inf> board_control: led3_pin_routing is ENABLED[0m
    [00:00:00.000,396] [0m<inf> board_control: led4_pin_routing is ENABLED[0m
    [00:00:00.000,427] [0m<inf> board_control: switch1_pin_routing is ENABLED[0m
    [00:00:00.000,488] [0m<inf> board_control: switch2_pin_routing is ENABLED[0m
    [00:00:00.000,518] [0m<inf> board_control: button1_pin_routing is ENABLED[0m
    [00:00:00.000,579] [0m<inf> board_control: button2_pin_routing is ENABLED[0m
    [00:00:00.000,610] [0m<inf> board_control: nrf_interface_pins_0_2_routing is ENABLED[0m
    [00:00:00.000,671] [0m<inf> board_control: nrf_interface_pins_3_5_routing is ENABLED[0m
    [00:00:00.000,701] [0m<inf> board_control: nrf_interface_pins_6_8_routing is disabled[0m
    [00:00:00.000,732] [0m<inf> board_control: GPIO reset line enabled on pin P1.02, holding...[0m
    [00:00:00.000,762] [0m<inf> board_control: Board configured.[0m
    *** Booting Zephyr OS build v3.2.99-ncs1 ***
    [00:00:00.001,251] [0m<inf> sdc_hci_driver: SoftDevice Controller build revision:
    6d 90 41 2a 38 e8 ad 17 29 a5 03 38 39 27 d7 85 |m.A*8... )..89'..
    1f 85 d8 e1 |.... [0m
    [00:00:00.001,861] [0m<inf> bt_hci_raw: Bluetooth enabled in RAW mode[0m

    the log of lte_ble_gateway

    -------------------------------------------------------------------


    [09:10:32.618]RX←◆*** Booting Zephyr OS build v3.2.99-ncs1 ***

    [09:10:33.623]RX←◆[00:00:00.543,884] [0m<inf> lte_ble_gw: LTE Sensor Gateway sample started[0m
    [00:00:00.544,097] [0m<inf> lte_ble_gw: Initializing Bluetooth..[0m

    [09:10:37.327]RX←◆[00:00:04.250,335] [1;33m<wrn> bt_hci_core: opcode 0x0000 pool id 3 pool 0x2000de4c != &hci_cmd_pool 0x2000deb4[0m
    [00:00:04.346,374] [0m<inf> lte_ble_gw: Bluetooth ready[0m
    [00:00:04.346,405] [0m<inf> lte_ble_gw: Scanning...[0m

    [09:10:42.437]RX←◆[00:00:09.359,863] [0m<inf> lte_ble_gw: start scan success[0m
    [00:00:09.449,188] [0m<inf> lte_ble_gw: Stop scan success:28, num:2[0m

    [09:10:43.526]RX←◆[00:00:10.449,920] [0m<inf> lte_ble_gw: start scan success[0m
    [00:00:10.540,069] [0m<inf> lte_ble_gw: Stop scan success:31, num:4[0m

    [09:10:44.616]RX←◆[00:00:11.539,886] [0m<inf> lte_ble_gw: start scan success[0m
    [00:00:11.637,542] [0m<inf> lte_ble_gw: Stop scan success:28, num:6[0m

    [09:10:45.706]RX←◆[00:00:12.629,943] [0m<inf> lte_ble_gw: start scan success[0m
    [00:00:12.713,134] [0m<inf> lte_ble_gw: Stop scan success:27, num:8[0m

    ... I have omitted some same logs

    ..............................................................

    [09:26:29.630]RX←◆[00:15:56.593,475] [0m<inf> lte_ble_gw: start scan success[0m
    [00:15:56.676,361] [0m<inf> lte_ble_gw: Stop scan success:21, num:1740[0m

    [09:26:30.720]RX←◆[00:15:57.683,502] [0m<inf> lte_ble_gw: start scan success[0m
    [00:15:57.787,109] [0m<inf> lte_ble_gw: Stop scan success:34, num:1742[0m

    [09:26:31.812]RX←◆[00:15:58.773,529] [0m<inf> lte_ble_gw: start scan success[0m
    [00:15:58.857,940] [0m<inf> lte_ble_gw: Stop scan success:21, num:1744[0m

    [09:26:32.901]RX←◆[00:15:59.86
    [09:26:32.940]RX←◆3,525] [0m<inf> lte_ble_gw: start scan success[0m
    [00:15:59.971,649] [0m<inf> lte_ble_gw: Stop scan success:37, num:1746[0m

    [09:26:33.991]RX←◆[00:16:00.953,613] [0m<inf> lte_ble_gw: start scan success[0m
    [00:16:01.041,534] [0m<inf> lte_ble_gw: Stop scan success:28, num:1748[0m

    [09:26:35.081]RX←◆[00:16:02.043,640] [0m<inf> lte_ble_gw: start scan success[0m
    [00:16:02.140,380] [0m<inf> lte_ble_gw: Stop scan success:31, num:1750[0m

    [09:26:36.171]RX←◆[00:16:03.133,636] [0m<inf> lte_ble_gw: start scan success[0m
    [00:16:03.221,740] [0m<inf> lte_ble_gw: Stop scan success:29, num:1752[0m

    [09:26:37.260]RX←◆[00:16:04.223,663] [0m<inf> lte_ble_gw: start scan success[0m
    [00:16:04.311,218] [0m<inf> lte_ble_gw: Stop scan success:26, num:1754[0m

    [09:26:38.352]RX←◆[00:16:05.313,781] [0m<inf> lte_ble_gw: start scan success[0m
    [00:16:05.400,054] [0m<inf> lte_ble_gw: Stop scan success:26, num:1756[0m

    [09:26:39.440]RX←◆[00:16:06.403,686] [0m<inf> lte_ble_gw: start scan success[0m
    [00:16:06.480,438] [1;31m<err> bt_driver: TX interrupt but no pending buffer![0m
    [00:16:06.480,438] [1;31m<err> bt_driver: TX interrupt but no pending buffer![0m

    [09:26:48.518]RX←◆ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:329
    k_sem_take failed with err -11
    [00:16:16.489,959] [1;33m<wrn> bt_driver: Discarding event 0x3e[0m
    [00:16:16.491,088] [1;31m<err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x00000000[0m
    [00:16:16.491,088] [1;31m<err> os: r3/a4: 0x00016e71 r12/ip: 0x00000000 r14/lr: 0x00010677[0m
    [00:16:16.491,119] [1;31m<err> os: xpsr: 0x41000000[0m
    [00:16:16.491,119] [1;31m<err> os: Faulting instruction address (r15/pc): 0x00010682[0m
    [00:16:16.491,149] [1;31m<err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0[0m
    [00:16:16.491,180] [1;31m<err> os: Current thread: 0x200108f8 (unknown)[0m
    [00:16:16.542,175] [1;31m<err> fatal_error: Resetting system[0m\0
    [09:26:49.351]RX←◆*** Booting Zep
    [09:26:49.384]RX←◆hyr OS build v3.2.99-ncs1 ***

    [09:26:50.354]RX←◆[00:00:00
    [09:26:50.376]RX←◆.501,708] [0m<inf> lte_ble_gw: LTE Sensor Gateway sample started[0m

Related