Zephyr crashing when I connect to a device

  has been helping me connect to a device and I am finally able to do that, but as soon as I do, Zephyr crashes.

This is with a lightly modified central_uart project

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: Scan module initialized<0x1b>[0m␍␊

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: NUS Client module initialized<0x1b>[0m␍␊

[00:00:00.029,846] <0x1b>[0m<inf> central_uart: Scanning successfully started<0x1b>[0m␍␊

[00:00:00.093,719] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

[00:00:00.289,886] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

[00:00:00.290,039] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

[00:00:00.290,100] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

[00:00:00.441,497] <0x1b>[0m<inf> central_uart: MTU exchange done<0x1b>[0m␍␊

ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433␍␊

⇥ invalid parameters␍␊

␍␊

[00:00:01.791,351] <0x1b>[0m<inf> central_uart: Service discovery completed<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00001539 r2/a3: 0x20000b60<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00000000 r14/lr: 0x000217df<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: xpsr: 0x21000000<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

[00:00:01.801,086] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

[00:00:01.801,116] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

[00:00:01.869,415] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

The program crashes here:

 

return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
addr->a.val[5], addr->a.val[4], addr->a.val[3],
addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

Which is in
addr.h (sic) Quite what code is doing in a header file is beyond me.

The prototype for the function is this:

int bt_addr_le_to_str(const bt_addr_le_t *addr, char *str, size_t len);

The values passed look reasonable to me.
addr->a.val[0]-[5] have the correct MAC address in it
str has (in decimal) 23 0 ... in it so that looks like it's just got a single character and anyway,
it shouldn't make a difference because, surely, that's being written into.

len = 30

type ends up with "random" in it.

I am assuming
snprintk is based on
snprint and so shouldn't copy more than len bytes anyway.

The string being used is that is passed into bt_addr_le_to_str is:
char addr[BT_ADDR_LE_STR_LEN];
and BT_ADDR_LE_STR_LEN is set to 30 so that all stacks up nicely.
Can anyone give me some pointers as to what might be wrong here?
BTW   that little project you sent me does not crash like this, but I can't
see where you set up the characteristics for that project, there is no reference to
BT_GATT_SERVICE_DEFINE anywhere in it. I would happily just mod that if I could work out how you've set up
its service and characteristics.
Thanks


Parents
  • Hello,

    ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433

    This error message shows that an assertion is raised in gatt.c at this line: https://github.com/nrfconnect/sdk-zephyr/blob/0bc3393fb112ec80ebeab48cd023d69b1e9db757/subsys/bluetooth/host/gatt.c#L5433 which suggests that the 'conn' pointer passed to your bt_gatt_subscribe() call is set to NULL. 

    hat little project you sent me does not crash like this, but I can't
    see where you set up the characteristics for that project, there is no reference to

    I copied the NUS service and NUS client source files and placed them in the project's source directory (I also renamed every occurrence of 'nus' to 'custom').

    Best regards,

    Vidar

  • Morning,

    I did look with the debugger at that pointer and that's not the issue.

    The crash doesn't come until snprintk in bt_addr_le_to_str

    and as I say, in the first post, The values in passed to:

    return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
    addr->a.val[5], addr->a.val[4], addr->a.val[3],
    addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

    Look fine:



    This smacks more of a stack overflow to me,
    but I've massively increased the main stack size in prj.conf
    to no or little avail.

    The message on crash changes slightly:

    [00:00:07.404,510] <0x1b>[1;31m<err> os: Fault during interrupt handling␊

    <0x1b>[0m␍␊

    [00:00:07.404,571] <0x1b>[1;31m<err> os: Current thread:

    0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    This looks like it;s some sort bluetooth receive issue? However, all I'm doing is

    connecting, no actual data is being sent as yet.

    On the subject of your code, it's not the peripheral_uart I'm trying, it's the

    central_uart which doesn't have those definitions in it. .

     

Reply
  • Morning,

    I did look with the debugger at that pointer and that's not the issue.

    The crash doesn't come until snprintk in bt_addr_le_to_str

    and as I say, in the first post, The values in passed to:

    return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
    addr->a.val[5], addr->a.val[4], addr->a.val[3],
    addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

    Look fine:



    This smacks more of a stack overflow to me,
    but I've massively increased the main stack size in prj.conf
    to no or little avail.

    The message on crash changes slightly:

    [00:00:07.404,510] <0x1b>[1;31m<err> os: Fault during interrupt handling␊

    <0x1b>[0m␍␊

    [00:00:07.404,571] <0x1b>[1;31m<err> os: Current thread:

    0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    This looks like it;s some sort bluetooth receive issue? However, all I'm doing is

    connecting, no actual data is being sent as yet.

    On the subject of your code, it's not the peripheral_uart I'm trying, it's the

    central_uart which doesn't have those definitions in it. .

     

Children
  • Morning,

    I did look with the debugger at that pointer and that's not the issue.

    Are you still seeing the same assertion in the crash log?

    The crash doesn't come until snprintk in bt_addr_le_to_str

    and as I say, in the first post, The values in passed to:

    return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
    addr->a.val[5], addr->a.val[4], addr->a.val[3],
    addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

    Does the crash log report a different error now?

    This smacks more of a stack overflow to me,

    A stack overflow should be caught by the stack overflow detection mechanism, and the crash log will indicate that a stack overflow occurred and specify the thread it occurred in.

    On the subject of your code, it's not the peripheral_uart I'm trying, it's the

    central_uart which doesn't have those definitions in it. .

    My central device is only implementing the GATT client for your service. The service itself is implemented on the GATT server with the BT_GATT_SERVICE_DEFINE macro.

  • Are you still seeing the same assertion in the crash log?


    This is what I get after increasing the stack size (if that's what you're asking)


    00:00:01.270,050] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/bluetooth/controller/hci_driver.c:311␍␊

    ⇥ SoftDevice Controller ASSERT: 53, 238␍␊

    ␍␊

    [00:00:07.404,357] <0x1b>[1;31m<err> os: ***** HARD FAULT *****<0x1b>[0m␍␊

    [00:00:07.404,388] <0x1b>[1;31m<err> os: Fault escalation (see below)<0x1b>[0m␍␊

    [00:00:07.404,388] <0x1b>[1;31m<err> os: ARCH_EXCEPT with reason 4␊

    <0x1b>[0m␍␊

    [00:00:07.404,418] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00000137 r2/a3: 0x20000b60<0x1b>[0m␍␊

    [00:00:07.404,449] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00501bcf r14/lr: 0x000266bf<0x1b>[0m␍␊

    [00:00:07.404,449] <0x1b>[1;31m<err> os: xpsr: 0x21000018<0x1b>[0m␍␊

    [00:00:07.404,479] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

    [00:00:07.404,510] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

    [00:00:07.404,510] <0x1b>[1;31m<err> os: Fault during interrupt handling␊

    <0x1b>[0m␍␊

    [00:00:07.404,571] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    [00:00:07.475,585] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

    My central device is only implementing the GATT client for your service. The service itself is implemented on the GATT server with the BT_GATT_SERVICE_DEFINE macro.

    I thin I may be misunderstanding or miscommunicating something here.

    If I'm in the central_uart project you sent me (central is what I need to get going) there is no reference to BT_GATT_SERVICE_DEFINE anywhere. Yes, it's in the peripheral_uart project, but not in central, so I don't understand how your central_uart you are setting up the characteristics.

  • This is what I get after increasing the stack size (if that's what you're asking)


    00:00:01.270,050] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/bluetooth/controller/hci_driver.c:311␍␊

    ⇥ SoftDevice Controller ASSERT: 53, 238␍␊

    Yes, this assertion is raised by the softdevice controller because it exceeded the allocated time for a scheduled timeslot event. Did you get this error while debugging the code? This issue typically happens if you single step through the code or resume execution after hitting a breakpoint. When the debugger halts the CPU, the bluetooth controller cannot meet its real time requirements.

    If I'm in the central_uart project you sent me (central is what I need to get going) there is no reference to BT_GATT_SERVICE_DEFINE anywhere. Yes, it's in the peripheral_uart project, but not in central, so I don't understand how your central_uart you are setting up the characteristics.

    In this case, the custom service/GATT server is implemented on the peripheral side, while the central device operates in the GATT client role. Therefore, the central device does not declare the service in its attribute table.

    https://academy.nordicsemi.com/courses/bluetooth-low-energy-fundamentals/lessons/lesson-1-bluetooth-low-energy-introduction/topic/att-gatt-data-representation-and-exchange/ 

  • OK, it is apparent I have a big gap (no pun intended) in my client side knowledge. As soon  as you say it though it is obvious that the client can't be expected to know what's going on in all the peripherals. I suspect the best bet here might be to work with your project and see if I can work out how it functions.

  • OK, and FWIW, yes I had a breakpoint and without it, I get this:

    Starting Bluetooth Central UART example␍␊

    [00:00:03.093,292] <0x1b>[0m<inf> fs_nvs: 2 Sectors of 4096 bytes<0x1b>[0m␍␊

    [00:00:03.093,292] <0x1b>[0m<inf> fs_nvs: alloc wra: 0, fb8<0x1b>[0m␍␊

    [00:00:03.093,322] <0x1b>[0m<inf> fs_nvs: data wra: 0, 88<0x1b>[0m␍␊

    [00:00:03.093,475] <0x1b>[0m<inf> bt_sdc_hci_driver: SoftDevice Controller build revision: ␍␊

    fe 2c f9 6a 7f 36 22 2e a0 79 c0 40 be 2c 03 20 |.,.j.6". .y.@.,. ␍␊

    40 c2 f3 32 |@..2 <0x1b>[0m␍␊

    [00:00:03.097,717] <0x1b>[0m<inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)<0x1b>[0m␍␊

    [00:00:03.097,747] <0x1b>[0m<inf> bt_hci_core: HW Variant: nRF52x (0x0002)<0x1b>[0m␍␊

    [00:00:03.097,778] <0x1b>[0m<inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 254.63788 Build 573996906<0x1b>[0m␍␊

    [00:00:03.098,480] <0x1b>[0m<inf> bt_hci_core: No ID address. App must call settings_load()<0x1b>[0m␍␊

    [00:00:03.098,510] <0x1b>[0m<inf> central_uart: Bluetooth initialized<0x1b>[0m␍␊

    [00:00:03.099,487] <0x1b>[0m<inf> bt_hci_core: Identity: D9:AD:8D:A9:1A:A2 (random)<0x1b>[0m␍␊

    [00:00:03.099,517] <0x1b>[0m<inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x104e, manufacturer 0x0059<0x1b>[0m␍␊

    [00:00:03.099,578] <0x1b>[0m<inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x104e<0x1b>[0m␍␊

    [00:00:03.100,952] <0x1b>[0m<inf> central_uart: Scan module initialized<0x1b>[0m␍␊

    [00:00:03.100,952] <0x1b>[0m<inf> central_uart: NUS Client module initialized<0x1b>[0m␍␊

    [00:00:03.105,987] <0x1b>[0m<inf> central_uart: Scanning successfully started<0x1b>[0m␍␊

    [00:00:03.414,184] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    [00:00:03.604,248] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

    [00:00:03.604,370] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

    [00:00:03.604,431] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

    [00:00:03.855,590] <0x1b>[1;33m<wrn> bt_conn: conn 0x20002640 failed to establish. RF noise?<0x1b>[0m␍␊

    [00:00:03.855,926] <0x1b>[1;33m<wrn> central_uart: MTU exchange failed (err 14)<0x1b>[0m␍␊

    [00:00:03.855,957] <0x1b>[0m<inf> central_uart: Service not found<0x1b>[0m␍␊

    [00:00:03.856,140] <0x1b>[0m<inf> central_uart: Disconnected: E8:A0:DB:24:21:62 (random), reason 0x3e <0x1b>[0m␍␊

    [00:00:03.987,030] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    [00:00:04.183,074] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

    [00:00:04.183,166] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

    [00:00:04.183,258] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

    [00:00:04.334,655] <0x1b>[0m<inf> central_uart: MTU exchange done<0x1b>[0m␍␊

    ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433␍␊

    ⇥ invalid parameters␍␊

    ␍␊

    [00:00:05.684,509] <0x1b>[0m<inf> central_uart: Service discovery completed<0x1b>[0m␍␊

    [00:00:05.694,183] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00001539 r2/a3: 0x20000b60<0x1b>[0m␍␊

    [00:00:05.694,213] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00000000 r14/lr: 0x000217df<0x1b>[0m␍␊

    [00:00:05.694,213] <0x1b>[1;31m<err> os: xpsr: 0x21000000<0x1b>[0m␍␊

    [00:00:05.694,213] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

    [00:00:05.694,274] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

    [00:00:05.694,305] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    [00:00:05.747,711] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

Related