Zephyr crashing when I connect to a device

  has been helping me connect to a device and I am finally able to do that, but as soon as I do, Zephyr crashes.

This is with a lightly modified central_uart project

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: Scan module initialized<0x1b>[0m␍␊

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: NUS Client module initialized<0x1b>[0m␍␊

[00:00:00.029,846] <0x1b>[0m<inf> central_uart: Scanning successfully started<0x1b>[0m␍␊

[00:00:00.093,719] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

[00:00:00.289,886] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

[00:00:00.290,039] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

[00:00:00.290,100] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

[00:00:00.441,497] <0x1b>[0m<inf> central_uart: MTU exchange done<0x1b>[0m␍␊

ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433␍␊

⇥ invalid parameters␍␊

␍␊

[00:00:01.791,351] <0x1b>[0m<inf> central_uart: Service discovery completed<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00001539 r2/a3: 0x20000b60<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00000000 r14/lr: 0x000217df<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: xpsr: 0x21000000<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

[00:00:01.801,086] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

[00:00:01.801,116] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

[00:00:01.869,415] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

The program crashes here:

 

return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
addr->a.val[5], addr->a.val[4], addr->a.val[3],
addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

Which is in
addr.h (sic) Quite what code is doing in a header file is beyond me.

The prototype for the function is this:

int bt_addr_le_to_str(const bt_addr_le_t *addr, char *str, size_t len);

The values passed look reasonable to me.
addr->a.val[0]-[5] have the correct MAC address in it
str has (in decimal) 23 0 ... in it so that looks like it's just got a single character and anyway,
it shouldn't make a difference because, surely, that's being written into.

len = 30

type ends up with "random" in it.

I am assuming
snprintk is based on
snprint and so shouldn't copy more than len bytes anyway.

The string being used is that is passed into bt_addr_le_to_str is:
char addr[BT_ADDR_LE_STR_LEN];
and BT_ADDR_LE_STR_LEN is set to 30 so that all stacks up nicely.
Can anyone give me some pointers as to what might be wrong here?
BTW   that little project you sent me does not crash like this, but I can't
see where you set up the characteristics for that project, there is no reference to
BT_GATT_SERVICE_DEFINE anywhere in it. I would happily just mod that if I could work out how you've set up
its service and characteristics.
Thanks


Parents
  • Hello,

    ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433

    This error message shows that an assertion is raised in gatt.c at this line: https://github.com/nrfconnect/sdk-zephyr/blob/0bc3393fb112ec80ebeab48cd023d69b1e9db757/subsys/bluetooth/host/gatt.c#L5433 which suggests that the 'conn' pointer passed to your bt_gatt_subscribe() call is set to NULL. 

    hat little project you sent me does not crash like this, but I can't
    see where you set up the characteristics for that project, there is no reference to

    I copied the NUS service and NUS client source files and placed them in the project's source directory (I also renamed every occurrence of 'nus' to 'custom').

    Best regards,

    Vidar

  • Morning,

    I did look with the debugger at that pointer and that's not the issue.

    The crash doesn't come until snprintk in bt_addr_le_to_str

    and as I say, in the first post, The values in passed to:

    return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
    addr->a.val[5], addr->a.val[4], addr->a.val[3],
    addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

    Look fine:



    This smacks more of a stack overflow to me,
    but I've massively increased the main stack size in prj.conf
    to no or little avail.

    The message on crash changes slightly:

    [00:00:07.404,510] <0x1b>[1;31m<err> os: Fault during interrupt handling␊

    <0x1b>[0m␍␊

    [00:00:07.404,571] <0x1b>[1;31m<err> os: Current thread:

    0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    This looks like it;s some sort bluetooth receive issue? However, all I'm doing is

    connecting, no actual data is being sent as yet.

    On the subject of your code, it's not the peripheral_uart I'm trying, it's the

    central_uart which doesn't have those definitions in it. .

     

  • Morning,

    I did look with the debugger at that pointer and that's not the issue.

    Are you still seeing the same assertion in the crash log?

    The crash doesn't come until snprintk in bt_addr_le_to_str

    and as I say, in the first post, The values in passed to:

    return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
    addr->a.val[5], addr->a.val[4], addr->a.val[3],
    addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

    Does the crash log report a different error now?

    This smacks more of a stack overflow to me,

    A stack overflow should be caught by the stack overflow detection mechanism, and the crash log will indicate that a stack overflow occurred and specify the thread it occurred in.

    On the subject of your code, it's not the peripheral_uart I'm trying, it's the

    central_uart which doesn't have those definitions in it. .

    My central device is only implementing the GATT client for your service. The service itself is implemented on the GATT server with the BT_GATT_SERVICE_DEFINE macro.

  • Are you still seeing the same assertion in the crash log?


    This is what I get after increasing the stack size (if that's what you're asking)


    00:00:01.270,050] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/bluetooth/controller/hci_driver.c:311␍␊

    ⇥ SoftDevice Controller ASSERT: 53, 238␍␊

    ␍␊

    [00:00:07.404,357] <0x1b>[1;31m<err> os: ***** HARD FAULT *****<0x1b>[0m␍␊

    [00:00:07.404,388] <0x1b>[1;31m<err> os: Fault escalation (see below)<0x1b>[0m␍␊

    [00:00:07.404,388] <0x1b>[1;31m<err> os: ARCH_EXCEPT with reason 4␊

    <0x1b>[0m␍␊

    [00:00:07.404,418] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00000137 r2/a3: 0x20000b60<0x1b>[0m␍␊

    [00:00:07.404,449] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00501bcf r14/lr: 0x000266bf<0x1b>[0m␍␊

    [00:00:07.404,449] <0x1b>[1;31m<err> os: xpsr: 0x21000018<0x1b>[0m␍␊

    [00:00:07.404,479] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

    [00:00:07.404,510] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

    [00:00:07.404,510] <0x1b>[1;31m<err> os: Fault during interrupt handling␊

    <0x1b>[0m␍␊

    [00:00:07.404,571] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    [00:00:07.475,585] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

    My central device is only implementing the GATT client for your service. The service itself is implemented on the GATT server with the BT_GATT_SERVICE_DEFINE macro.

    I thin I may be misunderstanding or miscommunicating something here.

    If I'm in the central_uart project you sent me (central is what I need to get going) there is no reference to BT_GATT_SERVICE_DEFINE anywhere. Yes, it's in the peripheral_uart project, but not in central, so I don't understand how your central_uart you are setting up the characteristics.

Reply
  • Are you still seeing the same assertion in the crash log?


    This is what I get after increasing the stack size (if that's what you're asking)


    00:00:01.270,050] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/bluetooth/controller/hci_driver.c:311␍␊

    ⇥ SoftDevice Controller ASSERT: 53, 238␍␊

    ␍␊

    [00:00:07.404,357] <0x1b>[1;31m<err> os: ***** HARD FAULT *****<0x1b>[0m␍␊

    [00:00:07.404,388] <0x1b>[1;31m<err> os: Fault escalation (see below)<0x1b>[0m␍␊

    [00:00:07.404,388] <0x1b>[1;31m<err> os: ARCH_EXCEPT with reason 4␊

    <0x1b>[0m␍␊

    [00:00:07.404,418] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00000137 r2/a3: 0x20000b60<0x1b>[0m␍␊

    [00:00:07.404,449] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00501bcf r14/lr: 0x000266bf<0x1b>[0m␍␊

    [00:00:07.404,449] <0x1b>[1;31m<err> os: xpsr: 0x21000018<0x1b>[0m␍␊

    [00:00:07.404,479] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

    [00:00:07.404,510] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

    [00:00:07.404,510] <0x1b>[1;31m<err> os: Fault during interrupt handling␊

    <0x1b>[0m␍␊

    [00:00:07.404,571] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    [00:00:07.475,585] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

    My central device is only implementing the GATT client for your service. The service itself is implemented on the GATT server with the BT_GATT_SERVICE_DEFINE macro.

    I thin I may be misunderstanding or miscommunicating something here.

    If I'm in the central_uart project you sent me (central is what I need to get going) there is no reference to BT_GATT_SERVICE_DEFINE anywhere. Yes, it's in the peripheral_uart project, but not in central, so I don't understand how your central_uart you are setting up the characteristics.

Children
  • This is what I get after increasing the stack size (if that's what you're asking)


    00:00:01.270,050] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    ASSERTION FAIL [0] @ WEST_TOPDIR/nrf/subsys/bluetooth/controller/hci_driver.c:311␍␊

    ⇥ SoftDevice Controller ASSERT: 53, 238␍␊

    Yes, this assertion is raised by the softdevice controller because it exceeded the allocated time for a scheduled timeslot event. Did you get this error while debugging the code? This issue typically happens if you single step through the code or resume execution after hitting a breakpoint. When the debugger halts the CPU, the bluetooth controller cannot meet its real time requirements.

    If I'm in the central_uart project you sent me (central is what I need to get going) there is no reference to BT_GATT_SERVICE_DEFINE anywhere. Yes, it's in the peripheral_uart project, but not in central, so I don't understand how your central_uart you are setting up the characteristics.

    In this case, the custom service/GATT server is implemented on the peripheral side, while the central device operates in the GATT client role. Therefore, the central device does not declare the service in its attribute table.

    https://academy.nordicsemi.com/courses/bluetooth-low-energy-fundamentals/lessons/lesson-1-bluetooth-low-energy-introduction/topic/att-gatt-data-representation-and-exchange/ 

  • OK, it is apparent I have a big gap (no pun intended) in my client side knowledge. As soon  as you say it though it is obvious that the client can't be expected to know what's going on in all the peripherals. I suspect the best bet here might be to work with your project and see if I can work out how it functions.

  • OK, and FWIW, yes I had a breakpoint and without it, I get this:

    Starting Bluetooth Central UART example␍␊

    [00:00:03.093,292] <0x1b>[0m<inf> fs_nvs: 2 Sectors of 4096 bytes<0x1b>[0m␍␊

    [00:00:03.093,292] <0x1b>[0m<inf> fs_nvs: alloc wra: 0, fb8<0x1b>[0m␍␊

    [00:00:03.093,322] <0x1b>[0m<inf> fs_nvs: data wra: 0, 88<0x1b>[0m␍␊

    [00:00:03.093,475] <0x1b>[0m<inf> bt_sdc_hci_driver: SoftDevice Controller build revision: ␍␊

    fe 2c f9 6a 7f 36 22 2e a0 79 c0 40 be 2c 03 20 |.,.j.6". .y.@.,. ␍␊

    40 c2 f3 32 |@..2 <0x1b>[0m␍␊

    [00:00:03.097,717] <0x1b>[0m<inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)<0x1b>[0m␍␊

    [00:00:03.097,747] <0x1b>[0m<inf> bt_hci_core: HW Variant: nRF52x (0x0002)<0x1b>[0m␍␊

    [00:00:03.097,778] <0x1b>[0m<inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 254.63788 Build 573996906<0x1b>[0m␍␊

    [00:00:03.098,480] <0x1b>[0m<inf> bt_hci_core: No ID address. App must call settings_load()<0x1b>[0m␍␊

    [00:00:03.098,510] <0x1b>[0m<inf> central_uart: Bluetooth initialized<0x1b>[0m␍␊

    [00:00:03.099,487] <0x1b>[0m<inf> bt_hci_core: Identity: D9:AD:8D:A9:1A:A2 (random)<0x1b>[0m␍␊

    [00:00:03.099,517] <0x1b>[0m<inf> bt_hci_core: HCI: version 6.0 (0x0e) revision 0x104e, manufacturer 0x0059<0x1b>[0m␍␊

    [00:00:03.099,578] <0x1b>[0m<inf> bt_hci_core: LMP: version 6.0 (0x0e) subver 0x104e<0x1b>[0m␍␊

    [00:00:03.100,952] <0x1b>[0m<inf> central_uart: Scan module initialized<0x1b>[0m␍␊

    [00:00:03.100,952] <0x1b>[0m<inf> central_uart: NUS Client module initialized<0x1b>[0m␍␊

    [00:00:03.105,987] <0x1b>[0m<inf> central_uart: Scanning successfully started<0x1b>[0m␍␊

    [00:00:03.414,184] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    [00:00:03.604,248] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

    [00:00:03.604,370] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

    [00:00:03.604,431] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

    [00:00:03.855,590] <0x1b>[1;33m<wrn> bt_conn: conn 0x20002640 failed to establish. RF noise?<0x1b>[0m␍␊

    [00:00:03.855,926] <0x1b>[1;33m<wrn> central_uart: MTU exchange failed (err 14)<0x1b>[0m␍␊

    [00:00:03.855,957] <0x1b>[0m<inf> central_uart: Service not found<0x1b>[0m␍␊

    [00:00:03.856,140] <0x1b>[0m<inf> central_uart: Disconnected: E8:A0:DB:24:21:62 (random), reason 0x3e <0x1b>[0m␍␊

    [00:00:03.987,030] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

    [00:00:04.183,074] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

    [00:00:04.183,166] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

    [00:00:04.183,258] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

    [00:00:04.334,655] <0x1b>[0m<inf> central_uart: MTU exchange done<0x1b>[0m␍␊

    ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433␍␊

    ⇥ invalid parameters␍␊

    ␍␊

    [00:00:05.684,509] <0x1b>[0m<inf> central_uart: Service discovery completed<0x1b>[0m␍␊

    [00:00:05.694,183] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00001539 r2/a3: 0x20000b60<0x1b>[0m␍␊

    [00:00:05.694,213] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00000000 r14/lr: 0x000217df<0x1b>[0m␍␊

    [00:00:05.694,213] <0x1b>[1;31m<err> os: xpsr: 0x21000000<0x1b>[0m␍␊

    [00:00:05.694,213] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

    [00:00:05.694,274] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

    [00:00:05.694,305] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    [00:00:05.747,711] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

  • It seems you are encountering the same assertion as before:

    DiBosco said:

    ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433␍␊

    ⇥ invalid parameters␍␊

    Can you place a breakpoint at this line to confirm that the conn pointer is not set? Also, were you able to reproduce this with the central project I uploaded?

  • OK, in gatt.c the pointer looks like it is NULL.

    That function really should have:

    if ((NULL != conn) && (NULL != params))

    at the start of that function to stop that issue.

    I think we should forget about this particular project though, because...

    ----------------------------------------------------------------------------------

    Your program does not do this, but although I can get comms in over the UART, write it to

    the characteristic

    in the remote node and I get notification it's been read by  the remote node

    becauswe the ble_data_sent function is called.


    However,  I never get the
    ble_data_received callback function being called, so

    either the data isn't being quite sent to the write characteristic properly in your app

    or something's wrong

    with the setup fior the read characteristic in your app. We know the remote node works correctly

    because it's been working with a Raspberry Pi just fine.

    I'm assuming that, somehow, when you set up the syatem to be able to write to

    BT_UUID_CUSTOM_TX

    That you are setting up some area of memoryt that is
    automatically the same size as the characteristic in our custom
    remote node?

    I did put this:

    CONFIG_BT_L2CAP_TX_MTU=247
    CONFIG_BT_BUF_ACL_RX_SIZE=251
    CONFIG_BT_BUF_ACL_TX_SIZE=251

    in prj.conf
    and changed:

    #define UART_BUF_SIZE 240

    from it being just 20 bytes long to allow
    for our larger characteristics.

    ----------------------------------------------------------------------------------

    In your program I am guessing via
    custom_client_init you somehow tell the system

    that if it finds the the service you are telling it to call ble_data_received. I'm again

    guessing that

    bt_custom_handles_assign does the magic of knitting them together
    in one of those
    memset lines somewhere? Or maybe:

    custom_c->handles.rx = gatt_desc->handle;

    But I could be wrong, I can't follow that
    code I'm afraid. What should be looking for here in
    an attempt to ascertain whether that is all set up correctly?
     

Related