Zephyr crashing when I connect to a device

  has been helping me connect to a device and I am finally able to do that, but as soon as I do, Zephyr crashes.

This is with a lightly modified central_uart project

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: Scan module initialized<0x1b>[0m␍␊

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: NUS Client module initialized<0x1b>[0m␍␊

[00:00:00.029,846] <0x1b>[0m<inf> central_uart: Scanning successfully started<0x1b>[0m␍␊

[00:00:00.093,719] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

[00:00:00.289,886] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

[00:00:00.290,039] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

[00:00:00.290,100] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

[00:00:00.441,497] <0x1b>[0m<inf> central_uart: MTU exchange done<0x1b>[0m␍␊

ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433␍␊

⇥ invalid parameters␍␊

␍␊

[00:00:01.791,351] <0x1b>[0m<inf> central_uart: Service discovery completed<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00001539 r2/a3: 0x20000b60<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00000000 r14/lr: 0x000217df<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: xpsr: 0x21000000<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

[00:00:01.801,086] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

[00:00:01.801,116] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

[00:00:01.869,415] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

The program crashes here:

 

return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
addr->a.val[5], addr->a.val[4], addr->a.val[3],
addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

Which is in
addr.h (sic) Quite what code is doing in a header file is beyond me.

The prototype for the function is this:

int bt_addr_le_to_str(const bt_addr_le_t *addr, char *str, size_t len);

The values passed look reasonable to me.
addr->a.val[0]-[5] have the correct MAC address in it
str has (in decimal) 23 0 ... in it so that looks like it's just got a single character and anyway,
it shouldn't make a difference because, surely, that's being written into.

len = 30

type ends up with "random" in it.

I am assuming
snprintk is based on
snprint and so shouldn't copy more than len bytes anyway.

The string being used is that is passed into bt_addr_le_to_str is:
char addr[BT_ADDR_LE_STR_LEN];
and BT_ADDR_LE_STR_LEN is set to 30 so that all stacks up nicely.
Can anyone give me some pointers as to what might be wrong here?
BTW   that little project you sent me does not crash like this, but I can't
see where you set up the characteristics for that project, there is no reference to
BT_GATT_SERVICE_DEFINE anywhere in it. I would happily just mod that if I could work out how you've set up
its service and characteristics.
Thanks


Parents
  • Hello,

    ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433

    This error message shows that an assertion is raised in gatt.c at this line: https://github.com/nrfconnect/sdk-zephyr/blob/0bc3393fb112ec80ebeab48cd023d69b1e9db757/subsys/bluetooth/host/gatt.c#L5433 which suggests that the 'conn' pointer passed to your bt_gatt_subscribe() call is set to NULL. 

    hat little project you sent me does not crash like this, but I can't
    see where you set up the characteristics for that project, there is no reference to

    I copied the NUS service and NUS client source files and placed them in the project's source directory (I also renamed every occurrence of 'nus' to 'custom').

    Best regards,

    Vidar

  • Morning,

    I did look with the debugger at that pointer and that's not the issue.

    The crash doesn't come until snprintk in bt_addr_le_to_str

    and as I say, in the first post, The values in passed to:

    return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
    addr->a.val[5], addr->a.val[4], addr->a.val[3],
    addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

    Look fine:



    This smacks more of a stack overflow to me,
    but I've massively increased the main stack size in prj.conf
    to no or little avail.

    The message on crash changes slightly:

    [00:00:07.404,510] <0x1b>[1;31m<err> os: Fault during interrupt handling␊

    <0x1b>[0m␍␊

    [00:00:07.404,571] <0x1b>[1;31m<err> os: Current thread:

    0x20002568 (BT RX WQ)<0x1b>[0m␍␊

    This looks like it;s some sort bluetooth receive issue? However, all I'm doing is

    connecting, no actual data is being sent as yet.

    On the subject of your code, it's not the peripheral_uart I'm trying, it's the

    central_uart which doesn't have those definitions in it. .

     

  • DiBosco said:
    I can only assume that when modifying the code to allow for this new subscription I've messed something up.

    I’m not sure what could have triggered the unaligned memory access. The first thing I would do is look up the faulting instruction address to see which line of code it maps to. You can do this either by using arm-zephyr-eabi-addr2line or by placing a breakpoint at the address.

    With addr2line:

    1. Open a terminal in Vs code with the toolchain env set:

    2. Enter the following command from the "nRF connect" terminal

    In my case:

    DiBosco said:
    Would be great to have one of those take the traffic and send it to Cutecom to see what's going on at the same time I'm trying to use the BLE "UART"

    Unfortunately, the second COM port on the nRF52840 DK is not connected to the nRF chip on this board. Otherwise, you could have created a board overlay to direct logs to the second UARTE instance.

    DiBosco said:
    I do, of course, realise this and I also know that coding style varies massively from person to person. I do appreciate that some people even like K&R style indentation.

    I don’t have strong opinions on this to be honest, but I think it makes sense for us to follow the same style as Zephyr for consistency, even though there are always things that could be improved.

  • I've solved the crashing issue, that trick worked. 

    Am currently getting

    #define ENOTCONN 128 /* Socket is not connected */
    Error when I try to write to this:

    err = bt_custom_client_send(&custom_client, data_ptr, len);
    Even though, the system's telling me it's connected to the custom unit.

    Unfortunately, the second COM port on the nRF52840 DK is not connected to the nRF chip on this board. Otherwise, you could have created a board overlay to direct logs to the second UARTE instance.


    I don't understand what you're saying here.

    First question is:

    Assuming you have more than one UART, can you route the

    LOG_WRN & LOG_ERR
    traffic to that other UART? If so, how. How and where do you
    specify *which* UART
    is being used for the debug information?

    Also, I assume you can change between spitting out
    LOG_ERR and LOG_WRN and both? If so, how?


    Second question

    Assuming the above is possible:

    a. This second UART: what do you mean it's not connected to the nrf chip? Surely the second UART is a peripheral IN the nrf? Also, I thought this was a funky chip where you could connect any pin to any peripheral? So as long as the pins are accessible on the SIL headers I can connected up an FTDI  3V3 cable to the second UART just as I am to the first to get UART traffic in and out.


    b. There are clearly some virtual com ports on this DK, when the board is powered up via the USB  connection to the board, I get:

    crw-rw-rw- 1 root dialout 166, 0 Dec  4 19:50 /dev/ttyACM0
    crw-rw-rw- 1 root dialout 166, 1 Dec  4 19:50 /dev/ttyACM1


    Now, I know that the debug information comes out of this because at one point I somehow managed to open some sort of terminal in VSC and when

    CONFIG_LOG_BACKEND_UART=y

    Was in prj.conf

    I got the same serial info into VSC as I currently get into Cutecom.

    So is it not possible to use that VCP in Cutecom
    as the BACKEND UART? This would be the easiest and best way to
    do it.
    Just in case it matters, in the interest of readability I have split
    all the serial stuff into a new file, so when I not get a packet
    in over the UART in serial.c, I call this function in main.c

    void send_ble_data_to_xlrt(uint8_t *data_ptr, uint16_t len)
    {
    int err;
    err = bt_custom_client_send(&custom_client, data_ptr, len);
    if (err != NO_ERROR)
    {
    LOG_WRN("CUSTOM send timeout");
    }

    }
    This is where I get the error -128 not connected

    Eventually I'll put all the BLE stuff in another file
    so main is as it should be, a very basic file. Baby steps though.
  • Glad to hear you found the reason for the usage fault. Regarding the -ENOTCONN, could you check if it is returned here: 

    ? The 'conn' object should be assigned at the end of the bt_custom_handles_assign() function.

    Also, not directly related to this issue, but while reviewing this function again I noticed that I assigned the RX handle instead of the TX handle here:

    DiBosco said:
    I don't understand what you're saying here.

    Sorry for not being more clear. What I meant to say is that the second COM port (/dev/ttyACM1) exposed by the interface MCU isn't available on this board as there are no nRF pins routed to this UART - USB bridge on the board. You can of course hook up an external FTDI device to the pin headers, or simply use the nRF USB.

    From the nRF52840 DK user guide:

    (Ref. https://docs.nordicsemi.com/bundle/ug_nrf52840_dk/page/UG/dk/vir_com_port.html )

    To enable logging over the nRF USB port, you can try adding the following 2 overlays to the boards folder in your project source directory and do a pristine build.

    <PROJ_DIR>/boards/
    ├── nrf52840dk_nrf52840.conf
    └── nrf52840dk_nrf52840.overlay

    4555.nrf52840dk_nrf52840.overlay

    0636.nrf52840dk_nrf52840.conf

    DiBosco said:

    Assuming you have more than one UART, can you route the

    LOG_WRN & LOG_ERR
    traffic to that other UART? If so, how. How and where do you
    specify *which* UART

    The UART backend does not support individual filtering per device. I.e., you can't define one log level for one UART/USB instance and a different level for another.

  • Glad to hear you found the reason for the usage fault. Regarding the -ENOTCONN, could you check if it is returned here: 

    This morning  it's not giving me the same error and it's allegedly sending messages out to the remote unit. I haven't changed anything, so goodness knows what was going on last night. I do often find with this ecosystem is very flaky. I often have to clean and recompile because it sometimes just doesn't work after a normal compile and I often have to program twice because it fails the first time, so maybe it was the end of a long fruitless day and had I cleaned and rebuilt it would have been OK.

    Today, I have found out, from a colleague, that sending "status<CR><LF>" into the main RX characteristic puts the unit  I'm trying to communicate with into ATE mode.

    Previously, I've been trying to get this  this really complicated binary comms system it normally uses going. With this binary method the master is constantly sending out frames and expecting responses back and I had to use a specific program. So I was going:

    Program using PC_UART <-> nrf52840 bLE <-> Remote Unit BLE

    Now I can go:

    Cutecom on PC <->nrf52840 bLE <-> Remote Unit BLE

    And I don't have to worry about frames, it can all be ASCII based and simple. I can switch on the verbose errors and warnings in central_uart and read them on Cutecom along with any other data coming back over BLE from the custom unit.


    I have proven I can get into ATE on the custom unit mode over BLE by using BLE Scanner app on my phone and once I send  status<CR><LF> the unit goes into ATE mode, with the LCD on the front panel confirming it.

    However, if I send that via the central_uart app on the 52840DK the remote does not go into ATE mode. So, I hook up the DK running your central_uart app to Cutecom and I send

    status<CR><LF>

    The central_uart app is clearly getting that exact text in, I can see that by either mirroring the message back to Cutecom or stopping the app with the debugging and looking at what it received.

    Once the status request is received by UART into centrral_uart app, this is  called:

    err = bt_custom_client_send(&custom_client, data_ptr, len);


    I can see with the debugger that

    data_ptr points to:

    "status<CR><LF>"


    len is 8

    err = 0 is the response back from the bt_custom_client_send function which suggests is being sent to the remote unit.

    ble_data_sent
    is getting called , which I think is called when the

    central_uart app gets confirmation that the remote unit has read
    the charctersitic that has been written to locally?

    Is your central_uart app on the DK definitely writing to:

    #define BT_UUID_CUSTOM_SRV_TX_VAL \
    BT_UUID_128_ENCODE(0x651135b0, 0xdfe9, 0x45a2, 0x95c1, 0x46552a8a77e2)

    What do all these functions:

    uint8_t (*received_tx)(struct bt_custom_client *custom, const uint8_t *data, uint16_t len);
    uint8_t (*received_rx)(struct bt_custom_client *custom, const uint8_t *data, uint16_t len);
    void (*sent)(struct bt_custom_client *custom, uint8_t err, const uint8_t *data, uint16_t len);
    void (*unsubscribed)(struct bt_custom_client *custom);

    So and when should they be called? It looks to me,
    (although I might be wrong because all the callbacks
    within structures [sometimes] within further structures really
    are hard to read) that they are set up with

    struct bt_custom_client_init_param init = {
    .cb = {

    .received_tx = ble_data_received_tx,
    .received_rx = ble_data_received_rx,
    .sent = ble_data_sent,
    }
    };


    and

    err = bt_custom_client_init(&custom_client, &init);
    But there are four callbacks in the struct
    struct bt_custom_client_cb
    in custom_client.h and we on;y pass three here.
    So maybe they aren't really connected.

    So, I guess the big question is:

    If the central_uart app is 100% definitely getting the right
    string to send to the remote unit and the central_uart app claims
    that is has sent the string to the remote unit, and I even,
    seemingly get confirmation back that the remote unit has received
    something, how can we ascertain it's actually put the right
    data into the Tx characteristic? I would hook up the other DK and
    see what is received but of course that no longer connects because
    of all the new stuff we have in central now.
  • Sorry for not being more clear. What I meant to say is that the second COM port (/dev/ttyACM1) exposed by the interface MCU isn't available on this board as there are no nRF pins routed to this UART - USB bridge on the board. You can of course hook up an external FTDI device to the pin headers, or simply use the nRF USB.

    I still don't understand this.

    Can we route the

    LOG_ERR etc messages to a specific UART and have one UART for
    control comms to the PC that is being sent on via BLE and another
    purely for debug data, ie the data currently sent to
    LOG_ERR, LOG_WRN and LOG_INF?

    If so, surely we could route it to ttyACM0? I can see ttyAM0 is
    currently spitting out data because it shows exactly the same
    info that I am getting out of the UART I have connected to an
    FTDI lead with:

    CONFIG_LOG_BACKEND_UART=y
Reply
  • Sorry for not being more clear. What I meant to say is that the second COM port (/dev/ttyACM1) exposed by the interface MCU isn't available on this board as there are no nRF pins routed to this UART - USB bridge on the board. You can of course hook up an external FTDI device to the pin headers, or simply use the nRF USB.

    I still don't understand this.

    Can we route the

    LOG_ERR etc messages to a specific UART and have one UART for
    control comms to the PC that is being sent on via BLE and another
    purely for debug data, ie the data currently sent to
    LOG_ERR, LOG_WRN and LOG_INF?

    If so, surely we could route it to ttyACM0? I can see ttyAM0 is
    currently spitting out data because it shows exactly the same
    info that I am getting out of the UART I have connected to an
    FTDI lead with:

    CONFIG_LOG_BACKEND_UART=y
Children
  • DiBosco said:

    ble_data_sent is getting called , which I think is called when the

    central_uart app gets confirmation that the remote unit has read
    the charctersitic that has been written to locally?

    Unless an ATT error is reported in the callback, the write command will have been sent. Did you check this error code?

    DiBosco said:
    Is your central_uart app on the DK definitely writing to:

    #define BT_UUID_CUSTOM_SRV_TX_VAL \

    Did you change the handle value from RX to TX as mentioned in my previous reply?

    DiBosco said:
    LOG_ERR etc messages to a specific UART and have one UART for
    control comms to the PC that is being sent on via BLE and another
    purely for debug data, ie the data currently sent to
    LOG_ERR, LOG_WRN and LOG_INF?

    If so, surely we could route it to ttyACM0? I can see ttyAM0 is
    currently spitting out data because it shows exactly the same

    You can select which device(s) the UART logger backend should use by modifying the zephyr,console property in the chosen node. The DK board sets this property to zephyr,console = &uart0 by default. Alternatively, if you want to have console (printf, printk) and log (LOG_*) output on separate devices, you can use the zephyr,log-uart property as shown in the Devicetree overlay I shared.  What is not possible is to have the Logger module to print debug messages to one device and only error messages to another. It seems like I misunderstood the question.

  • Unless an ATT error is reported in the callback, the write command will have been sent. Did you check this error code?

    If there was an error it would be spat out by this in ble_data_sent:

    if (err != 0)
    {
    LOG_WRN("ATT error code: 0x%02X", err);
    }

    And I see nothing there. If I stop the program there err is
    optimised out (and yes I have it building for debugging).
    Did you change the handle value from RX to TX as mentioned in my previous reply?
    Hmmm, no I did not, I missed that. <Scrolls back>

    Damnit, yes, that was not changed.

    OK, having changed that it now works in that I can put the
    remote unit into ATE mode. Thank whatever deity you might
    worship for that.

    AND, I get data back.

    That might have really broken the back of it finally.

    I have a couple of things I'm not sure about:

    ble_data_received_rx

    This is where I get stuff back in from the remote unit over BLE

    What is this for?

    ble_data_received_tx

    ------------------------------------------

    Also, one thing here in
    on_received_tx:

    return custom->cb.received_tx(custom, data, length);

    Why does this have to be done with the obfuscation
    of a callback? Why not call

    ble_data_received_rx

    directly? Or even better, simply do all the work in

    ble_data_received_tx

    Without having to call yet another function? Just
    seems like an extra layer of unnecessary complexity and code.

    The callbacks and their associated structs are a nightmare
    for reducing readability.

    Thanks!



  • I'm glad to hear that you are receiving responses now from your peripheral.

    DiBosco said:
    And I see nothing there. If I stop the program there err is
    optimised out (and yes I have it building for debugging).

    '-Og' still enables some optimizations. You can add CONFIG_NO_OPTIMIZATIONS=y to your prj.conf if you want to use '-O0'. Just make sure that you don't override this in your project build configuration in VS code.

    DiBosco said:
    ble_data_received_rx

    This is where I get stuff back in from the remote unit over BLE

    What is this for?

    ble_data_received_tx

    First one is invoked when receiving notifications from the RX characteristic and the other one is when you receive it from the TX characteristic. I don't know if your peripheral will actually send anything on the TX characteristic, but the screenshot from phone showed that both characteristics had the "notify" property. If not, I would have just updated the existing function to use the RX handle.

    DiBosco said:
    Why does this have to be done with the obfuscation
    of a callback? Why not call

    ble_data_received_rx

    directly? Or even better, simply do all the work in

    ble_data_received_tx

    Without having to call yet another function? Just
    seems like an extra layer of unnecessary complexity and code.

    The callbacks and their associated structs are a nightmare
    for reducing readability.

    The goal is to decouple the code and allow callbacks to be registered in the application. It was not designed to contain any application specific logic. I don't see any problems with changing this though. The custom_client.c file I made is more or less a copy of the nrf/subsys/bluetooth/services/nus_client.c file with some renaming and small changes I made along the way to try to match your GATT server.

  • The goal is to decouple the code and allow callbacks to be registered in the application. It was not designed to contain any application specific logic. I don't see any problems with changing this though. The custom_client.c file I made is more or less a copy of the nrf/subsys/bluetooth/services/nus_client.c file with some renaming and small changes I made along the way to try to match your GATT server.

    The thing that  doesn't make sense to me with that is that custom_client is full of custom code anyway, the UUIDs are unique to that header as is

    struct bt_custom_client_handles

    which depends on what characteristics are in a service.
    In other words,it's not a generic file that you can use with any
    service. If it was, I could more see the advantage in it.

    The likelihood is that I will go through and simplify as
    much of it as I can.
    I still don't understand a lot of this, and maybe tidying
    up the code as much as possible will help me with that.

    Thanks again for all the help, I'd never have got this
    working otherwise.
  • The Nordic UART Service (NUS) Client which the custom_client is based on is specifically made to interface with the Nordic UART Service. Any project that needs to work with this service can include this this implementation by enabling the CONFIG_BT_NUS_CLIENT setting, not just the central_uart sample. If you want, you can remove the need for the nested callbacks in custom_client by integrating the code into your application.

Related