Zephyr crashing when I connect to a device

Vidar Berg has been helping me connect to a device and I am finally able to do that, but as soon as I do, Zephyr crashes.

This is with a lightly modified central_uart project

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: Scan module initialized<0x1b>[0m␍␊

[00:00:00.024,810] <0x1b>[0m<inf> central_uart: NUS Client module initialized<0x1b>[0m␍␊

[00:00:00.029,846] <0x1b>[0m<inf> central_uart: Scanning successfully started<0x1b>[0m␍␊

[00:00:00.093,719] <0x1b>[0m<inf> central_uart: Filters matched. Address: E8:A0:DB:24:21:62 (random) connectable: 1<0x1b>[0m␍␊

[00:00:00.289,886] <0x1b>[0m<inf> central_uart: Connected: E8:A0:DB:24:21:62 (random)<0x1b>[0m␍␊

[00:00:00.290,039] <0x1b>[1;33m<wrn> central_uart: Failed to set security: -12<0x1b>[0m␍␊

[00:00:00.290,100] <0x1b>[1;31m<err> central_uart: Stop LE scan failed (err 0)<0x1b>[0m␍␊

[00:00:00.441,497] <0x1b>[0m<inf> central_uart: MTU exchange done<0x1b>[0m␍␊

ASSERTION FAIL [conn] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/gatt.c:5433␍␊

⇥ invalid parameters␍␊

␍␊

[00:00:01.791,351] <0x1b>[0m<inf> central_uart: Service discovery completed<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r0/a1: 0x00000004 r1/a2: 0x00001539 r2/a3: 0x20000b60<0x1b>[0m␍␊

[00:00:01.800,994] <0x1b>[1;31m<err> os: r3/a4: 0x00000004 r12/ip: 0x00000000 r14/lr: 0x000217df<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: xpsr: 0x21000000<0x1b>[0m␍␊

[00:00:01.801,025] <0x1b>[1;31m<err> os: Faulting instruction address (r15/pc): 0x00031072<0x1b>[0m␍␊

[00:00:01.801,086] <0x1b>[1;31m<err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0<0x1b>[0m␍␊

[00:00:01.801,116] <0x1b>[1;31m<err> os: Current thread: 0x20002568 (BT RX WQ)<0x1b>[0m␍␊

[00:00:01.869,415] <0x1b>[1;31m<err> os: Halting system<0x1b>[0m␍␊

The program crashes here:

 return snprintk(str, len, "%02X:%02X:%02X:%02X:%02X:%02X (%s)",
 addr->a.val[5], addr->a.val[4], addr->a.val[3],
 addr->a.val[2], addr->a.val[1], addr->a.val[0], type);

Which is in 
addr.h (sic) Quite what code is doing in a header file is beyond me. 

The prototype for the function is this:

int bt_addr_le_to_str(const bt_addr_le_t *addr, char *str, size_t len);

The values passed look reasonable to me.
addr->a.val[0]-[5] have the correct MAC address in it
str has (in decimal)  23 0 ... in it so that looks like it's just got a single character and anyway,
it shouldn't make a difference because, surely, that's being written into. 

len = 30

type ends up with "random" in it. 

I am assuming
snprintk is based on
snprint and so shouldn't copy more than len bytes anyway. 

The string being used is that is passed into bt_addr_le_to_str is:
 char addr[BT_ADDR_LE_STR_LEN];
and BT_ADDR_LE_STR_LEN is set to 30 so that all stacks up nicely.
Can anyone give me some pointers as to what might be wrong here?
BTW  Vidar Berg that little project you sent me does not crash like this, but I can't
see where you set up the characteristics for that project, there is no reference to
BT_GATT_SERVICE_DEFINE anywhere in it. I would happily just mod that if I could work out how you've set up
its service and characteristics.
Thanks

0 Vidar Berg 7 months ago in reply to DiBosco

DiBosco said:
I can only assume that when modifying the code to allow for this new subscription I've messed something up.

I’m not sure what could have triggered the unaligned memory access. The first thing I would do is look up the faulting instruction address to see which line of code it maps to. You can do this either by using arm-zephyr-eabi-addr2line or by placing a breakpoint at the address.

With addr2line:

1. Open a terminal in Vs code with the toolchain env set:

2. Enter the following command from the "nRF connect" terminal

In my case:

DiBosco said:
Would be great to have one of those take the traffic and send it to Cutecom to see what's going on at the same time I'm trying to use the BLE "UART"

Unfortunately, the second COM port on the nRF52840 DK is not connected to the nRF chip on this board. Otherwise, you could have created a board overlay to direct logs to the second UARTE instance.

DiBosco said:
I do, of course, realise this and I also know that coding style varies massively from person to person. I do appreciate that some people even like K&R style indentation.

I don’t have strong opinions on this to be honest, but I think it makes sense for us to follow the same style as Zephyr for consistency, even though there are always things that could be improved.
0 DiBosco 7 months ago in reply to Vidar Berg

I've solved the crashing issue, that trick worked.

Am currently getting

#define ENOTCONN 128 /* Socket is not connected */
Error when I try to write to this:

err = bt_custom_client_send(&custom_client, data_ptr, len);
Even though, the system's telling me it's connected to the custom unit.

Vidar Berg said:
Unfortunately, the second COM port on the nRF52840 DK is not connected to the nRF chip on this board. Otherwise, you could have created a board overlay to direct logs to the second UARTE instance.

I don't understand what you're saying here.

First question is:

Assuming you have more than one UART, can you route the

LOG_WRN & LOG_ERR

traffic to that other UART? If so, how. How and where do you

specify *which* UART

is being used for the debug information?

Also, I assume you can change between spitting out

LOG_ERR and LOG_WRN and both? If so, how?

Second question

Assuming the above is possible:

a. This second UART: what do you mean it's not connected to the nrf chip? Surely the second UART is a peripheral IN the nrf? Also, I thought this was a funky chip where you could connect any pin to any peripheral? So as long as the pins are accessible on the SIL headers I can connected up an FTDI 3V3 cable to the second UART just as I am to the first to get UART traffic in and out.

b. There are clearly some virtual com ports on this DK, when the board is powered up via the USB connection to the board, I get:

crw-rw-rw- 1 root dialout 166, 0 Dec 4 19:50 /dev/ttyACM0
crw-rw-rw- 1 root dialout 166, 1 Dec 4 19:50 /dev/ttyACM1

Now, I know that the debug information comes out of this because at one point I somehow managed to open some sort of terminal in VSC and when

CONFIG_LOG_BACKEND_UART=y

Was in prj.conf

I got the same serial info into VSC as I currently get into Cutecom.

So is it not possible to use that VCP in Cutecom

as the BACKEND UART? This would be the easiest and best way to

do it.

Just in case it matters, in the interest of readability I have split

all the serial stuff into a new file, so when I not get a packet

in over the UART in serial.c, I call this function in main.c

void send_ble_data_to_xlrt(uint8_t *data_ptr, uint16_t len)

{

int err;

err = bt_custom_client_send(&custom_client, data_ptr, len);

if (err != NO_ERROR)

{

LOG_WRN("CUSTOM send timeout");

}

}
This is where I get the error -128 not connected

Eventually I'll put all the BLE stuff in another file

so main is as it should be, a very basic file. Baby steps though.
0 Vidar Berg 7 months ago in reply to DiBosco

Glad to hear you found the reason for the usage fault. Regarding the -ENOTCONN, could you check if it is returned here:

? The 'conn' object should be assigned at the end of the bt_custom_handles_assign() function.

Also, not directly related to this issue, but while reviewing this function again I noticed that I assigned the RX handle instead of the TX handle here:

DiBosco said:
I don't understand what you're saying here.

Sorry for not being more clear. What I meant to say is that the second COM port (/dev/ttyACM1) exposed by the interface MCU isn't available on this board as there are no nRF pins routed to this UART - USB bridge on the board. You can of course hook up an external FTDI device to the pin headers, or simply use the nRF USB.

From the nRF52840 DK user guide:

(Ref. https://docs.nordicsemi.com/bundle/ug_nrf52840_dk/page/UG/dk/vir_com_port.html )

To enable logging over the nRF USB port, you can try adding the following 2 overlays to the boards folder in your project source directory and do a pristine build.

<PROJ_DIR>/boards/
├── nrf52840dk_nrf52840.conf
└── nrf52840dk_nrf52840.overlay

4555.nrf52840dk_nrf52840.overlay

0636.nrf52840dk_nrf52840.conf

DiBosco said:
Assuming you have more than one UART, can you route the

LOG_WRN & LOG_ERR

traffic to that other UART? If so, how. How and where do you

specify *which* UART

The UART backend does not support individual filtering per device. I.e., you can't define one log level for one UART/USB instance and a different level for another.
0 DiBosco 7 months ago in reply to Vidar Berg

Vidar Berg said:
Glad to hear you found the reason for the usage fault. Regarding the -ENOTCONN, could you check if it is returned here:

This morning it's not giving me the same error and it's allegedly sending messages out to the remote unit. I haven't changed anything, so goodness knows what was going on last night. I do often find with this ecosystem is very flaky. I often have to clean and recompile because it sometimes just doesn't work after a normal compile and I often have to program twice because it fails the first time, so maybe it was the end of a long fruitless day and had I cleaned and rebuilt it would have been OK.

Today, I have found out, from a colleague, that sending "status<CR><LF>" into the main RX characteristic puts the unit I'm trying to communicate with into ATE mode.

Previously, I've been trying to get this this really complicated binary comms system it normally uses going. With this binary method the master is constantly sending out frames and expecting responses back and I had to use a specific program. So I was going:

Program using PC_UART <-> nrf52840 bLE <-> Remote Unit BLE

Now I can go:

Cutecom on PC <->nrf52840 bLE <-> Remote Unit BLE

And I don't have to worry about frames, it can all be ASCII based and simple. I can switch on the verbose errors and warnings in central_uart and read them on Cutecom along with any other data coming back over BLE from the custom unit.

I have proven I can get into ATE on the custom unit mode over BLE by using BLE Scanner app on my phone and once I send status<CR><LF> the unit goes into ATE mode, with the LCD on the front panel confirming it.

However, if I send that via the central_uart app on the 52840DK the remote does not go into ATE mode. So, I hook up the DK running your central_uart app to Cutecom and I send

status<CR><LF>

The central_uart app is clearly getting that exact text in, I can see that by either mirroring the message back to Cutecom or stopping the app with the debugging and looking at what it received.

Once the status request is received by UART into centrral_uart app, this is called:

err = bt_custom_client_send(&custom_client, data_ptr, len);

I can see with the debugger that

data_ptr points to:

"status<CR><LF>"

len is 8

err = 0 is the response back from the bt_custom_client_send function which suggests is being sent to the remote unit.

ble_data_sent is getting called , which I think is called when the

central_uart app gets confirmation that the remote unit has read

the charctersitic that has been written to locally?

Is your central_uart app on the DK definitely writing to:

#define BT_UUID_CUSTOM_SRV_TX_VAL \

BT_UUID_128_ENCODE(0x651135b0, 0xdfe9, 0x45a2, 0x95c1, 0x46552a8a77e2)

What do all these functions:

uint8_t (*received_tx)(struct bt_custom_client *custom, const uint8_t *data, uint16_t len);

uint8_t (*received_rx)(struct bt_custom_client *custom, const uint8_t *data, uint16_t len);

void (*sent)(struct bt_custom_client *custom, uint8_t err, const uint8_t *data, uint16_t len);

void (*unsubscribed)(struct bt_custom_client *custom);

So and when should they be called? It looks to me,

(although I might be wrong because all the callbacks

within structures [sometimes] within further structures really
are hard to read) that they are set up with

struct bt_custom_client_init_param init = {

.cb = {

.received_tx = ble_data_received_tx,

.received_rx = ble_data_received_rx,

.sent = ble_data_sent,

}

};

and

err = bt_custom_client_init(&custom_client, &init);

But there are four callbacks in the struct

struct bt_custom_client_cb

in custom_client.h and we on;y pass three here.

So maybe they aren't really connected.

So, I guess the big question is:

If the central_uart app is 100% definitely getting the right

string to send to the remote unit and the central_uart app claims
that is has sent the string to the remote unit, and I even,

seemingly get confirmation back that the remote unit has received
something, how can we ascertain it's actually put the right
data into the Tx characteristic? I would hook up the other DK and
see what is received but of course that no longer connects because
of all the new stuff we have in central now.
0 DiBosco 7 months ago in reply to DiBosco

Vidar Berg said:
Sorry for not being more clear. What I meant to say is that the second COM port (/dev/ttyACM1) exposed by the interface MCU isn't available on this board as there are no nRF pins routed to this UART - USB bridge on the board. You can of course hook up an external FTDI device to the pin headers, or simply use the nRF USB.

I still don't understand this.

Can we route the

LOG_ERR etc messages to a specific UART and have one UART for
control comms to the PC that is being sent on via BLE and another

purely for debug data, ie the data currently sent to
LOG_ERR, LOG_WRN and LOG_INF?

If so, surely we could route it to ttyACM0? I can see ttyAM0 is
currently spitting out data because it shows exactly the same

info that I am getting out of the UART I have connected to an

FTDI lead with:

CONFIG_LOG_BACKEND_UART=y