NRF 52840 Crash after bt_nus_client_send when attempting to call sensor_sample_fetch

I changed the central UART example to work with a demo I'm writing. I removed the hardware UART part and only send  messages to a peripheral. What I do is I measure different values from sensors using I2C, then send the values over BT UART. the issue is I'm sending 4 strings in a row then go measure again, but the example always fails and restarts after it sends with the following output: 

 [00:00:11.341,430] <err> mpsl_init: MPSL ASSERT: 112, 2152
00> [00:00:11.341,888] <err> os: ***** HARD FAULT *****
00> [00:00:11.342,285] <err> os:   Fault escalation (see below)
00> [00:00:11.342,742] <err> os: ARCH_EXCEPT with reason 3
00> [00:00:11.343,200] <err> os: r0/a1:  0x00000003  r1/a2:  0x20005388  r2/a3:  0x0003c878
00> [00:00:11.343,841] <err> os: r3/a4:  0x0003527d r12/ip:  0x20014b14 r14/lr:  0x000168a1
00> [00:00:11.344,482] <err> os:  xpsr:  0x41000018
00> [00:00:11.344,909] <err> os: Faulting instruction address (r15/pc): 0x0002942c
00> [00:00:11.345,458] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
00> [00:00:11.346,008] <err> os: Fault during interrupt handling

I read in another issue raised by someone else, that this issue is related to HFCLK and tried the fix posted by NRF support

adding this :

NRF_TIMER1->BITMODE = TIMER_BITMODE_BITMODE_32Bit << TIMER_BITMODE_BITMODE_Pos;
NRF_TIMER1->TASKS_CLEAR = 1;
NRF_CLOCK->EVENTS_HFCLKSTARTED = 0;
NRF_TIMER1->TASKS_START = 1;
NRF_CLOCK->TASKS_HFCLKSTART = 1;
while(NRF_CLOCK->EVENTS_HFCLKSTARTED == 0);
NRF_TIMER1->TASKS_CAPTURE[0] = 1;

before calling

bt_enable

and my output was:

00>
00> HF Clock has started. Startup time: 344 uS

I read that a bad startup time is 1.5 ms and more, so this seems good?

I also get this error some times:

00> [00:05:50.469,024] <err> os: ***** BUS FAULT *****
00> [00:05:50.469,421] <err> os:   Precise data bus error
00> [00:05:50.469,848] <err> os:   BFAR Address: 0x302d3b3e
00> [00:05:50.470,336] <err> os: r0/a1:  0x302d3b36  r1/a2:  0x20005388  r2/a3:  0x00000000
00> [00:05:50.471,008] <err> os: r3/a4:  0x2000530c r12/ip:  0x0003d579 r14/lr:  0x0001382d
00> [00:05:50.471,649] <err> os:  xpsr:  0x81000000
00> [00:05:50.472,106] <err> os: Faulting instruction address (r15/pc): 0x00013a2a
00> [00:05:50.472,656] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
00> [00:05:50.473,205] <err> os: Current thread: 0x200020a0 (main)
00> [00:05:50.478,698] [1;31m<err> fatal_error: Resetting system
00> *** Booting Zephyr OS build v3.0.99-ncs1  ***

I tried by using NCS 2.0.0 and 2.1.0 and the issue is in both. I also read the same issues.

I tried to step through using Debug and breakpoints after sending and every step over or step into crashes too.

Here is my changes from the UART central example:

instead of main, I moved it to this function:

void ble_central_begin()
{
int err;

err = bt_conn_auth_cb_register(&conn_auth_callbacks);
if (err) {
LOG_ERR("Failed to register authorization callbacks.");
return;
}

err = bt_conn_auth_info_cb_register(&conn_auth_info_callbacks);
if (err) {
printk("Failed to register authorization info callbacks.\n");
return;
}
NRF_TIMER1->BITMODE = TIMER_BITMODE_BITMODE_32Bit << TIMER_BITMODE_BITMODE_Pos;
NRF_TIMER1->TASKS_CLEAR = 1;
NRF_CLOCK->EVENTS_HFCLKSTARTED = 0;
NRF_TIMER1->TASKS_START = 1;
NRF_CLOCK->TASKS_HFCLKSTART = 1;
while(NRF_CLOCK->EVENTS_HFCLKSTARTED == 0);
NRF_TIMER1->TASKS_CAPTURE[0] = 1;

printk("HF Clock has started. Startup time: %d uS\n", NRF_TIMER1->CC[0]);
err = bt_enable(NULL);
if (err) {
LOG_ERR("Bluetooth init failed (err %d)", err);
return;
}
LOG_INF("Bluetooth initialized");

if (IS_ENABLED(CONFIG_SETTINGS)) {
settings_load();
}

int (*module_init[])(void) = {scan_init, nus_client_init};
for (size_t i = 0; i < ARRAY_SIZE(module_init); i++) {
err = (*module_init[i])();
if (err) {
return;
}
}

printk("Starting Bluetooth Central UART example\n");


err = bt_scan_start(BT_SCAN_TYPE_SCAN_ACTIVE);
if (err) {
LOG_ERR("Scanning failed to start (err %d)", err);
return;
}

LOG_INF("Scanning successfully started");

}

BLE Connects fine so I doubt that there is an issue here. all the callback functions are unchanged from the example except for the following:

static int scan_init(void)
{
int err;
struct bt_scan_init_param scan_init = {
.connect_if_match = 1,
};

bt_scan_init(&scan_init);
bt_scan_cb_register(&scan_cb);

err = bt_scan_filter_add(BT_SCAN_FILTER_TYPE_NAME, "UART Service");
if (err) {
LOG_ERR("Scanning filters cannot be set (err %d)", err);
return err;
}

err = bt_scan_filter_enable(BT_SCAN_NAME_FILTER, false);
if (err) {
LOG_ERR("Filters cannot be turned on (err %d)", err);
return err;
}

LOG_INF("Scan module initialized");
return err;
}

I am searching by name instead of service

All the HW Uart stuff is removed. and When I receive data, by BLE I just ignore it, Since I only want to send.

My send message is the following:

void bt_central_send(char * payload) {
int err = bt_nus_client_send(&nus_client,payload, sizeof(payload));
if (err) {
LOG_WRN("Failed to send data over BLE connection"
"(err %d)", err);
}

err = k_sem_take(&nus_write_sem, NUS_WRITE_TIMEOUT);
if (err) {
LOG_WRN("NUS send timeout");
}
}

The same but instead of sending from UART Call this with CHAR * payload

I manage to send data correctly, and I receive it correctly on the other side, and sending  multiple values in a row is fine, it only breaks after sending. the first thing called after sending all the strings is a delay statement (k_busy_wait(), or k_msleep()), and the device just crashes afterwards.

I would really appreciate your help because I am completely lost.

Thank you in advance.

Edit: Managed to get more details since I moved to windows in hopes to solve it.  I was using MACOS.

I ran the debugger and set the breakpoint at fatal errors. The result is the following:

Thread 7 hit Breakpoint 2, k_sys_fatal_error_handler (reason=0, esf=0x20021544 <z_interrupt_stacks+32708>) at C:/ncs/v2.0.0/nrf/lib/fatal_error/fatal_error.c:23

Call stack:

The  [00:00:11.341,430] <err> mpsl_init: MPSL ASSERT: 112, 2152 erros stopped coming up by moving to windows, but now I only get this error:

00> [00:01:20.850,524] <err> os: ***** BUS FAULT *****
00> [00:01:20.850,555] <err> os:   Precise data bus error
00> [00:01:20.850,555] <err> os:   BFAR Address: 0x322d3b3d
00> [00:01:20.850,585] <err> os: r0/a1:  0x322d3b35  r1/a2:  0x00000000  r2/a3:  0x00000000
00> [00:01:20.850,616] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x00014639
00> [00:01:20.850,616] <err> os:  xpsr:  0x41000000
00> [00:01:20.850,646] <err> os: Faulting instruction address (r15/pc): 0x00014836
00> [00:01:20.850,646] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
00> [00:01:20.850,677] <err> os: Current thread: 0x200025a8 (main)
00> [00:01:21.098,541] <err> fatal_error: Resetting system

I also want to clarify how the error happens. I have a loop that loops 5 times and each time calls the bt_central_send. what happens in the loop is:

use k_malloc to allocate memory for char buffer, use strcpy and strcat from sting.h to for the string. Then calles the send function with the string, and after the function is done it calls k_free on the buffer, then repeat until the 5 trings are sent. The receiver gets all the strings, they are all correct, and the function that sends returns, and its back to the main thread, the next thing after the function in the main thread is a k_msleep statement. I hope this info is enough to help trace the error.

Adding also: Sorry bare with me I am new to this.

I back-traced the error by doing this in the debug terminal:

-exec set $exc_frame = ($lr & 0x4) ? $psp : $msp
-exec set $stacked_xpsr = ((uint32_t *)$exc_frame)[7]
-exec set $exc_frame_len = 32 + (($stacked_xpsr & (1 << 9)) ? 0x4 : 0x0) + (($lr & 0x10) ? 0 : 72)
-exec set $sp =($exc_frame + $exc_frame_len)
-exec set $lr =((uint32_t *)$exc_frame)[5]
-exec set $pc =((uint32_t *)$exc_frame)[6]

-exec backtrace

The output was the following:

#0 z_impl_sensor_sample_fetch (dev=0xfffffeb0) at C:/ncs/v2.0.0/zephyr/include/zephyr/drivers/sensor.h:510

#1 sensor_sample_fetch (dev=0xfffffeb0) at zephyr/include/generated/syscalls/sensor.h:85

#2 pressure_sensor_read () at ../src/sensor_app.c:130

#3 0x0001475c in main () at ../src/main.c:64

which is right after the delay I mentioned before, so it goad like this:

Read pressure & temp sensor ->

Read Humidity and temp sensor ->

Read accel sensor ->

Call function to check values and send ->

BT sends the 5 strings correctly ->

k_msleep(for some time) ->

Read pressure & temp sensor -> (Only breaks if BT Send was used before, this is main():64)

    64:    pressure_sensor_read(); // main():64

It goes to this:

int pressure_sensor_read(){
    int rc = sensor_sample_fetch(pressure_sensor); // this is /src/sensor_app.c:130
    if(rc != 0){
        printk("pressure read error %d", rc);
        return -1;
    }
    rc *= sensor_channel_get(pressure_sensor, SENSOR_CHAN_AMBIENT_TEMP,&temperatureP);
    rc *= sensor_channel_get(pressure_sensor, SENSOR_CHAN_PRESS,&pressure);
    if(rc != 0){
        return -1;
    }else{
        return 1;
    }

which also goas to:

static inline int sensor_sample_fetch(const struct device * dev)
{
#ifdef CONFIG_USERSPACE
    if (z_syscall_trap()) {
        union { uintptr_t x; const struct device * val; } parm0 = { .val = dev };
        return (int) arch_syscall_invoke1(parm0.x, K_SYSCALL_SENSOR_SAMPLE_FETCH);
    }
#endif
    compiler_barrier();
    return z_impl_sensor_sample_fetch(dev); //1 sensor_sample_fetch (dev=0xfffffeb0) at zephyr/include/generated/syscalls/sensor.h:85
}

This is in zephyr sensor.h driver, I did not change this

Finaly I reached:

static inline int z_impl_sensor_sample_fetch(const struct device *dev)
{
    const struct sensor_driver_api *api = //#0 z_impl_sensor_sample_fetch (dev=0xfffffeb0) atC:/ncs/v2.0.0/zephyr/include/zephyr/drivers/sensor.h:510
        (const struct sensor_driver_api *)dev->api;

    return api->sample_fetch(dev, SENSOR_CHAN_ALL);
}

which is confiremed by looking at the error output:

00> [00:02:31.014,923] <err> os: Faulting instruction address (r15/pc): 0x0001495a   //This instruction address is presistant now

if I look at the zephyr.lst file i see:

00014954 <pressure_sensor_read>:
int pressure_sensor_read(){
   14954:    b538          push    {r3, r4, r5, lr}
    int rc = sensor_sample_fetch(pressure_sensor);
   14956:    4b13          ldr    r3, [pc, #76]    ; (149a4 <pressure_sensor_read+0x50>)
   14958:    6818          ldr    r0, [r3, #0]
    const struct sensor_driver_api *api =
   1495a:    6883          ldr    r3, [r0, #8]
    return api->sample_fetch(dev, SENSOR_CHAN_ALL);

The sensor driver is working fine as long as I don't send any BT messages. so the same section of code is called over and over without issues, I can do thousands of reads and print on the terminal and do all sorts of stuff, but If i use the ble central uart and send this happens.

The sensor driver Im using is

    dps310@77 {
        compatible = "infineon,dps310";
        status = "okay";
        reg = <0x77>;
        label = "DPS310";

    };

which is shipped with the framework without any edits from me.

I would really appricate any help.

Edit:

I managed to fix !!

what I was doing is I was calling device_get_binding("device name") at the start of the application

so I had a sensor init function that included this:

pressure_sensor = device_get_binding("DPS310");

Without BLE calling this once was fine, but once I started using BLE, for some reason I had to call this everytime before I called

sensor_sample_fetch(pressure_sensor);

which is wierd at least to me.

My code works fine now, but I am still wondering why the original code didn't

I hope I get some explination on what is going on.

I found it and manged to return my code to normal. and also managed to feel stupid afterward.

I had a buffer that stored the Strings and wrote them to BLE, but the issue was I was exceeding the buffer size before sending. Nothing to do with  BLE, I had a single bad index value that was writing to the wrong location, and did not trigger a fault right away. I figured this out using the debug console. maybe what I am missing is I need to take a break.

Thanks NRF for existing I guess? sorry to bother you.

Parents Reply Children
No Data
Related