Hi Nordic Devzone,
I am trying out a custom board based on the nrf52840. I have implemented a few things already, mostly regarding basic bluetooth communication and gathering data from the sensors on the board. Everything has worked fine so far. Now, I would like to send the gathered data over to a phone app using a process developped on another project, so I adapted the code to work on zephyr, however, I keep getting the following error right after I send data over bluetooth
[00:00:58.587,066] <err> os: ***** MPU FAULT ***** [00:00:58.587,463] <err> os: Instruction Access Violation [00:00:58.587,951] <err> os: r0/a1: 0x20003260 r1/a2: 0x20009988 r2/a3: 0x00000000 [00:00:58.588,592] <err> os: r3/a4: 0x200099c0 r12/ip: 0x0000001e r14/lr: 0x0003df3d [00:00:58.589,233] <err> os: xpsr: 0x60000000 [00:00:58.589,721] <err> os: s[ 0]: 0x00000000 s[ 1]: 0x00000000 s[ 2]: 0x00000000 s[ 3]: 0x00000000 [00:00:58.590,484] <err> os: s[ 4]: 0x00000000 s[ 5]: 0x00000000 s[ 6]: 0x00000000 s[ 7]: 0x00000000 [00:00:58.591,247] <err> os: s[ 8]: 0x00000000 s[ 9]: 0x00000000 s[10]: 0x00000000 s[11]: 0x00000000 [00:00:58.592,010] <err> os: s[12]: 0x00000000 s[13]: 0x00000000 s[14]: 0x00000000 s[15]: 0x00000000 [00:00:58.592,773] <err> os: fpscr: 0x2000d1d9 [00:00:58.593,200] <err> os: Faulting instruction address (r15/pc): 0x200099c0 [00:00:58.593,780] <err> os: >>> ZEPHYR FATAL ERROR 20: Unknown error on CPU 0 [00:00:58.594,360] <err> os: Current thread: 0x20003138 (BT RX) [00:00:58.599,853] <err> fatal_error: Resetting system
I can't figure out why I get this error, because I use the same function to send data over bluetooth in other parts of the code and it works fine. I based my code off of the NUS service sample, here are some extracts of the parts that cause problems:
sync_manager.c :
uint8_t data[6]; data[0] = (uint8_t) evt->area_id; // Data char. ID (manual or automatic) data[1] = (uint8_t)(pkt_data[evt->area_id].timestamp & 0xFF); data[2] = (uint8_t)((pkt_data[evt->area_id].timestamp >> 8) & 0xFF); data[3] = (uint8_t)((pkt_data[evt->area_id].timestamp >> 16) & 0xFF); data[4] = (uint8_t)((pkt_data[evt->area_id].timestamp >> 24) & 0xFF); if (evt->area_id == AREA_STORAGE) { data[5] = pkt_data[evt->area_id].end_reason; } else { data[5] = (uint8_t)ds_get_setting(DS_END_PROCESSED_REAS_INDEX); } ble_eqss_control_queue(evt->area_id == AREA_STORAGE ? PERIPHERAL_SYNC_SESSION_REQ : PERIPHERAL_SYNC_REQ, data, sizeof(data));
The error occurs when ble_eqss_control_queue is called. Its definition is the following :
static uint8_t _queue_command(uint8_t opcode, uint8_t *data, uint8_t length) { if (length > EQS_CONTROL_DATA_MAX_LEN) { LOG_ERR("Data length too long"); return 0; } else { uint8_t dest[EQS_CONTROL_DATA_MAX_LEN]; memcpy(&dest[0], &opcode, sizeof(uint8_t)); memcpy(&dest[1], &length, sizeof(uint8_t)); memcpy(&dest[2], data, length); char str[60]; sprintf(str, "Building Opcode %.2x - Length %.2x - Payload", opcode, length); LOG_HEXDUMP_INF(data, length, str); int8_t ret = control_send(NULL, dest, length + 2); if (ret == 0) { return 1; } else { LOG_ERR("Failed to send opcode %.2x (%i)", opcode, ret); return 0; } } } uint32_t ble_eqss_control_queue(uint8_t opcode, uint8_t* data, uint8_t len){ return _queue_command(opcode, data, len); }
And finally, control_send, which does the actual sending, is almost identical to bt_nus_send from the NUS sample:
int control_send(struct bt_conn *conn, const uint8_t *data, uint16_t len) { struct bt_gatt_indicate_params params = {0}; const struct bt_gatt_attr *attr = &eqs_service.attrs[2]; params.attr = attr; params.data = data; params.len = len; params.func = control_on_sent; if (!conn) { LOG_DBG("Indication sent to all connected peers (control)"); return bt_gatt_indicate(NULL, ¶ms); } else if (bt_gatt_is_subscribed(conn, attr, BT_GATT_CCC_INDICATE)) { return bt_gatt_indicate(conn, ¶ms); } else { return -EINVAL; } }
All these functions work properly in other parts of the code, but for some reason I got the previous error when using it in sync_manager, which is confusing, because I don't think I am using it any differently than before. Some examples:
uint8_t data[AREA_COUNT-1]; for (int ii = 0; ii < AREA_COUNT-1; ii++) { data[ii] = storage_buffer_get_nb_area(ii); } EQS_INFO("Number of sessions in RAW_IMU: %u - RAW_ECG: %u - PROCESSED: %u", data[AREA_RAW_IMU], data[AREA_RAW_ECG], data[AREA_PROCESSED]); ble_eqss_control_queue(RESPONSE_BIT | PERIPHERAL_SYNC_MEM_STATUS, data, sizeof(data));
static void _resp_sys_serial() { uint8_t temp[8] = {0}; hwinfo_get_device_id(temp, 8); _queue_command((uint32_t)(PERIPHERAL_SYS_SERIAL | RESPONSE_BIT), temp, 8); return; }
Using arm-none-eabi-addr2line to check the faulting instruction address reported in the error, got "??:0".
When I looked around I found that this type of error is often related to thread stack size, so I used the thread analyzer to see the thread stack usage. At some point, the main stack was reaching 95% stack usage, so I increased CONFIG_MAIN_STACK_SIZE to 4096. Because the error states that the thread in which the issue occurs is BT_RX, I also increased CONFIG_BT_RX_STACK_SIZE and CONFIG_BT_HCI_TX_STACK_SIZE to 4096 to be safe, but it did not solve the problem.
Do you have any idea of what I could try next to solve my issue please ?
Thanks in advance for your help.
Nicolas G.