Instruction Access Violation when sending data over Bluetooth

Hi Nordic Devzone,

I am trying out a custom board based on the nrf52840. I have implemented a few things already, mostly regarding basic bluetooth communication and gathering data from the sensors on the board. Everything has worked fine so far. Now, I would like to send the gathered data over to a phone app using a process developped on another project, so I adapted the code to work on zephyr, however, I keep getting the following error right after I send data over bluetooth

[00:00:58.587,066] <err> os: ***** MPU FAULT *****
[00:00:58.587,463] <err> os:   Instruction Access Violation
[00:00:58.587,951] <err> os: r0/a1:  0x20003260  r1/a2:  0x20009988  r2/a3:  0x00000000
[00:00:58.588,592] <err> os: r3/a4:  0x200099c0 r12/ip:  0x0000001e r14/lr:  0x0003df3d
[00:00:58.589,233] <err> os:  xpsr:  0x60000000
[00:00:58.589,721] <err> os: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x00000000
[00:00:58.590,484] <err> os: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0x00000000  s[ 7]:  0x00000000
[00:00:58.591,247] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x00000000  s[11]:  0x00000000
[00:00:58.592,010] <err> os: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00000000
[00:00:58.592,773] <err> os: fpscr:  0x2000d1d9
[00:00:58.593,200] <err> os: Faulting instruction address (r15/pc): 0x200099c0
[00:00:58.593,780] <err> os: >>> ZEPHYR FATAL ERROR 20: Unknown error on CPU 0
[00:00:58.594,360] <err> os: Current thread: 0x20003138 (BT RX)
[00:00:58.599,853] <err> fatal_error: Resetting system

I can't figure out why I get this error, because I use the same function to send data over bluetooth in other parts of the code and it works fine. I based my code off of the NUS service sample, here are some extracts of the parts that cause problems:

sync_manager.c :

uint8_t data[6];
                data[0] =
                    (uint8_t)
                        evt->area_id;  // Data char. ID (manual or automatic)
                data[1] = (uint8_t)(pkt_data[evt->area_id].timestamp & 0xFF);
                data[2] =
                    (uint8_t)((pkt_data[evt->area_id].timestamp >> 8) & 0xFF);
                data[3] =
                    (uint8_t)((pkt_data[evt->area_id].timestamp >> 16) & 0xFF);
                data[4] =
                    (uint8_t)((pkt_data[evt->area_id].timestamp >> 24) & 0xFF);

                if (evt->area_id == AREA_STORAGE) {
                    data[5] = pkt_data[evt->area_id].end_reason;
                } else {
                    data[5] = (uint8_t)ds_get_setting(DS_END_PROCESSED_REAS_INDEX);
                }

                ble_eqss_control_queue(evt->area_id == AREA_STORAGE ? PERIPHERAL_SYNC_SESSION_REQ : PERIPHERAL_SYNC_REQ, data, sizeof(data));

The error occurs when ble_eqss_control_queue is called. Its definition is the following :

static uint8_t _queue_command(uint8_t opcode, uint8_t *data, uint8_t length)
{
    if (length > EQS_CONTROL_DATA_MAX_LEN)
    {
        LOG_ERR("Data length too long");
        return 0;
    }
    else
    {
        uint8_t dest[EQS_CONTROL_DATA_MAX_LEN];
        memcpy(&dest[0], &opcode, sizeof(uint8_t));
        memcpy(&dest[1], &length, sizeof(uint8_t));
        memcpy(&dest[2], data, length);
        char str[60];
        sprintf(str, "Building Opcode %.2x - Length %.2x - Payload", opcode, length);
        LOG_HEXDUMP_INF(data, length, str);

        int8_t ret = control_send(NULL, dest, length + 2);
        if (ret == 0)
        {
            return 1;
        }
        else
        {
            LOG_ERR("Failed to send opcode %.2x (%i)", opcode, ret);
            return 0;
        }
    }
}

uint32_t ble_eqss_control_queue(uint8_t opcode, uint8_t* data, uint8_t len){
    return _queue_command(opcode, data, len);
}

And finally, control_send, which does the actual sending, is almost identical to bt_nus_send from the NUS sample:

int control_send(struct bt_conn *conn, const uint8_t *data, uint16_t len)
{
	struct bt_gatt_indicate_params params = {0};
	const struct bt_gatt_attr *attr = &eqs_service.attrs[2];

	params.attr = attr;
	params.data = data;
	params.len = len;
	params.func = control_on_sent;

	if (!conn)
	{
		LOG_DBG("Indication sent to all connected peers (control)");
		return bt_gatt_indicate(NULL, &params);
	}
	else if (bt_gatt_is_subscribed(conn, attr, BT_GATT_CCC_INDICATE))
	{
		return bt_gatt_indicate(conn, &params);
	}
	else
	{
		return -EINVAL;
	}
}

All these functions work properly in other parts of the code, but for some reason I got the previous error when using it in sync_manager, which is confusing, because I don't think I am using it any differently than before. Some examples:

uint8_t data[AREA_COUNT-1];
for (int ii = 0; ii < AREA_COUNT-1; ii++) {
    data[ii] = storage_buffer_get_nb_area(ii);
}
EQS_INFO("Number of sessions in RAW_IMU: %u - RAW_ECG: %u - PROCESSED: %u", data[AREA_RAW_IMU], data[AREA_RAW_ECG], data[AREA_PROCESSED]);
ble_eqss_control_queue(RESPONSE_BIT | PERIPHERAL_SYNC_MEM_STATUS, data, sizeof(data));

static void _resp_sys_serial()
{
    uint8_t temp[8] = {0};
    hwinfo_get_device_id(temp, 8);
    _queue_command((uint32_t)(PERIPHERAL_SYS_SERIAL | RESPONSE_BIT), temp, 8);
    return;
}

Using arm-none-eabi-addr2line to check the faulting instruction address reported in the error, got "??:0".

When I looked around I found that this type of error is often related to thread stack size, so I used the thread analyzer to see the thread stack usage. At some point, the main stack was reaching 95% stack usage, so I increased CONFIG_MAIN_STACK_SIZE to 4096. Because the error states that the thread in which the issue occurs is BT_RX, I also increased CONFIG_BT_RX_STACK_SIZE and CONFIG_BT_HCI_TX_STACK_SIZE to 4096 to be safe, but it did not solve the problem.

Do you have any idea of what I could try next to solve my issue please ?

Thanks in advance for your help.

Nicolas G.

Related