Data Bus Error

Question

Hello,

IDE - Visual Studio Code

SDK version - 2.9.0

Computer platform - Win11

I'm at the early stages of a project which first requires to send ~10K of data to the nrf52dk (configured as 52832) in a series of slices between 300 to ~2,000 bytes per slice. This is done as write without response so there are three stages each with a separate characteristic:

1. Start transfer sending the total number of bytes which will eb sent

2.Repeated transfers until the number of bytes is reached

3. End transfer detailign how many bytes will be expanded to using LZ4 (though this is not implemented yet)

I've got a GATT characteristic for each item:

Start Transfer

/**
 * @brief This is called by the GATT service definition, it contains the number of bytes to receieve
 * as a 32 bit little endian unsigned number
 */
static ssize_t start_vid_trans(struct bt_conn *conn,
			  const struct bt_gatt_attr *attr,
			  const void *buf,
			  uint16_t len,
			  uint16_t offset, uint8_t flags)
{
	k_mutex_lock(&trans_mutex, K_FOREVER);
	LOG_INF("Attribute write, handle: %u, conn: %p", attr->handle,
		(void *)conn);

    uint8_t *comp_size_data = (uint8_t *) buf;
    uint32_t num_bytes = comp_size_data[0] | (comp_size_data[1] << 8) | (comp_size_data[2] << 16) | (comp_size_data[3] << 24);
    LOG_DBG("num bytes is %" PRIu32 , num_bytes);
	bw_callbacs.bw_start_trans(num_bytes);
	LOG_INF("returning success");
	return BT_GATT_ERR(BT_ATT_ERR_SUCCESS);
}

This calls a separate method which mallocs the memory for the data:

/**
 * @brief This is called to notify that a series of bytes will be sent
 * which will represent a compressed image
 */
static int app_bw_start_trans(uint32_t compressed_size){

	if(image_data != NULL){
		cleanup_image_data();
	}
    num_compressed_bytes = compressed_size;
	LOG_INF("Allocating memory to image_data with %" PRIu32 " bytes", num_compressed_bytes);
    image_data = k_malloc(num_compressed_bytes * sizeof(uint8_t));
	memset(image_data, 0, compressed_size);
	LOG_INF("Allocated memory to image_data");
	if (image_data == NULL){
		LOG_WRN("Failed to allocate memory to new image_data buffer");
	}
	
	return 0;
}

Then the data is repeatedly transferred and the image_data variable has the data appended to it until we reach the end:

/**
 * @brief This is called by the GATT service definition,it contains the number of 
 * bytes of the uncompressed data as a 32 bit little endian unsigned number
 */
static ssize_t end_vid_trans(struct bt_conn *conn,
			  const struct bt_gatt_attr *attr,
			  const void *buf,
			  uint16_t len,
			  uint16_t offset, uint8_t flags)
{

	LOG_INF("Attribute write, handle: %u, conn: %p", attr->handle,
		(void *)conn);
        
    uint8_t *uncomp_size_data = (uint8_t *) buf;
    uint32_t uncompressed_size = uncomp_size_data[0] | (uncomp_size_data[1] << 8) | (uncomp_size_data[2] << 16) | (uncomp_size_data[3] << 24);
    LOG_DBG("we have %" PRIu32 " uncompressed bytes", uncompressed_size);

	bw_callbacs.bw_end_trans(uncompressed_size);
	k_mutex_unlock(&trans_mutex);
	LOG_INF("returning success");
	return BT_GATT_ERR(BT_ATT_ERR_SUCCESS);

}

The end basically just frees up the image_data memory (we'll add the implementation to actually process the data later):

/**
 * @brief This is called at the end of data trasmit and will prompt the image to stored
 * in permanent memory
 */
static int app_bw_end_trans(uint32_t uncompressed_size){
    LOG_INF("End trans with %" PRIu32 " bytes", uncompressed_size);
	
	LOG_INF("cleaning image data");
	k_free(image_data);
	LOG_INF("freed image data memory");
	image_data = NULL;
    image_offset = 0;
	LOG_INF("set image offset to zero");
	
	return 0;
}

When I execute this I have success for the first ~5K but then I get the following error:

[00:00:16.530,242] <err> os: ***** BUS FAULT *****
[00:00:16.530,303] <err> os:   Precise data bus error
[00:00:16.530,334] <err> os:   BFAR Address: 0x3e00bd1e
[00:00:16.530,364] <err> os: r0/a1:  0x3e00bd1e  r1/a2:  0x20007f00  r2/a3:  0x784283a5
[00:00:16.530,395] <err> os: r3/a4:  0x00000040 r12/ip:  0x00000004 r14/lr:  0x00021207
[00:00:16.530,426] <err> os:  xpsr:  0x21000000
[00:00:16.530,456] <err> os: Faulting instruction address (r15/pc): 0x00023cc0
[00:00:16.530,548] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:00:16.530,609] <err> os: Current thread: 0x200027b8 (unknown)

I've used coredump to get a stack trace from this which gives:

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
z_spin_lock_valid (l=0x585f0016, l@entry=0x0 <store_video>) at C:/ncs/v2.9.0/zephyr/kernel/spinlock_validate.c:11
11              uintptr_t thread_cpu = l->thread_cpu;
(gdb) bt
#0  z_spin_lock_valid (l=0x585f0016, l@entry=0x0 <store_video>) at C:/ncs/v2.9.0/zephyr/kernel/spinlock_validate.c:11
#1  0x000211f6 in z_spinlock_validate_pre (l=0x0 <store_video>) at C:/ncs/v2.9.0/zephyr/include/zephyr/spinlock.h:136
#2  k_spin_lock (l=0x0 <store_video>) at C:/ncs/v2.9.0/zephyr/include/zephyr/spinlock.h:193
#3  k_heap_free (heap=0x0 <store_video>, mem=0x0 <store_video>) at C:/ncs/v2.9.0/zephyr/kernel/kheap.c:151
#4  0x000211f6 in z_spinlock_validate_pre (l=0x2000ad28 <net_buf_data_hci_cmd_pool+140>)
    at C:/ncs/v2.9.0/zephyr/include/zephyr/spinlock.h:136
#5  k_spin_lock (l=0x2000ad28 <net_buf_data_hci_cmd_pool+140>) at C:/ncs/v2.9.0/zephyr/include/zephyr/spinlock.h:193
#6  k_heap_free (heap=0xa4ce2045, mem=0x40 <_irq_vector_table>) at C:/ncs/v2.9.0/zephyr/kernel/kheap.c:151
#7  0x200008b0 in fragments ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I'm not sure where to go next; do we just have a case that the calls from the other end (currently a test python script sending data using bleak) is running much faster than the nordic and just overruns the nordic as it cannot keep up with the data? I'm not sure about that though because I'm not sending _huge_ amounts of data. If that is the case do I setup a notify system between each slice to give a handshake to central so it 'slows down'? I've tried increasing the stack size up to 16K but it still fails at the same place.

Thanks for your help.

Cheers,

Neil

Neil Benn · Accepted Answer

Hello, 
 Thanks but I had the same error, however I've fixedit and it was a silly typo in the code. The method that copies the data into the image_data array is as follows: 
 /**
 * @brief This will represent a series of bytes for an image
 * @param data the data to be stored
 * @param len the number of bytes in the data
 */
static int app_bw_vid_trans(uint8_t *data, uint32_t len){
	
	k_mutex_lock(&trans_mutex, K_FOREVER);
 memcpy(data + image_offset, data, (size_t) len);
	k_mutex_unlock(&trans_mutex);
 image_offset += len;
 LOG_DBG("got data of %" PRIu32 " bytes total received is %" PRIu32 " bytes", len, image_offset);
	
	return 0;
}

If you look closely at the memcpy line; it isn't copying the incoming data to the image_data array but over the data array itself which goes back to the buffer into the bluetooth stack which is likely causing the memory overflow. 
 I basically lead myself down a blind alley rather than reading the code properly! Thank yo for your help and apologies for wasting your time - at least I've learned some debugging tricks 
 Cheers, 
 Neil

Data Bus Error

Top Replies