Corrupted Heap Bounds Error

Hello,

To provide some context, I am using an ANNA-B112 with nRF52832 on a custom PCB board we manufactured for evaluation. The nRF52832 is setup with the NUS service with MTU changed to 251 chars. 

I have been noticing strange behavior on 1/10 of the boards. On this board, if a message is sent longer than around 70 chars, this is causing the system to have a bus fault. Here are the RTT logs from the bus fault:

ASSERTION FAIL [left_chunk(h, right_chunk(h, c)) == c] @ WEST_TOPDIR/zephyr/lib/os/heap.c:183
        corrupted heap bounds (buffer overflow?) for memory at 0x2000a364
[00:01:45.722,381] <err> os: r0/a1:  0x00000004  r1/a2:  0x000000b7  r2/a3:  0x00000002
[00:01:45.722,412] <err> os: r3/a4:  0x20001d58 r12/ip:  0x0000000c r14/lr:  0x000125d1
[00:01:45.722,412] <err> os:  xpsr:  0x41000000
[00:01:45.722,442] <err> os: Faulting instruction address (r15/pc): 0x0002a3b2
[00:01:45.722,473] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:01:45.722,503] <err> os: Current thread: 0x200033d8 (unknown)
[00:01:45.957,092] <err> fatal_error: Resetting system
*** Booting nRF Connect SDK v2.5.0 ***

I turned on assert and this told me there is a corrupted heap bounds error, and that it failed at the heap.c file on line 183. Before I turned on assert, the error stated a faulting instruction address which, using addr2line, lead me to heap.c file on line 58. The functions that faulted on heap.c are below:

heap.c line 183:

void sys_heap_free(struct sys_heap *heap, void *mem)
{
	if (mem == NULL) {
		return; /* ISO C free() semantics */
	}
	struct z_heap *h = heap->heap;
	chunkid_t c = mem_to_chunkid(h, mem);

	/*
	 * This should catch many double-free cases.
	 * This is cheap enough so let's do it all the time.
	 */
	__ASSERT(chunk_used(h, c),
		 "unexpected heap state (double-free?) for memory at %p", mem);

	/*
	 * It is easy to catch many common memory overflow cases with
	 * a quick check on this and next chunk header fields that are
	 * immediately before and after the freed memory.
	 */
	__ASSERT(left_chunk(h, right_chunk(h, c)) == c,
		 "corrupted heap bounds (buffer overflow?) for memory at %p",
		 mem);

	set_chunk_used(h, c, false);
#ifdef CONFIG_SYS_HEAP_RUNTIME_STATS
	h->allocated_bytes -= chunksz_to_bytes(h, chunk_size(h, c));
#endif

#ifdef CONFIG_SYS_HEAP_LISTENER
	heap_listener_notify_free(HEAP_ID_FROM_POINTER(heap), mem,
				  chunksz_to_bytes(h, chunk_size(h, c)));
#endif

	free_chunk(h, c);
}

heap.c line 58:

static void free_list_remove_bidx(struct z_heap *h, chunkid_t c, int bidx)
{
	struct z_heap_bucket *b = &h->buckets[bidx];

	CHECK(!chunk_used(h, c));
	CHECK(b->next != 0);
	CHECK(h->avail_buckets & BIT(bidx));

	if (next_free_chunk(h, c) == c) {
		/* this is the last chunk */
		h->avail_buckets &= ~BIT(bidx);
		b->next = 0;
	} else {
		chunkid_t first = prev_free_chunk(h, c),
			  second = next_free_chunk(h, c);

		b->next = second;
		set_next_free_chunk(h, first, second);
		set_prev_free_chunk(h, second, first);
	}

#ifdef CONFIG_SYS_HEAP_RUNTIME_STATS
	h->free_bytes -= chunksz_to_bytes(h, chunk_size(h, c));
#endif
}

I am starting to notice these errors more often after I had increased the heap size from 2048 to 4096. Here is my prj.conf file for context.

CONFIG_ASSERT=y

# Logging Module
CONFIG_LOG=y
CONFIG_UART_CONSOLE=n
CONFIG_USE_SEGGER_RTT=y
CONFIG_LOG_BACKEND_RTT=y

# Enable Thread debugging
# CONFIG_DEBUG_THREAD_INFO=y
# CONFIG_DEBUG_OPTIMIZATIONS=y

# RAM
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048
CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_MAIN_STACK_SIZE=4096

# BLE Settings
CONFIG_BT=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_DEVICE_APPEARANCE=833
CONFIG_BT_DEVICE_NAME="A040000010"
CONFIG_BT_MAX_PAIRED=1
CONFIG_BT_RX_STACK_SIZE=2048

# For Requesting ATT_MTU Update
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_USER_DATA_LEN_UPDATE=y
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_ACL_RX_SIZE=251
CONFIG_BT_L2CAP_TX_MTU=247

# BLE NUS Service
CONFIG_BT_NUS=y
CONFIG_BT_NUS_AUTHEN=n

# Buttons and LEDs Library
CONFIG_DK_LIBRARY=y

# Use External LFCLK for UART RX Byte Counting
CONFIG_UART_0_NRF_HW_ASYNC=y
CONFIG_UART_0_NRF_HW_ASYNC_TIMER=2

# Async UART
CONFIG_UART_ASYNC_API=y
CONFIG_NRFX_UARTE0=y
CONFIG_SERIAL=y
CONFIG_GPIO=y
CONFIG_CONSOLE=y
CONFIG_UART_0_NRF_PARITY_BIT=n
CONFIG_BT_NUS_UART_BUFFER_SIZE=200

# Analog Pins
CONFIG_ADC=y

# Power Management 
CONFIG_PM=y
CONFIG_PM_DEVICE=y
CONFIG_PM_DEVICE_RUNTIME=y

# NVS Memory
CONFIG_FLASH=y
CONFIG_FLASH_MAP=y
CONFIG_FILE_SYSTEM=y
CONFIG_FILE_SYSTEM_LITTLEFS=y

The error messages lead me to believe that I made some sort of memory allocation error, but since this is only affecting one device I am a little confused. I have also noticed that the one affected device is draining the LiPo battery connected to the custom board quicker than the other devices. 

How can I debug this issue based on these error messages?

Thank you!

Crisvin Kadambathil 

Parents
  • Hi,

    " Current thread: 0x200033d8 (unknown)"

    If you set CONFIG_THREAD_NAME, the "unknown" will let you know which thread this is.

    Let's start with this, it usually gives some good indicators

    Regards,
    Sigurd Hellesvik

  • Hello Sigurd!

    I apologize for getting back to you late.

    I updated the prj.conf to set CONFIG_THREAD_NAME. And this is what I got as a result:

    It appears that the fault occurred on the main thread, so I will share that code with you as well:

    int my_init()
    {
    	// Initiate peripherals, BLE, file system, workqueue, and work items
    	int err;
    
    	// Enable Buttons and Lights
    	err = dk_leds_init();
    	if (err)
    	{
    		LOG_ERR("LEDs init failed (err %d)\n", err);
    		return err;
    	}
    
    	// Enable UART
    	err = uart_init();
    	if (err)
    	{
    		LOG_ERR("UART init failed (err %d)\n", err);
    		return err;
    	}
    
    	// Initiate ADC
    	if (!adc_is_ready_dt(&adc_channel))
    	{
    		LOG_ERR("ADC controller device %s not ready", adc_channel.dev->name);
    		return err;
    	}
    
    	err = adc_channel_setup_dt(&adc_channel);
    	if (err < 0)
    	{
    		LOG_ERR("Could not setup channel #%d (%d)", 0, err);
    		return err;
    	}
    
    	err = adc_sequence_init_dt(&adc_channel, &sequence);
    	if (err < 0) 
    	{
    		LOG_ERR("Could not initalize sequnce");
    		return err;
    	}
    
    	// Enable BLE Stack
    	err = bt_enable(NULL);
    	if (err)
    	{
    		LOG_ERR("BLE init failed (err %d)\n", err);
    		return err;
    	}
    
    	// Initiallize NUS Service
    	err = bt_nus_init(&nus_cb);
    	if (err)
    	{
    		LOG_ERR("NUS Service init failed (err %d)", err);
    		return err;
    	}
    
    	// Register connection callbacks
    	bt_conn_cb_register(&connection_callbacks);
    
    	// Start advertising BLE
    	err = bt_le_adv_start(adv_param, ad, ARRAY_SIZE(ad), sd, ARRAY_SIZE(sd));
    	if (err)
    	{
    		LOG_ERR("Advertising failed to start (err %d)\n", err);
    		return err;
    	}
    
    	// Start uart_receive_thread()
    	k_sem_give(&ble_init_ok);
    
    	// Initialize LittleFS
    	fs_file_t_init(&myFile);
    	fs_dir_t_init(&myDirectory);
    
    	// Mount file system
    	err = fs_mount(&littlefs_mnt);
    	if (err)
    	{
    		LOG_ERR("Error mounting LittleFS [%d]\n", err);
    		return err;
    	}
    
    	// Initialize workqueue for file system
    	k_work_queue_init(&file_work_q);
    	k_work_queue_start(&file_work_q, work_q_stack_area, K_THREAD_STACK_SIZEOF(work_q_stack_area), WORKQ_PRIORITY, NULL);
    
    	// Initialize all work items with their handler
    	k_work_init_delayable(&write_work_item.d_work, littleFS_write_handler);
    	k_work_init_delayable(&write_header_work_item.d_work, littleFS_write_header_handler);
    	k_work_init_delayable(&directory_work_item.d_work, littleFS_dir_handler);
    	k_work_init_delayable(&erase_work_item.d_work, littleFS_erase_handler);
    	k_work_init_delayable(&read_work_item.d_work, littleFS_read_handler);
    	k_work_init_delayable(&read_header_work_item.d_work, littleFS_read_header_handler);
    
    	return 0;
    }
    
    void parse_cmd_a(char *input_string, char *meas_time, char *freq, char *filename)
    {
    	// Parse the 'a' command used for starting continuous measurements
    
    	char *argument;
    
    	argument = strtok(input_string + 2, "_");
    
    	for (uint8_t i = 0; i < 3; i++)
    	{
    		if (argument == NULL)
    		{
    			return;
    		}
    
    		switch (i)
    		{
    			// 1st Argument: Measurement rate (3-digit number)
    			case 0:
    				if (strspn(argument, "0123456789") != 3)
    				{
    					return;
    				}
    
    				strncpy(meas_time, argument, 3);
    				break;
    
    			// 2nd Argument: EIS frequency (3-digit number)
    			case 1:
    				if (strspn(argument, "0123456789") != 3)
    				{
    					return;
    				}
    
    				strncpy(freq, argument, 3);
    				break;
    
    			// 3rd Argument: File header (<100 char title)
    			case 2:
    				if (strlen(argument) > 100)
    				{
    					return;
    				}
    
    				strncpy(filename, argument, 100);
    				break;
    
    			default:
    				break;
    		}
    		
    		argument = strtok(NULL, "_");
    	}
    }
    
    /*------------------------------Main Thread------------------------------*/
    
    int main(void)
    {
    	LOG_INF("Starting AWAREv4.0_AlphaV0.4");
    	
    	int err;
    	char cmd_buf[251] = {0};
    	char measTimeSec_buf[4] = {0};
    	char frequencyEIS_buf[4] = {0};
    	char filename_buf[100] = {0};
    	bool measOngoing = false;
    	bool fileDataCleared = false;
    	int batVoltage;
    	uint8_t adcMeas[30];
    
    	// Initiate peripherals, BLE, file system, workqueue, and work items
    	err = my_init();
    	if (err)
    	{
    		return err;
    	}
    
    	// Check file storage status: data in file/data not in file flag
    	struct fs_dirent entry;
    	err = fs_stat("/lfs/test.txt", &entry);
    	if (err)
    	{
    		fileDataCleared = true;
    	}
    
    	// Startup Sequence: 
    	// Turn on blue LED, reset pico, hibernate pico, turn off blue LED
    
    	dk_set_led_off(RED_LED);
    	dk_set_led_off(GREEN_LED);
    	dk_set_led_on(BLUE_LED);
    
    	uart_resume();
    
    	k_msleep(100);
    	uart_tx(uart, RESET_PICO, strlen(RESET_PICO), SYS_FOREVER_MS);
    	k_msleep(500);
    	uart_tx(uart, ALL_IN_HIBERNATE, strlen(ALL_IN_HIBERNATE), SYS_FOREVER_MS);
    	k_msleep(500);
    
    	uart_suspend();
    
    	dk_set_led_off(BLUE_LED);
    
    	for (;;)
    	{
    		// Wait for command from NUS
    		struct uart_data_t *nus_command;
    		nus_command = k_fifo_get(&nus_receive_data, K_FOREVER);
    
    		// Save command in command buffer
    		memcpy(cmd_buf, nus_command->data, nus_command->len);
    
    		k_free(nus_command);
    
    		switch (*cmd_buf)
    		{
    			// CMD 'o': Turn on blue LEDs
    			case 'o':
    				dk_set_led_off(RED_LED);
    				dk_set_led_off(GREEN_LED);
    				dk_set_led_on(BLUE_LED);
    
    				break;
    
    			// CMD 'i': Get information on device (hardwareID, fimrwareID, battery voltage, storage used)
    			case 'i':
    				dk_set_led_off(BLUE_LED);
    
    				// Send hardware and firmware ID
    				bt_nus_send(NULL, deviceHardwareID, strlen(deviceHardwareID));
    				bt_nus_send(NULL, annaFirmwareID, strlen(annaFirmwareID));
    
    				// Get and send battery voltage
    				get_bat_voltage();
    				k_msleep(100); 
    				batVoltage = get_bat_voltage(); 
    				sprintf(adcMeas, "Battery Voltage: %d mV\n", batVoltage);
    				bt_nus_send(NULL, adcMeas, strlen(adcMeas));
    
    				// Schedule littleFS_dir_handler() -> littleFS_dir_handler() will send storage used
    				k_work_schedule_for_queue(&file_work_q, &directory_work_item.d_work, K_NO_WAIT);
    
    				break;
    
    			// CMD 'e': Erase all data in file storage
    			case 'e':
    				dk_set_led_off(BLUE_LED);
    
    				// ERROR a!1001: This CMD is locked during ongoing measurements
    				if (measOngoing)
    				{
    					bt_nus_send(NULL, "a!1001\n", 7);
    					break;
    				}
    
    				// Schedule littleFS_erase_handler()
    				k_work_schedule_for_queue(&file_work_q, &erase_work_item.d_work, K_NO_WAIT);
    
    				fileDataCleared = true;
    
    				break;
    
    			// CMD 'x': Erase all data in file storage; override any checks for ongoing measurements
    			case 'x':
    				dk_set_led_off(BLUE_LED);
    
    				// Schedule littleFS_erase_handler()
    				k_work_schedule_for_queue(&file_work_q, &erase_work_item.d_work, K_NO_WAIT);
    
    				fileDataCleared = true;
    
    				break;
    
    			// CMD 'r': Read all data in file
    			case 'r':
    				dk_set_led_off(BLUE_LED);
    
    				// Schedule littleFS_read_handler()
    				k_work_schedule_for_queue(&file_work_q, &read_work_item.d_work, K_NO_WAIT);
    
    				break;
    
    			// CMD 'h': Read header data from file
    			case 'h':
    				dk_set_led_off(BLUE_LED);
    
    				// Schedule littleFS_read_header_handler()
    				k_work_schedule_for_queue(&file_work_q, &read_header_work_item.d_work, K_NO_WAIT);
    
    				break;
    
    			// CMD 'a': Start a continous measurement; measurement timer, EIS frequency, and file header are argument
    			case 'a':
    				dk_set_led_off(BLUE_LED);
    
    				// ERROR a!1001: This CMD is locked during ongoing measurements
    				if (measOngoing)
    				{
    					bt_nus_send(NULL, "a!1001\n", 7);
    					break;
    				}
    
    				// ERROR a!1002: This CMD is locked if there is already data in the file
    				else if (!fileDataCleared)
    				{
    					bt_nus_send(NULL, "a!1002\n", 7);
    					break;
    				}
    
    				parse_cmd_a(cmd_buf, measTimeSec_buf, frequencyEIS_buf, filename_buf);
    
    				// ERROR a!1003: This CMD has an inadaquate number of or incorrectly formatted arguments
    				if (measTimeSec_buf[0] == '\0' || frequencyEIS_buf[0] == '\0' || filename_buf[0] == '\0')
    				{
    					bt_nus_send(NULL, "a!1003\n", 7);
    					break;
    				}
    
    				measTimeSec = atoi(measTimeSec_buf);
    				frequencyEIS = atoi(frequencyEIS_buf);
    
    				// ERROR a!1004: The measurement rate cannot be less than 15 seconds
    				if (measTimeSec < 15)
    				{
    					bt_nus_send(NULL, "a!1004\n", 7);
    					break;
    				}
    				// ERROR a!1005: The EIS frequency cannot be less than 100 Hz
    				else if (frequencyEIS < 100)
    				{
    					bt_nus_send(NULL, "a!1005\n", 7);
    					break;
    				}
    
    				// Set the EIS frequency
    				snprintf(EIS_pt2_freq, 15, " %i %i 4 0m\n", frequencyEIS, frequencyEIS);
    
    				// Write file header to file: put header in fifo -> schedule littleFS_write_header_handler()
    				struct uart_data_t *store_buf;
    				store_buf = k_malloc(sizeof(*store_buf));
    
    				// make sure that k_malloc() does not return NULL ptr
    				if (store_buf)
    				{
    					snprintf(store_buf->data, CONFIG_BT_CTLR_DATA_LENGTH_MAX, ":%s;\n", filename_buf);
    					store_buf->len = strlen(store_buf->data);
    
    					k_fifo_put(&fifo_uart_tx_data, store_buf);
    					k_work_schedule_for_queue(&file_work_q, &write_header_work_item.d_work, K_NO_WAIT);
    
    					k_msleep(50);
    
    					measOngoing = true;
    					fileDataCleared = false;
    
    					// Initiate measurements: start measurement timer -> activate run_meas_thread()
    					write_work_item.meas_count = 0;
    					k_timer_start(&meas_timer, K_SECONDS(measTimeSec), K_SECONDS(measTimeSec));
    					k_thread_resume(run_meas_thread_id);
    					k_sem_give(&run_meas_sem);
    				}
    				// ERROR a!1008: Could not allocate memory
    				else
    				{
    					bt_nus_send(NULL, "a!1008\n", 7);
    					break;
    				}
    
    				break;
    
    			// CMD 'b': Stop an ongoing continous measurement
    			case 'b':
    				dk_set_led_off(BLUE_LED);
    				
    				bt_nus_send(NULL, "Measurements Stopped\n", 21);
    
    				measOngoing = false;
    
    				// Stop measurements: stop timer -> suspend run_meas_thread()
    				k_timer_stop(&meas_timer);
    				k_thread_suspend(run_meas_thread_id);
    
    				break;
    
    			default:
    				break;
    		}
    
    		// Clear buffers
    		measTimeSec_buf[0] = '\0';
    		frequencyEIS_buf[0] = '\0';
    		filename_buf[0] = '\0';
    		memset(cmd_buf, 0, sizeof(cmd_buf));
    	}
    
    	return 0;
    }

    The main function works as follows: 

    1. Initiate all peripherals.

    2. Start up and hibernate our sensor chip.

    3. Wait for messages from bt_receive_cb() of NUS.

    4. Process each command, send work to workqueue threads, etc, and loop back to step 3. 

    The command that usually causes the bus fault is the 'a' command in the switch statement, especially when the command has a really long filename arguments. 

    Additionally, since my original post I have been noticing this issue on other devices as well, but only when I flash them with new firmware. Most of the time I am able to fix the issue by doing "nrfjprog --recover" then reflashing.

    I want to say there are likely problems in my main() thread, but it is confusing to me why this happens only sometimes on some devices and only when I flash the devices with firmware that updates the heap size from 2048 to 4096.

    I also tried increasing the CONFIG_MAIN_STACK_SIZE=8192 from 4096, but this only solves the issue sometimes. 

    Is this kind of inconsistent behavior caused by anything in my code?

    Thank you!

    Crisvin K.

Reply
  • Hello Sigurd!

    I apologize for getting back to you late.

    I updated the prj.conf to set CONFIG_THREAD_NAME. And this is what I got as a result:

    It appears that the fault occurred on the main thread, so I will share that code with you as well:

    int my_init()
    {
    	// Initiate peripherals, BLE, file system, workqueue, and work items
    	int err;
    
    	// Enable Buttons and Lights
    	err = dk_leds_init();
    	if (err)
    	{
    		LOG_ERR("LEDs init failed (err %d)\n", err);
    		return err;
    	}
    
    	// Enable UART
    	err = uart_init();
    	if (err)
    	{
    		LOG_ERR("UART init failed (err %d)\n", err);
    		return err;
    	}
    
    	// Initiate ADC
    	if (!adc_is_ready_dt(&adc_channel))
    	{
    		LOG_ERR("ADC controller device %s not ready", adc_channel.dev->name);
    		return err;
    	}
    
    	err = adc_channel_setup_dt(&adc_channel);
    	if (err < 0)
    	{
    		LOG_ERR("Could not setup channel #%d (%d)", 0, err);
    		return err;
    	}
    
    	err = adc_sequence_init_dt(&adc_channel, &sequence);
    	if (err < 0) 
    	{
    		LOG_ERR("Could not initalize sequnce");
    		return err;
    	}
    
    	// Enable BLE Stack
    	err = bt_enable(NULL);
    	if (err)
    	{
    		LOG_ERR("BLE init failed (err %d)\n", err);
    		return err;
    	}
    
    	// Initiallize NUS Service
    	err = bt_nus_init(&nus_cb);
    	if (err)
    	{
    		LOG_ERR("NUS Service init failed (err %d)", err);
    		return err;
    	}
    
    	// Register connection callbacks
    	bt_conn_cb_register(&connection_callbacks);
    
    	// Start advertising BLE
    	err = bt_le_adv_start(adv_param, ad, ARRAY_SIZE(ad), sd, ARRAY_SIZE(sd));
    	if (err)
    	{
    		LOG_ERR("Advertising failed to start (err %d)\n", err);
    		return err;
    	}
    
    	// Start uart_receive_thread()
    	k_sem_give(&ble_init_ok);
    
    	// Initialize LittleFS
    	fs_file_t_init(&myFile);
    	fs_dir_t_init(&myDirectory);
    
    	// Mount file system
    	err = fs_mount(&littlefs_mnt);
    	if (err)
    	{
    		LOG_ERR("Error mounting LittleFS [%d]\n", err);
    		return err;
    	}
    
    	// Initialize workqueue for file system
    	k_work_queue_init(&file_work_q);
    	k_work_queue_start(&file_work_q, work_q_stack_area, K_THREAD_STACK_SIZEOF(work_q_stack_area), WORKQ_PRIORITY, NULL);
    
    	// Initialize all work items with their handler
    	k_work_init_delayable(&write_work_item.d_work, littleFS_write_handler);
    	k_work_init_delayable(&write_header_work_item.d_work, littleFS_write_header_handler);
    	k_work_init_delayable(&directory_work_item.d_work, littleFS_dir_handler);
    	k_work_init_delayable(&erase_work_item.d_work, littleFS_erase_handler);
    	k_work_init_delayable(&read_work_item.d_work, littleFS_read_handler);
    	k_work_init_delayable(&read_header_work_item.d_work, littleFS_read_header_handler);
    
    	return 0;
    }
    
    void parse_cmd_a(char *input_string, char *meas_time, char *freq, char *filename)
    {
    	// Parse the 'a' command used for starting continuous measurements
    
    	char *argument;
    
    	argument = strtok(input_string + 2, "_");
    
    	for (uint8_t i = 0; i < 3; i++)
    	{
    		if (argument == NULL)
    		{
    			return;
    		}
    
    		switch (i)
    		{
    			// 1st Argument: Measurement rate (3-digit number)
    			case 0:
    				if (strspn(argument, "0123456789") != 3)
    				{
    					return;
    				}
    
    				strncpy(meas_time, argument, 3);
    				break;
    
    			// 2nd Argument: EIS frequency (3-digit number)
    			case 1:
    				if (strspn(argument, "0123456789") != 3)
    				{
    					return;
    				}
    
    				strncpy(freq, argument, 3);
    				break;
    
    			// 3rd Argument: File header (<100 char title)
    			case 2:
    				if (strlen(argument) > 100)
    				{
    					return;
    				}
    
    				strncpy(filename, argument, 100);
    				break;
    
    			default:
    				break;
    		}
    		
    		argument = strtok(NULL, "_");
    	}
    }
    
    /*------------------------------Main Thread------------------------------*/
    
    int main(void)
    {
    	LOG_INF("Starting AWAREv4.0_AlphaV0.4");
    	
    	int err;
    	char cmd_buf[251] = {0};
    	char measTimeSec_buf[4] = {0};
    	char frequencyEIS_buf[4] = {0};
    	char filename_buf[100] = {0};
    	bool measOngoing = false;
    	bool fileDataCleared = false;
    	int batVoltage;
    	uint8_t adcMeas[30];
    
    	// Initiate peripherals, BLE, file system, workqueue, and work items
    	err = my_init();
    	if (err)
    	{
    		return err;
    	}
    
    	// Check file storage status: data in file/data not in file flag
    	struct fs_dirent entry;
    	err = fs_stat("/lfs/test.txt", &entry);
    	if (err)
    	{
    		fileDataCleared = true;
    	}
    
    	// Startup Sequence: 
    	// Turn on blue LED, reset pico, hibernate pico, turn off blue LED
    
    	dk_set_led_off(RED_LED);
    	dk_set_led_off(GREEN_LED);
    	dk_set_led_on(BLUE_LED);
    
    	uart_resume();
    
    	k_msleep(100);
    	uart_tx(uart, RESET_PICO, strlen(RESET_PICO), SYS_FOREVER_MS);
    	k_msleep(500);
    	uart_tx(uart, ALL_IN_HIBERNATE, strlen(ALL_IN_HIBERNATE), SYS_FOREVER_MS);
    	k_msleep(500);
    
    	uart_suspend();
    
    	dk_set_led_off(BLUE_LED);
    
    	for (;;)
    	{
    		// Wait for command from NUS
    		struct uart_data_t *nus_command;
    		nus_command = k_fifo_get(&nus_receive_data, K_FOREVER);
    
    		// Save command in command buffer
    		memcpy(cmd_buf, nus_command->data, nus_command->len);
    
    		k_free(nus_command);
    
    		switch (*cmd_buf)
    		{
    			// CMD 'o': Turn on blue LEDs
    			case 'o':
    				dk_set_led_off(RED_LED);
    				dk_set_led_off(GREEN_LED);
    				dk_set_led_on(BLUE_LED);
    
    				break;
    
    			// CMD 'i': Get information on device (hardwareID, fimrwareID, battery voltage, storage used)
    			case 'i':
    				dk_set_led_off(BLUE_LED);
    
    				// Send hardware and firmware ID
    				bt_nus_send(NULL, deviceHardwareID, strlen(deviceHardwareID));
    				bt_nus_send(NULL, annaFirmwareID, strlen(annaFirmwareID));
    
    				// Get and send battery voltage
    				get_bat_voltage();
    				k_msleep(100); 
    				batVoltage = get_bat_voltage(); 
    				sprintf(adcMeas, "Battery Voltage: %d mV\n", batVoltage);
    				bt_nus_send(NULL, adcMeas, strlen(adcMeas));
    
    				// Schedule littleFS_dir_handler() -> littleFS_dir_handler() will send storage used
    				k_work_schedule_for_queue(&file_work_q, &directory_work_item.d_work, K_NO_WAIT);
    
    				break;
    
    			// CMD 'e': Erase all data in file storage
    			case 'e':
    				dk_set_led_off(BLUE_LED);
    
    				// ERROR a!1001: This CMD is locked during ongoing measurements
    				if (measOngoing)
    				{
    					bt_nus_send(NULL, "a!1001\n", 7);
    					break;
    				}
    
    				// Schedule littleFS_erase_handler()
    				k_work_schedule_for_queue(&file_work_q, &erase_work_item.d_work, K_NO_WAIT);
    
    				fileDataCleared = true;
    
    				break;
    
    			// CMD 'x': Erase all data in file storage; override any checks for ongoing measurements
    			case 'x':
    				dk_set_led_off(BLUE_LED);
    
    				// Schedule littleFS_erase_handler()
    				k_work_schedule_for_queue(&file_work_q, &erase_work_item.d_work, K_NO_WAIT);
    
    				fileDataCleared = true;
    
    				break;
    
    			// CMD 'r': Read all data in file
    			case 'r':
    				dk_set_led_off(BLUE_LED);
    
    				// Schedule littleFS_read_handler()
    				k_work_schedule_for_queue(&file_work_q, &read_work_item.d_work, K_NO_WAIT);
    
    				break;
    
    			// CMD 'h': Read header data from file
    			case 'h':
    				dk_set_led_off(BLUE_LED);
    
    				// Schedule littleFS_read_header_handler()
    				k_work_schedule_for_queue(&file_work_q, &read_header_work_item.d_work, K_NO_WAIT);
    
    				break;
    
    			// CMD 'a': Start a continous measurement; measurement timer, EIS frequency, and file header are argument
    			case 'a':
    				dk_set_led_off(BLUE_LED);
    
    				// ERROR a!1001: This CMD is locked during ongoing measurements
    				if (measOngoing)
    				{
    					bt_nus_send(NULL, "a!1001\n", 7);
    					break;
    				}
    
    				// ERROR a!1002: This CMD is locked if there is already data in the file
    				else if (!fileDataCleared)
    				{
    					bt_nus_send(NULL, "a!1002\n", 7);
    					break;
    				}
    
    				parse_cmd_a(cmd_buf, measTimeSec_buf, frequencyEIS_buf, filename_buf);
    
    				// ERROR a!1003: This CMD has an inadaquate number of or incorrectly formatted arguments
    				if (measTimeSec_buf[0] == '\0' || frequencyEIS_buf[0] == '\0' || filename_buf[0] == '\0')
    				{
    					bt_nus_send(NULL, "a!1003\n", 7);
    					break;
    				}
    
    				measTimeSec = atoi(measTimeSec_buf);
    				frequencyEIS = atoi(frequencyEIS_buf);
    
    				// ERROR a!1004: The measurement rate cannot be less than 15 seconds
    				if (measTimeSec < 15)
    				{
    					bt_nus_send(NULL, "a!1004\n", 7);
    					break;
    				}
    				// ERROR a!1005: The EIS frequency cannot be less than 100 Hz
    				else if (frequencyEIS < 100)
    				{
    					bt_nus_send(NULL, "a!1005\n", 7);
    					break;
    				}
    
    				// Set the EIS frequency
    				snprintf(EIS_pt2_freq, 15, " %i %i 4 0m\n", frequencyEIS, frequencyEIS);
    
    				// Write file header to file: put header in fifo -> schedule littleFS_write_header_handler()
    				struct uart_data_t *store_buf;
    				store_buf = k_malloc(sizeof(*store_buf));
    
    				// make sure that k_malloc() does not return NULL ptr
    				if (store_buf)
    				{
    					snprintf(store_buf->data, CONFIG_BT_CTLR_DATA_LENGTH_MAX, ":%s;\n", filename_buf);
    					store_buf->len = strlen(store_buf->data);
    
    					k_fifo_put(&fifo_uart_tx_data, store_buf);
    					k_work_schedule_for_queue(&file_work_q, &write_header_work_item.d_work, K_NO_WAIT);
    
    					k_msleep(50);
    
    					measOngoing = true;
    					fileDataCleared = false;
    
    					// Initiate measurements: start measurement timer -> activate run_meas_thread()
    					write_work_item.meas_count = 0;
    					k_timer_start(&meas_timer, K_SECONDS(measTimeSec), K_SECONDS(measTimeSec));
    					k_thread_resume(run_meas_thread_id);
    					k_sem_give(&run_meas_sem);
    				}
    				// ERROR a!1008: Could not allocate memory
    				else
    				{
    					bt_nus_send(NULL, "a!1008\n", 7);
    					break;
    				}
    
    				break;
    
    			// CMD 'b': Stop an ongoing continous measurement
    			case 'b':
    				dk_set_led_off(BLUE_LED);
    				
    				bt_nus_send(NULL, "Measurements Stopped\n", 21);
    
    				measOngoing = false;
    
    				// Stop measurements: stop timer -> suspend run_meas_thread()
    				k_timer_stop(&meas_timer);
    				k_thread_suspend(run_meas_thread_id);
    
    				break;
    
    			default:
    				break;
    		}
    
    		// Clear buffers
    		measTimeSec_buf[0] = '\0';
    		frequencyEIS_buf[0] = '\0';
    		filename_buf[0] = '\0';
    		memset(cmd_buf, 0, sizeof(cmd_buf));
    	}
    
    	return 0;
    }

    The main function works as follows: 

    1. Initiate all peripherals.

    2. Start up and hibernate our sensor chip.

    3. Wait for messages from bt_receive_cb() of NUS.

    4. Process each command, send work to workqueue threads, etc, and loop back to step 3. 

    The command that usually causes the bus fault is the 'a' command in the switch statement, especially when the command has a really long filename arguments. 

    Additionally, since my original post I have been noticing this issue on other devices as well, but only when I flash them with new firmware. Most of the time I am able to fix the issue by doing "nrfjprog --recover" then reflashing.

    I want to say there are likely problems in my main() thread, but it is confusing to me why this happens only sometimes on some devices and only when I flash the devices with firmware that updates the heap size from 2048 to 4096.

    I also tried increasing the CONFIG_MAIN_STACK_SIZE=8192 from 4096, but this only solves the issue sometimes. 

    Is this kind of inconsistent behavior caused by anything in my code?

    Thank you!

    Crisvin K.

Children
  • Crisvin_K said:
    only when I flash the devices with firmware that updates the heap size from 2048 to 4096.

    this one is a bit odd. Perhaps other settings are affected by this change as well?
    Try to compare build/project_name/zephyr/.config before and after the heap size increase.

    Crisvin_K said:
    Is this kind of inconsistent behavior caused by anything in my code?

    In situations like this, I usually try to comment out some parts of my code and see if the error still happens. This is a good way to find out which parts do not cause the issue.

    I would start by looking at the code using malloc. I see this is also inside the 'a' switch, which makes it a good suspect.

Related