Hello,
To provide some context, I am using an ANNA-B112 with nRF52832 on a custom PCB board we manufactured for evaluation. The nRF52832 is setup with the NUS service with MTU changed to 251 chars.
I have been noticing strange behavior on 1/10 of the boards. On this board, if a message is sent longer than around 70 chars, this is causing the system to have a bus fault. Here are the RTT logs from the bus fault:
ASSERTION FAIL [left_chunk(h, right_chunk(h, c)) == c] @ WEST_TOPDIR/zephyr/lib/os/heap.c:183 corrupted heap bounds (buffer overflow?) for memory at 0x2000a364 [00:01:45.722,381] <err> os: r0/a1: 0x00000004 r1/a2: 0x000000b7 r2/a3: 0x00000002 [00:01:45.722,412] <err> os: r3/a4: 0x20001d58 r12/ip: 0x0000000c r14/lr: 0x000125d1 [00:01:45.722,412] <err> os: xpsr: 0x41000000 [00:01:45.722,442] <err> os: Faulting instruction address (r15/pc): 0x0002a3b2 [00:01:45.722,473] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0 [00:01:45.722,503] <err> os: Current thread: 0x200033d8 (unknown) [00:01:45.957,092] <err> fatal_error: Resetting system *** Booting nRF Connect SDK v2.5.0 ***
I turned on assert and this told me there is a corrupted heap bounds error, and that it failed at the heap.c file on line 183. Before I turned on assert, the error stated a faulting instruction address which, using addr2line, lead me to heap.c file on line 58. The functions that faulted on heap.c are below:
heap.c line 183:
void sys_heap_free(struct sys_heap *heap, void *mem) { if (mem == NULL) { return; /* ISO C free() semantics */ } struct z_heap *h = heap->heap; chunkid_t c = mem_to_chunkid(h, mem); /* * This should catch many double-free cases. * This is cheap enough so let's do it all the time. */ __ASSERT(chunk_used(h, c), "unexpected heap state (double-free?) for memory at %p", mem); /* * It is easy to catch many common memory overflow cases with * a quick check on this and next chunk header fields that are * immediately before and after the freed memory. */ __ASSERT(left_chunk(h, right_chunk(h, c)) == c, "corrupted heap bounds (buffer overflow?) for memory at %p", mem); set_chunk_used(h, c, false); #ifdef CONFIG_SYS_HEAP_RUNTIME_STATS h->allocated_bytes -= chunksz_to_bytes(h, chunk_size(h, c)); #endif #ifdef CONFIG_SYS_HEAP_LISTENER heap_listener_notify_free(HEAP_ID_FROM_POINTER(heap), mem, chunksz_to_bytes(h, chunk_size(h, c))); #endif free_chunk(h, c); }
heap.c line 58:
static void free_list_remove_bidx(struct z_heap *h, chunkid_t c, int bidx) { struct z_heap_bucket *b = &h->buckets[bidx]; CHECK(!chunk_used(h, c)); CHECK(b->next != 0); CHECK(h->avail_buckets & BIT(bidx)); if (next_free_chunk(h, c) == c) { /* this is the last chunk */ h->avail_buckets &= ~BIT(bidx); b->next = 0; } else { chunkid_t first = prev_free_chunk(h, c), second = next_free_chunk(h, c); b->next = second; set_next_free_chunk(h, first, second); set_prev_free_chunk(h, second, first); } #ifdef CONFIG_SYS_HEAP_RUNTIME_STATS h->free_bytes -= chunksz_to_bytes(h, chunk_size(h, c)); #endif }
I am starting to notice these errors more often after I had increased the heap size from 2048 to 4096. Here is my prj.conf file for context.
CONFIG_ASSERT=y # Logging Module CONFIG_LOG=y CONFIG_UART_CONSOLE=n CONFIG_USE_SEGGER_RTT=y CONFIG_LOG_BACKEND_RTT=y # Enable Thread debugging # CONFIG_DEBUG_THREAD_INFO=y # CONFIG_DEBUG_OPTIMIZATIONS=y # RAM CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048 CONFIG_HEAP_MEM_POOL_SIZE=4096 CONFIG_MAIN_STACK_SIZE=4096 # BLE Settings CONFIG_BT=y CONFIG_BT_PERIPHERAL=y CONFIG_BT_DEVICE_APPEARANCE=833 CONFIG_BT_DEVICE_NAME="A040000010" CONFIG_BT_MAX_PAIRED=1 CONFIG_BT_RX_STACK_SIZE=2048 # For Requesting ATT_MTU Update CONFIG_BT_GATT_CLIENT=y CONFIG_BT_USER_DATA_LEN_UPDATE=y CONFIG_BT_CTLR_DATA_LENGTH_MAX=251 CONFIG_BT_BUF_ACL_TX_SIZE=251 CONFIG_BT_BUF_ACL_RX_SIZE=251 CONFIG_BT_L2CAP_TX_MTU=247 # BLE NUS Service CONFIG_BT_NUS=y CONFIG_BT_NUS_AUTHEN=n # Buttons and LEDs Library CONFIG_DK_LIBRARY=y # Use External LFCLK for UART RX Byte Counting CONFIG_UART_0_NRF_HW_ASYNC=y CONFIG_UART_0_NRF_HW_ASYNC_TIMER=2 # Async UART CONFIG_UART_ASYNC_API=y CONFIG_NRFX_UARTE0=y CONFIG_SERIAL=y CONFIG_GPIO=y CONFIG_CONSOLE=y CONFIG_UART_0_NRF_PARITY_BIT=n CONFIG_BT_NUS_UART_BUFFER_SIZE=200 # Analog Pins CONFIG_ADC=y # Power Management CONFIG_PM=y CONFIG_PM_DEVICE=y CONFIG_PM_DEVICE_RUNTIME=y # NVS Memory CONFIG_FLASH=y CONFIG_FLASH_MAP=y CONFIG_FILE_SYSTEM=y CONFIG_FILE_SYSTEM_LITTLEFS=y
The error messages lead me to believe that I made some sort of memory allocation error, but since this is only affecting one device I am a little confused. I have also noticed that the one affected device is draining the LiPo battery connected to the custom board quicker than the other devices.
How can I debug this issue based on these error messages?
Thank you!
Crisvin Kadambathil