Hello,
To provide some context, I am using an ANNA-B112 with nRF52832 on a custom PCB board we manufactured for evaluation. The nRF52832 is setup with the NUS service with MTU changed to 251 chars.
I have been noticing strange behavior on 1/10 of the boards. On this board, if a message is sent longer than around 70 chars, this is causing the system to have a bus fault. Here are the RTT logs from the bus fault:
ASSERTION FAIL [left_chunk(h, right_chunk(h, c)) == c] @ WEST_TOPDIR/zephyr/lib/os/heap.c:183
corrupted heap bounds (buffer overflow?) for memory at 0x2000a364
[00:01:45.722,381] <err> os: r0/a1: 0x00000004 r1/a2: 0x000000b7 r2/a3: 0x00000002
[00:01:45.722,412] <err> os: r3/a4: 0x20001d58 r12/ip: 0x0000000c r14/lr: 0x000125d1
[00:01:45.722,412] <err> os: xpsr: 0x41000000
[00:01:45.722,442] <err> os: Faulting instruction address (r15/pc): 0x0002a3b2
[00:01:45.722,473] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:01:45.722,503] <err> os: Current thread: 0x200033d8 (unknown)
[00:01:45.957,092] <err> fatal_error: Resetting system
*** Booting nRF Connect SDK v2.5.0 ***
I turned on assert and this told me there is a corrupted heap bounds error, and that it failed at the heap.c file on line 183. Before I turned on assert, the error stated a faulting instruction address which, using addr2line, lead me to heap.c file on line 58. The functions that faulted on heap.c are below:
heap.c line 183:
void sys_heap_free(struct sys_heap *heap, void *mem)
{
if (mem == NULL) {
return; /* ISO C free() semantics */
}
struct z_heap *h = heap->heap;
chunkid_t c = mem_to_chunkid(h, mem);
/*
* This should catch many double-free cases.
* This is cheap enough so let's do it all the time.
*/
__ASSERT(chunk_used(h, c),
"unexpected heap state (double-free?) for memory at %p", mem);
/*
* It is easy to catch many common memory overflow cases with
* a quick check on this and next chunk header fields that are
* immediately before and after the freed memory.
*/
__ASSERT(left_chunk(h, right_chunk(h, c)) == c,
"corrupted heap bounds (buffer overflow?) for memory at %p",
mem);
set_chunk_used(h, c, false);
#ifdef CONFIG_SYS_HEAP_RUNTIME_STATS
h->allocated_bytes -= chunksz_to_bytes(h, chunk_size(h, c));
#endif
#ifdef CONFIG_SYS_HEAP_LISTENER
heap_listener_notify_free(HEAP_ID_FROM_POINTER(heap), mem,
chunksz_to_bytes(h, chunk_size(h, c)));
#endif
free_chunk(h, c);
}
heap.c line 58:
static void free_list_remove_bidx(struct z_heap *h, chunkid_t c, int bidx)
{
struct z_heap_bucket *b = &h->buckets[bidx];
CHECK(!chunk_used(h, c));
CHECK(b->next != 0);
CHECK(h->avail_buckets & BIT(bidx));
if (next_free_chunk(h, c) == c) {
/* this is the last chunk */
h->avail_buckets &= ~BIT(bidx);
b->next = 0;
} else {
chunkid_t first = prev_free_chunk(h, c),
second = next_free_chunk(h, c);
b->next = second;
set_next_free_chunk(h, first, second);
set_prev_free_chunk(h, second, first);
}
#ifdef CONFIG_SYS_HEAP_RUNTIME_STATS
h->free_bytes -= chunksz_to_bytes(h, chunk_size(h, c));
#endif
}
I am starting to notice these errors more often after I had increased the heap size from 2048 to 4096. Here is my prj.conf file for context.
CONFIG_ASSERT=y # Logging Module CONFIG_LOG=y CONFIG_UART_CONSOLE=n CONFIG_USE_SEGGER_RTT=y CONFIG_LOG_BACKEND_RTT=y # Enable Thread debugging # CONFIG_DEBUG_THREAD_INFO=y # CONFIG_DEBUG_OPTIMIZATIONS=y # RAM CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048 CONFIG_HEAP_MEM_POOL_SIZE=4096 CONFIG_MAIN_STACK_SIZE=4096 # BLE Settings CONFIG_BT=y CONFIG_BT_PERIPHERAL=y CONFIG_BT_DEVICE_APPEARANCE=833 CONFIG_BT_DEVICE_NAME="A040000010" CONFIG_BT_MAX_PAIRED=1 CONFIG_BT_RX_STACK_SIZE=2048 # For Requesting ATT_MTU Update CONFIG_BT_GATT_CLIENT=y CONFIG_BT_USER_DATA_LEN_UPDATE=y CONFIG_BT_CTLR_DATA_LENGTH_MAX=251 CONFIG_BT_BUF_ACL_TX_SIZE=251 CONFIG_BT_BUF_ACL_RX_SIZE=251 CONFIG_BT_L2CAP_TX_MTU=247 # BLE NUS Service CONFIG_BT_NUS=y CONFIG_BT_NUS_AUTHEN=n # Buttons and LEDs Library CONFIG_DK_LIBRARY=y # Use External LFCLK for UART RX Byte Counting CONFIG_UART_0_NRF_HW_ASYNC=y CONFIG_UART_0_NRF_HW_ASYNC_TIMER=2 # Async UART CONFIG_UART_ASYNC_API=y CONFIG_NRFX_UARTE0=y CONFIG_SERIAL=y CONFIG_GPIO=y CONFIG_CONSOLE=y CONFIG_UART_0_NRF_PARITY_BIT=n CONFIG_BT_NUS_UART_BUFFER_SIZE=200 # Analog Pins CONFIG_ADC=y # Power Management CONFIG_PM=y CONFIG_PM_DEVICE=y CONFIG_PM_DEVICE_RUNTIME=y # NVS Memory CONFIG_FLASH=y CONFIG_FLASH_MAP=y CONFIG_FILE_SYSTEM=y CONFIG_FILE_SYSTEM_LITTLEFS=y
The error messages lead me to believe that I made some sort of memory allocation error, but since this is only affecting one device I am a little confused. I have also noticed that the one affected device is draining the LiPo battery connected to the custom board quicker than the other devices.
How can I debug this issue based on these error messages?
Thank you!
Crisvin Kadambathil