Choice of controller configurations in BLE Throughput sample

I am referring to the ncs\v3.1.0\nrf\samples\bluetooth\throughput\sysbuild\ipc_radio\prj.conf for the nRF5340 and extrapolating that to a central application with multiple peripheral connections. In that case, RAM use on the network core becomes a concern. 

1. Why does the sample have such a large heap size (CONFIG_HEAP_MEM_POOL_SIZE=8192)? I am not aware of any k_malloc() use in the Throughput sample and ipc_radio.

* Yes, we can use a "minimal.conf", but why put such a large default of 8192 in the Throughput prj.conf?

2. Why doesn't the sample suggest adjusting other configurations, notably the CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT, CONFIG_BT_CTLR_SDC_RX_PACKET_COUNT, and CONFIG_BT_BUF_ACL_TX_COUNT? Couldn't these influence throughput?

* See my case 296354, where increasing CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT influenced keeping the More Data (MD) bit set.

3. In that case 296354, there was a statement:
"There is no need to maintain any ratio or relationship between CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT and CONFIG_BT_BUF_ACL_TX_COUNT. Also, the latter is only used by the Zephyr LL. (Generally, the difference between BT_BUF_ACL_TX_COUNT and BT_CTLR_SDC_TX_PACKET_COUNT is that the latter is per connection while the former is shared. This does not matter for a single connection, though.)"

That was for an old NCS, 2.0. I believe CONFIG_BT_BUF_ACL_TX_COUNT is indeed used by the SDC in newer NCS versions, such as NCS 2.6 and beyond. Is that correct?

Parents
  • Hello, I think I can answer your questions.

    1. Because the IPC/RPMsg/virtqueue system on the nRF5340 network core uses dynamic allocation, even if the throughput sample itself never calls k_malloc().
    The 8 KB heap is simply a safe default to ensure that RPMsg, libmetal, and virtqueue buffers always have enough memory.

    2. Because the sample is meant to run out-of-the-box and demonstrate throughput, not serve as a tuning guide.
    However, those parameters absolutely do affect throughput and memory usage, especially for multiple connections, and they should be tuned in real applications.

    3. Yes, in newer NCS releases, BT_BUF_ACL_TX_COUNT is also used when the SoftDevice Controller is enabled.
    The old statement (“only used by Zephyr LL”) is no longer correct for modern NCS.

  • Thanks  for your reply

    I spent quite a bit of time looking into #1 thanks to what you pointed out. However, I found that the system heap with size CONFIG_HEAP_MEM_POOL_SIZE only needs CONFIG_HEAP_MEM_POOL_SIZE=704 for the network core, as the worst case for either central or peripheral role in the Throughput sample.  In my opinion, the default 8 KB heap is way overkill and wasteful, at least for the throughput sample, when you can increase BLE-related buffer sizes and counts instead for multiple connections.

    I used NCS 2.6.4 to investigate. I employed use of 
    CONFIG_SYS_HEAP_RUNTIME_STATS on both cores, as well as adding several debug printk's.

    One thing I noticed is that there are two heaps being used in both cores.

    1. The _SYSTEM_HEAP of size CONFIG_HEAP_MEM_POOL_SIZE in ncs\v2.6.4\zephyr\kernel\mempool.c.

    K_HEAP_DEFINE(_system_heap, CONFIG_HEAP_MEM_POOL_SIZE);
    #define _SYSTEM_HEAP (&_system_heap)
    

    2. The z_malloc_heap of size HEAP_SIZE in ncs\v2.6.4\zephyr\lib\libc\common\source\stdlib\malloc.c.

    For our case of z_malloc_heap:

    #   define USED_RAM_END_ADDR   POINTER_TO_UINT(&_end)\
    /*
     * No partition, heap can just start wherever _end is, with
     * suitable alignment
     */
    #   define HEAP_BASE	ROUND_UP(USED_RAM_END_ADDR, HEAP_ALIGN)
    
    #   define HEAP_SIZE	ROUND_DOWN((RAM_SIZE -	\
    		((size_t) HEAP_BASE - (size_t) RAM_ADDR)), HEAP_ALIGN)
    


    That is, the malloc heap begins where ever the used-RAM ends, and the heap ends at the end of the RAM itself.

    k_malloc() uses the _SYSTEM_HEAP, while malloc() uses the z_malloc_heap.

    For any heap allocation, both heaps will eventually call sys_heap_alloc() in C:\ncs\v2.6.4\zephyr\lib\os\heap.c., which calls increase_allocated_bytes() if CONFIG_SYS_HEAP_RUNTIME_STATS=y.

    In increase_allocated_bytes(), I added these two lines at the end:

    	printk("increase_allocated_bytes(): allocated %zu, free %zu, max allocated %zu, pHeapStruct=%p\n",
    		h->allocated_bytes, h->free_bytes,
    		h->max_allocated_bytes, (void*)h);

    The pHeapStruct above tells me which heap is being used for the allocation. I recorded the address of both heap structures at startup in mempool.c/k_thread_system_pool_assign() and malloc.c/malloc_prepare(). One of those two heap structures will show up in the print of pHeapStruct when increase_allocated_bytes() is called.

    At bootup, the network core in the Throughput sample makes just two allocations from the _SYSTEM_HEAP each of size 312 bytes (316 bytes aligned). One for the rx vring and one for the tx vring:

    static int vq_setup(struct ipc_static_vrings *vr, unsigned int role)
    {
    	vr->vq[RPMSG_VQ_0] = virtqueue_allocate(vr->vring_size);
    	if (vr->vq[RPMSG_VQ_0] == NULL) {
    		return -ENOMEM;
    	}
    
    	vr->vq[RPMSG_VQ_1] = virtqueue_allocate(vr->vring_size);
    	if (vr->vq[RPMSG_VQ_1] == NULL) {
    		return -ENOMEM;
    	}
    
    



    No further allocations in the network core occur when you run the throughput test. And malloc allocations never occur.

    Those two allocations use 316*2=632 bytes.  I verified with CONFIG_HEAP_MEM_POOL_SIZE=512 that IPC init fails (<err> hci_ipc: IPC service instance initialization failed: -12). I verified the sample runs the test successfully on both sides if CONFIG_HEAP_MEM_POOL_SIZE=704. It probably can go even lower.  Regardless, 8192 seems to be way overkill.

    As an aside, the app core also had those same two vring allocations from the system heap. The central app core is the only image that had an additional allocation other than the two vrings.  That occurred in 
    user_data_alloc() in v2.6.4\nrf\subsys\bluetooth\gatt_dm.c, which called k_calloc() for the _SYSTEM_HEAP for an additional 124 bytes, and that's it. As for the network core, no mallocs occurred using the z_malloc_heap.

    2. Follow up:  . 
    a. What benefit do you get if you set the network core 
    CONFIG_BT_BUF_ACL_RX_SIZE to greater than 251 if the app core has CONFIG_BT_BUF_ACL_RX_SIZE=500 (the MTU size)?  That is, won't the network core receive max 251 bytes per packet over the air, which would then be sent to the app core with the larger 500 size for L2CAP reassembly? How would the network core make use of CONFIG_BT_BUF_ACL_RX_SIZE>251?

    b. How can you determine if the CONFIG_BT_CTLR_SDC_RX_PACKET_COUNT default of 2 is too small? Does the SDC nak a packet if it doesn't have enough buffers for a particular connection?  As for the CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT, my understanding is that if there are not enough tx buffers, the More Data bit will not be set if it could be set. Is that right?

    c. What is the relation of the CONFIG_BT_BUF_ACL_TX_COUNT for the app core and network core?

  •  For your case, did you set both the app core and net core to CONFIG_BT_BUF_ACL_TX_COUNT=10? It seems that your net core has 3 buffers and your app core has 10.

    For the HCI_LE_Read_Buffer_Size command, per the spec:

    It seems the warning is that you are configuring the host with 10 buffers to send to the controller, but the controller has only 3, which is inefficient.

  • Thank you   I'm sorry I didn't notice it's about nRF5340. I'm using NRF54L15 and there is just one core.

  • Hi Mike, 

    Happy New Year! 

    Sorry for the late reply. I got the following answers from the team:

    variant said:
    a. Can you please just reply yes or no to the question: Is there any benefit if you set the network core CONFIG_BT_BUF_ACL_RX_SIZE to greater than 251 if the app core has CONFIG_BT_BUF_ACL_RX_SIZE=500 (the MTU size)?

    No. Making it bigger than 255 on the network core shouldn't bring any benefit.

    variant said:
    b. Again, this can have a yes or no reply:  Does the SDC nak a received packet if it doesn't have enough CONFIG_BT_CTLR_SDC_RX_PACKET_COUNT buffers for a particular connection to store the packet? If so, we can use a sniffer to know if that is happening.

    In such a case, SDC will NAK a received packet only if a device has some data to send. If the device doesn't have any data to send, it will simply close the current connection event (so it will not send an empty packet just to NAK reception) and will receive retransmission of the packet in the next connection event. However, it is an implementation detail, so we don't recommend relying on this behavior.

    variant said:
    c.  I was hoping to get a suggestion for best throughput such as set the CONFIG_BT_BUF_ACL_TX_COUNT in the app core to be larger than the net core's value.  Or, should the app core and net core values be equal for best throughput? Can you provide such a suggestion?

    I can't provide any universal recommendation. My suggestion would be to fine-tune the amount and sizes of buffers depending on application needs and CPU utilization.

    -Amanda H.

  • Hi Amanda, 
    Happy New Year to you too.

    Your above reply is exactly what I was looking for.

    This part of the reply was very telling:

    No. Making it bigger than 255 on the network core shouldn't bring any benefit.

    I assume 255 is the 251 plus 4 bytes for the MIC or the 4 bytes for the l2cap header.

    Anyway, I think it would be useful if this fact were documented somewhere, especially as a comment in the Throughput sample network core .conf file. In fact, the Throughput sample sets the default value to 502 instead of 255 for the network core:  CONFIG_BT_BUF_ACL_RX_SIZE=502. 

    Similarly, as I pointed out in an earlier reply, it would be nice if the Throughput sample used a more reasonable CONFIG_HEAP_MEM_POOL_SIZE instead of 8192 (see the same file I linked above).  As I said earlier, with multiple connection support, the network core memory use becomes significant, so having reasonable default values would be useful for people to see when they look at the Throughput sample for guidance.

  • Hello Amanda, I wanted to jump in here and ask for a little clarification on your statement about the Controller returning CONFIG_BT_BUF_ACL_TX_COUNT as the response to the HCI LE Read Buffer Size command.

    I am in the process of updating our nRF5340 application to NCS 3.2 and have been seeing the "<wrn> bt_hci_core: Num of Controller's ACL packets != ACL bt_conn_tx contexts (3 != 10)" log that is mentioned elsewhere in this thread. I have verified that both the application and ipc_radio configuration have CONFIG_BT_BUF_ACL_TX_COUNT set to 10, and yet I am still seeing this exact warning. I suspect that 
    the controller is actually using CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT as when I set that to 10 in the ipc_radio config, the warning goes away. This obviously does not make sense as ACL_TX_COUNT refers to number of buffers shared among connections, whereas SDC_TX_PACKET_COUNT is specified as per connection.

    This can be easily recreated with the throughput sample building for nrf5340dk/nrf5340/cpuapp. If you add the following to prj.conf in a clone of the throughput sample:

    CONFIG_LOG=y
    CONFIG_BT_LOG_LEVEL_WRN=y

    You will see the exact warning log mentioned above.

    You can then add 

    CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT=10

    to sysbuild/ipc_radio/prj.conf and rerun the sample and observe that the warning has gone away. Would you be able to comment on this behavior?

Reply
  • Hello Amanda, I wanted to jump in here and ask for a little clarification on your statement about the Controller returning CONFIG_BT_BUF_ACL_TX_COUNT as the response to the HCI LE Read Buffer Size command.

    I am in the process of updating our nRF5340 application to NCS 3.2 and have been seeing the "<wrn> bt_hci_core: Num of Controller's ACL packets != ACL bt_conn_tx contexts (3 != 10)" log that is mentioned elsewhere in this thread. I have verified that both the application and ipc_radio configuration have CONFIG_BT_BUF_ACL_TX_COUNT set to 10, and yet I am still seeing this exact warning. I suspect that 
    the controller is actually using CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT as when I set that to 10 in the ipc_radio config, the warning goes away. This obviously does not make sense as ACL_TX_COUNT refers to number of buffers shared among connections, whereas SDC_TX_PACKET_COUNT is specified as per connection.

    This can be easily recreated with the throughput sample building for nrf5340dk/nrf5340/cpuapp. If you add the following to prj.conf in a clone of the throughput sample:

    CONFIG_LOG=y
    CONFIG_BT_LOG_LEVEL_WRN=y

    You will see the exact warning log mentioned above.

    You can then add 

    CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT=10

    to sysbuild/ipc_radio/prj.conf and rerun the sample and observe that the warning has gone away. Would you be able to comment on this behavior?

Children
  • Hi  
    You are correct that the controller returns CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT as the response to the host's HCI LE Read Buffer Size command. I can see that myself via logging.

    But why do you say 

    This obviously does not make sense as ACL_TX_COUNT refers to number of buffers shared among connections, whereas SDC_TX_PACKET_COUNT is specified as per connection.

    By using the CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT instead of CONFIG_BT_BUF_ACL_TX_COUNT in the reply, to me, the controller is telling the host it can accommodate CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT no matter if there is one connection active or more than one connection active.

Related