Bluetooth throughput reduced with more calculations executed

Hi,

I am developing an acoustic platform with my custom board with a nRF5340 CLAA chip. The board is recording audio data from two microphones at a sampling rate of 100 kHz (controlled by I2S). The recorded data is streamed out via Bluetooth. I tested the Bluetooth transmission by receiving the audio data on an Android phone from my custom board and it could reach around 800 kbps throughput. However, when I tried to process the data on the board, the throughput dropped significantly. More specifically, I tried to calculate the cross correlation between the recorded audio data and a given sequence with the arm_dot_prod_q15 function from CMSIS-DSP library continuously before streaming the data out via Bluetooth. In this case, the throughput dropped to only 40 kbps. Since nRF5340 has a dual-core system and the calculations are done on the application core while the Bluetooth transmission is done on the network core, I am confused why executing more operations on the application core can have such a significant impact on the Bluetooth transmission. Any advice would be appreciated. Thank you very much!

Best,

Ke

  • Depending on block size, the dot product may only allow 40kbps due to the MCU execution speed.

  • So the calculations executed on the application core do impact the Bluetooth transmission on the network core significantly? I tested with a blockSize of 60. 

  • Hi

    I wonder if the reason the throughput drops so much is in part that you need to transfer the incoming data from the NET core to the APP core but the APP core is busy handling data for example. But I don't know for certain as I don't understand exactly what you're doing here.

    Best regards,

    Simon

  • Hi Simon,

    Thanks for your reply. In the first test, the main thread handles the recording of the audio data over I2S and puts the recorded samples into a global array:

    static uint32_t rx_buffers[I2S_N_BUFFER_BLOCKS][VALID_SAMPLES + 1];
    In another ble_thread, it checks whether there is any data to be sent in rx_buffers, and if so, it sends the data out via Bluetooth using bt_nus_send. In this case, the throughput is around 800 kbps.
    In the second test, I calculate the dot product of the recorded audio with a given sequence using arm_dot_prod_q15 right after recording each frame of audio data using i2s_read in the main thread and put the calculated results into the rx_buffers instead for the ble_thread to transmit. I double checked that the calculation of dot product is fast enough so that it does not block the i2s_read for each frame. I did not change the ble_thread. However, the throughput dropped to around 40 kbps in this case. Do you have any suggestion on why this happens? Thanks.
    Best,
    Ke
  • Hi

    I needed to discuss this with a colleague before getting back to you. Thank you for your patience.

    When you make the APP core busy with a lot of DSP math as it seems you're doing here, you're not just burning APP core CPU cycles, but also using the shared memory bus and slowing down how fast Bluetooth events get handled. Even though the radio is completely separate on the NET core, all the higherl-level Bluetooth actions (ACKs, GATT, HCI, etc.) still needs the app core to handle interrupts, move data over the IPC channel and run the Zephyr BLE threads.

    So we think here you need to take a closer look on the firmware design here. Instead of doing the dot-product right into the RX callback or a high priority thread, rather stash raw samples into CCM/fast RAM. Put both I2S buffers and DSP I/O buffers in CCM instead of the regular RAM. That should take way less bus time and make the DMA "happier". You can also start up a lower priority Zephyr work queue to crunch the numbers between BLE events to get a better BLE throughput if that's an option for you.

    Note that these are general suggestions from us on how to better handle this from a firmware perspective and that we don't have any code snippets to share.

    Best regards,

    Simon

Related