Bluetooth throughput reduced with more calculations executed

Hi,

I am developing an acoustic platform with my custom board with a nRF5340 CLAA chip. The board is recording audio data from two microphones at a sampling rate of 100 kHz (controlled by I2S). The recorded data is streamed out via Bluetooth. I tested the Bluetooth transmission by receiving the audio data on an Android phone from my custom board and it could reach around 800 kbps throughput. However, when I tried to process the data on the board, the throughput dropped significantly. More specifically, I tried to calculate the cross correlation between the recorded audio data and a given sequence with the arm_dot_prod_q15 function from CMSIS-DSP library continuously before streaming the data out via Bluetooth. In this case, the throughput dropped to only 40 kbps. Since nRF5340 has a dual-core system and the calculations are done on the application core while the Bluetooth transmission is done on the network core, I am confused why executing more operations on the application core can have such a significant impact on the Bluetooth transmission. Any advice would be appreciated. Thank you very much!

Best,

Ke

Parents
  • Hi

    I needed to discuss this with a colleague before getting back to you. Thank you for your patience.

    When you make the APP core busy with a lot of DSP math as it seems you're doing here, you're not just burning APP core CPU cycles, but also using the shared memory bus and slowing down how fast Bluetooth events get handled. Even though the radio is completely separate on the NET core, all the higherl-level Bluetooth actions (ACKs, GATT, HCI, etc.) still needs the app core to handle interrupts, move data over the IPC channel and run the Zephyr BLE threads.

    So we think here you need to take a closer look on the firmware design here. Instead of doing the dot-product right into the RX callback or a high priority thread, rather stash raw samples into CCM/fast RAM. Put both I2S buffers and DSP I/O buffers in CCM instead of the regular RAM. That should take way less bus time and make the DMA "happier". You can also start up a lower priority Zephyr work queue to crunch the numbers between BLE events to get a better BLE throughput if that's an option for you.

    Note that these are general suggestions from us on how to better handle this from a firmware perspective and that we don't have any code snippets to share.

    Best regards,

    Simon

Reply
  • Hi

    I needed to discuss this with a colleague before getting back to you. Thank you for your patience.

    When you make the APP core busy with a lot of DSP math as it seems you're doing here, you're not just burning APP core CPU cycles, but also using the shared memory bus and slowing down how fast Bluetooth events get handled. Even though the radio is completely separate on the NET core, all the higherl-level Bluetooth actions (ACKs, GATT, HCI, etc.) still needs the app core to handle interrupts, move data over the IPC channel and run the Zephyr BLE threads.

    So we think here you need to take a closer look on the firmware design here. Instead of doing the dot-product right into the RX callback or a high priority thread, rather stash raw samples into CCM/fast RAM. Put both I2S buffers and DSP I/O buffers in CCM instead of the regular RAM. That should take way less bus time and make the DMA "happier". You can also start up a lower priority Zephyr work queue to crunch the numbers between BLE events to get a better BLE throughput if that's an option for you.

    Note that these are general suggestions from us on how to better handle this from a firmware perspective and that we don't have any code snippets to share.

    Best regards,

    Simon

Children
Related