Any ADPCM / Opus example available?

Hi, guys! I noticed this answer from the devzone.

I have nRF52 DK and an nRF Thingy. I want to play a short speech using I2S with MAX98357A.

This speech is usually has a length of 5~20 seconds.

This doesn't need to be played in strict real time. I'm worried while playing the speech can disconnect the BLE/Mesh connection.

I'm trying to insert this speech file, which is an MP3 file, into the nRF52's flash memory using SEGGER ES 3.

My questions are,

0) Are there C codes for ADPCM/Opus are included in the nRF5 SDKs?

1) Are there any examples that use I2S and ADPCM/Opus?

2) I don't know much about ADPCM/Opus. Can they play MP3 files? Or do I have to use different types of file format like WAV?

I have 200 speech files which are stored in an SD card.

But if this supports other files like MP3 which can decrease the file size, I'm planning to use nRF52840 and QSPI to read it from external flash memory.

-Thanks!

Top Replies

Parents

0 wpaul over 6 years ago

0) I didn't see any.

1) I didn't see any.

2) ADPCM, Opus and MP3 are all different audio encoding/compression algorithms. So no, if you have software to decode ADPCM and/or Opus, it will will not also decode MP3.

Note that MP3 was considered proprietary but I think the patents all expired in 2017 so there may no longer be any strings attached.

Regardless of which codec or format you use, the idea is that you'll need software to decode the data back into raw audio samples and then use the I2S block to play them. (You also will need to connect an I2S chip like the Cirrus Logic CS4344 in order to hear the sound.) The trick will be a) getting the codec library code to compile into your nRF52 project, and b) getting it to decode the audio data and feed it to the I2S controller fast enough so there's no gaps in the playback. You'll basically be doing:

- read a chunk (from SD card)

- decode a chunk

- play a chunk

If the "decode a chunk" step is too slow, your playback will be choppy.

I've experimented with the nRF52 I2S block and gotten it to work, but I decided to just convert all my audio files to raw I2S samples rather than try to put any decompression or decoding software on the nRF52 itself. I used the SOX audio tool for this, using a command like the following:

% sox --encoding signed-integer originalfile.mp3 outputfile.raw channels 2 rate 15625

This is obviously for stereo mode. I chose a sample rate of 15.625KHz because it yielded acceptable audio quality and was a rate that was easily selectable using the MCLK frequencies and LRCLK divisors that Nordic saw fit to provide in their I2S implementation. Basically, this produces 16-bit sample data, with 15625 samples per second. Since this is stereo that means there's 2 values per sample interval. You can estimate the file size from this. For example, say you have 20 seconds of audio:

15625 x 2 bytes per sample x 2 channels = 62500 bytes per second

62500 bytes per second x 20 seconds = 1250000 bytes

So your 20-second speech clip would be 1.25MB in uncompressed form. If you only want mono audio, it would be half that (625KB).

You say you plan to have about 200 files. Worst case, at 1.25MB per file, that's about 250MB. SD cards these days are often 8 or 16GB, so space should not be a problem. That's probably a lot for QSPI though (last time I looked, the largest chip I could find was 256MB).

In any case, using an uncompressed file might be handy at least for testing, so that you can figure out how to get audio from the SD card to the I2S controller. After that you can figure out how to incorporate a compression/decompression library.
Cancel
Vote Up +2 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Matthew K over 6 years ago in reply to wpaul

Thanks for your kind reply!

0) I apologize for my ignorance in audio;

2^16 = 65,536, 2^15 = 32,768, but where did the15625 came from?

1) You’ve mentioned bout CS4344 to hear the sound.

Does this mean when I use MAX98357A, this cannot make a sound like CS4344?

What is the key feature that CS4344 has compared to other products?

Thanks a million.

I did hear that the Nordic team is focusing on ADPCM on the other post so I was curious what will the Nordic team will answer.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 wpaul over 6 years ago in reply to Matthew K

Look at the I2S section of the nRF52840 product specification document. You need to configure two things: the MCLK frequency, and the left/right clock (LRCLK) divisor, which yields the LRCLK frequency. Typically both the MCLK and LRCLK must be supplied to the I2S codec chip. (It looks like the MAX98357A doesn't need MCLK, but the CS4344 does.) Nordic does not let you choose arbitrary values for these things: you get to choose from a specific set of MCLK and LRCLK settings. There's some tables which show you some possible values. I chose values that yielded 15.625KHz for the LRCLK:

MCLK == 32MHz divided by 8 == 4MHz

LRCLK divisor == 256

4000000 / 256 == 15625

This effectively also sets the audio sample rate, so when you encode your audio samples, you have to do it at this same rate in order for them to play correctly.

You can choose different values for a higher sample rate -- the higher the sample rate, the higher the maximum audio frequency you can sample. The max audio frequency is usually half the sample rate, so with 15.625KHz sample rate, the audio frequency range is 7812.5Hz. This is more than good enough for voice and 'ok' for music, though for music content you usually want 20KHz as the top end frequency. This is why 44.1KHz is used for sampling high quality audio (44.1KHz is basically two times 20KHz).

A higher sample rate means you end up with more samples per second though so you end up with bigger files. I chose 15.625KHz as a good compromise between audio quality and file size for my application.

The reason I used the CS4344 is because it's a stereo codec (it has two audio outputs, for left and right channel). However while the MAX98357A is monaural, it has a built-in amplifier. (My friend and I are planning to use an external stereo amplifier with the CS4344.) The I2S controller in the nRF52 supports both mono and stereo modes. With stereo, you have twice the audio samples, interleaved. The first sample is for the left channel and the second is for the right. The left and right channels are automatically routed to the right output of the codec chip by the hardware.

You can see the sample driver code that I wrote here:

https://github.com/netik/dc27_badge/blob/master/software/firmware/badge_840/i2s_lld.c

I used a couple of dirty tricks to make it work the way I wanted for my application, but it basically follows the same setup steps as documented in the nRF52840 manual.

The nRF52840 doesn't have any hardware tricks in it for handling ADPCM (as far as I know) so any work Nordic would do here would likely involve porting an ADPCM decoder library so that it builds with the SDK. ADPCM is probably simpler to implement than Opus meaning it would require less processing overhead, but I think Opus gives you better compression.
Cancel
Vote Up +2 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 wpaul over 6 years ago in reply to Matthew K

Look at the I2S section of the nRF52840 product specification document. You need to configure two things: the MCLK frequency, and the left/right clock (LRCLK) divisor, which yields the LRCLK frequency. Typically both the MCLK and LRCLK must be supplied to the I2S codec chip. (It looks like the MAX98357A doesn't need MCLK, but the CS4344 does.) Nordic does not let you choose arbitrary values for these things: you get to choose from a specific set of MCLK and LRCLK settings. There's some tables which show you some possible values. I chose values that yielded 15.625KHz for the LRCLK:

MCLK == 32MHz divided by 8 == 4MHz

LRCLK divisor == 256

4000000 / 256 == 15625

This effectively also sets the audio sample rate, so when you encode your audio samples, you have to do it at this same rate in order for them to play correctly.

You can choose different values for a higher sample rate -- the higher the sample rate, the higher the maximum audio frequency you can sample. The max audio frequency is usually half the sample rate, so with 15.625KHz sample rate, the audio frequency range is 7812.5Hz. This is more than good enough for voice and 'ok' for music, though for music content you usually want 20KHz as the top end frequency. This is why 44.1KHz is used for sampling high quality audio (44.1KHz is basically two times 20KHz).

A higher sample rate means you end up with more samples per second though so you end up with bigger files. I chose 15.625KHz as a good compromise between audio quality and file size for my application.

The reason I used the CS4344 is because it's a stereo codec (it has two audio outputs, for left and right channel). However while the MAX98357A is monaural, it has a built-in amplifier. (My friend and I are planning to use an external stereo amplifier with the CS4344.) The I2S controller in the nRF52 supports both mono and stereo modes. With stereo, you have twice the audio samples, interleaved. The first sample is for the left channel and the second is for the right. The left and right channels are automatically routed to the right output of the codec chip by the hardware.

You can see the sample driver code that I wrote here:

https://github.com/netik/dc27_badge/blob/master/software/firmware/badge_840/i2s_lld.c

I used a couple of dirty tricks to make it work the way I wanted for my application, but it basically follows the same setup steps as documented in the nRF52840 manual.

The nRF52840 doesn't have any hardware tricks in it for handling ADPCM (as far as I know) so any work Nordic would do here would likely involve porting an ADPCM decoder library so that it builds with the SDK. ADPCM is probably simpler to implement than Opus meaning it would require less processing overhead, but I think Opus gives you better compression.
Cancel
Vote Up +2 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Matthew K over 6 years ago in reply to wpaul

...

Now I remember. I saw CS4344 on STM32 Disco boards.

If it is fine to reveal it, can I ask what external stereo you will use?

wpaul said:
while the MAX98357A is monaural, it has a built-in amplifier.

For my case, I will only use one 3W speaker for the prototype. In this case, I won't need a stereo codec like you, right?

Or do you have any recommendation for alternative I2S amps to play songs which can be mounted on breadboards? MAX98357A was the only I could find from online.

I'm waiting for the Nordic team's reply about ADPCM / Opus. Thanks for helping me while waiting for their answer.

-Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 wpaul over 6 years ago in reply to Matthew K

When you say "external stereo" you mean what amplifier, right? I think we settled on the Texas Instruments LM4880. We're making a small board, so we're trying to use small speakers (Dayton Audio CM20-14M-8).

Unfortunately we're probably going to have to get a prototype board fabricated for testing rather than breadboard the amplifier since most parts are SMT only.

Right now for my test setup, I have the CS4344 breakout board stuffed into one of the headers on the nRF52840 DK board, and I have my computer speakers plugged into the jack so I can hear properly. It's messy, but at least I can tell that I've got my IS2 code working right.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel