Edge Impulse Wrapper - without additional buffer

Hi,

we are currently trying to implement an edge impulse model (keyword spotting, 600 ms window @ 16kHz) on a nRF52832.

The problem is that the SoC's RAM is already very full and barely fits our audio slab buffer, the library itself and the stuff that's already on it.

It seems like the wrapper requires an additional internal buffer to hold the samples. This unfortunately doesn't fit and we would like to point directly to the audio buffer that already contains our raw samples.

I would be grateful if you could give us a hint on how to achieve that.

Thank you very much.

Edit 1: What I probably have to do is to drastically reduce the "audio slab" size and use the wrapper's buffer to accumulate the data using the "ei_wrapper_add_data()" function.
What creates a problem then, however, is that the wrapper uses a float array. I have space for my 600 ms window filled with int16 samples but not for a float32 array of the same length :/

Edit 2: It seems like the ei_wrapper_add_data() function only accepts data inputs of the window length or larger. If I am not mistaken, that means that I need space for at least 2x the window length (which I don't have). For me only the following scenarios would work:

1. Small memory slab for the input audio that gradually fills the wrapper's buffer. When full -> run classifier on buffer data
2. Using only the audio data slab of total size = window size. Wait until slab is full -> run classifier on slab data

I don't need any continuous streaming functionality right now. Therefore "Filling buffer -> running inference -> filling buffer -> ...." would work for me.

Top Replies

AHaug 11 months ago in reply to Nordix +1

Hi again, Here's some input from one of our engineers who's had a deeper dive into the EI SDK: --- What should be happening is that they have a ping-pong buffer to store one window (20ms I think…

Parents

0 AHaug 11 months ago

Hi,

The nRF52832 is as you say rather limited w.r.t flash and RAM so it is, as you say, required with some action to optimize either the load or the general footprint. The suggestions you make seems to be sound, albeit being brought down yet again by the available flash/RAM on the device.

I'm curious is the SoC choice something that comes from the scope of your project, i.e that you want to create this edge impulse application on a RAM limited device, or due to the availability of no other boards?

Kind regards,
Andreas
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Nordix 11 months ago in reply to AHaug

Hi,
we have a custom development board available right now that has not been designed with AI applications in mind. Therefore the SoC is under-dimensioned for many tasks that include DSP+NN processing. However, since we need to work with this particular board right now, it would be great if we could still get the edge impulse running on it somehow.

Right now I am trying to change the wrapper to

1. work with int16_t instead of floats
2. Constructing the necessary input signal to the run_classifier function manually and calling it directly.

But I don't know how promising this approach is.

Best regards, Nordix
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Nordix 11 months ago in reply to AHaug

Hi,
we have a custom development board available right now that has not been designed with AI applications in mind. Therefore the SoC is under-dimensioned for many tasks that include DSP+NN processing. However, since we need to work with this particular board right now, it would be great if we could still get the edge impulse running on it somehow.

Right now I am trying to change the wrapper to

1. work with int16_t instead of floats
2. Constructing the necessary input signal to the run_classifier function manually and calling it directly.

But I don't know how promising this approach is.

Best regards, Nordix
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 AHaug 11 months ago in reply to Nordix

I see, that makes sense.

I will check in with my colleagues and see they have any other thoughts regarding suggestions you could try investigate, but my thoughts are that your angle of attack seems sane. As long as the changes you make, such as modifying ei_wrapper_add_data to work with int16 instead of float, does not propagate into the rest of the api and/or library in any unexpected ways. I can't guarantee that it will work, but I can't see why it should not as of now.

I'll let you know about a status later this week w.r.t input from the mentioned discussion.

Kind regards,
Andreas
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AHaug 11 months ago in reply to AHaug

It's also worth mentioning that it may be worth investigating any suggestions/comments on the Zephyr discord forums as well as Edge Impulse forums as well while waiting for some input from us.

Kind regards,
Andreas
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Nordix 11 months ago in reply to AHaug

Thank you. I opened a topic in the Edge Impulse forum too:
https://forum.edgeimpulse.com/t/c-library-input-datatype-int16-float/11792/3
Looking forward to hearing back from you/your team!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AHaug 11 months ago in reply to Nordix
Hi again,

Here's some input from one of our engineers who's had a deeper dive into the EI SDK:

---

What should be happening is that they have a ping-pong buffer to store one window (20ms I think is EI's default window size) of raw incoming audio samples at 16bit, another buffer also windows size padded out to the next power of 2 (for calculating MFCCs), and a 3rd buffer to accumulate MFCCs in.

It appears that all of EI's signal processing is done in f32, so the 2nd buffer will need to be f32, as well as possibly the 3rd if they don't quantize on the fly.

If by slab buffer the OP means a buffer to hold 600ms of audio, that is completely unnecessary(unless the CPU can't keep up). But I don't know EI well enough how to restructure this. I don't think trying to re-do all the signal processing in int16 is going to work; the model is trained w/ the float implementation so it should be inferred with float as well.

So the flow should be to:

capture one window (20ms) of audio in ping buffer.

Switch DMA to the pong buffer (if it's a circular buffer, set the threshold interrupt to 50%).

Extract MFCCs on ping buffer, save them to MFCC buffer.

Repeat above, alternating ping/pong until 600ms is captured.

Perform inference.

This all depends on being able to calculate the MFCCs for a single window within a window's frame of time.

A short-cut could be use 8kfps audio. Cuts the audio buffers in half. Most voice is below 4K, and we've actually seen improvements using 8K audio, I think because the MFCCs comb filters are narrower. That's just anecdotal though.

---

This will unfortunately be close to the limit of what we can offer of support for this, since our knowledge is somewhat limited w.r.t the EI SDK and how it works.

Hope that this is helpful (not too obvious or abstract)

Kind regards,
Andreas
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Nordix 10 months ago in reply to AHaug

Hei Andreas,

thank you very much for your input and I apologize for the late reply.

Unfortunately, after the holidays, we don't have the resources anymore to continue our work on this and we have to omit running the NN at the edge on a nRF52. Maybe we'll get a bigger chip at some point or more time for trying to implement the approach you explained above.

Thank you very much again for your support.

Best regards, Nordix
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel