Powering ultra-low-energy edge AI with custom Neuton models

Powering ultra-low-energy edge AI with custom Neuton models

Edge AI was once out of reach for tiny, battery-powered IoT devices, but not anymore. With the release of custom Neuton models, user-generated ultra-tiny ML models for CPU-run edge AI, Nordic is advancing the capabilities of all nRF54L Series SoC. This makes them a viable option for a variety of AI tasks that were previously reserved for only the most powerful hardware. Custom Neuton models are built from your own data, collected using your own device and tailored to your specific use case, enhancing your competitive edge. Neuton models are ultra-tiny and highly efficient, making them the first choice for ultra-low-power AI on wireless IoT devices. In this blog post, we will provide an introduction to the technology and outline the process for creating custom Neuton models and integrating them in an nRF Connect SDK application.

Our take on edge AI

The term “edge AI” encompasses every application of AI algorithms and neural networks that run locally on the device, rather than in a centralized cloud server. This includes everything from edge servers and network infrastructure, through powerful PC hardware and smartphones, down to low-power IoT devices. For Nordic, edge AI means AI algorithms and neural networks running on our ultra-low-power SoCs. For the remainder of this blog, we will refer to this simply as “edge AI.”

Why Neuton models?

As a developer, the two biggest hindrances to using edge AI in your product are

  1. ML models are too large for the memory of your chosen microcontroller.

  2. Creating custom ML models is an inherently manual process that requires a high level of data science expertise to do well.

Well, not anymore.

Generating custom Neuton models is made easy by the Nordic Edge AI Lab, our online tool to generate edge AI models a fraction of the size of traditional frameworks, such as TensorFlow Lite. This is enabled by our patented neural network framework, which grows the network neuron by neuron, without user input. For developers, this means that to train a highly optimized, fast and accurate ML model, all you need is a good dataset. These custom Neuton models can run on any Nordic SoC, like our flagship nRF54L15, and are so efficient that they also fit well within the limitations of the most space-constrained ones, like the nRF52805, taking up only a few kilobytes of non-volatile memory (NVM). This allows for adding edge AI capabilities to applications where running TensorFlow Lite models either consumes too much memory, occupies all of your NVM, or is too slow and inefficient, occupying your CPU and draining your battery.

Which use cases are Neuton models suited for?

For wearables like smart rings and watches, developers can create custom Neuton models to detect activities such as walking, running, step counting, or sleeping, recognize hand or finger gestures for smart control or automation, and monitor biometrics, including heart rate, heart rate variability, and blood oxygen. In industrial environments, Neuton models can be built to detect anomalies in sensor data, enabling predictive maintenance.

In short, Neuton models support time-series data from sensors like accelerometers, IMUs, PPG sensors (Photoplethysmogram, the blinky diodes heart rate sensor on the back of your smartwatch), temperature sensors, various electrical measurement sensors (for example, measured voltage using an ADC) and many more. As long as the output from the sensor can be sampled periodically as a value over time, the Nordic Edge AI Lab can accept it and create a model from it.

What makes Neuton models different from other AI models?

Other frameworks for edge AI are numerous and have been around for a long time. One key pain point of LiteRT (aka TensorFlow Lite for microcontrollers) and similar frameworks is that other frameworks still rely on the developer to have knowledge about how to organize a neural network, selecting the number of neurons and network depth manually, and then compressing and optimizing the model after the fact to make it fit on the desired target device. This approach yields models that are less efficient in terms of code size, execution speed, and power consumption, and are highly dependent on the skill and knowledge of the one doing the optimization work.

Automated model creation and training process

Neuton, on the other hand, handles all this automatically. Instead of statically defining the parameters of your network from the start, Neuton grows the network automatically, and for every new neuron, it evaluates if this improves the model's performance. Neurons that do not add value are immediately removed to conserve resources. This brings multiple benefits to the developer:

  1. No manual selection of neural network structure, parameters, or architecture

  2. No resource-intensive automatic neural architecture search (NAS)

  3. The smallest code size possible, with no need for compression or optimization

  4. Faster execution, which means lower power consumption

Easy integration

Neuton models are downloaded from the Edge AI Lab as a plain C library, with no external dependencies or special runtime requirements. They are ready to be integrated into any application running on the main application core (CPU) of the nRF52, nRF53, nRF54L, and nRF54H Series SoCs, or the nRF91 Series SiPs.

The integration of models is facilitated by the introduction of the Edge AI Add-On for nRF Connect SDK, which incorporates the nRF Edge AI API, enabling on-device inference from your application. In addition, the Add-On comes with three application examples, demonstrating how to interact with a Neuton model for three types of AI operations: regression, classification, and anomaly detection.

Can I use Neuton models with an NPU?

A Neural Processing Unit (NPU) is a dedicated processing core, with an architecture especially designed to run neural networks. The increase in speed and efficiency of using an NPU instead of running your edge AI model on the CPU, leads to reduced latency, more responsive applications and reduced power consumption. For these reasons, they are often referred to as “AI accelerator” hardware.

NPUs tend to be more effective at accelerating larger and more complex neural networks. When the networks get smaller and more efficient, the benefits of accelerating them diminishes. Neuton models are by default much smaller and more efficient than even the best optimized TensorFlow Lite models, meaning that running a Neuton model on an NPU will not lead to any substantial improvement to its performance. That is why, when Nordic launches its first product with an integrated NPU in 2026, it will not support Neuton models but rely on open frameworks like TensorFlow Lite and be suitable for a range of more advanced AI use cases.

How to create a custom Neuton model

Data collection

Collecting a good dataset for training your model is easily the most involved part of building an edge AI application.

How much data is enough?

How many samples with how many datapoints from how many test subjects highly depends on what use case and what stage of development you are in. For a simple internal proof-of-concept (POC) of a gesture recognition (i.e., classification) use case, you might only need one 2 to 5-minute sample, from a few different test subjects for each class. Additionally, the idle and random data classes are necessary to filter out any movement not associated with a specific class, which would otherwise lead to “overfitting” the model.

For a more representative prototype, 5 to 10 minutes of samples for each gesture would most likely be needed. In addition, these samples should be taken from at least 5 to 10 different test subjects, to cover some level of natural variation.

When creating a final production model for use in a commercial product, the requirements for a high-quality dataset increase significantly. A good dataset for a production-quality model usually requires a representative selection of the population, meaning from 20 to hundreds of test subjects. However, 5 to 10 minutes of samples from each of the test subjects should still be enough.

Key considerations when collecting data

Collect all samples using the same sampling frequency. The frequency to select depends on the sensor you have and the pattern you are trying to recognize.

In a single session, only one sample should be created. For a gesture recognition use case, this means one person, doing one gesture for the entire timespan, with no interruptions. The first few seconds and last seconds of the sample should also be excluded to maintain consistency throughout the whole sample.

The samples are collected as CSV files, containing the sensor's outputs in separate columns, along with an identifier column that indicates which class the sample is associated with. Each row in the CSV will represent one data point, so when sampling at, for example, 50 Hz, you will have 50 rows per second of data collection. All samples are then concatenated into a single CSV file after being collected separately, collectively making up the dataset. The first row of the dataset CSV should contain a header that describes what data is stored in each column.

For discrete data, like non-continuous gesture recognition, it is also important that the data is centered within the sampling windows.

Edge AI Lab workflow

Uploading data

When you have collected a good dataset, navigate to ai.lab.nordicsemi.com and log in. Create a new solution and select the task type. Three task types are currently supported: regression, classification and anomaly detection. A classification model is a supervised learning model, meaning it is trained on samples of labeled data, where each label represents a distinct class. Then the trained model will identify which class any new data belongs to. An example of this is gesture recognition. Anomaly detection models are a type of unsupervised learning model. They will be trained on a set of data that does not contain anomalies, and the trained model will identify any data that falls outside the normal range. An example of this is predictive maintenance, where the model is trained only on data from working machines, but is tasked with identifying machines that are about to fail. Regression is also a supervised learning model, where it is trained on labeled data to predict the value of a continuous variable, such as temperature.

Drag and drop your CSV-formatted dataset to upload it. After uploading, the interface will display an overview of the identified data in your dataset. In regression and classification models, you select “target column” for the column that acts as the label for the labeled data; for an anomaly detection model the data is unlabeled so no target column is used.

Preprocessing and model parameters

After uploading the data, you will proceed to the “training pipeline”. For most projects, it is necessary to enable signal processing and feature extraction here. Signal processing involves selecting your window, which determines the amount of input data used for each inference result. This will also be the minimum frequency of outputs from your inference operation. If you, for example, want to print a new result every second, you need to either input your sampling frequency and set the window to one second, or you need to input the number of rows in your dataset that equates to one second. The most important, however, is that you select a window size that can capture whatever event you’re trying to classify in its entirety within the window. You can also select overlapping windows or a time gap between the windows by selecting a different value for the sliding shift.

The feature extraction settings are used to select which features from your dataset will be used for training the model. Enabling “feature selection” allows the framework to automatically disregard features that do not contribute to the model's accuracy, thereby reducing the model's size and improving its efficiency. NB! If any frequency domain features are selected, your window size must equate to a power of 2 samples per window, between 128 and 2048.

“Data type” will be automatically detected based on the content of your dataset. You can manually override this, but remember that you will need to select the option that encompasses the “largest” datatype you use. I.e, if you have both int8 and float32 data in your dataset, you need to select float32; if you have int8 and int16, you need to select int16.

The final step before training is to select the model settings and target hardware. You once again select which task you are performing: regression, binary classification or multi-class classification.

The evaluation metric box lets you select the main view in the next step. However, all the metrics are always calculated, so you can easily switch which metric you want to evaluate on after the training has been completed.

By default the “weight and coefficients” setting will match your input dataset. It determines which format the weights and coefficients will be stored as in your model. This can be manually overridden, and selecting a lower bit depth can help minimize your final model size. For quantization settings, select the option matching your weights and coefficients. Output settings determine which format the probability score of each inference result is given. Floating point will provide you with the probability directly (0 to 1), while 8-bit and 16-bit will give an integer value within the range of the data type (0 to 255 and 0 to 65,535, respectively) that can be converted to a percentage value in your application if needed. For most projects, it is not necessary to set any training stop options. However, there are options to set limits for the maximum time used for training, as well as the maximum accuracy of the model.

For projects using current hardware, you will select Arm Cortex-M33 as your target. This is the application core found inside all the nRF91 Series SiPs, as well as the nRF53 Series and nRF54 Series SoCs. The Arm Cortex-M4 option is available for the nRF52 Series, and if you’re playing with a dusty old nRF51 Series development kit, you might want to select Arm Cortex-M0 as your target. This is not at all recommended for any new designs, though.

Then, you click the “Start Training” button, which provides an option to receive notifications by SMS when the training is complete.

Evaluating the resulting models

After training has completed, the “solution options” view allows you to select which variant of the models generated you wish to download for your project. All generated model variants are displayed on a chart, laying out each variant with accuracy on the X-axis and model size on the Y-axis. Often, it is possible to get a big reduction in model size by selecting a variant that is only slightly less accurate than the most accurate and usually largest variant. The estimated accuracy, as well as RAM and NVM usage for the selected variant, is displayed to the right of the chart.There is also a section for the model quality chart, feature importance, and confusion matrix, providing more in-depth analytics of the selected model variant for those with skills and experience in interpreting such information.

Integrating a Neuton model in your application

Edge AI Add-On

To ease the integration of Neuton models in your application, the Edge AI Add-On is available for the nRF Connect SDK. The add-on contains four main components: nRF Edge AI Library, Regression sample, Classification sample, Anomaly detection sample.

The nRF Edge AI Library contains the nRF Edge AI Runtime module, a C library with interfaces for initializing and running custom Neuton models generated in Edge AI Lab. It also contains the DSP module for data processing and a separate neural network module, which can run both floating-point and quantized AI models based on Neuton or other frameworks.

Installing the add-on

Go to https://nrfconnect.github.io/ncs-app-index/ to find the Edge AI Add-On from Nordic Semiconductor. Click the blue “Open for nRF Connect for VS Code” button, and click the pop-up in the browser to open VS Code. Then you allow the nRF Connect SDK for VS Code extension to open the URL when prompted in VS Code. There are also manual ways of installing the Add-On using either west and command line, or the nRF Connect for VS Code GUI, see the documentation for a detailed description.

Building an edge AI application

When building a new application, the easiest method is to use one of the firmware samples from the SDK as a reference and a starting point. This will show you how to call the relevant APIs to feed input data to, and decode the outputs from, a Neuton model. Three simple function calls handle these central tasks, all of which are part of the nRF Edge AI library’s nRF Edge AI runtime module:

  • nrf_edgeai_init() - Initializes the edge AI model’s runtime environment

  • nrf_edgeai_feed_inputs() - Feeds the raw data from your sensor to the Neuton model

  • nrf_edgeai_run_inference() - Performs inference on the input data, returning the predictions corresponding to your model type, and metrics for assessing the confidence of the predictions.

If you’re completely new to the nRF Connect SDK, it might be worth checking out the nRF Connect SDK Fundamentals course in Nordic Developer Academy. nRF Connect SDK Fundamentals is a self-paced hands-on online course focusing on learning the essentials of firmware development with the nRF Connect SDK.

Closing

By introducing custom Neuton models, Nordic makes edge AI available for a wider audience. Gone is the daunting task of manually building neural networks in TensorFlow with complicated Python toolchains, and the manual optimization, compression and conversion of the TensorFlow network into a LiteRT compatible model.

Embedded developers can now go directly from data collection to application integration, with little to no data science knowledge required. Additionally, the ultra-efficient models can run on the CPU of any Nordic SoC, regardless of the memory size. Meaning that no matter your skillset or which component you are using, custom Neuton models could be an option for you.

This blog post merely scratches the surface; we encourage you to dive deeper into the Neuton models documentation and related SDK samples. Happy modeling!