Embedded software for IoT applications often requires a high degree of optimization, especially around fitting the image into constrained Flash/RAM space and reaching tight power consumption targets.
In 2018 Nordic released the nRF Connect SDK, initially with support for cellular IoT and later adding support for all other wireless technologies within Nordic's product portfolio such as Bluetooth Low Energy, Thread, Matter and most recently Wi-Fi. The nRF Connect SDK is the recommended and future-proof solution for all new projects based on Nordic products.
Table of Contents
Introduction
Customers often ask why Nordic created the nRF Connect SDK, and why it is built around Zephyr RTOS. Although that is outside of the scope of this blog, the key driver was the need to have a scalable and unified SDK platform that can support a growing number of hardware and wireless technologies, with a high degree of configurability and code re-use. More details and insights on this topic can be found here:
- DevZone blog post nRF Connect SDK and nRF5 SDK statement
- Embedded World 2021 session Understand the nRF Connect SDK
- Zephyr Developer Summit 2022 Keynote: Zephyr as the Foundation for a Microcontroller SDK
- Webinar Future-proofing IoT development with the nRF Connect SDK
The most common technical questions and largest concerns that customers have with nRF Connect SDK, especially those coming from nRF5 SDK, are related to resource needs (Flash and RAM) and power consumption. Because nRF Connect SDK is based on an RTOS it often leads to a premature conclusion that the application firmware is larger and more power hungry. While that can be the case for some applications, on the vast majority of applications both the firmware size and power consumption can be on par, or even better, than when utilizing the nRF5 SDK. The main reasons behind this statement are the modularity and scalability offered by nRF Connect SDK, that enable an extremely high level of customization and optimization of software components, as well as the low-power nature of Zephyr RTOS that was created primarily for ultra-low power and resource constrained devices.
In this blog post we will go through a non-exhaustive technical analysis of nRF5 SDK and nRF Connect SDK with regards to memory footprint and power consumption, to help demystify some of the abovementioned misconceptions. While doing so, we aim to cover some fundamental differences between the SDKs and give important guidance on how you can optimize nRF Connect SDK based applications.
Hardware and software
The information and data used to create this blog post were obtained using nRF5 SDK v17.1.0, nRF Connect SDK v2.2.0 and nRF52 DK (target nrf52dk_nrf52832).
Memory footprint
Going from a bare-metal implementation to an RTOS based implementation brings additional Flash and RAM needs, but how much and can it be reduced?
Starting with the basics - Blinky
Every developer starts learning a new programming language by writing a "Hello World" application. In the embedded space that is called "Blinky", an application that simply toggles a pin at a set interval, which then drives a LED to turn ON and OFF.
Both nRF5 SDK and nRF Connect SDK have their own Blinky, documented here and here respectively. The sample includes only the bare minimum functionality to perform the simple task of blinking a LED, for which a wireless stack is not necessary.
If we build these two samples out of the box these are the figures we get for Flash and RAM from the build log, in KB.
SDK | Flash (KB) | RAM (KB) |
---|---|---|
nRF5 SDK | 2.0 | 4.0 |
nRF Connect SDK | 18.9 | 5.3 |
By taking these figures at face value one would draw the conclusion that the humble Blinky needs 9x more Flash on nRF Connect SDK than nRF5 SDK. It is not a surprise that a bare-metal Blinky requires less Flash and RAM than one that is RTOS-based, and such a simple application is where the RTOS overhead is more noticeable. For larger applications the RTOS impact relative to the size of the entire firmware will be much smaller, and in many cases almost negligible, especially considering all the functional benefits that it brings for developers.
However, this simple example also shows one of the key differences between nRF5 SDK and nRF Connect SDK, the fact that the samples on nRF Connect SDK are optimized for debugging, not for size or power consumption. In practice, the samples have features enabled by default that are not necessary for a production build, such as CONFIG_DEBUG that builds a kernel suitable for debugging, and CONFIG_BOOT_BANNER that outputs a banner to the console device during boot up.
The nRF Connect SDK documentation acknowledges that optimization is a fundamental part of application development, and dedicates a section to this topic, both for memory footprint as well as power consumption. So let's go ahead and apply some memory footprint optimizations on the Blinky sample and see where it gets us.
Optimizing Blinky memory footprint
A handy Minimal footprint sample offers a reference point for reducing ROM footprint. It contains multiple project configuration files that enable and disable various bits and pieces of functionality. The project file common.conf is the baseline for a minimal ROM implementation. As a first step we will copy those configurations into Blinky prj.conf file except CONFIG_GPIO=n which is required for Blinky. The prj.conf will look as below.
CONFIG_GPIO=y
# Drivers and peripherals
CONFIG_I2C=n
CONFIG_WATCHDOG=n
CONFIG_PINCTRL=n
CONFIG_SPI=n
CONFIG_SERIAL=n
CONFIG_FLASH=n
# Power management
CONFIG_PM=n
# Interrupts
CONFIG_DYNAMIC_INTERRUPTS=n
CONFIG_IRQ_OFFLOAD=n
# Memory protection
CONFIG_THREAD_STACK_INFO=n
CONFIG_THREAD_CUSTOM_DATA=n
CONFIG_FPU=n
# Boot
CONFIG_BOOT_BANNER=n
CONFIG_BOOT_DELAY=0
# Console
CONFIG_CONSOLE=n
CONFIG_UART_CONSOLE=n
CONFIG_STDOUT_CONSOLE=n
CONFIG_PRINTK=n
CONFIG_EARLY_CONSOLE=n
# Build
CONFIG_SIZE_OPTIMIZATIONS=y
Now we can trigger a build and observe that the Flash use has decreased to 14.4KB, a reduction of over 4KB compared to the initial figures.
By default, our nRF Connect SDK Blinky application has two threads, main and idle. For such a simple application we can live without multithreading which allows us to further optimize by disabling multithreading. This is not a generally applicable optimization as most samples will require multithreading to run wireless stacks and other functionality. Multithreading can be disabled by adding these configuration options at the end of prj.conf file.
CONFIG_MULTITHREADING=n CONFIG_KERNEL_MEM_POOL=n
Because the idle thread is no longer available we need to make a small modification in main.c to replace k_msleep with k_busy_wait.
/* 1000 msec = 1 sec */
#define SLEEP_TIME_MS 1000
/* 1000000 usec = 1 sec */
#define SLEEP_TIME_US 1000000
(...)
while (1) {
ret = gpio_pin_toggle_dt(&led);
if (ret < 0) {
return;
}
//k_msleep(SLEEP_TIME_MS);
/* Use k_busy_wait for 1 second */
k_busy_wait(SLEEP_TIME_US);
}
Now if we build the project we get 10.35KB of Flash, a reduction of a further 4KB. As a bonus we can observe that RAM has also gone down to 4.75KB. To reduce the RAM further we can decrease the size of some of the buffers. The default size for the ISR stack is 2048 bytes, we can reduce it to 1024 by adding this configuration option at the end of the prj.conf file.
CONFIG_ISR_STACK_SIZE=1024
A new build shows that now the RAM required is 3.75KB, which is less than what is required on the equivalent nRF5 SDK sample.
The table below summarizes the figures from nRF5 SDK and nRF Connect SDK, before and after optimizing the Blinky sample.
SDK | Flash (KB) | RAM (KB) | |
---|---|---|---|
nRF5 SDK | 2.0 | 4.0 | |
nRF Connect SDK | Without optimizations | 18.9 | 5.3 |
With optimizations | 10.35 | 3.75 |
Through this optimization example we have demonstrated that by leveraging the modularity and configurability of nRF Connect SDK, the perceived overhead of having an RTOS is smaller than expected. Although simple, these and other optimizations can be applied to any application based on nRF Connect SDK so that you can reach the desired resource consumption targets.
Adding wireless - Beaconing
While Blinky is a good example to to demonstrate some of the basics, it is not a real-world application use case. Nordic's customers are developing cutting-edge IoT applications, hence there is some of form of wireless connectivity being used. One of the most common use cases for Bluetooth Low Energy applications is beaconing, which is simply sending out information through advertisements.
Both nRF5 SDK and nRF Connect SDK have beaconing samples, documented here and here respectively. If we build those samples without modifications we get the following results.
SDK | Flash (KB) | RAM (KB) |
---|---|---|
nRF5 SDK | 109.8 | 8.8 |
nRF Connect SDK | 89.5 | 20.4 |
Now the result is more interesting and perhaps unexpected. The nRF Connect SDK sample requires less Flash than the nRF5 SDK equivalent. The nRF5 SDK sample was built with S112 softdevice which is a Memory-optimized Peripheral-only Bluetooth Low Energy protocol stack. So how can it be that an RTOS-based application with the exact same functionality as a bare-metal implementation, require significantly less Flash usage? There are two key reasons to explain this.
The first reason is that the nRF5 SDK uses the softdevice stack which is a binary blob, meaning that the entire binary gets linked into the final image, regardless of which features are actually used by the application. If the application is only sending out beacons then it's not using the peripheral role functionality, nor GATT. But those features are included in the softdevice binary regardless, and they are taking Flash space even if unused by the application.
The second reason goes back to the configurable nature of nRF Connect SDK that was described and demonstrated earlier. If peripheral role is not required (or any other arbitrary piece of functionality), it can simply be left out of the build. A quick look into the .config file in the build\zephyr folder sheds some light into this.
# CONFIG_BT_RPC is not set # CONFIG_BT_RPC_STACK is not set # CONFIG_BT_CENTRAL is not set # CONFIG_BT_PERIPHERAL is not set # CONFIG_BT_OBSERVER is not set CONFIG_BT_BROADCASTER=y # CONFIG_BT_EXT_ADV is not set
As seen here, only broadcast role is being configured, while central, peripheral and observer roles are not. This allows the build to pull-in only the pieces of functionality that are really required for the application, while leaving out the rest, and thus resulting in a more optimized image size, when compared with what was possible with nRF5 SDK.
Several samples in nRF Connect SDK have a minimal project configuration file that can be used instead of the default one. When supported this will also be mentioned in the sample documentation, as for example here for the Bluetooth: Peripheral LBS sample.
Optimizing Beacon memory footprint
Although we get a good Flash figure for the beacon sample in nRF Connect SDK out of the box, the RAM is still 2x higher than on nRF5 SDK and even within the Flash itself there is room for optimizations. Let's repeat the same exercise as when optimizing Blinky and bring over the configurations from the minimal sample, which results in the following prj.conf file.
CONFIG_BT=y
CONFIG_BT_DEBUG_LOG=y
# Drivers and peripherals
CONFIG_I2C=n
CONFIG_WATCHDOG=n
CONFIG_PINCTRL=n
CONFIG_SPI=n
CONFIG_SERIAL=n
CONFIG_FLASH=n
# Power management
CONFIG_PM=n
# Interrupts
CONFIG_DYNAMIC_INTERRUPTS=n
CONFIG_IRQ_OFFLOAD=n
# Memory protection
CONFIG_THREAD_STACK_INFO=n
CONFIG_THREAD_CUSTOM_DATA=n
CONFIG_FPU=n
# Boot
CONFIG_BOOT_BANNER=n
CONFIG_BOOT_DELAY=0
# Console
CONFIG_CONSOLE=n
CONFIG_UART_CONSOLE=n
CONFIG_STDOUT_CONSOLE=n
CONFIG_PRINTK=n
CONFIG_EARLY_CONSOLE=n
# Build
CONFIG_SIZE_OPTIMIZATIONS=y
A new build gives us 86.4 KB of Flash (-3 KB) and 20.3KB of RAM (-0.1 KB). The reduction in Flash is similar to what was obtained with Blinky but as the overall firmware image is larger then the optimizations have a relatively smaller impact. The Bluetooth stack requires multithreading to be enabled so that is not an optimization option for this project, and the firmware size already allows the application to fit into the smallest nRF52 devices (nRF52810 or nRF52805) that have 192KB of Flash.
The next thing we can do is to disable Bluetooth debug log by setting CONFIG_BT_DEBUG_LOG=n at the top of the project file. This is no longer useful because serial communication has been disabled (CONFIG_SERIAL=n) and we should see some reduction in both Flash and RAM. After triggering a new project build we get 69.2 KB of Flash (-17.2 KB) and 17.99 KB of RAM (-2.31 KB).
We can still go lower on the RAM by adjusting the sizes of various stacks while ensuring that our application stays functional. To do so we need to evaluate stack sizes for various threads which can be easily accomplished with nRF Debug Thread Viewer. If we allow the application to run for a few seconds this is what the Thread Viewer shows regarding stack usage for each thread.
It is clear that the default stack sizes are over-dimensioned for the beacon sample and there is room to reduce RAM. We can override the default stack sizes by adding the following options to the project configuration file.
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=512
CONFIG_IDLE_STACK_SIZE=64
CONFIG_MAIN_STACK_SIZE=256
CONFIG_PRIVILEGED_STACK_SIZE=0
CONFIG_MPSL_WORK_STACK_SIZE=256
CONFIG_ISR_STACK_SIZE=128
CONFIG_BT_HCI_TX_STACK_SIZE_WITH_PROMPT=y
CONFIG_BT_HCI_TX_STACK_SIZE=1024
After building with these additional configurations we get no change in Flash but RAM goes down to 12.3 KB (-5.69 KB).
The table below summarizes the figures from nRF5 SDK and nRF Connect SDK, before and after optimizing the Beacon sample.
SDK | Flash (KB) | RAM (KB) | |
---|---|---|---|
nRF5 SDK | 109.8 | 8.5 | |
nRF Connect SDK | Without optimizations | 89.5 | 20.4 |
With optimizations | 69.2 | 12.3 |
Similarly, with the Blinky optimization, we have demonstrated how even a simple wireless sample can be optimized for memory footprint with the help of nRF Connect for VS Code, which offers advanced debugging tools that give additional insights into various firmware metrics.
Power consumption
Reducing power consumption is of extreme importance for IoT applications. Whether to meet regulatory targets or reach battery life goals, using less energy generally has a positive impact.
When moving from bare-metal into an RTOS-based implementation, there is a concern that power consumption will tick upwards. The traditional background thinking is that there is more code to run (for example, the RTOS scheduler), which means more CPU cycles that will cause power consumption overhead. While this is technically true, the question is if it matters. For one thing, computing efficiency continues to increase as devices get smaller, faster, and more integrated. But most importantly is that the RTOS also offloads code from the application, for example scheduling and power management, which not only allows developers to write cleaner and more optimized applications but also accelerates the development cycle and time-to-market.
It is also true that for IoT applications, which implies wireless connectivity, the biggest contributor to the overall power footprint is the radio active power consumption (RX/TX), which is independent of whether the firmware is bare-metal or RTOS based. This means that a small increase in CPU cycles will have a potentially negligible effect on the overall power consumption, all else being equal.
On low power wireless devices (e.g. Bluetooth LE heart rate sensors, Thread sleepy end device for smart home), the overall power consumption is typically a balancing act between sleep mode and active radio power consumption. In other words, it's about reducing the radio duty cycle as much as possible, not necessarily reducing CPU cycles. In fact, a power optimized application may chose to run more code while the radio is gorging energy so that it doesn't need to wake-up separately just to run some code, as waking up from sleep mode has a penalty in itself.
It is not straight-forward doing an apples to apples comparison of power consumption between SDKs as the samples have been written in different ways to be best adapted to each SDK, and as earlier mentioned, the nRF Connect SDK applications are not optimized for power. Taking once again the beacon sample from nRF Connect SDK as an example, one would be negatively surprised to measure the sleep current out-of-the-box and find that it's over 1mA.
This elevated current between beacons is caused by the serial logging being enabled by default, as the samples are optimized for debugging. This is noted at the very top of the power optimization section in nRF Connect SDK documentation. If we follow the guidance in the documentation and simply add CONFIG_SERIAL=n
to the project configuration file then it allows the device to go into sleep mode in between the beacons which takes the power consumption to single digit micro-amp level.
To get comparable data between nRF5 SDK and nRF Connect SDK we can look at parts of samples that are dominated by radio activity to observe the potential impact of an RTOS compared to bare-metal. A typical Bluetooth LE peripheral device is either advertising or connected to a central device, and it can be in either one of those two states for long periods of time. When it comes to low power devices it's most common to look at average power consumption to get a good sense of the power budget and be able to e.g. estimate battery size and/or lifetime. It also helps eliminating any fast power transients that may influence readings when looking at samples with short duration.
Let us then measure the power consumption during advertisements and connections using samples from both SDKs. On all the below power consumption comparisons the nRF5 SDK measurements are on the left, and nRF Connect SDK measurements are on the right.
Advertising
For measuring advertisement power consumption we took the Beacon samples that were used earlier in this blog post for memory footprint comparison. The samples simply transmit non-connectable advertisements at regular intervals. Before measuring average power consumption, we need to ensure that all relevant behavior between the samples is equal. In this particular case, the TX power has been configured on both samples to be the same, as well as the data payload. We need to ensure that the advertisement interval is also the same, as that has the highest impact on average power consumption. Using the PPK2 we can measure that both samples are advertising at 200ms intervals.
To get a reading for average power consumption we select a trace with ~6s duration. As seen below the nRF Connect SDK sample is taking less than 1uA more that the nRF5 SDK sample, 57.58 uA vs 56.78 uA.
Connection
For the connection measurement, we took the Bluetooth Blinky application, which allows an LED to be controlled with the nRF Blinky mobile application. The sample is documented here for nRF5 SDK and here for nRF Connect SDK. Once a central (mobile phone) connects to the device running the sample, they will exchange one empty packet at each connection interval. If no other user action (e.g. controlling LED from mobile phone or pressing the button on the board) that is all that will happen, so it is a good way to compare the samples by looking at the average power consumption over a period of time.
Similarly to the Beacon application, we need to ensure that all relevant behavior between the samples is equal. Since they are both connecting to the same central, which dictates the connection parameters, it's safe to assume that they should be the same. For full certainty, we can use the PPK2, and as observed below, the connection interval is 45ms for both samples, as expected.
With this very same sample, we get the following readings if we select a part of the trace with about 5 seconds duration, where the devices are just keeping the connection established by exchanging empty packets. The results show that the nRF Connect SDK sample is taking just 2uA higher power consumption than the nRF5 SDK sample, 98.23 uA vs. 96.29 uA.
The conclusion is that the additional processing required for the RTOS has a negligible effect even on applications with low radio and CPU duty cycles. That impact only gets smaller on more processing-heavy applications as the overall power consumption will also be higher.
Closing
With this blog post, we aimed to provide a better understanding of some of the fundamental differences between the nRF5 SDK and nRF Connect SDK related to memory footprint and power consumption. Furthermore, examples were used to show that any perceived performance penalties associated with nRF Connect SDK can easily be overcome by leveraging the advanced features offered by the SDK, and optimizing to meet specific application requirements. There is no one-size-fits-all when it comes to optimization since each application has its own individual needs, but the generic showcases coupled with additional documentation in the nRF Connect SDK can guide your optimization path to reach the desired goals.
Most samples in the nRF Connect SDK are configured for easy development and debugging, which means that many features are enabled, and generous amounts of memory gets allocated. This is by choice, and designed for a workflow where the application is developed to include the needed functionality, and optimized later.
If you are just getting started with nRF Connect SDK, whether coming from nRF5 SDK or being completely new to Nordic, we highly recommend taking the nRF Connect SDK Fundamentals course in the Nordic Developer Academy as well as the Cellular IoT and Bluetooth Low Energy Fundamentals courses if you are working or planning to work with those wireless technologies.
Feel free to leave a comment and let us know about your experience with nRF Connect SDK, especially if you have been using nRF5 SDK in the past.
Top Comments