The RISC-V coprocessor unlocks a new range of options and possibilities for users of the nRF54L Series. It can work as a separate system inside the SoC, either doing time sensitive/CPU intensive tasks, or be used to run predefined SoftPeripherals, to add new hardware/like features to the system.
Although using the coprocessor does not incur any additional license or silicon costs, developers should consider a few things about the memory layout at the main architectural level. We will discuss and explain them in this blog post, so you can get the most out of your Nordic SoC.
Sharing code memory between the CPU and the FLPR
The first thing we need to mention is that your project must have two separate build images, one for the application processor and another for the FLPR. The instruction sets are different, and you don’t want the application processor reading and trying to execute the code of the FLPR, or vice versa. You also want to make sure you have enough non-volatile memory to store both images together.
So it’s important that you reserve some memory for the FLPR code in the Devicetree of your main application, with some lines like this:
/ {
soc {
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
cpuflpr_code_partition: image@165000 {
/* FLPR core code partition */
reg = <0x165000 DT_SIZE_K(96)>;
};
};
};
};In this case, we are reserving 96 KB at the end of the non-volatile RRAM for the binary image of the FLPR, and leaving the remainder of the 1.5 MB for the binary image that will run in the application core. This change ensures the code of both images does not overlap or access each other when the final binary is merged.
The image that will run in the FLPR will use a build target ending in cpuflpr, while the image for the application CPU will use the cpuapp target you are probably already familiar with. For example, for the nRF54L15 DK boards, these will be nrf54l15dk/nrf54l15/cpuflpr and nrf54l15dk/nrf54l15/cpuapp, respectively.
As a general rule, you should not reserve more than 96 KB to the FLPR. If you are wondering why, let’s take a look at the relevant Memory section of the documentation, where we find the following figure:

The figure shows the AMBA interconnect (AMBIX). Its main job is to coordinate access to the RAM banks. It has a matrix that handles bus arbitration, and uses a round-robin bus Manager arbitration algorithm.
We can see the application processor and the FLPR at the top of the diagram (in black), and 3 RAM banks on the right side: a single RRAM block (non-volatile), and two static RAM00 and RAM01. All 3 are available to the application CPU and the FLPR by the AMBIX.
Now, please notice there is only one RRAM bank. If the application CPU and the FLPR try to fetch instructions at the same time, there will be a bus contention, and the AMBIX will arbitrate who gets access to it and who waits. This can cause unpredictable glitches in the execution time of the FLPR instructions. To mitigate this problem, the FLPR should fetch and execute the instructions from RAM, preferably RAM01. As there are two different RAM banks, the AMBIX can guarantee the application CPU access to RRAM and RAM00, and the FLPR to RAM01 at the same time, without any contention, ensuring a correct instruction execution flow.
So we must make a few more changes to the Devicetree overlay to address this:
/ {
soc {
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
cpuflpr_code_partition: image@165000 {
/* FLPR core code partition */
reg = <0x165000 DT_SIZE_K(96)>;
};
};
cpuflpr_sram_code_data: memory@20028000 {
compatible = "mmio-sram";
reg = <0x20028000 DT_SIZE_K(96)>;
#address-cells = <1>;
#size-cells = <1>;
ranges = <0x0 0x20028000 0x18000>;
};
};
};
&cpuapp_sram {
reg = <0x20000000 DT_SIZE_K(160)>;
ranges = <0x0 0x20000000 0x28000>;
};
&cpuflpr_vpr {
status = "okay";
execution-memory = <&cpuflpr_sram_code_data>;
source-memory = <&cpuflpr_code_partition>;
};On top of the reserved-memory section, we must now add a cpuflpr_sram_code_data entry of the same size (96 KB, in RAM01) and reduce the size of the cpuapp_sram entry to ensure the application CPU does not use it.
The final piece of the puzzle is the cpuflpr_vpr entry. This puts two and two together and will be used by the system initialization code, which will copy the binary image from RRAM to RAM (with the source-memory information), and then start the FLPR from RAM (with the execution-memory information).
Don’t worry if it sounds too complicated. As long as the memory is mapped correctly and has the correct size, the multi-image build will deal with all the hassle, and you will end up with code running on the application CPU and the FLPR at the same time, without interfering with each other.
There is an alternative to apply all these configuration changes, in a simpler way: using code snippets. If you select the nordic-flpr snippet in the build configuration, the RAM01 memory will be automatically reserved and configured for the FLPR. It might help to keep your DTS overlay smaller, and focused on your own board changes.
Unless you are doing a very basic application, you will need some way of exchanging data between the application CPU and the FLPR. Let’s see what we can do in that area.
Sharing data memory between the CPU and the coprocessor
Developers with experience in Zephyr RTOS are probably familiar with the challenges of accessing variables or memory buffers concurrently in systems of multiple threads. When two or more threads need to use a variable, they usually load it into a register first, do something, and then store the result back into memory. The problem is how to ensure all threads do so in such a way that no thread is caught using an outdated value, or stores a result that overwrites a previous result, corrupting the information.
There are a few handful of system resources that Zephyr provides to deal with such scenarios, like Mutexes and Semaphores, and plenty of bibliography and examples on how to use them. Internally, these mechanisms also rely on using a common memory region for storing variables and accessing it in an atomic way. As all threads run in the same CPU core, one at a time, using the same set of buses to access the memory, the threads' synchronized access is nice and tidy.
However, when you use the application processor and the FLPR, you face a different scenario, as there will be two different threads running at the same time, and in different memory spaces. If you need to share data between a thread running on the application CPU and a thread running on the FPLR core, you should use the IPC service.
This needs to be defined in both the application processor and the FLPR, with a DTS overlay like this:
/ {
soc {
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
sram_rx: memory@20018000 {
reg = <0x20018000 0x0800>;
};
sram_tx: memory@20020000 {
reg = <0x20020000 0x0800>;
};
};
};
ipc {
ipc0: ipc0 {
compatible = "zephyr,ipc-icmsg";
dcache-alignment = <32>;
tx-region = <&sram_tx>;
rx-region = <&sram_rx>;
mboxes = <&cpuapp_vevif_rx 20>, <&cpuapp_vevif_tx 21>;
mbox-names = "rx", "tx";
status = "okay";
};
};
};
&cpuapp_vevif_rx {
status = "okay";
};
&cpuapp_vevif_tx {
status = "okay";
};
Because it must be accessible by both the application CPU and the FLPR, it must be defined in both Devicetree overlays. With one difference: the TX and RX regions should be swapped. This simple approach gives developers a flexible way to design protocols and methods to exchange data, and use the memory in the most convenient way for their application.
Closing
The RISC-V coprocessor of the nRF54L Series can be used to increase the processing power and capabilities of existing designs, without adding significant costs or requiring major architectural redesigns.
However, to fully realise these benefits, careful attention to memory usage and layout is important. By following the recommended Device Tree configurations or utilizing the nordic-flpr snippet, you can simplify the development process and maintain efficient operation between the application CPU and the FLPR.
