compiling for nrf5340 FPU

Hello,

I'm trying to do some calculations using the FPU, but it seems like software float routines are still being used.  I haven't been able to find any conclusive documents on setting this up, so if I've missed them please give me a pointer.

Using nRF Connect SDK 2.2.0, Zephyr build in vscode

prj.conf:

CONFIG_SERIAL=y
CONFIG_UART_ASYNC_API=y
CONFIG_I2C=y
CONFIG_DEBUG_OPTIMIZATIONS=y
CONFIG_DEBUG_THREAD_INFO=y
CONFIG_SPI=y
CONFIG_NRFX_SPIS2=y
CONFIG_FPU=y
CONFIG_FPU_SHARING=y

Test routine:

float accumulator = 1000.0;
printf("Performing 1 million divides.\r\n");
for (int i = 0; i < 1000000; i++)
{
    accumulator = accumulator / 0.9999999;
}

The test routine is not in a task.  It takes about 10 seconds to complete, which I would expect to be a bit faster, so I check the list file

zephyr.lst:

   10dd2:	4962      	ldr	r1, [pc, #392]	; (10f5c <cliRunCommand+0x2f4>)
   10dd4:	2001      	movs	r0, #1
   10dd6:	f000 fc6f 	bl	116b8 <palTracef>
                float accumulator = 1000.0;
   10dda:	4c61      	ldr	r4, [pc, #388]	; (10f60 <cliRunCommand+0x2f8>)
                for (int i = 0; i < 1000000; i++)
   10ddc:	e00b      	b.n	10df6 <cliRunCommand+0x18e>
                    accumulator = accumulator / 0.9999999;
   10dde:	4620      	mov	r0, r4
   10de0:	f7ff fb1e 	bl	10420 <__aeabi_f2d>
   10de4:	a34e      	add	r3, pc, #312	; (adr r3, 10f20 <cliRunCommand+0x2b8>)
   10de6:	e9d3 2300 	ldrd	r2, r3, [r3]
   10dea:	f7ff fc9b 	bl	10724 <__aeabi_ddiv>
   10dee:	f7ff fd81 	bl	108f4 <__aeabi_d2f>
   10df2:	4604      	mov	r4, r0
                for (int i = 0; i < 1000000; i++)
   10df4:	3501      	adds	r5, #1
   10df6:	4b5b      	ldr	r3, [pc, #364]	; (10f64 <cliRunCommand+0x2fc>)
   10df8:	429d      	cmp	r5, r3
   10dfa:	ddf0      	ble.n	10dde <cliRunCommand+0x176>

It uses calls such as __aeabi_f2d, which has more instructions than I would expect

00010420 <__aeabi_f2d>:
   10420:	0042      	lsls	r2, r0, #1
   10422:	ea4f 01e2 	mov.w	r1, r2, asr #3
   10426:	ea4f 0131 	mov.w	r1, r1, rrx
   1042a:	ea4f 7002 	mov.w	r0, r2, lsl #28
   1042e:	bf1f      	itttt	ne
   10430:	f012 437f 	andsne.w	r3, r2, #4278190080	; 0xff000000
   10434:	f093 4f7f 	teqne	r3, #4278190080	; 0xff000000
   10438:	f081 5160 	eorne.w	r1, r1, #939524096	; 0x38000000
   1043c:	4770      	bxne	lr
   1043e:	f032 427f 	bics.w	r2, r2, #4278190080	; 0xff000000
   10442:	bf08      	it	eq
   10444:	4770      	bxeq	lr
   10446:	f093 4f7f 	teq	r3, #4278190080	; 0xff000000
   1044a:	bf04      	itt	eq
   1044c:	f441 2100 	orreq.w	r1, r1, #524288	; 0x80000
   10450:	4770      	bxeq	lr
   10452:	b530      	push	{r4, r5, lr}
   10454:	f44f 7460 	mov.w	r4, #896	; 0x380
   10458:	f001 4500 	and.w	r5, r1, #2147483648	; 0x80000000
   1045c:	f021 4100 	bic.w	r1, r1, #2147483648	; 0x80000000
   10460:	e71c      	b.n	1029c <__adddf3+0x138>
   10462:	bf00      	nop

BTW, this is similar to  GNU compiler flags for generating FPU assembly instructions for nrf5340 app core

Finally, the make options generated are:

arm-zephyr-eabi-gcc.exe -DKERNEL -DMBEDTLS_CONFIG_FILE=\"nrf-config.h\" -DMBEDTLS_USER_CONFIG_FILE=\"nrf-config-user.h\" -DNRF5340_XXAA_APPLICATION -DNRF_SKIP_FICR_NS_COPY_TO_RAM -DNRF_TRUSTZONE_NONSECURE -DTFM_PSA_API -DUSE_PARTITION_MANAGER=1 -D__PROGRAM_START -D__ZEPHYR__=1 -I../../../bsp_nrf5340 -I../../../common -IC:/ncs/v2.2.0/zephyr/include -Izephyr/include/generated -IC:/ncs/v2.2.0/zephyr/soc/arm/nordic_nrf/nrf53 -IC:/ncs/v2.2.0/zephyr/soc/arm/nordic_nrf/common/. -IC:/ncs/v2.2.0/nrf/include -IC:/ncs/v2.2.0/nrf/include/tfm -IC:/ncs/v2.2.0/nrf/tests/include -Itfm/generated/interface/include -IC:/ncs/v2.2.0/modules/hal/cmsis/CMSIS/Core/Include -IC:/ncs/v2.2.0/modules/hal/nordic/nrfx -IC:/ncs/v2.2.0/modules/hal/nordic/nrfx/drivers/include -IC:/ncs/v2.2.0/modules/hal/nordic/nrfx/mdk -IC:/ncs/v2.2.0/zephyr/modules/hal_nordic/nrfx/. -Itfm/install/interface/include -Imodules/nrfxlib/nrfxlib/nrf_security/src/include/generated -IC:/ncs/v2.2.0/nrfxlib/nrf_security/include -IC:/ncs/v2.2.0/nrfxlib/nrf_security/include/mbedtls -IC:/ncs/v2.2.0/mbedtls/include -IC:/ncs/v2.2.0/mbedtls/include/mbedtls -IC:/ncs/v2.2.0/mbedtls/include/psa -IC:/ncs/v2.2.0/mbedtls/library -IC:/ncs/v2.2.0/nrfxlib/crypto/nrf_oberon/include/mbedtls -IC:/ncs/v2.2.0/nrfxlib/crypto/nrf_oberon/include -isystem C:/ncs/v2.2.0/zephyr/lib/libc/minimal/include -isystem c:/ncs/toolchains/v2.2.0/opt/zephyr-sdk/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.1.0/include -isystem c:/ncs/toolchains/v2.2.0/opt/zephyr-sdk/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.1.0/include-fixed -Wall -Werror -Wextra -fno-strict-aliasing -Og -imacros C:/github/squawk-demo/vocoder/apps/007_mem/build/zephyr/include/generated/autoconf.h -ffreestanding -fno-common -g -gdwarf-4 -fdiagnostics-color=always -mcpu=cortex-m33 -mthumb -mabi=aapcs -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mfp16-format=ieee --sysroot=C:/ncs/toolchains/v2.2.0/opt/zephyr-sdk/arm-zephyr-eabi/arm-zephyr-eabi -imacros C:/ncs/v2.2.0/zephyr/include/zephyr/toolchain/zephyr_stdint.h -Wformat -Wformat-security -Wno-format-zero-length -Wno-main -Wno-pointer-sign -Wpointer-arith -Wexpansion-to-defined -Wno-unused-but-set-variable -Werror=implicit-int -fno-pic -fno-pie -fno-asynchronous-unwind-tables -fno-reorder-functions --param=min-pagesize=0 -fno-defer-pop -fmacro-prefix-map=C:/github/squawk-demo/vocoder/apps/007_mem=CMAKE_SOURCE_DIR -fmacro-prefix-map=C:/ncs/v2.2.0/zephyr=ZEPHYR_BASE -fmacro-prefix-map=C:/ncs/v2.2.0=WEST_TOPDIR -ffunction-sections -fdata-sections -std=c99 -nostdinc -MD -MT CMakeFiles/app.dir/src/console_cmds.c.obj -MF CMakeFiles\app.dir\src\console_cmds.c.obj.d -o CMakeFiles/app.dir/src/console_cmds.c.obj -c ../src/console_cmds.c

Do you think my conclusion that the FPU is not being used is correct?

How can I configure the build such that math operations are compiled to FPU assembly instructions?

Thanks

Parents
  • Hello,

    CONFIG_FPU=y should be sufficient to enable the floating point unit. And CONFIG_FPU_SHARING=y if you are doing float operations across multiple threads.

    Please try to use the suffix 'f' when using hardcoded values to inform the compiler it should be a float type and not a double. This did the trick when I tested your code here.

    Best regards,

    Vidar

    Edit: the __aeabi_f2d function will convert a float input to a double. As only single precision floats is supported in HW, it makes sense that this conversion is done in SW.

  • Thank you for the quick reply!  I was so focused on the unknowns I forgot to check the basics. It says right there in the data brief, "Single-precision floating-point unit (FPU)"....

    Fixing this, the runtime went down to under a second.  Very nice.

Reply Children
No Data
Related