Hello,
I'm trying to do some calculations using the FPU, but it seems like software float routines are still being used. I haven't been able to find any conclusive documents on setting this up, so if I've missed them please give me a pointer.
Using nRF Connect SDK 2.2.0, Zephyr build in vscode
prj.conf:
CONFIG_SERIAL=y CONFIG_UART_ASYNC_API=y CONFIG_I2C=y CONFIG_DEBUG_OPTIMIZATIONS=y CONFIG_DEBUG_THREAD_INFO=y CONFIG_SPI=y CONFIG_NRFX_SPIS2=y CONFIG_FPU=y CONFIG_FPU_SHARING=y
Test routine:
float accumulator = 1000.0; printf("Performing 1 million divides.\r\n"); for (int i = 0; i < 1000000; i++) { accumulator = accumulator / 0.9999999; }
The test routine is not in a task. It takes about 10 seconds to complete, which I would expect to be a bit faster, so I check the list file
zephyr.lst:
10dd2: 4962 ldr r1, [pc, #392] ; (10f5c <cliRunCommand+0x2f4>) 10dd4: 2001 movs r0, #1 10dd6: f000 fc6f bl 116b8 <palTracef> float accumulator = 1000.0; 10dda: 4c61 ldr r4, [pc, #388] ; (10f60 <cliRunCommand+0x2f8>) for (int i = 0; i < 1000000; i++) 10ddc: e00b b.n 10df6 <cliRunCommand+0x18e> accumulator = accumulator / 0.9999999; 10dde: 4620 mov r0, r4 10de0: f7ff fb1e bl 10420 <__aeabi_f2d> 10de4: a34e add r3, pc, #312 ; (adr r3, 10f20 <cliRunCommand+0x2b8>) 10de6: e9d3 2300 ldrd r2, r3, [r3] 10dea: f7ff fc9b bl 10724 <__aeabi_ddiv> 10dee: f7ff fd81 bl 108f4 <__aeabi_d2f> 10df2: 4604 mov r4, r0 for (int i = 0; i < 1000000; i++) 10df4: 3501 adds r5, #1 10df6: 4b5b ldr r3, [pc, #364] ; (10f64 <cliRunCommand+0x2fc>) 10df8: 429d cmp r5, r3 10dfa: ddf0 ble.n 10dde <cliRunCommand+0x176>
It uses calls such as __aeabi_f2d, which has more instructions than I would expect
00010420 <__aeabi_f2d>: 10420: 0042 lsls r2, r0, #1 10422: ea4f 01e2 mov.w r1, r2, asr #3 10426: ea4f 0131 mov.w r1, r1, rrx 1042a: ea4f 7002 mov.w r0, r2, lsl #28 1042e: bf1f itttt ne 10430: f012 437f andsne.w r3, r2, #4278190080 ; 0xff000000 10434: f093 4f7f teqne r3, #4278190080 ; 0xff000000 10438: f081 5160 eorne.w r1, r1, #939524096 ; 0x38000000 1043c: 4770 bxne lr 1043e: f032 427f bics.w r2, r2, #4278190080 ; 0xff000000 10442: bf08 it eq 10444: 4770 bxeq lr 10446: f093 4f7f teq r3, #4278190080 ; 0xff000000 1044a: bf04 itt eq 1044c: f441 2100 orreq.w r1, r1, #524288 ; 0x80000 10450: 4770 bxeq lr 10452: b530 push {r4, r5, lr} 10454: f44f 7460 mov.w r4, #896 ; 0x380 10458: f001 4500 and.w r5, r1, #2147483648 ; 0x80000000 1045c: f021 4100 bic.w r1, r1, #2147483648 ; 0x80000000 10460: e71c b.n 1029c <__adddf3+0x138> 10462: bf00 nop
BTW, this is similar to GNU compiler flags for generating FPU assembly instructions for nrf5340 app core
Finally, the make options generated are:
arm-zephyr-eabi-gcc.exe -DKERNEL -DMBEDTLS_CONFIG_FILE=\"nrf-config.h\" -DMBEDTLS_USER_CONFIG_FILE=\"nrf-config-user.h\" -DNRF5340_XXAA_APPLICATION -DNRF_SKIP_FICR_NS_COPY_TO_RAM -DNRF_TRUSTZONE_NONSECURE -DTFM_PSA_API -DUSE_PARTITION_MANAGER=1 -D__PROGRAM_START -D__ZEPHYR__=1 -I../../../bsp_nrf5340 -I../../../common -IC:/ncs/v2.2.0/zephyr/include -Izephyr/include/generated -IC:/ncs/v2.2.0/zephyr/soc/arm/nordic_nrf/nrf53 -IC:/ncs/v2.2.0/zephyr/soc/arm/nordic_nrf/common/. -IC:/ncs/v2.2.0/nrf/include -IC:/ncs/v2.2.0/nrf/include/tfm -IC:/ncs/v2.2.0/nrf/tests/include -Itfm/generated/interface/include -IC:/ncs/v2.2.0/modules/hal/cmsis/CMSIS/Core/Include -IC:/ncs/v2.2.0/modules/hal/nordic/nrfx -IC:/ncs/v2.2.0/modules/hal/nordic/nrfx/drivers/include -IC:/ncs/v2.2.0/modules/hal/nordic/nrfx/mdk -IC:/ncs/v2.2.0/zephyr/modules/hal_nordic/nrfx/. -Itfm/install/interface/include -Imodules/nrfxlib/nrfxlib/nrf_security/src/include/generated -IC:/ncs/v2.2.0/nrfxlib/nrf_security/include -IC:/ncs/v2.2.0/nrfxlib/nrf_security/include/mbedtls -IC:/ncs/v2.2.0/mbedtls/include -IC:/ncs/v2.2.0/mbedtls/include/mbedtls -IC:/ncs/v2.2.0/mbedtls/include/psa -IC:/ncs/v2.2.0/mbedtls/library -IC:/ncs/v2.2.0/nrfxlib/crypto/nrf_oberon/include/mbedtls -IC:/ncs/v2.2.0/nrfxlib/crypto/nrf_oberon/include -isystem C:/ncs/v2.2.0/zephyr/lib/libc/minimal/include -isystem c:/ncs/toolchains/v2.2.0/opt/zephyr-sdk/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.1.0/include -isystem c:/ncs/toolchains/v2.2.0/opt/zephyr-sdk/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.1.0/include-fixed -Wall -Werror -Wextra -fno-strict-aliasing -Og -imacros C:/github/squawk-demo/vocoder/apps/007_mem/build/zephyr/include/generated/autoconf.h -ffreestanding -fno-common -g -gdwarf-4 -fdiagnostics-color=always -mcpu=cortex-m33 -mthumb -mabi=aapcs -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mfp16-format=ieee --sysroot=C:/ncs/toolchains/v2.2.0/opt/zephyr-sdk/arm-zephyr-eabi/arm-zephyr-eabi -imacros C:/ncs/v2.2.0/zephyr/include/zephyr/toolchain/zephyr_stdint.h -Wformat -Wformat-security -Wno-format-zero-length -Wno-main -Wno-pointer-sign -Wpointer-arith -Wexpansion-to-defined -Wno-unused-but-set-variable -Werror=implicit-int -fno-pic -fno-pie -fno-asynchronous-unwind-tables -fno-reorder-functions --param=min-pagesize=0 -fno-defer-pop -fmacro-prefix-map=C:/github/squawk-demo/vocoder/apps/007_mem=CMAKE_SOURCE_DIR -fmacro-prefix-map=C:/ncs/v2.2.0/zephyr=ZEPHYR_BASE -fmacro-prefix-map=C:/ncs/v2.2.0=WEST_TOPDIR -ffunction-sections -fdata-sections -std=c99 -nostdinc -MD -MT CMakeFiles/app.dir/src/console_cmds.c.obj -MF CMakeFiles\app.dir\src\console_cmds.c.obj.d -o CMakeFiles/app.dir/src/console_cmds.c.obj -c ../src/console_cmds.c
Do you think my conclusion that the FPU is not being used is correct?
How can I configure the build such that math operations are compiled to FPU assembly instructions?
Thanks