nRF5340 Development Kit board: ECDH implementation (found in nrf/samples/crypto/ecdh) using hardware (cc3xx) is significantly slower than when using software (oberon)

Performance on hardware (cc3xx) is significantly slower than performance on software (oberon). Does anyone have a good explanation for this? I have also tested ECDSA and can see that hardware is faster. I also tried a different elliptic curve, but the result is the same. 
added the performance code, everything else and more can be found in nrf/samples/crypto/ecdh.
Development setup:
Macbook Air Apple M1
Toolchain version: 2.6.0
nRF5340 Development Kit board

This is my main.c file:




This is my prj.conf:




This is my nrf5340dk_nrf5340_cpuapp.conf:




When I run this with CONFIG_PSA_CRYPTO_DRIVER_OBERON=y and CONFIG_PSA_CRYPTO_DRIVER_CC3XX=n, I get this output:

*** Booting nRF Connect SDK v3.5.99-ncs1 ***
Starting ECDH Keypair Generation benchmark (100 runs)...
Frequency: 64 MHz
ECDH Keypair Generation benchmark results:
Runs: 100
Total: 1149048 cycles
Average: 11490.000 cycles
Minimum: 11490 cycles
Maximum: 11520 cycles
Std: 3.000 cycles
*** Booting nRF Connect SDK v3.5.99-ncs1 ***


but running it with CONFIG_PSA_CRYPTO_DRIVER_OBERON=n and CONFIG_PSA_CRYPTO_DRIVER_CC3XX=y, I get this output:

*** Booting nRF Connect SDK v3.5.99-ncs1 ***
Starting ECDH Keypair Generation benchmark (100 runs)...
Frequency: 64 MHz
ECDH Keypair Generation benchmark results:
Runs: 100
Total: 97657851 cycles
Average: 976578.000 cycles
Minimum: 929049 cycles
Maximum: 1014195 cycles
Std: 13743.361 cycles
*** Booting nRF Connect SDK v3.5.99-ncs1 ***
  • Hello achi77,

    I will look into this and follow-up with you this week.

    Hieu

  • Hello achi77,

    I got feedback from our engineers that this is the state of the CC3XX driver right now with ECDH key generation. It is meant to have better security than the Oberon library, but at the cost of performance.

    I was also tipped that you can improve the performance a little by changing the access lock strategy here: sdk-nrfxlib/crypto/Kconfig at v3.0.1 · nrfconnect/sdk-nrfxlib.
    Please note that this change will make CryptoCell access not thread-safe.

    Hieu

  • Hi Hieu,

    Thank you very much for the fast reply!

    achi

  • I put my bet on that the answer by Hieu is wrong.

    I think it's different behaviour in the two implementations how they each define "keypair generation". The oberon lib only generates some random numbers which constitute the private key, while I think CC310 also derives the public key internally (which is the operation that actually costs). Please try to redo your benchmark by performing key pair generation followed by public key export to make an apples by apples comparison.

    In any case, the CC310 is quite slow when it comes to big number arithmetic (as is used in public key cryptography) for being a hw accelerator. The multiplier internally is not so fast and it uses the generic Barrett reduction algorithm, while software implementations can be optimized better for "special prime moduli" such as the ones typically used for elliptic curves. The Cracen found in nRF54L is several times faster.

  • Hi Emil

    Here is the comparison between HW and SW for public key export after key generation:

    HW:

    Starting ECDH KEY PAIR GENERATION benchmark (100 runs)...
    Frequency: 64 MHz
    ECDH KEY PAIR GENERATION benchmark results:
       Runs:    100
       Total:   97356680 cycles
       Average: 973566.000 cycles
       Minimum: 943227 cycles
       Maximum: 1013147 cycles
       Std:     13392.071 cycles
    
    Starting ECDH PUBLIC KEY EXPORT benchmark (100 runs)...
    Frequency: 64 MHz
    ECDH PUBLIC KEY EXPORT benchmark results:
       Runs:    100
       Total:   90703453 cycles
       Average: 907034.000 cycles
       Minimum: 906987 cycles
       Maximum: 907036 cycles
       Std:     4.796 cycles
    

    SW:

    Starting ECDH KEY PAIR GENERATION benchmark (100 runs)...
    Frequency: 64 MHz
    ECDH KEY PAIR GENERATION benchmark results:
       Runs:    100
       Total:   1107485 cycles
       Average: 11074.000 cycles
       Minimum: 11074 cycles
       Maximum: 11153 cycles
       Std:     7.874 cycles
    
    Starting ECDH PUBLIC KEY EXPORT benchmark (100 runs)...
    Frequency: 64 MHz
    ECDH PUBLIC KEY EXPORT benchmark results:
       Runs:    100
       Total:   147704871 cycles
       Average: 1477048.000 cycles
       Minimum: 1477031 cycles
       Maximum: 1477049 cycles
       Std:     2.000 cycles
    


    I use the nRF5340 DK and according to https://www.nordicsemi.com/Products/nRF5340, it should use the Arm CryptoCell-312. I have seen that Cracen is also a possibility, but not on nRF5340 devices.

1 2