HW crypto slower than SW crypto for certain CCM* vectors

test_crypto.zip

Hi,

I ran some tests with the SW and HW crypto on nrf5340dk. I expected that using the crypto cell would improve my crypto performance tremendously but unfortunately, it did not.

I ran some unit tests and they are better in HW on average but some are worse in HW than they are in SW and those are the ones that I need for my dev.

I'm attaching the sample code I'm using for my tests but here is the essence:

Using the SW encryption, I get a pretty similar performance over the encryption and decryption (west build -p -b nrf5340dk_nrf5340_cpuapp --build-dir build/crypto.sw test_crypto -DOVERLAY_CONFIG=crypto_sw.conf)

Now, using the HW encryption (west build -p -b nrf5340dk_nrf5340_cpuapp --build-dir build/crypto.hw test_crypto -DOVERLAY_CONFIG=crypto_hw.conf), I get very different performance and even worse for some vectors than the SW implementation.

Could you please advise if I did something wrong? Or help me getting this better.

SW crypto output:

*** Booting Zephyr OS build v2.6.99-ncs1-1 ***
Running test suite crypto_tests
===================================================================
START - test_ccm_star_encrypt_vectors
I: test_ccm_star_encrypt_vectors: test 0 duration 16126 cycles
I: test_ccm_star_encrypt_vectors: test 1 duration 15450 cycles
I: test_ccm_star_encrypt_vectors: test 2 duration 15523 cycles
I: test_ccm_star_encrypt_vectors: test 3 duration 15746 cycles
PASS - test_ccm_star_encrypt_vectors in 0.25 seconds
===================================================================
START - test_ccm_star_decrypt_vectors
I: test_ccm_star_decrypt_vectors: test 0 duration 12127 cycles
I: test_ccm_star_decrypt_vectors: test 1 duration 11633 cycles
I: test_ccm_star_decrypt_vectors: test 2 duration 11779 cycles
I: test_ccm_star_decrypt_vectors: test 3 duration 12826 cycles
PASS - test_ccm_star_decrypt_vectors in 0.25 seconds
===================================================================
Test suite crypto_tests succeeded
===================================================================
PROJECT EXECUTION SUCCESSFUL
HW crypto output:
*** Booting Zephyr OS build v2.6.99-ncs1-1 ***
Running test suite crypto_tests
===================================================================
START - test_ccm_star_encrypt_vectors
I: test_ccm_star_encrypt_vectors: test 0 duration 17535 cycles
I: test_ccm_star_encrypt_vectors: test 1 duration 175 cycles
I: test_ccm_star_encrypt_vectors: test 2 duration 143 cycles
I: test_ccm_star_encrypt_vectors: test 3 duration 161 cycles
PASS - test_ccm_star_encrypt_vectors in 0.23 seconds
===================================================================
START - test_ccm_star_decrypt_vectors
I: test_ccm_star_decrypt_vectors: test 0 duration 239 cycles
I: test_ccm_star_decrypt_vectors: test 1 duration 17308 cycles
I: test_ccm_star_decrypt_vectors: test 2 duration 197 cycles
I: test_ccm_star_decrypt_vectors: test 3 duration 280 cycles
PASS - test_ccm_star_decrypt_vectors in 0.23 seconds
===================================================================
Test suite crypto_tests succeeded
===================================================================
PROJECT EXECUTION SUCCESSFUL

Parents
  • Hi,

    There is a complicating factor when using WH acceleration which you don't have with SW, which is that there is a need to lock the HW using a mutex and this leads to some overhead.

    There are configurations that allow changing this to a simpler scheme (using atomic locks) but it won't be thread-safe so you have to be very careful about how you call crypto APIs in that case. The biggest problem is the need to get PRNG at random places in the code. This will require use of CTR_DRBG which is the same module as the CCM. There can only be one users of this. If you use atomic locking then you have the possibility of deadlocking yourself if you are requesting PRNG at the same time as doing the CCM operation

    Also there is a possibility that the setup of the HW Crypto is also impacting the execution speed, in case the input material is very small.

    For larger input data the execution time should consistently be much smaller using HW compared to using SW, and this is normally what is most important.

Reply
  • Hi,

    There is a complicating factor when using WH acceleration which you don't have with SW, which is that there is a need to lock the HW using a mutex and this leads to some overhead.

    There are configurations that allow changing this to a simpler scheme (using atomic locks) but it won't be thread-safe so you have to be very careful about how you call crypto APIs in that case. The biggest problem is the need to get PRNG at random places in the code. This will require use of CTR_DRBG which is the same module as the CCM. There can only be one users of this. If you use atomic locking then you have the possibility of deadlocking yourself if you are requesting PRNG at the same time as doing the CCM operation

    Also there is a possibility that the setup of the HW Crypto is also impacting the execution speed, in case the input material is very small.

    For larger input data the execution time should consistently be much smaller using HW compared to using SW, and this is normally what is most important.

Children
Related