Issues with PKA Engine Modular Operations on CC310

I'm working on implementing arithmetic operations using the PKA engine on the nrf52840 in Rust, following the CRYPTOCELL — Arm TrustZone CryptoCell 310 datasheet. While I've successfully implemented basic arithmetic operations, I'm running into several issues with modular operations.

The main problem I'm facing is that modular reductions aren't being completed as expected. For modular addition, the operation only works correctly when the operands are already reduced, specifically when the result is less than 2N-1 (where N is the modulo). The modular multiplication behaves slightly differently - it only reduces the result when it's less than 4N-1. The most problematic operation is modular division, which isn't performing correctly at all, though regular division works fine.I've also noticed something interesting about memory loading. After mapping the virtual registers, I had to load the memory in reverse order to get multiplication to work. The datasheet doesn't provide clear guidance about handling larger values (for edxample, when working with [u32; 2] = [x, x], an array of 32-bit elements), so I'm not entirely sure if this reverse-order approach is correct. Has anyone encountered similar issues or can provide guidance on the correct way to handle these modular operations? I'm particularly interested in understanding if these reduction limits are expected behavior and if my approach to memory loading is correct. Any insights would be greatly appreciated.

fn main() -> ! {
    info!("Running.");

    // Enable the PKA and CryptoCell clock
    let p = pac::Peripherals::take().unwrap();
    let cc_misc = p.cc_misc;
    let cc_pka = p.cc_pka;

    p.cryptocell.enable().write(|w| w.enable().set_bit());
    cc_misc.pka_clk().write(|w| w.enable().set_bit());

    while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
    // Wait for PKA clock to be ready
    }

    cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) });

    // Configure memory map
    cc_pka.memory_map(0).write(|w| unsafe { w.bits(0x0) }); // R0
    cc_pka.memory_map(1).write(|w| unsafe { w.bits(VIRTUAL_MEMORY_OFFSET) }); // R1
    cc_pka.memory_map(4).write(|w| unsafe { w.bits(2 * VIRTUAL_MEMORY_OFFSET) }); // R4
    cc_pka.memory_map(5).write(|w| unsafe { w.bits(3 * VIRTUAL_MEMORY_OFFSET) }); // R5
    cc_pka.memory_map(6).write(|w| unsafe { w.bits(4 * VIRTUAL_MEMORY_OFFSET) }); // R6
    cc_pka.memory_map(30).write(|w| unsafe { w.bits(5 * VIRTUAL_MEMORY_OFFSET) }); // T0
    cc_pka.memory_map(31).write(|w| unsafe { w.bits(6 * VIRTUAL_MEMORY_OFFSET) }); // T1

    // Load N (R0) and Np (R1) into PKA SRAM
    // Memory is loaded in reverse order
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(0).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(N[reverse_index]) });
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(1).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(NP[reverse_index]) });
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(4).read().bits()) });
    // FIXME add bound check on A
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(A[reverse_index])});
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(5).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(B[reverse_index])});
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(6).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 + 2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00)});
    }

    // Execute the operation
    cc_pka.opcode().write(|w| unsafe {
        w.bits(
            (6 << REG_R_POS as u32)       // Result register (R6)
                | (5 << REG_B_POS as u32) // Operand B register (R5)
                | (4 << REG_A_POS as u32) // Operand A register (R4)
                | (1 << LEN_POS as u32)
                | ((Opcode::ModDiv as u32) << OPCODE_POS as u32)
        )
    });

    // Wait for the operation to complete
    while cc_pka.pka_done().read().bits() == 0 {}
 
    // exit via semihosting call
    debug::exit(EXIT_SUCCESS);
    loop {}
}

Parents
  • The documentation can be found at https://docs.nordicsemi.com/bundle/ps_nrf52840/page/cryptocell.html#ariaid-title101.

    What is the value of your OPERAND_SIZE_BITS and VIRTUAL_MEMORY_OFFSET?

    At a first glance, your code looks correct. But the code that reads the result seems to be missing. Can you show it?

    Have you calculated Np correctly? What is your N, Np, A, B and the result?

    Note that the cryptocell uses little endian word order with a word size of 32 bits (not 8-bit bytes). So to write the number 0x0123456789abcdef you should first write the word 0x89abcdef to the PKA_SRAM_WDATA register and then write 0x01234567 to the same register.

    Note to Nordic Semiconductor: the product specification says:

    1. Set the Address Offset: Specify the starting byte address for writing by setting register PKA_SRAM_WADDR. An offset value of 0x0 points to the first 32-bits word in the PKA SRAM memory. An offset value of 0x10 points to the fourth 32-bits word in the PKA SRAM memory.

    But this is wrong. The register uses word offset, so an offset value of 0x10 points to the 16th 32-bit word in the PKA SRAM memory.

  • // Example constants for positions
    const TAG_POS: u8 = 0;         // tag of the operand
    const REG_R_POS: u8 = 6;       // Result register position (Bits 6:10)
    const REG_R_CTRL_POS: u8 = 11; // Result register control position (Bit 11)
    const REG_B_POS: u8 = 12;      // Operand B register position (Bits 12:16)
    const REG_B_CTRL_POS: u8 = 17; // Operand B register control position (Bit 17)
    const REG_A_POS: u8 = 18;      // Operand A register position (Bits 18:22)
    const REG_A_CTRL_POS: u8 = 23; // Operand A register control position (Bit 23)
    const LEN_POS: u8 = 24;        // Operand length register index (Bits 24:26)
    const OPCODE_POS: u8 = 27;     // Operation code position (Bits 27:31)
    
    // All virtual registers must be 64 bits word size aligned, and the size of the virtual 
    // registers must be at least the size of the largest operand plus an extra 64 bits 
    // for internal PKA calculations. 
    // These extra 64 bits must be initialized to zero. 
    // In the 1D examples, we would have const OPERAND_SIZE_BITS = 1 * 4 * 8;
    const OPERAND_SIZE_BITS: usize = 8 * 4 * 8;
    const OPERAND_SIZE_WORDS: usize = OPERAND_SIZE_BITS/8/4;
    const OPERAND_MEMORY_OFFSET: u32 = (OPERAND_SIZE_BITS as u32)/8/4 + 2;
    const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size
    const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4 + 2;
    
    // Define example values for N and Np 
    // 1D examples
    // const N: [u32; 1] = [0x15];
    // const NP: [u32; 1] = [0xC30C30C3];
    
    // Example values for a and b
    // 1D examples
    // const A: [u32; 1] = [0x02];
    // const B: [u32; 1] = [0x07];
    
    
    //  tests [u32; 8]
    const N: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000015
    ];
    
    const NP: [u32; 8] = [
        0xC30C30C3, 0xC30C30C3, 0xC30C30C3, 0xC30C30C3, 
        0xC30C30C3, 0xC30C30C3, 0xC30C30C3, 0xC30C30C3
    ];
    
    const B: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000010
    ];
    
    const A: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000010
    ];

    As for the order, I guess that if it is little endian it makes sense that I need to lead them in the memory in reverse order.

    To read the result, I just read register 6:

        // Wait for the operation to complete
        while cc_pka.pka_done().read().bits() == 0 {}
        
        // When changing from read to write they advise to clear the sram buffer
        cc_pka.pka_sram_wclear();
        
        // Read and log the result
        let mut result = [0u32; OPERAND_SIZE_BITS/8/4 + 2];
        cc_pka.pka_sram_raddr().write(|w| unsafe { w.bits(cc_pka.memory_map(6).read().bits()) });
        for i in 0..result.len() {
            result[i] = cc_pka.pka_sram_rdata().read().bits(); 
        } 
    
        info!("Result: {:#X}", result);
    
        // exit via semihosting call
        debug::exit(EXIT_SUCCESS);
        loop {}

    This is the result:
    └─ crypto_cc310::__cortex_m_rt_main @ src/main.rs:172
    INFO Verification of R0: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15]
    └─ crypto_cc310::read_word_array @ src/main.rs:305
    INFO Verification of R4: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10]
    └─ crypto_cc310::read_word_array @ src/main.rs:305
    INFO Verification of R5: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10]
    └─ crypto_cc310::read_word_array @ src/main.rs:305
    INFO Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xC1]

    which is clearly not reduced. This is for the ModMul. 10*10=256 = 193 if we subtract 3 times 21 = 0x15

  • How did you come to the conclusion that Np should be an array full of 0x33333333?

  • For the calculation of Np I used a python script that performs the Euclidean Algorithm.

    (There was a mistake and Np is:

    const NP: [u32; 8] = [
        0xC30C30C3, 0xC30C30C3, 0xC30C30C3, 0xC30C30C3,
        0xC30C30C3, 0xC30C30C3, 0xC30C30C3, 0xC30C30C3
    ];
    I forgot to change the value of N in the python script that computes Np. Still this seems to not make any difference)

    N⋅Np≡−1 (mod R)

    Where:

    • N is the modulus.

    • Np​ is the modular multiplicative inverse of N modulo R, usually chosen as R=2^k where k is the bit length of the modulus N.

    The equation can be solved using the extended Euclidean Algorithm:

    def compute_modular_inverse(N, bit_width):
        """
        Compute the modular inverse of N modulo 2^bit_width using the Extended Euclidean Algorithm.
    
        Args:
        - N (int): The number to compute the modular inverse for.
        - bit_width (int): The bit width of the modulus (e.g., 256 for 2^256).
    
        Returns:
        - int: The modular inverse of N modulo 2^bit_width, or None if no inverse exists.
        """
        R = 2 ** bit_width
    
        # Check if gcd(N, R) is 1 (coprime condition)
        def gcd(a, b):
            while b:
                a, b = b, a % b
            return a
    
        if gcd(N, R) != 1:
            return None  # No modular inverse exists if not coprime
    
        # Extended Euclidean Algorithm to find the inverse
        t, new_t = 0, 1
        r, new_r = R, N
    
        while new_r != 0:
            quotient = r // new_r
            t, new_t = new_t, t - quotient * new_t
            r, new_r = new_r, r - quotient * new_r
    
        # Ensure the result is positive
        if t < 0:
            t += R
    
        # Step 2: Adjust NP_mod to satisfy the condition N * NP ≡ -1 (mod R)
        NP = (R - t) % R
        return NP
    
    # Example usage
    if __name__ == "__main__":
        N_parts = [0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x15]
        N = 0
        for part in N_parts:
            N = (N << 32) | part
    
        bit_width = len(N_parts) * 4 * 8
    
        Np = compute_modular_inverse(N, bit_width)
        if Np is not None:
            print(f"The modular inverse of {N} modulo 2^{bit_width} is:")
            result_array = print_in_array_format(Np, 32, bit_width)
            print(result_array)  # This will print Np in array format
        else:
            print(f"{N} has no modular inverse modulo 2^{bit_width}.")
    

  • According to https://github.com/ARM-software/cryptocell-312-runtime/blob/update-cc110-bu-00000-r1p4/codesafe/src/crypto_api/pki/common/pka.c#L561, the Np parameter should be floor(2^(N+A+X-1) / n), where N=bitlength of modulus n (according to the PKA_L register I guess corresponding to the modulo parameter), A=64, X=8.

  • Thanks for the reference. However, why is A = 64? And X = 8? According to the link you have provided  uint32_t A = CC_PKA_WORD_SIZE_IN_BITS;
    uint32_t X = PKA_EXTRA_BITS;

    But I cannot find those values anywhere.

    Also, I have implemented the calculation of NP following the c code, but still reductions are not being performed. Here is what I have

    ... 
    // Example constants for positions
    const TAG_POS: u8 = 0;         // tag of the operand
    const REG_R_POS: u8 = 6;       // Result register position (Bits 6:10)
    const REG_R_CTRL_POS: u8 = 11; // Result register control position (Bit 11)
    const REG_B_POS: u8 = 12;      // Operand B register position (Bits 12:16)
    const REG_B_CTRL_POS: u8 = 17; // Operand B register control position (Bit 17)
    const REG_A_POS: u8 = 18;      // Operand A register position (Bits 18:22)
    const REG_A_CTRL_POS: u8 = 23; // Operand A register control position (Bit 23)
    const LEN_POS: u8 = 24;        // Operand length register index (Bits 24:26)
    const OPCODE_POS: u8 = 27;     // Operation code position (Bits 27:31)
    
    // All virtual registers must be 64 bits word size aligned, and the size of the virtual 
    // registers must be at least the size of the largest operand plus an extra 64 bits 
    // for internal PKA calculations. 
    // These extra 64 bits must be initialized to zero. 
    const MAX_OPERAND_SIZE_BITS: usize = 64 * 4 * 8;
    const OPERAND_SIZE_BITS: usize = 8 * 4 * 8;
    const OPERAND_SIZE_WORDS: usize = OPERAND_SIZE_BITS/8/4;
    const MAX_OPERAND_SIZE_WORDS: usize = MAX_OPERAND_SIZE_BITS/8/4;
    const OPERAND_MEMORY_OFFSET: u32 = (OPERAND_SIZE_BITS as u32)/8/4 + 2;
    const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size
    const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4 + 2;
    
    // tests
    const N: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000015
    ];
    
    const B: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000010
    ];
    
    const A: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000100
    ];
    
    
    #[entry]
    fn main() -> ! {
        info!("Running.");
    
        // Enable the PKA and CryptoCell clock
        let p = pac::Peripherals::take().unwrap();
        let cc_misc = p.cc_misc;
        let cc_pka = p.cc_pka;
    
        p.cryptocell.enable().write(|w| w.enable().set_bit());
        cc_misc.pka_clk().write(|w| w.enable().set_bit());
    
        while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
        // Wait for PKA clock to be ready
        }
        info!("PKA clock ready. PKA engine enabled");
    
        // Operand size
        cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) }); 
        // max opernad size
        cc_pka.pka_l(0).write(|w| unsafe { w.bits(MAX_OPERAND_SIZE_BITS as u32) }); 
    
        // Configure memory map
        configure_memory_map(&cc_pka);
    
        // Clear registers
        clear_pka_registers(&cc_pka);
    
        // Load N
        load_word_array(&cc_pka, 0, &N);
        
        // Calculate Np
        calculate_np(&cc_pka);
    
        // Verify data is well written
        cc_pka.pka_sram_wclear();
        read_word_array(&cc_pka, 0);
        read_word_array(&cc_pka, 1);
    
        // Load data to compute operations
        load_word_array(&cc_pka, 4, &A);
        load_word_array(&cc_pka, 5, &B);
        read_word_array(&cc_pka, 4);
        read_word_array(&cc_pka, 5);
    
        // example operation 4 * 5 = 6
        execute_operation(&cc_pka, cc_pka::opcode::Opcode::ModMul, 6, 4, 5, 0);
    
        cc_pka.pka_sram_wclear();
        read_word_array(&cc_pka, 6);
    
        // exit via semihosting call
        debug::exit(EXIT_SUCCESS);
        loop {}
    }
    
    
    fn configure_memory_map(cc_pka: &pac::CcPka) {
        // Map virtual registers
        // R0: modulus (N)
        // R1: Np
        // R2: a parameter
        // R3: b parameter
        // R4: operand A
        // R5: operand B
        // R6: result
        // R7: temporal
        // R8: temporal
        // T0: register 30
        // T1: register 31
        for i in 0..9 {
            cc_pka.memory_map(i).write(|w| unsafe { 
                w.bits(i as u32 * VIRTUAL_MEMORY_OFFSET) 
            });
        }
        cc_pka.memory_map(30).write(|w| unsafe { 
                w.bits(7 as u32 * VIRTUAL_MEMORY_OFFSET) 
            });
        cc_pka.memory_map(31).write(|w| unsafe { 
            w.bits(8 as u32 * VIRTUAL_MEMORY_OFFSET) 
        });
    }
    
    fn clear_pka_registers(cc_pka: &pac::CcPka) {
        for i in 0..9 {
            cc_pka.pka_sram_waddr().write(|w| unsafe { 
                w.bits(cc_pka.memory_map(i).read().bits()) 
            });
            
            for i in 0..MAX_OPERAND_SIZE_WORDS {
                cc_pka.pka_sram_wdata().write(|w| unsafe { 
                    w.bits(0x00) 
                });
            }
        }
        for i in 30..32 {
            cc_pka.pka_sram_waddr().write(|w| unsafe { 
                w.bits(cc_pka.memory_map(i).read().bits()) 
            });
            
            for i in 0..(64 * 4 * 8) {
                cc_pka.pka_sram_wdata().write(|w| unsafe { 
                    w.bits(0x00) 
                });
            }
        }
    }
    
    fn load_word_array(cc_pka: &pac::CcPka, reg: usize, data: &[u32]) {
        cc_pka.pka_sram_waddr().write(|w| unsafe { 
            w.bits(cc_pka.memory_map(reg).read().bits()) 
        });
        
        // Load data in reverse order
        for i in 0..data.len() {
            let reverse_index = data.len() - 1 - i;
            cc_pka.pka_sram_wdata().write(|w| unsafe { 
                w.bits(data[reverse_index]) 
            });
        }
        // Add padding zeros
        for _ in 0..2 {
            cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
        }
    }
    
    fn read_word_array(cc_pka: &pac::CcPka, reg: usize) {
        cc_pka.pka_sram_raddr().write(|w| unsafe { 
            w.bits(cc_pka.memory_map(reg).read().bits()) 
        });
        let mut verif = [0u32; MAX_OPERAND_SIZE_WORDS];
        for i in 0..MAX_OPERAND_SIZE_WORDS {
            verif[MAX_OPERAND_SIZE_WORDS - 1 -i] = cc_pka.pka_sram_rdata().read().bits();
            // verif[i] = cc_pka.pka_sram_rdata().read().bits();
        }
        info!("Verification of R{:?}: {:#X}", reg, verif);
    }
    
    fn execute_operation(cc_pka: &pac::CcPka, opcode: cc_pka::opcode::Opcode, 
        result_reg: u8, operand_a_reg: u8, operand_b_reg: u8, operand_size_idx: u32) {
        cc_pka.opcode().write(|w| unsafe {
        w.bits(
        ((result_reg as u32) << REG_R_POS)
        | ((operand_b_reg as u32) << REG_B_POS)
        | ((operand_a_reg as u32) << REG_A_POS)
        | (operand_size_idx << LEN_POS)
        | ((opcode as u32) << OPCODE_POS)
        )
        });
    
        while cc_pka.pka_done().read().bits() == 0 {}
    }
    
    
    fn calculate_np(cc_pka: &pac::CcPka) -> () {
    
        let total_bits = OPERAND_SIZE_BITS + 64 + 8 - 1;
    
        // Create big number representing 2^(N+A+X-1)    
        let word_index = total_bits / 32;
        let bit_index = total_bits % 32;
        let mut numerator = [0u32; MAX_OPERAND_SIZE_WORDS];
        numerator[MAX_OPERAND_SIZE_WORDS - 1 - word_index] = 1 << bit_index;
     
        // Load data in reverse order into a temp register
        load_word_array(&cc_pka, 7, &numerator);
     
        // n is already in R0, execute division
        cc_pka.opcode().write(|w| unsafe {
            w.bits(
            ( 1 << REG_R_POS)
            | ( 0 << REG_B_POS)
            | ( 7 << REG_A_POS)
            | ( 0 << LEN_POS)
            | ((cc_pka::opcode::Opcode::Division as u32) << OPCODE_POS)
            )
            });
     }

    The results are still not reduced...

    INFO Verification of R0: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R1: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R4: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x100]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R5: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFC1]

    Thanks!

Reply
  • Thanks for the reference. However, why is A = 64? And X = 8? According to the link you have provided  uint32_t A = CC_PKA_WORD_SIZE_IN_BITS;
    uint32_t X = PKA_EXTRA_BITS;

    But I cannot find those values anywhere.

    Also, I have implemented the calculation of NP following the c code, but still reductions are not being performed. Here is what I have

    ... 
    // Example constants for positions
    const TAG_POS: u8 = 0;         // tag of the operand
    const REG_R_POS: u8 = 6;       // Result register position (Bits 6:10)
    const REG_R_CTRL_POS: u8 = 11; // Result register control position (Bit 11)
    const REG_B_POS: u8 = 12;      // Operand B register position (Bits 12:16)
    const REG_B_CTRL_POS: u8 = 17; // Operand B register control position (Bit 17)
    const REG_A_POS: u8 = 18;      // Operand A register position (Bits 18:22)
    const REG_A_CTRL_POS: u8 = 23; // Operand A register control position (Bit 23)
    const LEN_POS: u8 = 24;        // Operand length register index (Bits 24:26)
    const OPCODE_POS: u8 = 27;     // Operation code position (Bits 27:31)
    
    // All virtual registers must be 64 bits word size aligned, and the size of the virtual 
    // registers must be at least the size of the largest operand plus an extra 64 bits 
    // for internal PKA calculations. 
    // These extra 64 bits must be initialized to zero. 
    const MAX_OPERAND_SIZE_BITS: usize = 64 * 4 * 8;
    const OPERAND_SIZE_BITS: usize = 8 * 4 * 8;
    const OPERAND_SIZE_WORDS: usize = OPERAND_SIZE_BITS/8/4;
    const MAX_OPERAND_SIZE_WORDS: usize = MAX_OPERAND_SIZE_BITS/8/4;
    const OPERAND_MEMORY_OFFSET: u32 = (OPERAND_SIZE_BITS as u32)/8/4 + 2;
    const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size
    const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4 + 2;
    
    // tests
    const N: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000015
    ];
    
    const B: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000010
    ];
    
    const A: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000100
    ];
    
    
    #[entry]
    fn main() -> ! {
        info!("Running.");
    
        // Enable the PKA and CryptoCell clock
        let p = pac::Peripherals::take().unwrap();
        let cc_misc = p.cc_misc;
        let cc_pka = p.cc_pka;
    
        p.cryptocell.enable().write(|w| w.enable().set_bit());
        cc_misc.pka_clk().write(|w| w.enable().set_bit());
    
        while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
        // Wait for PKA clock to be ready
        }
        info!("PKA clock ready. PKA engine enabled");
    
        // Operand size
        cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) }); 
        // max opernad size
        cc_pka.pka_l(0).write(|w| unsafe { w.bits(MAX_OPERAND_SIZE_BITS as u32) }); 
    
        // Configure memory map
        configure_memory_map(&cc_pka);
    
        // Clear registers
        clear_pka_registers(&cc_pka);
    
        // Load N
        load_word_array(&cc_pka, 0, &N);
        
        // Calculate Np
        calculate_np(&cc_pka);
    
        // Verify data is well written
        cc_pka.pka_sram_wclear();
        read_word_array(&cc_pka, 0);
        read_word_array(&cc_pka, 1);
    
        // Load data to compute operations
        load_word_array(&cc_pka, 4, &A);
        load_word_array(&cc_pka, 5, &B);
        read_word_array(&cc_pka, 4);
        read_word_array(&cc_pka, 5);
    
        // example operation 4 * 5 = 6
        execute_operation(&cc_pka, cc_pka::opcode::Opcode::ModMul, 6, 4, 5, 0);
    
        cc_pka.pka_sram_wclear();
        read_word_array(&cc_pka, 6);
    
        // exit via semihosting call
        debug::exit(EXIT_SUCCESS);
        loop {}
    }
    
    
    fn configure_memory_map(cc_pka: &pac::CcPka) {
        // Map virtual registers
        // R0: modulus (N)
        // R1: Np
        // R2: a parameter
        // R3: b parameter
        // R4: operand A
        // R5: operand B
        // R6: result
        // R7: temporal
        // R8: temporal
        // T0: register 30
        // T1: register 31
        for i in 0..9 {
            cc_pka.memory_map(i).write(|w| unsafe { 
                w.bits(i as u32 * VIRTUAL_MEMORY_OFFSET) 
            });
        }
        cc_pka.memory_map(30).write(|w| unsafe { 
                w.bits(7 as u32 * VIRTUAL_MEMORY_OFFSET) 
            });
        cc_pka.memory_map(31).write(|w| unsafe { 
            w.bits(8 as u32 * VIRTUAL_MEMORY_OFFSET) 
        });
    }
    
    fn clear_pka_registers(cc_pka: &pac::CcPka) {
        for i in 0..9 {
            cc_pka.pka_sram_waddr().write(|w| unsafe { 
                w.bits(cc_pka.memory_map(i).read().bits()) 
            });
            
            for i in 0..MAX_OPERAND_SIZE_WORDS {
                cc_pka.pka_sram_wdata().write(|w| unsafe { 
                    w.bits(0x00) 
                });
            }
        }
        for i in 30..32 {
            cc_pka.pka_sram_waddr().write(|w| unsafe { 
                w.bits(cc_pka.memory_map(i).read().bits()) 
            });
            
            for i in 0..(64 * 4 * 8) {
                cc_pka.pka_sram_wdata().write(|w| unsafe { 
                    w.bits(0x00) 
                });
            }
        }
    }
    
    fn load_word_array(cc_pka: &pac::CcPka, reg: usize, data: &[u32]) {
        cc_pka.pka_sram_waddr().write(|w| unsafe { 
            w.bits(cc_pka.memory_map(reg).read().bits()) 
        });
        
        // Load data in reverse order
        for i in 0..data.len() {
            let reverse_index = data.len() - 1 - i;
            cc_pka.pka_sram_wdata().write(|w| unsafe { 
                w.bits(data[reverse_index]) 
            });
        }
        // Add padding zeros
        for _ in 0..2 {
            cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
        }
    }
    
    fn read_word_array(cc_pka: &pac::CcPka, reg: usize) {
        cc_pka.pka_sram_raddr().write(|w| unsafe { 
            w.bits(cc_pka.memory_map(reg).read().bits()) 
        });
        let mut verif = [0u32; MAX_OPERAND_SIZE_WORDS];
        for i in 0..MAX_OPERAND_SIZE_WORDS {
            verif[MAX_OPERAND_SIZE_WORDS - 1 -i] = cc_pka.pka_sram_rdata().read().bits();
            // verif[i] = cc_pka.pka_sram_rdata().read().bits();
        }
        info!("Verification of R{:?}: {:#X}", reg, verif);
    }
    
    fn execute_operation(cc_pka: &pac::CcPka, opcode: cc_pka::opcode::Opcode, 
        result_reg: u8, operand_a_reg: u8, operand_b_reg: u8, operand_size_idx: u32) {
        cc_pka.opcode().write(|w| unsafe {
        w.bits(
        ((result_reg as u32) << REG_R_POS)
        | ((operand_b_reg as u32) << REG_B_POS)
        | ((operand_a_reg as u32) << REG_A_POS)
        | (operand_size_idx << LEN_POS)
        | ((opcode as u32) << OPCODE_POS)
        )
        });
    
        while cc_pka.pka_done().read().bits() == 0 {}
    }
    
    
    fn calculate_np(cc_pka: &pac::CcPka) -> () {
    
        let total_bits = OPERAND_SIZE_BITS + 64 + 8 - 1;
    
        // Create big number representing 2^(N+A+X-1)    
        let word_index = total_bits / 32;
        let bit_index = total_bits % 32;
        let mut numerator = [0u32; MAX_OPERAND_SIZE_WORDS];
        numerator[MAX_OPERAND_SIZE_WORDS - 1 - word_index] = 1 << bit_index;
     
        // Load data in reverse order into a temp register
        load_word_array(&cc_pka, 7, &numerator);
     
        // n is already in R0, execute division
        cc_pka.opcode().write(|w| unsafe {
            w.bits(
            ( 1 << REG_R_POS)
            | ( 0 << REG_B_POS)
            | ( 7 << REG_A_POS)
            | ( 0 << LEN_POS)
            | ((cc_pka::opcode::Opcode::Division as u32) << OPCODE_POS)
            )
            });
     }

    The results are still not reduced...

    INFO Verification of R0: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R1: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R4: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x100]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R5: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFC1]

    Thanks!

Children
  • If operand size in bits according to the PKA_L register is N, then the Np parameter should contain floor(2^(N+64-1)/n), where n is the modulus.

    After doing some testing on my own, it appears the cryptocell has some constraints on n and the operand size. To get correct results, I suggest you to try a larger modulus (bitsize >= 64) and set the operand size in bits (PKA_L) to exactly fit the modulus, or at most 8 bit extra, but not bigger. Not entirely sure though. However, the reference implementation uses 8 extra bits.

Related