Issues with PKA Engine Modular Operations on CC310

I'm working on implementing arithmetic operations using the PKA engine on the nrf52840 in Rust, following the CRYPTOCELL — Arm TrustZone CryptoCell 310 datasheet. While I've successfully implemented basic arithmetic operations, I'm running into several issues with modular operations.

The main problem I'm facing is that modular reductions aren't being completed as expected. For modular addition, the operation only works correctly when the operands are already reduced, specifically when the result is less than 2N-1 (where N is the modulo). The modular multiplication behaves slightly differently - it only reduces the result when it's less than 4N-1. The most problematic operation is modular division, which isn't performing correctly at all, though regular division works fine.I've also noticed something interesting about memory loading. After mapping the virtual registers, I had to load the memory in reverse order to get multiplication to work. The datasheet doesn't provide clear guidance about handling larger values (for edxample, when working with [u32; 2] = [x, x], an array of 32-bit elements), so I'm not entirely sure if this reverse-order approach is correct. Has anyone encountered similar issues or can provide guidance on the correct way to handle these modular operations? I'm particularly interested in understanding if these reduction limits are expected behavior and if my approach to memory loading is correct. Any insights would be greatly appreciated.

fn main() -> ! {
    info!("Running.");

    // Enable the PKA and CryptoCell clock
    let p = pac::Peripherals::take().unwrap();
    let cc_misc = p.cc_misc;
    let cc_pka = p.cc_pka;

    p.cryptocell.enable().write(|w| w.enable().set_bit());
    cc_misc.pka_clk().write(|w| w.enable().set_bit());

    while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
    // Wait for PKA clock to be ready
    }

    cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) });

    // Configure memory map
    cc_pka.memory_map(0).write(|w| unsafe { w.bits(0x0) }); // R0
    cc_pka.memory_map(1).write(|w| unsafe { w.bits(VIRTUAL_MEMORY_OFFSET) }); // R1
    cc_pka.memory_map(4).write(|w| unsafe { w.bits(2 * VIRTUAL_MEMORY_OFFSET) }); // R4
    cc_pka.memory_map(5).write(|w| unsafe { w.bits(3 * VIRTUAL_MEMORY_OFFSET) }); // R5
    cc_pka.memory_map(6).write(|w| unsafe { w.bits(4 * VIRTUAL_MEMORY_OFFSET) }); // R6
    cc_pka.memory_map(30).write(|w| unsafe { w.bits(5 * VIRTUAL_MEMORY_OFFSET) }); // T0
    cc_pka.memory_map(31).write(|w| unsafe { w.bits(6 * VIRTUAL_MEMORY_OFFSET) }); // T1

    // Load N (R0) and Np (R1) into PKA SRAM
    // Memory is loaded in reverse order
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(0).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(N[reverse_index]) });
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(1).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(NP[reverse_index]) });
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(4).read().bits()) });
    // FIXME add bound check on A
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(A[reverse_index])});
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(5).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 {
        let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(B[reverse_index])});
    }
    // Extra 64 bits (2 words) must be intialized to zero
    for i in 0..2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
    cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(6).read().bits()) });
    for i in 0..OPERAND_SIZE_BITS/8/4 + 2 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00)});
    }

    // Execute the operation
    cc_pka.opcode().write(|w| unsafe {
        w.bits(
            (6 << REG_R_POS as u32)       // Result register (R6)
                | (5 << REG_B_POS as u32) // Operand B register (R5)
                | (4 << REG_A_POS as u32) // Operand A register (R4)
                | (1 << LEN_POS as u32)
                | ((Opcode::ModDiv as u32) << OPCODE_POS as u32)
        )
    });

    // Wait for the operation to complete
    while cc_pka.pka_done().read().bits() == 0 {}
 
    // exit via semihosting call
    debug::exit(EXIT_SUCCESS);
    loop {}
}

  • Thanks for the reference. However, why is A = 64? And X = 8? According to the link you have provided  uint32_t A = CC_PKA_WORD_SIZE_IN_BITS;
    uint32_t X = PKA_EXTRA_BITS;

    But I cannot find those values anywhere.

    Also, I have implemented the calculation of NP following the c code, but still reductions are not being performed. Here is what I have

    ... 
    // Example constants for positions
    const TAG_POS: u8 = 0;         // tag of the operand
    const REG_R_POS: u8 = 6;       // Result register position (Bits 6:10)
    const REG_R_CTRL_POS: u8 = 11; // Result register control position (Bit 11)
    const REG_B_POS: u8 = 12;      // Operand B register position (Bits 12:16)
    const REG_B_CTRL_POS: u8 = 17; // Operand B register control position (Bit 17)
    const REG_A_POS: u8 = 18;      // Operand A register position (Bits 18:22)
    const REG_A_CTRL_POS: u8 = 23; // Operand A register control position (Bit 23)
    const LEN_POS: u8 = 24;        // Operand length register index (Bits 24:26)
    const OPCODE_POS: u8 = 27;     // Operation code position (Bits 27:31)
    
    // All virtual registers must be 64 bits word size aligned, and the size of the virtual 
    // registers must be at least the size of the largest operand plus an extra 64 bits 
    // for internal PKA calculations. 
    // These extra 64 bits must be initialized to zero. 
    const MAX_OPERAND_SIZE_BITS: usize = 64 * 4 * 8;
    const OPERAND_SIZE_BITS: usize = 8 * 4 * 8;
    const OPERAND_SIZE_WORDS: usize = OPERAND_SIZE_BITS/8/4;
    const MAX_OPERAND_SIZE_WORDS: usize = MAX_OPERAND_SIZE_BITS/8/4;
    const OPERAND_MEMORY_OFFSET: u32 = (OPERAND_SIZE_BITS as u32)/8/4 + 2;
    const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size
    const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4 + 2;
    
    // tests
    const N: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000015
    ];
    
    const B: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000010
    ];
    
    const A: [u32; 8] = [
        0x00000000, 0x00000000, 0x00000000, 0x00000000,
        0x00000000, 0x00000000, 0x00000000, 0x00000100
    ];
    
    
    #[entry]
    fn main() -> ! {
        info!("Running.");
    
        // Enable the PKA and CryptoCell clock
        let p = pac::Peripherals::take().unwrap();
        let cc_misc = p.cc_misc;
        let cc_pka = p.cc_pka;
    
        p.cryptocell.enable().write(|w| w.enable().set_bit());
        cc_misc.pka_clk().write(|w| w.enable().set_bit());
    
        while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
        // Wait for PKA clock to be ready
        }
        info!("PKA clock ready. PKA engine enabled");
    
        // Operand size
        cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) }); 
        // max opernad size
        cc_pka.pka_l(0).write(|w| unsafe { w.bits(MAX_OPERAND_SIZE_BITS as u32) }); 
    
        // Configure memory map
        configure_memory_map(&cc_pka);
    
        // Clear registers
        clear_pka_registers(&cc_pka);
    
        // Load N
        load_word_array(&cc_pka, 0, &N);
        
        // Calculate Np
        calculate_np(&cc_pka);
    
        // Verify data is well written
        cc_pka.pka_sram_wclear();
        read_word_array(&cc_pka, 0);
        read_word_array(&cc_pka, 1);
    
        // Load data to compute operations
        load_word_array(&cc_pka, 4, &A);
        load_word_array(&cc_pka, 5, &B);
        read_word_array(&cc_pka, 4);
        read_word_array(&cc_pka, 5);
    
        // example operation 4 * 5 = 6
        execute_operation(&cc_pka, cc_pka::opcode::Opcode::ModMul, 6, 4, 5, 0);
    
        cc_pka.pka_sram_wclear();
        read_word_array(&cc_pka, 6);
    
        // exit via semihosting call
        debug::exit(EXIT_SUCCESS);
        loop {}
    }
    
    
    fn configure_memory_map(cc_pka: &pac::CcPka) {
        // Map virtual registers
        // R0: modulus (N)
        // R1: Np
        // R2: a parameter
        // R3: b parameter
        // R4: operand A
        // R5: operand B
        // R6: result
        // R7: temporal
        // R8: temporal
        // T0: register 30
        // T1: register 31
        for i in 0..9 {
            cc_pka.memory_map(i).write(|w| unsafe { 
                w.bits(i as u32 * VIRTUAL_MEMORY_OFFSET) 
            });
        }
        cc_pka.memory_map(30).write(|w| unsafe { 
                w.bits(7 as u32 * VIRTUAL_MEMORY_OFFSET) 
            });
        cc_pka.memory_map(31).write(|w| unsafe { 
            w.bits(8 as u32 * VIRTUAL_MEMORY_OFFSET) 
        });
    }
    
    fn clear_pka_registers(cc_pka: &pac::CcPka) {
        for i in 0..9 {
            cc_pka.pka_sram_waddr().write(|w| unsafe { 
                w.bits(cc_pka.memory_map(i).read().bits()) 
            });
            
            for i in 0..MAX_OPERAND_SIZE_WORDS {
                cc_pka.pka_sram_wdata().write(|w| unsafe { 
                    w.bits(0x00) 
                });
            }
        }
        for i in 30..32 {
            cc_pka.pka_sram_waddr().write(|w| unsafe { 
                w.bits(cc_pka.memory_map(i).read().bits()) 
            });
            
            for i in 0..(64 * 4 * 8) {
                cc_pka.pka_sram_wdata().write(|w| unsafe { 
                    w.bits(0x00) 
                });
            }
        }
    }
    
    fn load_word_array(cc_pka: &pac::CcPka, reg: usize, data: &[u32]) {
        cc_pka.pka_sram_waddr().write(|w| unsafe { 
            w.bits(cc_pka.memory_map(reg).read().bits()) 
        });
        
        // Load data in reverse order
        for i in 0..data.len() {
            let reverse_index = data.len() - 1 - i;
            cc_pka.pka_sram_wdata().write(|w| unsafe { 
                w.bits(data[reverse_index]) 
            });
        }
        // Add padding zeros
        for _ in 0..2 {
            cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
        }
    }
    
    fn read_word_array(cc_pka: &pac::CcPka, reg: usize) {
        cc_pka.pka_sram_raddr().write(|w| unsafe { 
            w.bits(cc_pka.memory_map(reg).read().bits()) 
        });
        let mut verif = [0u32; MAX_OPERAND_SIZE_WORDS];
        for i in 0..MAX_OPERAND_SIZE_WORDS {
            verif[MAX_OPERAND_SIZE_WORDS - 1 -i] = cc_pka.pka_sram_rdata().read().bits();
            // verif[i] = cc_pka.pka_sram_rdata().read().bits();
        }
        info!("Verification of R{:?}: {:#X}", reg, verif);
    }
    
    fn execute_operation(cc_pka: &pac::CcPka, opcode: cc_pka::opcode::Opcode, 
        result_reg: u8, operand_a_reg: u8, operand_b_reg: u8, operand_size_idx: u32) {
        cc_pka.opcode().write(|w| unsafe {
        w.bits(
        ((result_reg as u32) << REG_R_POS)
        | ((operand_b_reg as u32) << REG_B_POS)
        | ((operand_a_reg as u32) << REG_A_POS)
        | (operand_size_idx << LEN_POS)
        | ((opcode as u32) << OPCODE_POS)
        )
        });
    
        while cc_pka.pka_done().read().bits() == 0 {}
    }
    
    
    fn calculate_np(cc_pka: &pac::CcPka) -> () {
    
        let total_bits = OPERAND_SIZE_BITS + 64 + 8 - 1;
    
        // Create big number representing 2^(N+A+X-1)    
        let word_index = total_bits / 32;
        let bit_index = total_bits % 32;
        let mut numerator = [0u32; MAX_OPERAND_SIZE_WORDS];
        numerator[MAX_OPERAND_SIZE_WORDS - 1 - word_index] = 1 << bit_index;
     
        // Load data in reverse order into a temp register
        load_word_array(&cc_pka, 7, &numerator);
     
        // n is already in R0, execute division
        cc_pka.opcode().write(|w| unsafe {
            w.bits(
            ( 1 << REG_R_POS)
            | ( 0 << REG_B_POS)
            | ( 7 << REG_A_POS)
            | ( 0 << LEN_POS)
            | ((cc_pka::opcode::Opcode::Division as u32) << OPCODE_POS)
            )
            });
     }

    The results are still not reduced...

    INFO Verification of R0: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R1: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R4: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x100]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R5: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10]
    └─ crypto_cc310::read_word_array @ src/main.rs:278
    INFO Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFC1]

    Thanks!

  • If operand size in bits according to the PKA_L register is N, then the Np parameter should contain floor(2^(N+64-1)/n), where n is the modulus.

    After doing some testing on my own, it appears the cryptocell has some constraints on n and the operand size. To get correct results, I suggest you to try a larger modulus (bitsize >= 64) and set the operand size in bits (PKA_L) to exactly fit the modulus, or at most 8 bit extra, but not bigger. Not entirely sure though. However, the reference implementation uses 8 extra bits.

Related