Issue with PKA Engine implementation in Cryptocell 310. Modular multiplication not being performed properly

Hello all,

I have been working on the implementation of the PKA engine on CC310, and have been encountering some unexpected behaviours when using the modular multiplication and modular exponentiation.

For context, I am doing the implementation in Rust, and the code that execute the operation looks as follows:

fn execute_operation(cc_pka: &pac::CcPka, opcode: cc_pka::opcode::Opcode, 
    result_reg: u8, operand_a_reg: u8, operand_a_ctrl: u8, operand_b_reg: u8, operand_b_ctrl: u8, operand_size_idx: u32) {
    cc_pka.opcode().write(|w| unsafe {
    w.bits(
    ((result_reg as u32) << REG_R_POS)
    | ((operand_b_reg as u32) << REG_B_POS)
    | ((operand_b_ctrl as u32) << REG_B_CTRL_POS)
    | ((operand_a_reg as u32) << REG_A_POS)
    | ((operand_a_ctrl as u32) << REG_A_CTRL_POS)
    | (operand_size_idx << LEN_POS)
    | ((opcode as u32) << OPCODE_POS)
    )
    });


    while cc_pka.pka_done().read().bits() == 0 {}


    // We enforce an additional reduction
    cc_pka.opcode().write(|w| unsafe {
        w.bits(
        ((result_reg as u32) << REG_R_POS)
        | ((0 as u32) << REG_B_POS)
        | ((0 as u32) << REG_B_CTRL_POS)
        | ((result_reg as u32) << REG_A_POS)
        | ((0 as u32) << REG_A_CTRL_POS)
        | (1 << LEN_POS)
        | ((cc_pka::opcode::Opcode::Reduction as u32) << OPCODE_POS)
        )
        });
    
    while cc_pka.pka_done().read().bits() == 0 {}

    

}

 I have to enforce an additional reduction since otherwise it is not correctly implemented. I am guessing this has to do with the computation of NP, which I am not sure how to calculate and I have no test vectors to compare. Also, I have found different methods to compute it (currently following https://github.com/ARM-software/cryptocell-312-runtime/blob/update-cc110-bu-00000-r1p4/codesafe/src/crypto_api/pki/common/pka.c#L561 line 561).

One of the main problems I can not explain is the following: when storing the result of an operation, depending on which register I chose to store it in, I have a correct value or a random value for the output. So, in the following example:

const N: [u32; 8] = [
    0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000,
    0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
];

const TEST_A: [u32; 8] = [
    0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000, 
    0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFE
];

const TEST_B: [u32; 8] = [
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02
];

I perform the modular multiplication of TEST_A * TEST_B mod N.

When I store this result in register 6: Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0x1, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFF5], value reads correct. But if I select R7 as result register, I obtain: Verification of R7: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0].

The way I am defining the registers of the virtual memory is as follows, where VIRTUAL_MEMORY_OFFSET is defined as per the specification, allowing for 64 words between registers :

const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size
const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4;

fn configure_memory_map(cc_pka: &pac::CcPka) {
    // Map virtual registers
    for i in 0..13 {
        cc_pka.memory_map(i).write(|w| unsafe { 
            w.bits(i as u32 * VIRTUAL_MEMORY_OFFSET) 
        });
    }
    cc_pka.memory_map(30).write(|w| unsafe { 
            w.bits(7 as u32 * VIRTUAL_MEMORY_OFFSET) 
        });
    cc_pka.memory_map(31).write(|w| unsafe { 
        w.bits(8 as u32 * VIRTUAL_MEMORY_OFFSET) 
    });
}

The main code looks as follows:

#[entry]
fn main() -> ! {
    info!("Running.");

    // Enable the PKA and CryptoCell clock
    let p = pac::Peripherals::take().unwrap();
    let cc_misc = p.cc_misc;
    let cc_pka = p.cc_pka;

    p.cryptocell.enable().write(|w| w.enable().set_bit());
    cc_misc.pka_clk().write(|w| w.enable().set_bit());

    while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
    // Wait for PKA clock to be ready
    }
    // Reset PKA
    cc_pka.pka_sw_reset();

    info!("PKA clock ready. PKA engine enabled");
    // max opernad size
    cc_pka.pka_l(0).write(|w| unsafe { w.bits(MAX_OPERAND_SIZE_BITS as u32) });
    // Operand size
    cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) }); 
    // NP operand size 
    cc_pka.pka_l(2).write(|w| unsafe { w.bits(DOUBLE_OPERAND_SIZE_BITS as u32) }); 

    // Configure memory map
    configure_memory_map(&cc_pka);

    // Clear registers
    clear_pka_registers(&cc_pka);

    // Load N
    load_word_array(&cc_pka, 0, &N);   
    load_word_array(&cc_pka, 1, &NP);


    // Load data to compute operations
    load_word_array(&cc_pka, 4, &TEST_A);
    load_word_array(&cc_pka, 5, &TEST_B);


    let mut buffer = [0u32; 2*OPERAND_SIZE_WORDS];

    // example operation
    execute_operation(&cc_pka, cc_pka::opcode::Opcode::ModMul, 7, 4, 0, 5, 0, 1);
    
    
    cc_pka.pka_sram_wclear();
    read_word_array(&cc_pka, 6, &mut buffer);
    read_word_array(&cc_pka, 7, &mut buffer);


    // exit via semihosting call
    debug::exit(EXIT_SUCCESS);
    loop {}
}

The way I load values into the memory, following a little endian convention (so the last element of the array is loaded first, for the examples I provided):

fn load_word_array(cc_pka: &pac::CcPka, reg: usize, data: &[u32]) {
    cc_pka.pka_sram_waddr().write(|w| unsafe { 
        w.bits(cc_pka.memory_map(reg).read().bits()) 
    });
    
    // Load data in reverse order (little endian: least significative go first)
    for i in 0..data.len() {
        let reverse_index = data.len() - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { 
            w.bits(data[reverse_index]) 
        });
    }
    // Add padding zeros
    for _ in 0..8 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
}

I am not sure if the problem has to do with the virtual memory configuration, or some other issue, but I cannot find any documentation regarding this. Any help would be very much appreciated.

Cheers,

Elsa

Related