Issue with PKA Engine implementation in Cryptocell 310. Modular multiplication not being performed properly

Hello all,

I have been working on the implementation of the PKA engine on CC310, and have been encountering some unexpected behaviours when using the modular multiplication and modular exponentiation.

For context, I am doing the implementation in Rust, and the code that execute the operation looks as follows:

fn execute_operation(cc_pka: &pac::CcPka, opcode: cc_pka::opcode::Opcode, 
    result_reg: u8, operand_a_reg: u8, operand_a_ctrl: u8, operand_b_reg: u8, operand_b_ctrl: u8, operand_size_idx: u32) {
    cc_pka.opcode().write(|w| unsafe {
    w.bits(
    ((result_reg as u32) << REG_R_POS)
    | ((operand_b_reg as u32) << REG_B_POS)
    | ((operand_b_ctrl as u32) << REG_B_CTRL_POS)
    | ((operand_a_reg as u32) << REG_A_POS)
    | ((operand_a_ctrl as u32) << REG_A_CTRL_POS)
    | (operand_size_idx << LEN_POS)
    | ((opcode as u32) << OPCODE_POS)
    )
    });


    while cc_pka.pka_done().read().bits() == 0 {}


    // We enforce an additional reduction
    cc_pka.opcode().write(|w| unsafe {
        w.bits(
        ((result_reg as u32) << REG_R_POS)
        | ((0 as u32) << REG_B_POS)
        | ((0 as u32) << REG_B_CTRL_POS)
        | ((result_reg as u32) << REG_A_POS)
        | ((0 as u32) << REG_A_CTRL_POS)
        | (1 << LEN_POS)
        | ((cc_pka::opcode::Opcode::Reduction as u32) << OPCODE_POS)
        )
        });
    
    while cc_pka.pka_done().read().bits() == 0 {}

    

}

 I have to enforce an additional reduction since otherwise it is not correctly implemented. I am guessing this has to do with the computation of NP, which I am not sure how to calculate and I have no test vectors to compare. Also, I have found different methods to compute it (currently following https://github.com/ARM-software/cryptocell-312-runtime/blob/update-cc110-bu-00000-r1p4/codesafe/src/crypto_api/pki/common/pka.c#L561 line 561).

One of the main problems I can not explain is the following: when storing the result of an operation, depending on which register I chose to store it in, I have a correct value or a random value for the output. So, in the following example:

const N: [u32; 8] = [
    0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000,
    0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
];

const TEST_A: [u32; 8] = [
    0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000, 
    0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFE
];

const TEST_B: [u32; 8] = [
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02
];

I perform the modular multiplication of TEST_A * TEST_B mod N.

When I store this result in register 6: Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0x1, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFF5], value reads correct. But if I select R7 as result register, I obtain: Verification of R7: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0].

The way I am defining the registers of the virtual memory is as follows, where VIRTUAL_MEMORY_OFFSET is defined as per the specification, allowing for 64 words between registers :

const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size
const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4;

fn configure_memory_map(cc_pka: &pac::CcPka) {
    // Map virtual registers
    for i in 0..13 {
        cc_pka.memory_map(i).write(|w| unsafe { 
            w.bits(i as u32 * VIRTUAL_MEMORY_OFFSET) 
        });
    }
    cc_pka.memory_map(30).write(|w| unsafe { 
            w.bits(7 as u32 * VIRTUAL_MEMORY_OFFSET) 
        });
    cc_pka.memory_map(31).write(|w| unsafe { 
        w.bits(8 as u32 * VIRTUAL_MEMORY_OFFSET) 
    });
}

The main code looks as follows:

#[entry]
fn main() -> ! {
    info!("Running.");

    // Enable the PKA and CryptoCell clock
    let p = pac::Peripherals::take().unwrap();
    let cc_misc = p.cc_misc;
    let cc_pka = p.cc_pka;

    p.cryptocell.enable().write(|w| w.enable().set_bit());
    cc_misc.pka_clk().write(|w| w.enable().set_bit());

    while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
    // Wait for PKA clock to be ready
    }
    // Reset PKA
    cc_pka.pka_sw_reset();

    info!("PKA clock ready. PKA engine enabled");
    // max opernad size
    cc_pka.pka_l(0).write(|w| unsafe { w.bits(MAX_OPERAND_SIZE_BITS as u32) });
    // Operand size
    cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) }); 
    // NP operand size 
    cc_pka.pka_l(2).write(|w| unsafe { w.bits(DOUBLE_OPERAND_SIZE_BITS as u32) }); 

    // Configure memory map
    configure_memory_map(&cc_pka);

    // Clear registers
    clear_pka_registers(&cc_pka);

    // Load N
    load_word_array(&cc_pka, 0, &N);   
    load_word_array(&cc_pka, 1, &NP);


    // Load data to compute operations
    load_word_array(&cc_pka, 4, &TEST_A);
    load_word_array(&cc_pka, 5, &TEST_B);


    let mut buffer = [0u32; 2*OPERAND_SIZE_WORDS];

    // example operation
    execute_operation(&cc_pka, cc_pka::opcode::Opcode::ModMul, 7, 4, 0, 5, 0, 1);
    
    
    cc_pka.pka_sram_wclear();
    read_word_array(&cc_pka, 6, &mut buffer);
    read_word_array(&cc_pka, 7, &mut buffer);


    // exit via semihosting call
    debug::exit(EXIT_SUCCESS);
    loop {}
}

The way I load values into the memory, following a little endian convention (so the last element of the array is loaded first, for the examples I provided):

fn load_word_array(cc_pka: &pac::CcPka, reg: usize, data: &[u32]) {
    cc_pka.pka_sram_waddr().write(|w| unsafe { 
        w.bits(cc_pka.memory_map(reg).read().bits()) 
    });
    
    // Load data in reverse order (little endian: least significative go first)
    for i in 0..data.len() {
        let reverse_index = data.len() - 1 - i;
        cc_pka.pka_sram_wdata().write(|w| unsafe { 
            w.bits(data[reverse_index]) 
        });
    }
    // Add padding zeros
    for _ in 0..8 {
        cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
    }
}

I am not sure if the problem has to do with the virtual memory configuration, or some other issue, but I cannot find any documentation regarding this. Any help would be very much appreciated.

Cheers,

Elsa

Parents
  • Hi,

    One of the main problems I can not explain is the following: when storing the result of an operation, depending on which register I chose to store it in, I have a correct value or a random value for the output.

    (...)

    But if I select R7 as result register, I obtain: Verification of R7: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0].

    It looks like LLVM (and thereby Rust) uses r7 for the frame pointer on ARM targets. Maybe that might explain oddities when using the r7 register for result register? Here is an ARM miscompilation isuse related to r7, from a few years back, for the main rust project, which may or may not be related: Issues/miscompilation around ARM T32 frame pointer with new asm syntax #73450

    Regards,
    Terje

Reply
  • Hi,

    One of the main problems I can not explain is the following: when storing the result of an operation, depending on which register I chose to store it in, I have a correct value or a random value for the output.

    (...)

    But if I select R7 as result register, I obtain: Verification of R7: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0].

    It looks like LLVM (and thereby Rust) uses r7 for the frame pointer on ARM targets. Maybe that might explain oddities when using the r7 register for result register? Here is an ARM miscompilation isuse related to r7, from a few years back, for the main rust project, which may or may not be related: Issues/miscompilation around ARM T32 frame pointer with new asm syntax #73450

    Regards,
    Terje

Children
No Data
Related