Hello all,
I have been working on the implementation of the PKA engine on CC310, and have been encountering some unexpected behaviours when using the modular multiplication and modular exponentiation.
For context, I am doing the implementation in Rust, and the code that execute the operation looks as follows:
fn execute_operation(cc_pka: &pac::CcPka, opcode: cc_pka::opcode::Opcode, result_reg: u8, operand_a_reg: u8, operand_a_ctrl: u8, operand_b_reg: u8, operand_b_ctrl: u8, operand_size_idx: u32) { cc_pka.opcode().write(|w| unsafe { w.bits( ((result_reg as u32) << REG_R_POS) | ((operand_b_reg as u32) << REG_B_POS) | ((operand_b_ctrl as u32) << REG_B_CTRL_POS) | ((operand_a_reg as u32) << REG_A_POS) | ((operand_a_ctrl as u32) << REG_A_CTRL_POS) | (operand_size_idx << LEN_POS) | ((opcode as u32) << OPCODE_POS) ) }); while cc_pka.pka_done().read().bits() == 0 {} // We enforce an additional reduction cc_pka.opcode().write(|w| unsafe { w.bits( ((result_reg as u32) << REG_R_POS) | ((0 as u32) << REG_B_POS) | ((0 as u32) << REG_B_CTRL_POS) | ((result_reg as u32) << REG_A_POS) | ((0 as u32) << REG_A_CTRL_POS) | (1 << LEN_POS) | ((cc_pka::opcode::Opcode::Reduction as u32) << OPCODE_POS) ) }); while cc_pka.pka_done().read().bits() == 0 {} }
I have to enforce an additional reduction since otherwise it is not correctly implemented. I am guessing this has to do with the computation of NP, which I am not sure how to calculate and I have no test vectors to compare. Also, I have found different methods to compute it (currently following https://github.com/ARM-software/cryptocell-312-runtime/blob/update-cc110-bu-00000-r1p4/codesafe/src/crypto_api/pki/common/pka.c#L561 line 561).
One of the main problems I can not explain is the following: when storing the result of an operation, depending on which register I chose to store it in, I have a correct value or a random value for the output. So, in the following example:
const N: [u32; 8] = [ 0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, ]; const TEST_A: [u32; 8] = [ 0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFE ]; const TEST_B: [u32; 8] = [ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02 ];
I perform the modular multiplication of TEST_A * TEST_B mod N.
When I store this result in register 6: Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0x1, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFF5], value reads correct. But if I select R7 as result register, I obtain: Verification of R7: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0].
The way I am defining the registers of the virtual memory is as follows, where VIRTUAL_MEMORY_OFFSET is defined as per the specification, allowing for 64 words between registers :
const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4; fn configure_memory_map(cc_pka: &pac::CcPka) { // Map virtual registers for i in 0..13 { cc_pka.memory_map(i).write(|w| unsafe { w.bits(i as u32 * VIRTUAL_MEMORY_OFFSET) }); } cc_pka.memory_map(30).write(|w| unsafe { w.bits(7 as u32 * VIRTUAL_MEMORY_OFFSET) }); cc_pka.memory_map(31).write(|w| unsafe { w.bits(8 as u32 * VIRTUAL_MEMORY_OFFSET) }); }
The main code looks as follows:
#[entry] fn main() -> ! { info!("Running."); // Enable the PKA and CryptoCell clock let p = pac::Peripherals::take().unwrap(); let cc_misc = p.cc_misc; let cc_pka = p.cc_pka; p.cryptocell.enable().write(|w| w.enable().set_bit()); cc_misc.pka_clk().write(|w| w.enable().set_bit()); while cc_misc.clk_status().read().pka_clk().bit_is_clear() { // Wait for PKA clock to be ready } // Reset PKA cc_pka.pka_sw_reset(); info!("PKA clock ready. PKA engine enabled"); // max opernad size cc_pka.pka_l(0).write(|w| unsafe { w.bits(MAX_OPERAND_SIZE_BITS as u32) }); // Operand size cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) }); // NP operand size cc_pka.pka_l(2).write(|w| unsafe { w.bits(DOUBLE_OPERAND_SIZE_BITS as u32) }); // Configure memory map configure_memory_map(&cc_pka); // Clear registers clear_pka_registers(&cc_pka); // Load N load_word_array(&cc_pka, 0, &N); load_word_array(&cc_pka, 1, &NP); // Load data to compute operations load_word_array(&cc_pka, 4, &TEST_A); load_word_array(&cc_pka, 5, &TEST_B); let mut buffer = [0u32; 2*OPERAND_SIZE_WORDS]; // example operation execute_operation(&cc_pka, cc_pka::opcode::Opcode::ModMul, 7, 4, 0, 5, 0, 1); cc_pka.pka_sram_wclear(); read_word_array(&cc_pka, 6, &mut buffer); read_word_array(&cc_pka, 7, &mut buffer); // exit via semihosting call debug::exit(EXIT_SUCCESS); loop {} }
The way I load values into the memory, following a little endian convention (so the last element of the array is loaded first, for the examples I provided):
fn load_word_array(cc_pka: &pac::CcPka, reg: usize, data: &[u32]) { cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(reg).read().bits()) }); // Load data in reverse order (little endian: least significative go first) for i in 0..data.len() { let reverse_index = data.len() - 1 - i; cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(data[reverse_index]) }); } // Add padding zeros for _ in 0..8 { cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) }); } }
I am not sure if the problem has to do with the virtual memory configuration, or some other issue, but I cannot find any documentation regarding this. Any help would be very much appreciated.
Cheers,
Elsa