Hello all,
I have been working on the implementation of the PKA engine on CC310, and have been encountering some unexpected behaviours when using the modular multiplication and modular exponentiation.
For context, I am doing the implementation in Rust, and the code that execute the operation looks as follows:
fn execute_operation(cc_pka: &pac::CcPka, opcode: cc_pka::opcode::Opcode,
result_reg: u8, operand_a_reg: u8, operand_a_ctrl: u8, operand_b_reg: u8, operand_b_ctrl: u8, operand_size_idx: u32) {
cc_pka.opcode().write(|w| unsafe {
w.bits(
((result_reg as u32) << REG_R_POS)
| ((operand_b_reg as u32) << REG_B_POS)
| ((operand_b_ctrl as u32) << REG_B_CTRL_POS)
| ((operand_a_reg as u32) << REG_A_POS)
| ((operand_a_ctrl as u32) << REG_A_CTRL_POS)
| (operand_size_idx << LEN_POS)
| ((opcode as u32) << OPCODE_POS)
)
});
while cc_pka.pka_done().read().bits() == 0 {}
// We enforce an additional reduction
cc_pka.opcode().write(|w| unsafe {
w.bits(
((result_reg as u32) << REG_R_POS)
| ((0 as u32) << REG_B_POS)
| ((0 as u32) << REG_B_CTRL_POS)
| ((result_reg as u32) << REG_A_POS)
| ((0 as u32) << REG_A_CTRL_POS)
| (1 << LEN_POS)
| ((cc_pka::opcode::Opcode::Reduction as u32) << OPCODE_POS)
)
});
while cc_pka.pka_done().read().bits() == 0 {}
}
I have to enforce an additional reduction since otherwise it is not correctly implemented. I am guessing this has to do with the computation of NP, which I am not sure how to calculate and I have no test vectors to compare. Also, I have found different methods to compute it (currently following https://github.com/ARM-software/cryptocell-312-runtime/blob/update-cc110-bu-00000-r1p4/codesafe/src/crypto_api/pki/common/pka.c#L561 line 561).
One of the main problems I can not explain is the following: when storing the result of an operation, depending on which register I chose to store it in, I have a correct value or a random value for the output. So, in the following example:
const N: [u32; 8] = [
0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000,
0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
];
const TEST_A: [u32; 8] = [
0xFFFFFFFF, 0x00000001, 0x00000000, 0x00000000,
0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFE
];
const TEST_B: [u32; 8] = [
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02
];
I perform the modular multiplication of TEST_A * TEST_B mod N.
When I store this result in register 6: Verification of R6: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0x1, 0x0, 0x0, 0x0, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFF5], value reads correct. But if I select R7 as result register, I obtain: Verification of R7: [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0].
The way I am defining the registers of the virtual memory is as follows, where VIRTUAL_MEMORY_OFFSET is defined as per the specification, allowing for 64 words between registers :
const VIRTUAL_MEMORY_SIZE_BITS: usize = 64 * 4 * 8; // 64-bit word size
const VIRTUAL_MEMORY_OFFSET: u32 = (VIRTUAL_MEMORY_SIZE_BITS as u32)/8/4;
fn configure_memory_map(cc_pka: &pac::CcPka) {
// Map virtual registers
for i in 0..13 {
cc_pka.memory_map(i).write(|w| unsafe {
w.bits(i as u32 * VIRTUAL_MEMORY_OFFSET)
});
}
cc_pka.memory_map(30).write(|w| unsafe {
w.bits(7 as u32 * VIRTUAL_MEMORY_OFFSET)
});
cc_pka.memory_map(31).write(|w| unsafe {
w.bits(8 as u32 * VIRTUAL_MEMORY_OFFSET)
});
}
The main code looks as follows:
#[entry]
fn main() -> ! {
info!("Running.");
// Enable the PKA and CryptoCell clock
let p = pac::Peripherals::take().unwrap();
let cc_misc = p.cc_misc;
let cc_pka = p.cc_pka;
p.cryptocell.enable().write(|w| w.enable().set_bit());
cc_misc.pka_clk().write(|w| w.enable().set_bit());
while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
// Wait for PKA clock to be ready
}
// Reset PKA
cc_pka.pka_sw_reset();
info!("PKA clock ready. PKA engine enabled");
// max opernad size
cc_pka.pka_l(0).write(|w| unsafe { w.bits(MAX_OPERAND_SIZE_BITS as u32) });
// Operand size
cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) });
// NP operand size
cc_pka.pka_l(2).write(|w| unsafe { w.bits(DOUBLE_OPERAND_SIZE_BITS as u32) });
// Configure memory map
configure_memory_map(&cc_pka);
// Clear registers
clear_pka_registers(&cc_pka);
// Load N
load_word_array(&cc_pka, 0, &N);
load_word_array(&cc_pka, 1, &NP);
// Load data to compute operations
load_word_array(&cc_pka, 4, &TEST_A);
load_word_array(&cc_pka, 5, &TEST_B);
let mut buffer = [0u32; 2*OPERAND_SIZE_WORDS];
// example operation
execute_operation(&cc_pka, cc_pka::opcode::Opcode::ModMul, 7, 4, 0, 5, 0, 1);
cc_pka.pka_sram_wclear();
read_word_array(&cc_pka, 6, &mut buffer);
read_word_array(&cc_pka, 7, &mut buffer);
// exit via semihosting call
debug::exit(EXIT_SUCCESS);
loop {}
}
The way I load values into the memory, following a little endian convention (so the last element of the array is loaded first, for the examples I provided):
fn load_word_array(cc_pka: &pac::CcPka, reg: usize, data: &[u32]) {
cc_pka.pka_sram_waddr().write(|w| unsafe {
w.bits(cc_pka.memory_map(reg).read().bits())
});
// Load data in reverse order (little endian: least significative go first)
for i in 0..data.len() {
let reverse_index = data.len() - 1 - i;
cc_pka.pka_sram_wdata().write(|w| unsafe {
w.bits(data[reverse_index])
});
}
// Add padding zeros
for _ in 0..8 {
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
}
}
I am not sure if the problem has to do with the virtual memory configuration, or some other issue, but I cannot find any documentation regarding this. Any help would be very much appreciated.
Cheers,
Elsa