I'm working on implementing arithmetic operations using the PKA engine on the nrf52840 in Rust, following the CRYPTOCELL — Arm TrustZone CryptoCell 310 datasheet. While I've successfully implemented basic arithmetic operations, I'm running into several issues with modular operations.
The main problem I'm facing is that modular reductions aren't being completed as expected. For modular addition, the operation only works correctly when the operands are already reduced, specifically when the result is less than 2N-1 (where N is the modulo). The modular multiplication behaves slightly differently - it only reduces the result when it's less than 4N-1. The most problematic operation is modular division, which isn't performing correctly at all, though regular division works fine.I've also noticed something interesting about memory loading. After mapping the virtual registers, I had to load the memory in reverse order to get multiplication to work. The datasheet doesn't provide clear guidance about handling larger values (for edxample, when working with [u32; 2] = [x, x], an array of 32-bit elements), so I'm not entirely sure if this reverse-order approach is correct. Has anyone encountered similar issues or can provide guidance on the correct way to handle these modular operations? I'm particularly interested in understanding if these reduction limits are expected behavior and if my approach to memory loading is correct. Any insights would be greatly appreciated.
fn main() -> ! {
info!("Running.");
// Enable the PKA and CryptoCell clock
let p = pac::Peripherals::take().unwrap();
let cc_misc = p.cc_misc;
let cc_pka = p.cc_pka;
p.cryptocell.enable().write(|w| w.enable().set_bit());
cc_misc.pka_clk().write(|w| w.enable().set_bit());
while cc_misc.clk_status().read().pka_clk().bit_is_clear() {
// Wait for PKA clock to be ready
}
cc_pka.pka_l(1).write(|w| unsafe { w.bits(OPERAND_SIZE_BITS as u32) });
// Configure memory map
cc_pka.memory_map(0).write(|w| unsafe { w.bits(0x0) }); // R0
cc_pka.memory_map(1).write(|w| unsafe { w.bits(VIRTUAL_MEMORY_OFFSET) }); // R1
cc_pka.memory_map(4).write(|w| unsafe { w.bits(2 * VIRTUAL_MEMORY_OFFSET) }); // R4
cc_pka.memory_map(5).write(|w| unsafe { w.bits(3 * VIRTUAL_MEMORY_OFFSET) }); // R5
cc_pka.memory_map(6).write(|w| unsafe { w.bits(4 * VIRTUAL_MEMORY_OFFSET) }); // R6
cc_pka.memory_map(30).write(|w| unsafe { w.bits(5 * VIRTUAL_MEMORY_OFFSET) }); // T0
cc_pka.memory_map(31).write(|w| unsafe { w.bits(6 * VIRTUAL_MEMORY_OFFSET) }); // T1
// Load N (R0) and Np (R1) into PKA SRAM
// Memory is loaded in reverse order
cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(0).read().bits()) });
for i in 0..OPERAND_SIZE_BITS/8/4 {
let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(N[reverse_index]) });
}
// Extra 64 bits (2 words) must be intialized to zero
for i in 0..2 {
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
}
cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(1).read().bits()) });
for i in 0..OPERAND_SIZE_BITS/8/4 {
let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(NP[reverse_index]) });
}
// Extra 64 bits (2 words) must be intialized to zero
for i in 0..2 {
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
}
cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(4).read().bits()) });
// FIXME add bound check on A
for i in 0..OPERAND_SIZE_BITS/8/4 {
let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(A[reverse_index])});
}
// Extra 64 bits (2 words) must be intialized to zero
for i in 0..2 {
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
}
cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(5).read().bits()) });
for i in 0..OPERAND_SIZE_BITS/8/4 {
let reverse_index = OPERAND_SIZE_BITS/8/4 - 1 - i;
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(B[reverse_index])});
}
// Extra 64 bits (2 words) must be intialized to zero
for i in 0..2 {
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00) });
}
cc_pka.pka_sram_waddr().write(|w| unsafe { w.bits(cc_pka.memory_map(6).read().bits()) });
for i in 0..OPERAND_SIZE_BITS/8/4 + 2 {
cc_pka.pka_sram_wdata().write(|w| unsafe { w.bits(0x00)});
}
// Execute the operation
cc_pka.opcode().write(|w| unsafe {
w.bits(
(6 << REG_R_POS as u32) // Result register (R6)
| (5 << REG_B_POS as u32) // Operand B register (R5)
| (4 << REG_A_POS as u32) // Operand A register (R4)
| (1 << LEN_POS as u32)
| ((Opcode::ModDiv as u32) << OPCODE_POS as u32)
)
});
// Wait for the operation to complete
while cc_pka.pka_done().read().bits() == 0 {}
// exit via semihosting call
debug::exit(EXIT_SUCCESS);
loop {}
}