FPU performances calculation - optimizing real time math

Hi

I am using nRF52832, S132 / SDK17, implementing an algorithm requiring some math,

for example I am doing a matrix multiplication with about 800 float multiplications, I understand that a multiplication taking 3 cycles from the ARM-M4, working with 32MHz; and optimizing for time, I am seeing it 2400 cycles to take more than 200us -Does that make sense?

is there some way (not algorithmically that is) to improve those performances? some other optimizing flag to be raised, FPU enableing? a way to allocate the memory to be more efficiently accessed? 

Is there some example/reference you can refer me to?

Thanks!

Parents
  • Hi

    Okay, so I've not been able to track down any expected numbers on the computation time estimations. For just the calculations I think the timings you refer to are correct, but if you also write this answer to a buffer, read the next input value from another buffer, and then update two buffer pointers, then the ~16 clock cycles you're seeing in 200µs for 800 computations will start to make sense I think, as some cycles will be lost to data handling as well. How exactly are you doing these operations on your end? It is also possible to check disassembly to see exactly what's happening.

    Best regards,

    Simon

Reply
  • Hi

    Okay, so I've not been able to track down any expected numbers on the computation time estimations. For just the calculations I think the timings you refer to are correct, but if you also write this answer to a buffer, read the next input value from another buffer, and then update two buffer pointers, then the ~16 clock cycles you're seeing in 200µs for 800 computations will start to make sense I think, as some cycles will be lost to data handling as well. How exactly are you doing these operations on your end? It is also possible to check disassembly to see exactly what's happening.

    Best regards,

    Simon

Children
No Data
Related