executing small applications from SRAM -- performance implications

i have small applications which can comfortably fit in as little as 32K of SRAM....  i have no issues with the mechanics of linking my program, as well as copying code from FLASH to SRAM at startup...  once in SRAM, i generally don't need the cache -- in fact, the application *IS* the cache, and could potentially overlay other code segments in SRAM as the program switches between major modes of operation....

my questions largely concern the performance gains (execution time *and* power consumption) i could potentially reap from this approach....  when using your M0 devices, the upside is easy to obtain; and on your M4 devices (which have IMEM aliases for SRAM blocks) the harvard-architecture CPU can simultanously fetch code/data without stalling....

the M33 on the nRF54L is a different story.... when the CPU tries to fetch code/data simultaneously, what happens when *both* are located in different banks of SRAM???  can separate banks be accessed in parallel without stalling???

related to this is my experience with your small (2K?) CACHE....  when running a program that consists of (say) a 16K sequence of instructions, the SRAM based version does outperforms a version in FLASH (with cache enabled) by about 25%....  with CACHE disabled, the FLASH program slows down by 3X -- so the CACHE really is doing its thing!!!

what i don't really understand is what's happening when a run a more "typical" program which is larger than your cache and jumps about....  i see virtually no difference in this case between the SRAM versus FLASH/CACHE versions in terms of execution time....  is your cache really that good???

but even if i don't see any gains in execution time, i'm also curious about potential gains in *active power*....  perhaps i missed something once i loaded SRAM from FLASH at startup, but i'd like to *disable* the FLASH/CACHE at this point forward....  presumably i could see some efficiency gains in *not* having the FLASH memory and CACHE circuitry powered when executing....

i'm also aware of *sleep power* tradeoffs with this approach -- is it better to retain code in SRAM or reload it upon awakening....

but right now, i'm looking to optimize *active power* with this approach....  any insights into the nRF54L MCU architecture to help this cause would be greatly appreciated....  i'm also curious if this approach would work (better??) one the RISC-V core -- which i assume also has a harvard-architecture....

Parents Reply Children
No Data
Related