executing small applications from SRAM -- performance implications

Question

i have small applications which can comfortably fit in as little as 32K of SRAM.... i have no issues with the mechanics of linking my program, as well as copying code from FLASH to SRAM at startup... once in SRAM, i generally don't need the cache -- in fact, the application *IS* the cache, and could potentially overlay other code segments in SRAM as the program switches between major modes of operation.... my questions largely concern the performance gains (execution time *and* power consumption) i could potentially reap from this approach.... when using your M0 devices, the upside is easy to obtain; and on your M4 devices (which have IMEM aliases for SRAM blocks) the harvard-architecture CPU can simultanously fetch code/data without stalling.... 
 the M33 on the nRF54L is a different story.... when the CPU tries to fetch code/data simultaneously, what happens when *both* are located in different banks of SRAM??? can separate banks be accessed in parallel without stalling??? 
 related to this is my experience with your small (2K?) CACHE.... when running a program that consists of (say) a 16K sequence of instructions, the SRAM based version does outperforms a version in FLASH (with cache enabled) by about 25%.... with CACHE disabled, the FLASH program slows down by 3X -- so the CACHE really is doing its thing!!! 
 what i don't really understand is what's happening when a run a more "typical" program which is larger than your cache and jumps about.... i see virtually no difference in this case between the SRAM versus FLASH/CACHE versions in terms of execution time.... is your cache really that good??? 
 but even if i don't see any gains in execution time, i'm also curious about potential gains in *active power*.... perhaps i missed something once i loaded SRAM from FLASH at startup, but i'd like to *disable* the FLASH/CACHE at this point forward.... presumably i could see some efficiency gains in *not* having the FLASH memory and CACHE circuitry powered when executing.... 
 i'm also aware of *sleep power* tradeoffs with this approach -- is it better to retain code in SRAM or reload it upon awakening.... 
 but right now, i'm looking to optimize *active power* with this approach.... any insights into the nRF54L MCU architecture to help this cause would be greatly appreciated.... i'm also curious if this approach would work (better??) one the RISC-V core -- which i assume also has a harvard-architecture....

Einar Thorsrud · Answer

Hi, 
 The RRAM (NVM) is automatically powered down when not used. In the scenario you describe I assume you are in practice executing the whole coremark like test from cache, so RRAM has no practical effect an you are effectively comparing current consumption execution from Cache with execution from RAM, and then the cache is slightly more power efficient.

executing small applications from SRAM -- performance implications

Top Replies