i have small applications which can comfortably fit in as little as 32K of SRAM.... i have no issues with the mechanics of linking my program, as well as copying code from FLASH to SRAM at startup... once in SRAM, i generally don't need the cache -- in fact, the application *IS* the cache, and could potentially overlay other code segments in SRAM as the program switches between major modes of operation....
my questions largely concern the performance gains (execution time *and* power consumption) i could potentially reap from this approach.... when using your M0 devices, the upside is easy to obtain; and on your M4 devices (which have IMEM aliases for SRAM blocks) the harvard-architecture CPU can simultanously fetch code/data without stalling....
the M33 on the nRF54L is a different story.... when the CPU tries to fetch code/data simultaneously, what happens when *both* are located in different banks of SRAM??? can separate banks be accessed in parallel without stalling???
related to this is my experience with your small (2K?) CACHE.... when running a program that consists of (say) a 16K sequence of instructions, the SRAM based version does outperforms a version in FLASH (with cache enabled) by about 25%.... with CACHE disabled, the FLASH program slows down by 3X -- so the CACHE really is doing its thing!!!
what i don't really understand is what's happening when a run a more "typical" program which is larger than your cache and jumps about.... i see virtually no difference in this case between the SRAM versus FLASH/CACHE versions in terms of execution time.... is your cache really that good???
but even if i don't see any gains in execution time, i'm also curious about potential gains in *active power*.... perhaps i missed something once i loaded SRAM from FLASH at startup, but i'd like to *disable* the FLASH/CACHE at this point forward.... presumably i could see some efficiency gains in *not* having the FLASH memory and CACHE circuitry powered when executing....
i'm also aware of *sleep power* tradeoffs with this approach -- is it better to retain code in SRAM or reload it upon awakening....
but right now, i'm looking to optimize *active power* with this approach.... any insights into the nRF54L MCU architecture to help this cause would be greatly appreciated.... i'm also curious if this approach would work (better??) one the RISC-V core -- which i assume also has a harvard-architecture....