Achieving Nanosecond Precise Timing using ASM

I'm attempting to achieve nano-second precision delays by injecting ASM code like so:

__ASM ( \
" NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t" \
" NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t NOP\n\t" \
" NOP\n\t NOP\n\t NOP\n\t" \
); \
This works relatively consistently, however, the delay time doesn't amount to what I expect.
This particular code is supposed to delay 296.875 ns; I am running on an nRF5340 running at 64 MHz (therefore, the clock period is 15.625 ns), and there are 19 "NOP" commands, which should take 19 cycles, amounting to the 296.875 ns delay. However, measuring the delay on a logic analyzer, I'm seeing closer to 220-250 ns. For larger delays, the discrepancy is even larger. I would just like to see if anyone knows why this is happening.
Related