The title says it all. Code to reproduce:
#include <zephyr/random/random.h>
printf("Hello!\n");
for (int i = 1; i <= 1000000; i++) {
if (i > 9990) {
printf("%d\n", i);
}
uint32_t buf[1] = {0};
sys_csrand_get(buf, sizeof(buf));
}
Put this at the very first thing in main() in e.g. the cellular/udp sample (configured to use nrf9160dk/nrf9160/ns with the latest nrf connect sdk v3.2.4) and run. At the 10000th iteration, the code will crash. The buffer size per iteration does not seem to have any impact on the number of iterations before crashing. My suspicion is that the implementation fetches more entropy or something each 10000 iterations, causing a longer call chain. The cc310 crypto library seems to be closed source, so it's not possible to debug.
The bug is triggered when using optimization setting "Use project default" or "Optimize for debugging (-Og)", but not with "Optimize for size (-Os)" or "Optimize for speed (-O2)".
It seems the ns_agent_tz_stack is made too small, and the issue is actually a stack overflow that happens inside the secure tfm (while running in thread mode, i.e. using the psp_s stack pointer). But if the code is optimized, I guess it optimizes the stack usage, preventing the issue.
After changing the PLATFORM_SP_STACK_SIZE define from 0x500 to e.g. 0x700, the issue doesn't happen anymore. Note that the size of this stack seems to be hardcoded in the SDK code and it is not meant to be modified by users if I understand correctly. Therefore it should be seen as a bug in the SDK that needs to be fixed.
It is problematic that it happens after the 10000th time the sys_csrand_get function is called, as that might hide the existence of the bug if you send out units in production and they suddenly crash after a while.