I have been trying to use the secure services to read random from the CC310 and have been getting unpredictable crashes. I have been able to modify the secure_services sample code to create a crash, but it's the oddest thing. The combination of CONFIG_LOG=y and CONFIG_DK_LIBRARY=y along with a call to dk_buttons_init() is enough to cause the exception. No buttons need to be pressed and no log messages generated by me.
I'm using ncs 1.2.0 on a nRF9160 DK. Can anybody explain what's happening? Modified secure_services sample attached
The behavior changes to "working" by increasing CONFIG_DK_LIBRARY_BUTTON_SCAN_INTERVAL in prj.conf from the default of 10 to 100. I can't even come up with an explanation for this
What an interesting problem you have found.
I have done some testing myself, and the fault only happens if the buttons are initialized. Logging does not matter, other than for printing the fault message.
It also only happens when we try to get random numbers and not any of the other secure services.
Based on your comment regarding the scan interval, I expect it is a race condition related to the transition between the secure and non-secure domains.
However, I will have do look deeper into it next week to be able to pinpoint the cause of the problem.
I'm glad you see it too. I looked into it a little bit and it seemed to be related to the button_scan_fn or maybe zephyr's handling of the workq, but that's as far as I got.
I have found another way of avoiding the crash: changing the system workqueue's priority to a positive priority (e.g. 1).
The changes the system workqueue thread from a cooperative one to a preemptible one. That allows other threads (with higher priorities) to interrupt the system workqueue.
You can read more here: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/zephyr/reference/kernel/threads/index.html#thread-priorities
However, I have not been able to find the root cause of the crash. I will continue to investigate together with the SDK team.
A quick update:
The error does not seem to be linked to the buttons library in particular, but to the workqueue.
It seems the workqueue thread is not allowed to run when the timeout happens while in secure mode, and consequently an error happens once the main thread is unloaded.
We are investigating ways to solve this.