Observer under FreeRTOS: Events stop coming after a few minutes.
I am using the nRF52832 as simple observer running under FreeRTOS on a custom board. The built is based on the SDK HRM peripheral example. I have stripped down my code so that only the task left is the one in which the SD runs. I have a handler for BLE_GAP_EVT_ADV_REPORT events installed. Nothing else is running (no UART or any other peripheral).
The SD stops calling my handler after a few minutes (completely random) if the runtime of the handler (artificial busy-loop) is longer than about 300us which, I guess, is roughly the minimum distance between two advertising reports minus caller overhead. The SD throws no hardfault or any other error, the BLE events just stop coming in. FreeRTOS continues running normally. With the handler runtime below 200us, everything is fine. Strangely, runtimes >>300us (e.g. 10ms) do not accelerate the occurrence of the error.
Any idea where this might come from?
found a race condition in nrf_sdh_freertos.c file
softdevice task() tries to pull events from SD and then suspends the tasks unconditionally until the SD_EVT_IRQHandler resumes it.
But, if the below h…
I had a related problem, reported in a private case, and this code change does fix it. Thanks Susheel!
Forgot to mention: I have a lot (>20) of advertisers running in the vicinity to stress my system. So the probability of getting two reports coming in very close together is pretty high.
Norbert said:So the probability of getting two reports coming in very close together is pretty high.
This should not be a problem for the SD and I do not think is the reason for what you see.
I think that you are seeing a deadlock somewhere in one of your threads. Hard to say where and why, but if there are any conditions where you are waiting for some flags set in another thread, then make sure that you have scoped out a possibility of having a deadlock there since these checks are not normally atomic.
Not sure. As I said, I have stripped down the project to only one task (the one running the SD), so there cannot be a deadlock between tasks.
My current workaround is to communicate the event to a lower-priority task using a binary semaphore and treat the event there. This takes only ~20us in the event handler and the problem goes away.
However this does not satisfy me as I think there is still something wrong at the base.
The fact that your solution worked makes me believe stronger that this was a deadlock issue. You serialized the access using a binary semaphore and the problem went away, I cannot say with 100% certainty without understanding your different priority tasks and when they suspend and resume.
I would be happy to provide you the stripped-down test case, if you could spare the time to look into it. Just let me know how I can push a project to you.