Hello, I have the following observation regarding the potential problem with DFU bootloader implementation :
I have noticed that the Bootloader DFU implementation (nRF5_SDK_13.1.0) allows for the recursive calls to the "app_sched_execute" which cause multiple "execution" of the same entry from the "application" queue.
The following is the detail description of the "recursive calls" scenario:
1: The normal flow of the boot-loader DFU assumes that once the bootloading has been initialized by calling uint32_t nrf_dfu_init() function (located in "nrf_dfu.c") then the bootloader "spins" in the "wait_for_event()" function while processing the events fetched from the application queue as follows:
static void wait_for_event()
{
// Transport is waiting for event?
while(true)
{
// Can't be emptied like this because of lack of static variables
#ifdef BLE_STACK_SUPPORT_REQD
(void)sd_app_evt_wait();
#else
__WFI();
#endif
app_sched_execute();
}
}
2: The " app_sched_execute()" call above fetches and processes the events from the application queue as follows:
p_event_data = &m_queue_event_data[event_index * m_queue_event_size];
event_data_size = m_queue_event_headers[event_index].event_data_size;
event_handler = m_queue_event_headers[event_index].handler;
event_handler(p_event_data, event_data_size);
// Event processed, now it is safe to move the queue start index,
// so the queue entry occupied by this event can be used to store
// a next one.
m_queue_start_index = next_index(m_queue_start_index);
Note above that the "event" is actually "poped" from the queue (by incrementing index) AFTER the "event_handler" function has been called. The reason for this is that the "event_handler" uses the pointer at the event data "p_event_data" which point at the actual queue entry so the "queue slot" (which stores that data) must be protected from being overwritten until the current event is fully processed.
3: That implementation makes sense only under the assumption that the "event_handler" does not call " app_sched_execute()" recursively (because if it does, then the the second - recursive- call to " app_sched_execute()" will process the same queue entry since the queue index has not been incremented yet as it is observed above
4: It turns out however, that the DFU bootloader flow does implement the "recursive" calls to " app_sched_execute()" that will cause processing of the same queue entry multiple times.
Here is the detail description of the "call tree" which causes the "recursivity" of the calls to the "app_sched_execute()"
-
Assume that the particular event handler for the application queue entry being processed currently calls to "nrf_dfu_req_handler_on_req" as a part of the normal DFU bootloading flow
-
assume further, the above function calls in turn "nrf_dfu_data_req(p_context, p_req, p_res);" in the case of processing NRF_DFU_OBJECT_OP_EXECUTE opcode
-
assume further that in the above case, the nrf_dfu_wait(); is called as a part of the below loop
while(m_flash_operations_pending)
{
nrf_dfu_wait();
}
- The call to nrf_dfu_wait() above calls the app_sched_execute() as follows :
void nrf_dfu_wait()
{
#ifdef BLE_STACK_SUPPORT_REQD
(void)sd_app_evt_wait();
#else
__WFI();
#endif
app_sched_execute();
}
which creates a "recursion" with respect to the calls to "app_sched_execute()" and as as a result introduces the bug described above by processing the same queue entry more than once.
My question with respect to the above scenario :
-
is there anything in the DFU booltloader code handling which prevents the above scenario from happening ?
-
is there a known fix for the above problem ?
Thanks