This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Thread (Background) DFU - How to get progress in app?

Hello, and thanks for all the great examples!

I've gotten the COAP transport for background DFU working on my custom hardware platform and app, however I have noticed that occasionally the DFU process will hang, with the app never retrying a block request. If I add a periodic `coap_dfu_reset_state`, and `coap_dfu_trigger`, the dfu process will continue. How can I access the current progress of the DFU procedure so that I can set a timeout in the application? I see that there is a way to get a diagnostic struct with `coap_dfu_diagnostic_get`, but the fields of interest about total/current block progress seem to only be updated during a multicast update.

Additionally, I have found some bugs that I needed to fix to get the example working without compiler warnings with gcc. In particular, the assert on line 486 in `nrf_dfu_req_handler.c` had to be commented out, as the background dfu does not actually set the `callback.write` function pointer.

In `background_dfu_block.c` there were warnings about comparison between unsigned and signed integers on lines 156, 394, and 440, and should be explicitly cast.

In `nrf_log_backend_interface.h` the use of STRINGIFY on line 148 should be explicitly cast to a `char *` to avoid a warning about disregarding a const.

Parents
  • Hello, it's been nearly a month without a resolution to this. Could I get some resolution on how to enable timeout on a stuck DFU, or some way to get introspection to create my own timeout?

  • I've forwarded your question and bug reports to the developers, we'll get you an answer ASAP. I am so sorry for letting you wait for this long. 

  • From our developer:

    I have spent last two days playing with DFU and I was not able to reproduce the hang issue you mentioned.

    The good thing is that there have been quite a lot changes comparing our master branch to the v2.0.0 release.

    Invalid assert has been already fixed by rework of that part of code. It is still there, but it is OK now ;).

    As there were so many changes and the problem turned out to be unreproducible with my setup(multicast mode with 8 clients), I propose that you check the next release that should be out soon. If the issue persists in your environment we will need to collect more data about your environment/procedure and proceed with more precise troubleshooting.

    Both firmware side and nrfutil have been improved, I expect that you should be fine with the new release, who should be ready by mid-march.


  • Hi haakonsh,

    I have integrated the newly released SDK 15.3 with my application, and while it does fix the warnings and errors I encountered, it appears I am having the same problem as before: occasionally the update process hangs, presumably due to a dropped COAP packet. In order to continue the update, the device must be reset or the DFU process must be retriggered. To be clear, I am performing a unicast update, not a multicast one. I dynamically change my OpenThread polling period to 15ms when performing the update, normally it is one minute.

    It also appears that the `background_dfu_diagnostic_t` structure that is retrieved from `coap_dfu_diagnostic_get` still does not have a properly populated `total_images_blocks_received`, meaning I cannot monitor the DFU progress from my application and retrigger the operation when hung. I will try to dig into the coap dfu internals to see if I can figure out where the process is stalling.

  • I tracked down the problem. In `coap_dfu.c`, the implementation of `background_dfu_transport_block_request_send` does not set a response callback in the message configuration. This means that when the COAP message times out or runs out of retries, there is no indication to the dfu process that this message was lost. The background dfu library should specify a callback to handle messages that time out and resend the block request, or abort the operation. Right now the library just waits forever for a message that won't arrive because all retries failed.

  • Alright, it appears that `background_dfu_transport_block_request_send` is only called during a multicast update. I think unicast uses `background_dfu_transport_send_request`. This function does set a message handler. I'll have to investigate more.

  • I found my problem - I was not calling `coap_time_tick` in my application. I set up a timer to call this function every second, and now retransmissions are working as expected.

    I had to kind of dig to find out that I needed to call this function for the COAP stack to function properly. At least for me, it wasn't obvious from the background dfu documentation what I needed to implement for the coap stack. I assumed all I needed to do was call `coap_dfu_process` in my main loop because that was what I had seen in example applications.

Reply
  • I found my problem - I was not calling `coap_time_tick` in my application. I set up a timer to call this function every second, and now retransmissions are working as expected.

    I had to kind of dig to find out that I needed to call this function for the COAP stack to function properly. At least for me, it wasn't obvious from the background dfu documentation what I needed to implement for the coap stack. I assumed all I needed to do was call `coap_dfu_process` in my main loop because that was what I had seen in example applications.

Children
No Data
Related