This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

OTA DFU times out and is unreliable

I don't have a question but am rather posting the solution in case it helps someone else out who had the same problem as me.

With SDK v9/SoftDevice S110 v8, our application is too big to use the dual bank bootloader, so we're using the single bank bootloader. I kept running into an issue where DFU using Master Control Panel v3.10.0 would sometimes fail part way through the download process with the message "Error during firmware upload. Disconnected from device while waiting for notification from device.". Eventually I traced the problem back to this segment in dfu_transport_ble.c:

err_code = hci_mem_pool_rx_produce(length, (void **) &mp_rx_buffer);
if (err_code != NRF_SUCCESS)
{
    //bl_printf("Failed to rx_produce\r\n", length);
    dfu_error_notify(p_dfu, err_code);
    return;
}

My settings in hci_mem_pool_internal.h were initially the same as what was in the dfu_dual_bank_ble_s110_pca10028 example:

#define RX_BUF_SIZE       32u   /**< RX buffer size in bytes. */

#define RX_BUF_QUEUE_SIZE 8u    /**< RX buffer element size. */

And it seems to work much better with the following settings instead. I could probably crank down RX_BUF_SIZE as the largest request I ever saw was for 20 bytes, and have updated the comments to reflect what those #defines actually represent.

#define RX_BUF_SIZE       32u    /**< Size of a single RX buffer element in bytes. */

#define RX_BUF_QUEUE_SIZE 16u    /**< Number of RX buffer elements in the queue.  NOTE: MUST BE A POWER OF TWO */

The key change here is RX_BUF_QUEUE_SIZE, and the reason why this works has to deal with the architecture of the DFU spec. According to the message sequence diagram here, the DFU host first tells the DFU target "I want you to send me a notification every N data packets", and then sends data packets in bunches of N. The problem is that if N is greater than RX_BUF_QUEUE_SIZE, then hci_mem_pool_rx_produce() can run out of memory (since only RX_BUF_QUEUE_SIZE buffers are available) before the notification is sent. The same situation applies once pstorage writes the packet to flash; pstorage has a queue of outstanding commands (denoted by PSTORAGE_CMD_QUEUE_SIZE), and if the queue fills up before all outstanding commands have completed, cmd_queue_enqueue() will also return NRF_ERROR_NO_MEM. Therefore it is also necessary to ensure that PSTORAGE_CMD_QUEUE_SIZE is greater than N, so I made the following change in pstorage_platform.h:

#define PSTORAGE_CMD_QUEUE_SIZE     16

These settings happen to work with Master Control Panel v3.10.0 because MCP only requests a notification every 10 packets. If future versions of MCP changed how many packets were sent at a time, this could break compatibility with existing bootloaders. As a future enhancement, perhaps Nordic can add a mechanism for the DFU target to tell the host, "I'm sorry, but I can only accept a total of N packets at a time".

Parents
  • Thanks for reporting this, will raise this issue internally. Although, I was not able to reproduce it at my end even after several attempts. The application I uploaded was 124K. Did you do any other modifcations to the bootlaoder that could result in some additional overhead? Have you tried uploading the same using nRFtoolbox on an Android/iOS device? In that case, do you experience the same issue?

    I think the reason for this problem is that the flash operations is unable to keep up with the incoming data, hence fill up the buffers and queue. What happens when you increase the number packets between notifications is that you increase the data throughput as one connection interval is "wasted" on sending the notification packet.

  • I have a similar issue with nrf52 dk board(pca10040. nRF52832-QFAABA). With original dual bank ble bootloader in the SDK 11.0, I tried to upload using nRF Toolbox(ver 3.0.1). I used many .zip files to test but .zip file with only application without softdevice & bootloader had succeed. Every .zip files including softdevice & bootloader failed. (Even the .zip file in the nRF Toolbox for default.) I used iPhone5S(iOS9.3.2) and I tried using original bootloader and buffer size customized one like above. What points should I check for this problem?

Reply
  • I have a similar issue with nrf52 dk board(pca10040. nRF52832-QFAABA). With original dual bank ble bootloader in the SDK 11.0, I tried to upload using nRF Toolbox(ver 3.0.1). I used many .zip files to test but .zip file with only application without softdevice & bootloader had succeed. Every .zip files including softdevice & bootloader failed. (Even the .zip file in the nRF Toolbox for default.) I used iPhone5S(iOS9.3.2) and I tried using original bootloader and buffer size customized one like above. What points should I check for this problem?

Children
No Data
Related