mesh dfu timeout

Hello

I'm exploring the mesh dfu transfer, get some strange results

Here is the setup

1 distributor, 1 target

On the distributor I start  the transfer with mesh models dfd start 0 0 command

In about 40 minutes it completes, and the image is applied but I get a timeout message on the distributor side

However on the target I see a message from the mcuboot that the new message was applied successfully - and actually it is working just fine

Log from the distributor and the target below

Another question

The progress in the JSON below is "50" which I think is because the function that calculates progress divides by 2:

Function cmd_dfd_receivers_get()  in dfd.c  has this lihe
uint8_t progress = bt_mesh_dfu_cli_progress(&dfd_srv->dfu) / 2;
However if I use 
mesh models dfu cli progress command it returns
DFU progress: 100 %
Bug?
Thank you
Andy
--------------------------------
Logs

Distributor

rtt:~$ [00:40:51.061,126] <dbg> bt_mesh_blob_cli: broadcast_complete: continuing
rtt:~$ [00:40:51.061,157] <dbg> bt_mesh_dfu_cli: dfu_complete:
rtt:~$ [00:40:51.061,187] <dbg> bt_mesh_dfd_srv: dfu_ended: reason: 0, phase: 1, apply: 1
rtt:~$ Distribution phase changed to Applying Update
rtt:~$ [00:40:51.061,218] <dbg> bt_mesh_dfu_cli: apply:
rtt:~$ rtt:~$ [00:40:51.061,248] <dbg> bt_mesh_blob_cli: blob_cli_broadcast: 1 targets
rtt:~$ rtt:~$ [00:40:51.271,606] <dbg> bt_mesh_dfu_cli: handle_status: 0 phase: 6, cur state: 4
rtt:~$ [00:40:51.271,636] <dbg> bt_mesh_dfu_cli: handle_status: Target 0x01a2 receiving transfer
rtt:~$ rtt:~$ [00:41:00.282,928] <dbg> bt_mesh_blob_cli: retry_timeout: 5
rtt:~$ rtt:~$ [00:41:09.441,864] <dbg> bt_mesh_blob_cli: retry_timeout: 4
rtt:~$ rtt:~$ [00:41:18.604,400] <dbg> bt_mesh_blob_cli: retry_timeout: 3
rtt:~$ rtt:~$ [00:41:27.766,143] <dbg> bt_mesh_blob_cli: retry_timeout: 2
rtt:~$ rtt:~$ [00:41:36.924,011] <dbg> bt_mesh_blob_cli: retry_timeout: 1
rtt:~$ [00:41:36.924,041] <dbg> bt_mesh_blob_cli: retry_timeout: Transfer timed out.
rtt:~$ [00:41:36.924,041] <dbg> bt_mesh_blob_cli: drop_remaining_targets:
rtt:~$ [00:41:36.924,072] <wrn> bt_mesh_blob_cli: Dropping 0x01a2: 9
rtt:~$ [00:41:36.924,072] <err> bt_mesh_dfu_cli: Target 0x01a2 failed: 3
rtt:~$ [00:41:36.924,102] <dbg> bt_mesh_blob_cli: broadcast_complete: continuing
rtt:~$ [00:41:36.924,102] <dbg> bt_mesh_dfu_cli: dfu_failed: 3
rtt:~$ [00:41:36.924,133] <dbg> bt_mesh_dfd_srv: dfu_ended: reason: 3, phase: 3, apply: 1
rtt:~$ Distribution phase changed to Failed
rtt:~$ rtt:~$ mesh models dfd receivers-get 0 1
{
"target_cnt": 1,
"targets": {
"0": { "blob_addr": 418, "phase": 6, "status": 3, "blob_status": 9, "progress": 50, "img_idx": 0 }
}
}

-----------------------------------------

Target

*** Booting nRF Connect SDK v3.5.99-ncs1-1 ***
Current image version: 2.1.0+8

~$ Firmware upload started

~$ Firmware upload succeeded
Applying the new firmware
*** Booting nRF Connect SDK v3.5.99-ncs1-1 ***
I: Starting bootloader
I: Primary image: magic=unset, swap_type=0x1, copy_done=0x1, image_ok=0x3
I: Secondary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
I: Boot source: none
I: Image index: 0, Swap type: test
I: Starting swap using move algorithm.
I: Bootloader chainload address offset: 0x8000
I: Jumping to the first image slot
*** Booting nRF Connect SDK v3.5.99-ncs1-1 ***
Current image version: 2.1.0+9

Parents
  • Hi, 

    Let me check with the team and then back to you. 

    Regards,
    Amanda H.

  • Thanks

    My guess is that the failure was due to the wrong firmware ID I entered. 

    I'm testing with the correct one, will let you know
    Tried with multiple targets. What surprised me was that the process stopped after the first failure to apply.
    My expectation was that the process would continue until the image is transferred to all targets
    I canceled the update and now it's downloading from the beginning again

    Is there an option to continue downloading to other targets and ignore the errors?

  • Update

    Provided the correct fw id  - did not help. Still got a timeout error verifying and that caused the whole distribution to fail..Calling dfd start without canceling still starts downloading the image from the very beginning and does not pick up where it left off.
    The fiwmare id   of the image that is being distributed is 2.0.1.9 which was encoded as 
    0201000000000009 and provided to the mesh models dfu slot add command

    To clarify, the image is actually applied correctly on the target side but on the distributor side there is a timeout while verifying it.

    Perhaps the timeout on the distributor should be  increased? The image is pretty big,402299 bytes so it takes a while for the mcuboot to swap it

    I'm going to try that next, will update you

    Still would be very helpful to figure out how to ignore apply errors and continue with the distribution

    Thank you

Reply
  • Update

    Provided the correct fw id  - did not help. Still got a timeout error verifying and that caused the whole distribution to fail..Calling dfd start without canceling still starts downloading the image from the very beginning and does not pick up where it left off.
    The fiwmare id   of the image that is being distributed is 2.0.1.9 which was encoded as 
    0201000000000009 and provided to the mesh models dfu slot add command

    To clarify, the image is actually applied correctly on the target side but on the distributor side there is a timeout while verifying it.

    Perhaps the timeout on the distributor should be  increased? The image is pretty big,402299 bytes so it takes a while for the mcuboot to swap it

    I'm going to try that next, will update you

    Still would be very helpful to figure out how to ignore apply errors and continue with the distribution

    Thank you

Children
  • Here is the log of the error

    tt:~$ [10:05:56.667,999] <dbg> bt_mesh_dfu_cli: dfu_complete:
    rtt:~$ [10:05:56.668,029] <dbg> bt_mesh_dfd_srv: dfu_ended: reason: 0, phase: 1, apply: 1
    rtt:~$ Distribution phase changed to Applying Update
    rtt:~$ rtt:~$ [10:05:56.668,060] <dbg> bt_mesh_dfu_cli: apply:
    rtt:~$ [10:05:56.668,090] <dbg> bt_mesh_blob_cli: blob_cli_broadcast: 1 targets
    rtt:~$ rtt:~$ [10:05:56.857,849] <dbg> bt_mesh_dfu_cli: handle_status: 0 phase: 6, cur state: 4
    rtt:~$ [10:05:56.857,879] <dbg> bt_mesh_dfu_cli: handle_status: Target 0x03e5 receiving transfer
    rtt:~$ rtt:~$ [10:06:05.903,381] <dbg> bt_mesh_blob_cli: retry_timeout: 5
    rtt:~$ rtt:~$ [10:06:15.062,103] <dbg> bt_mesh_blob_cli: retry_timeout: 4
    rtt:~$ rtt:~$ [10:06:24.225,219] <dbg> bt_mesh_blob_cli: retry_timeout: 3
    rtt:~$ rtt:~$ [10:06:33.383,605] <dbg> bt_mesh_blob_cli: retry_timeout: 2
    rtt:~$ rtt:~$ [10:06:42.542,877] <dbg> bt_mesh_blob_cli: retry_timeout: 1
    rtt:~$ [10:06:42.542,907] <dbg> bt_mesh_blob_cli: retry_timeout: Transfer timed out.
    rtt:~$ [10:06:42.542,907] <dbg> bt_mesh_blob_cli: drop_remaining_targets:
    rtt:~$ [10:06:42.542,938] <wrn> bt_mesh_blob_cli: Dropping 0x03e5: 9
    rtt:~$ [10:06:42.542,938] <err> bt_mesh_dfu_cli: Target 0x03e5 failed: 3
    rtt:~$ [10:06:42.542,968] <dbg> bt_mesh_blob_cli: broadcast_complete: continuing
    rtt:~$ [10:06:42.542,968] <dbg> bt_mesh_dfu_cli: dfu_failed: 3
    rtt:~$ [10:06:42.542,999] <dbg> bt_mesh_dfd_srv: dfu_ended: reason: 3, phase: 3, apply: 1
    rtt:~$ Distribution phase changed to Failed

    The firmware was updated successfully on node 0x3e5 (997)

    {
    "target_cnt": 8,
    "targets": {
    "0": { "blob_addr": 407, "phase": 2, "status": 2, "blob_status": 9, "progress": 50, "img_idx": 0 },
    "1": { "blob_addr": 416, "phase": 2, "status": 3, "blob_status": 10, "progress": 50, "img_idx": 0 },
    "2": { "blob_addr": 419, "phase": 2, "status": 3, "blob_status": 10, "progress": 50, "img_idx": 0 },
    "3": { "blob_addr": 409, "phase": 2, "status": 3, "blob_status": 10, "progress": 50, "img_idx": 0 },
    "4": { "blob_addr": 997, "phase": 6, "status": 3, "blob_status": 9, "progress": 50, "img_idx": 0 },
    "5": { "blob_addr": 995, "phase": 2, "status": 3, "blob_status": 10, "progress": 50, "img_idx": 0 },
    "6": { "blob_addr": 214, "phase": 2, "status": 3, "blob_status": 10, "progress": 50, "img_idx": 0 },
    "7": { "blob_addr": 418, "phase": 2, "status": 3, "blob_status": 10, "progress": 50, "img_idx": 0 }
    }
    }

    Would be really great if those cryptic status and phase codes were strings.

Related