DFU issue - bt_mesh_cli: dropping

Matej 3 months ago

Dear,

I’m experiencing an issue with performing DFU over BLE Mesh. I have a setup with one node acting as provisioner and distributor. This node has a new image loaded (with a modified advertising name and updated imgtool_sign_version). Additionally, there is one target node (node 3) added in ble mesh.

I’ve uploaded the image to the distributor, added a slot, and registered the receiver. However, when I start the distribution and check the status, I receive a response indicating phase 10 and status 0.

This setup used to work occasionally when using SDK v2.6.2. However, after migrating to SDK v2.7.0, the issue consistently appears.

Do you have any idea what might be causing this?

Here is Seggers rtt output:

And commands from app:

I have also tried with image index 1, but nothing...

Any idea, inputs?

Thank you

Best regards,
Matej

Parents

0 Amanda Hsieh 3 months ago

Hi,

Please try NCS v2.9.1.

Regards,
Amanda H.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Matej 3 months ago in reply to Amanda Hsieh

Hi Amanda,

Thanks for the suggestion. Upgrading to NCS v2.9.1 isn't a trivial task for us at this point. We've already deployed and validated our current setup (NCS v2.7.0) across over 100 nodes in various locations. Repeating the entire validation process would be quite resource-intensive.

Is there any workaround or patch that could address the DFU issue within our existing NCS v2.7.0 setup? If not, any guidance on minimizing the impact of migrating to a newer SDK version would be appreciated.

Best regards,
Matej
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Amanda Hsieh 2 months ago in reply to Matej

Phase 10 with status 0 means the target node didn’t respond to the Firmware Update Start message, which prevents the DFU from proceeding. The error codes provided suggest there is an internal error on the target node. Could you provide logs from the target side to provide more detail?

Other things to try: extend timeout_base, and make sure that the imgtool_sign_version is strictly higher than the one already present on the target. Lastly, make sure the target is correctly provisioned and configured for DFU.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Matej 2 months ago in reply to Amanda Hsieh
Hi,

I have added additional logs, and noticed that i'm receiving <wrn> bt_mesh_dfu_srv: Wrong state4

void bt_mesh_dfu_srv_applied(struct bt_mesh_dfu_srv *srv) { if (srv->update.phase != BT_MESH_DFU_PHASE_APPLYING) { LOG_WRN("Wrong state4"); return; } LOG_DBG(""); srv->update.phase = BT_MESH_DFU_PHASE_IDLE; store_state(srv); }

I have prepare seperate command to initialize dfu, to ensure that mesh is already initialized, i wait for few messages that are sent through, after that i send cmd for dfu initialization.

function above is called within dfu_target_image_confirm, as last step of dfu initialization routine.

Any idea?

Thanks,

BR,
Matej
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Amanda Hsieh 2 months ago in reply to Matej
It would be helpful if you could provide the actual debug logs. The bt_mesh_dfu_srv_applied function is called on every boot, so this warning does not necessarily indicate anything is wrong.

In your original screenshot, the reason is indicated as "9" (BT_MESH_BLOB_ERR_INTERNAL) by the BLOB client. This reason is reported directly by the BLOB server. So, we should check the BLOB server logs as well.

The transfer is orchestrated by the FU server. If something is wrong with that process, BLOB Srv can return this error. Looks like the transfer is breaking in the very first phase itself.

I would suggest:

Add "CONFIG_BT_MESH_MODEL_LOG_LEVEL_DBG=y" and "CONFIG_LOG_BUFFER_SIZE=2048" on both distributor and target firmware. Collect logs on both sides by starting the process, and share them with us along with ".config" file for both.

I will strongly recommend to try this using latest SDK revision first and by following instructions from target and distributor samples; and then attempt the same DFU procedure on their existing nodes.

If anything goes wrong during the transfer, you must issue "cancel" to all participating nodes to get everyone back to same state. Or Reboot the devices.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Amanda Hsieh 2 months ago in reply to Matej
It would be helpful if you could provide the actual debug logs. The bt_mesh_dfu_srv_applied function is called on every boot, so this warning does not necessarily indicate anything is wrong.

In your original screenshot, the reason is indicated as "9" (BT_MESH_BLOB_ERR_INTERNAL) by the BLOB client. This reason is reported directly by the BLOB server. So, we should check the BLOB server logs as well.

The transfer is orchestrated by the FU server. If something is wrong with that process, BLOB Srv can return this error. Looks like the transfer is breaking in the very first phase itself.

I would suggest:

Add "CONFIG_BT_MESH_MODEL_LOG_LEVEL_DBG=y" and "CONFIG_LOG_BUFFER_SIZE=2048" on both distributor and target firmware. Collect logs on both sides by starting the process, and share them with us along with ".config" file for both.

I will strongly recommend to try this using latest SDK revision first and by following instructions from target and distributor samples; and then attempt the same DFU procedure on their existing nodes.

If anything goes wrong during the transfer, you must issue "cancel" to all participating nodes to get everyone back to same state. Or Reboot the devices.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Matej 2 months ago in reply to Amanda Hsieh
Hi Amanda,

Thank you for your inputs. I've successfully migrated to NCS 2.9.1, and most functionalities appear to be working as expected. However, I'm still encountering issues with DFU.

I tested example within ncs, without any code modifications, and I was able to perform a firmware update through the Device Manager without any problems.

In my custom project, I have enabled the suggested logging, I’m seeing the following output — unfortunately, nothing particularly useful or concrete. Apologies for not sharing the full logs and .config file at this stage, as I’d need to filter out a significant amount of content related to customer

00> Starting the firmware distribution. 00> Slot: 00> Size: 430487 bytes 00> FWID: 0304050001000000 00> Metadata: 030405000100000097910601844419620200 00> D: Distribution Start: slot: 0, appidx: 0, tb: 0, addr: 0000, ttl: 255, apply: 1 00> Distribution phase changed to Transfer Active 00> D: 00> D: 1 targets 00> D: 5 00> D: 4 00> D: 3 00> D: 2 00> D: 1 00> D: Transfer timed out. 00> D: 00> W: Dropping 0x0003: 9 00> E: Target 0x0003 failed: 3 00> D: continuing 00> D: 00> D: 3 00> D: reason: 3, phase: 1, apply: 1 00> Distribution phase changed to Failed

While on the target node, nothing DFU releated does not appear in RTT terminal.

We're exploring the possibility of using the same firmware image for both the distributor and target nodes to simplify field deployment. I suspect this approach might be contributing to the issue.

Is it actually supported to have both DFU client and server roles enabled in a single firmware image? If so, do you have an example project or configuration that demonstrates this setup?

Also, could you clarify what the minimal configuration and required function calls are for both the distributor and target nodes?

Thank you again for your support.

Best regards,
Matej
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Amanda Hsieh 1 month ago in reply to Matej

Matej said:
Is it actually supported to have both DFU client and server roles enabled in a single firmware image? If so, do you have an example project or configuration that demonstrates this setup?

Yes, please refer to Bluetooth Mesh: Device Firmware Update (DFU) distributor and see the Self-update section.

Matej said:
could you clarify what the minimal configuration and required function calls are for both the distributor and target nodes?

It needs to enable the required models in your project configuration:

For the Distributor (DFU client):

CONFIG_BT_MESH_DFD_SRV (Firmware Distribution Server)
CONFIG_BT_MESH_DFU_CLI (Firmware Update Client)
CONFIG_BT_MESH_BLOB_SRV (BLOB Transfer Server)
CONFIG_BT_MESH_BLOB_CLI (BLOB Transfer Client)

For the Target (DFU server):

CONFIG_BT_MESH_DFU_SRV (Firmware Update Server)
CONFIG_BT_MESH_BLOB_SRV (BLOB Transfer Server)

The required function calls are as the Bluetooth Mesh: Device Firmware Update (DFU) distributor demonstrates.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Matej 1 month ago in reply to Amanda Hsieh

Dear Amanda,

Thank you for your inputs — I’ve successfully managed to perform the firmware update. However, I’m now encountering an issue where the device lose its provisioning information after DFU, re-provisioning is required.

In initial state when I adding the new node, I call function save_settings within provisioned callback. During normal operation and power cycles everything works fine. After performing DFU, I receive provision_reset callback... Any idea how is or why is this triggered?

The new image includes only minimal changes (a few additional printouts and a version number update); everything else remains the same.

Do you have any insights into what might be causing this?

Best regards,
Matej
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Matej 1 month ago in reply to Matej

Dear Amanda,

I have checked, and the reset callback dfu_srv_reset is indeed being triggered within zephyr/subsys/bluetooth/mesh/dfu_srv.c.

Could you please clarify why erasing is needed in this context, and whether there's a way to avoid it? I would like to prevent the need for re-provisioning after each DFU.

Looking forward to your insights.

Best regards,
Matej
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Matej 1 month ago in reply to Matej

Friendly reminder — are there any updates regarding clearing provisioning information after performing DFU? It’s been 16 days without a response, and we’re still blocked on this.

Thank you for your inputs.

BR,
Matej
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel