DFU issue - bt_mesh_cli: dropping

Dear,

I’m experiencing an issue with performing DFU over BLE Mesh. I have a setup with one node acting as provisioner and distributor. This node has a new image loaded (with a modified advertising name and updated imgtool_sign_version). Additionally, there is one target node (node 3) added in ble mesh. 

I’ve uploaded the image to the distributor, added a slot, and registered the receiver. However, when I start the distribution and check the status, I receive a response indicating phase 10 and status 0.

This setup used to work occasionally when using SDK v2.6.2. However, after migrating to SDK v2.7.0, the issue consistently appears.

Do you have any idea what might be causing this?

Here is Seggers rtt output:

And commands from app:

I have also tried with image index 1, but nothing...

Any idea, inputs?

Thank you 

Best regards,
Matej

Parents
  • Hi Amanda,

    Thanks for the suggestion. Upgrading to NCS v2.9.1 isn't a trivial task for us at this point. We've already deployed and validated our current setup (NCS v2.7.0) across over 100 nodes in various locations. Repeating the entire validation process would be quite resource-intensive.

    Is there any workaround or patch that could address the DFU issue within our existing NCS v2.7.0 setup? If not, any guidance on minimizing the impact of migrating to a newer SDK version would be appreciated.

    Best regards,
    Matej

  • Phase 10 with status 0 means the target node didn’t respond to the Firmware Update Start message, which prevents the DFU from proceeding. The error codes provided suggest there is an internal error on the target node. Could you provide logs from the target side to provide more detail?

    Other things to try: extend timeout_base, and make sure that the imgtool_sign_version is strictly higher than the one already present on the target. Lastly, make sure the target is correctly provisioned and configured for DFU.

  • Hi,

    I have added additional logs, and noticed that i'm receiving <wrn> bt_mesh_dfu_srv: Wrong state4

    void bt_mesh_dfu_srv_applied(struct bt_mesh_dfu_srv *srv)
    {
    	if (srv->update.phase != BT_MESH_DFU_PHASE_APPLYING) {
    		LOG_WRN("Wrong state4");
    		return;
    	}
    
    	LOG_DBG("");
    
    	srv->update.phase = BT_MESH_DFU_PHASE_IDLE;
    	store_state(srv);
    }

    I have prepare seperate command to initialize dfu, to ensure that mesh is already initialized, i wait for few messages that are sent through, after that i send cmd for dfu initialization.

    function above is called within dfu_target_image_confirm, as last step of dfu initialization routine.

    Any idea?

    Thanks,

    BR,
    Matej

  • It would be helpful if you could provide the actual debug logs. The bt_mesh_dfu_srv_applied function is called on every boot, so this warning does not necessarily indicate anything is wrong.

    In your original screenshot, the reason is indicated as "9" (BT_MESH_BLOB_ERR_INTERNAL) by the BLOB client. This reason is reported directly by the BLOB server. So, we should check the BLOB server logs as well.  

    The transfer is orchestrated by the FU server. If something is wrong with that process, BLOB Srv can return this error. Looks like the transfer is breaking in the very first phase itself. 

    I would suggest: 

    1.  Add "CONFIG_BT_MESH_MODEL_LOG_LEVEL_DBG=y" and "CONFIG_LOG_BUFFER_SIZE=2048" on both distributor and target firmware. Collect logs on both sides by starting the process, and share them with us along with ".config" file for both.
    2. I will strongly recommend to try this using latest SDK revision first and by following instructions from target and distributor samples; and then attempt the same DFU procedure on their existing nodes.
    3. If anything goes wrong during the transfer, you must issue "cancel" to all participating nodes to get everyone back to same state. Or Reboot the devices.
Reply
  • It would be helpful if you could provide the actual debug logs. The bt_mesh_dfu_srv_applied function is called on every boot, so this warning does not necessarily indicate anything is wrong.

    In your original screenshot, the reason is indicated as "9" (BT_MESH_BLOB_ERR_INTERNAL) by the BLOB client. This reason is reported directly by the BLOB server. So, we should check the BLOB server logs as well.  

    The transfer is orchestrated by the FU server. If something is wrong with that process, BLOB Srv can return this error. Looks like the transfer is breaking in the very first phase itself. 

    I would suggest: 

    1.  Add "CONFIG_BT_MESH_MODEL_LOG_LEVEL_DBG=y" and "CONFIG_LOG_BUFFER_SIZE=2048" on both distributor and target firmware. Collect logs on both sides by starting the process, and share them with us along with ".config" file for both.
    2. I will strongly recommend to try this using latest SDK revision first and by following instructions from target and distributor samples; and then attempt the same DFU procedure on their existing nodes.
    3. If anything goes wrong during the transfer, you must issue "cancel" to all participating nodes to get everyone back to same state. Or Reboot the devices.
Children
No Data
Related