This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Mesh DFU problem

We have a Mesh app that should be updated via DFU.
The problem is that if a device (for whatever reason) restarts in the middle of DFU or it is powered up later (i.e. not working during DFU), it newer receives update.
Here is a app log (device started after DFU):

<t:         23>, ble_softdevice_support.c,  162, sd_ble_enable: app_ram_base should be adjusted to 0x20002DA0
<t:        489>, main.c,   68, Initializing and adding models
<t:        496>, main.c,  111, rom_base   26201
<t:        498>, main.c,  112, rom_end    424D4
<t:        500>, main.c,  113, rom_length 1C2D3
<t:        502>, main.c,  114, bank_addr   43000
<t:        509>, bizlogic.c,  213, Bizlogic init
<t:        511>, gap_listener.c,   85, GAP scanner started.
<t:        514>, gap_advertiser.c,   74, GAP advertiser init
<t:       5389>, nrf_mesh_dfu.c,  529, 	RADIO TX! SLOT 0, count 255, interval: periodic, handle: FFFE
<t:       5398>, main.c,  141, Started.
<t:       5510>, nrf_mesh_dfu.c,  391, 	New firmware!
<t:       5512>, dfu.c,   48, NRF_MESH_EVT_DFU_FIRMWARE_OUTDATED_NO_AUTH
<t:       5515>, nrf_mesh_dfu.c,  529, 	RADIO TX! SLOT 0, count 255, interval: periodic, handle: FFFD
<t:       5519>, nrf_mesh_dfu.c,  535, Killing a TX slot prematurely (repeats done: 0).
<t:       8167>, nrf_mesh_dfu.c,  529, 	RADIO TX! SLOT 0, count 255, interval: periodic, handle: FFFD
<t:       8171>, nrf_mesh_dfu.c,  535, Killing a TX slot prematurely (repeats done: 0).
,

Device obviously knows that is should be updated (NRF_MESH_EVT_DFU_FIRMWARE_OUTDATED_NO_AUTH) but it fails to transfer firmware (Killing a TX slot prematurely).
How to fix this?

And also when the update is transmission over the Mesh is stopped?
If we have Client with ID = 1 and Server with ID = 2, and if Server devices are now updated, they still broadcast this new firmware to all devices and if you want to update Client device, before you can start it is already receiving update from Server devices and this update is transmitted all the time.
How to stop firmware re-transmission over the Mesh, so that you can update Client without Server firmware being relayed, because if you send init packet via serial to Client you get *84 78 87*. ?
And how to update Server devices then to newer version after like a day or so if some device is still broadcasting older firmware?

[Mesh SDK 3.1, nRF SDK 15.2, SD 6.1, nRF52840]

Parents
  • Hello,

    Device obviously knows that is should be updated (NRF_MESH_EVT_DFU_FIRMWARE_OUTDATED_NO_AUTH) but it fails to transfer firmware (Killing a TX slot prematurely).

    Did you also restart the transmission vie serial to the first device, or does the device enter in the middle of the same transmission that it was reset? 

    And also when the update is transmission over the Mesh is stopped?

    Then the DFU will eventually time out. Check out TIMER_START_TIMEOUT_US and TIMER_DATA_TIMEOUT_US in nrf_mesh_dfu.c on line 74 and 75.

     

    If we have Client with ID = 1 and Server with ID = 2, and if Server devices are now updated, they still broadcast this new firmware to all devices

     yes. 

     

    and if you want to update Client device, before you can start it is already receiving update from Server devices and this update is transmitted all the time.

     I don't understand what you ask for here.

      

     

    How to stop firmware re-transmission over the Mesh, so that you can update Client without Server firmware being relayed, because if you send init packet via serial to Client you get *84 78 87*. ?
    And how to update Server devices then to newer version after like a day or so if some device is still broadcasting older firmware?

     They will not retransmit for that long. You can't disable retransmits. But again, check out the timeout variables. They decide how long to stay on the same update until it times out.

    BR,

    Edvin

  • It fails in all cases

    Examples
    1. -Run DFU for Server on 4 devices - 1 Client(serial) and 3 Servers
        -In the middle of DFU update, disconnect 1 server
        -Reconnect it again and only "NRF_MESH_EVT_DFU_FIRMWARE_OUTDATED_NO_AUTH" happens and that is it. It goes into adding event to timer but nothing else happens.
        -After some time, 2 servers get updated and that is it. The on disconnected stays outdated and never gets update

    2. -Run DFU for Server on 4 devices - 1 Client(serial) and 3 Servers
        -End DFU successfully
        -Connect new Server into network, which has older version
        -Same thing as in example 1, shows New firmware but nothing happens and it never updates

    So if there is some case when it looses connection/restarts or you connect new one, they will never update to current mesh firmware

Reply
  • It fails in all cases

    Examples
    1. -Run DFU for Server on 4 devices - 1 Client(serial) and 3 Servers
        -In the middle of DFU update, disconnect 1 server
        -Reconnect it again and only "NRF_MESH_EVT_DFU_FIRMWARE_OUTDATED_NO_AUTH" happens and that is it. It goes into adding event to timer but nothing else happens.
        -After some time, 2 servers get updated and that is it. The on disconnected stays outdated and never gets update

    2. -Run DFU for Server on 4 devices - 1 Client(serial) and 3 Servers
        -End DFU successfully
        -Connect new Server into network, which has older version
        -Same thing as in example 1, shows New firmware but nothing happens and it never updates

    So if there is some case when it looses connection/restarts or you connect new one, they will never update to current mesh firmware

Children
  • Tomi said:
    So if there is some case when it looses connection/restarts or you connect new one, they will never update to current mesh firmware

     If the device joins in in the middle of a DFU, it will have missed some packets, and therefore, it will not continue to write the rest of the packets. What you need to do is to start the update with the same DFU image again when the device that was turned off is turned back on. Then it will accept the new packet, and the rest of the nodes will reject it (because it is already up to date with an application with application-version with the same number). However, the other nodes will still retransmit/relay the packets for the DFU update, to help the remaining nodes get up to date.

    BR,

    Edvin

  • Thank you for response!

    So if we have
    Server 1 - 50% Updated
    Server 2 - 50% Updated
    Server 3 - disconnected, just came back online - 0%

    1. We can detect Server 3 power on/off and without sending Init packet to Client again, we can just start DFU from 0% and Server 1 and Server 2 will wait, till Server 3 gets to 50% and then all Server 1, 2 and 3 will be on 50% and then DFU will resume on all devices till 100%? So basically they will wait till this device gets up to date with them?

    2. If this is true, and timeout for DFU is 1h (currently) and DFU is 95% done (~50min) on Server 1 and 2, and if we start DFU again because Server 3 is on 0%, will Server 1 and 2 timeout after 10 minutes, or does this timeout reset in this case and if Server 3 is updating this timeout will not occur on Server 1 and 2?

    Thanks in advance!

  • 1: Have you tested? I haven't studied the state machine flow for the Mesh bootloader, but it should be quite easy to test.

    2: Yes. If the overall timeout is 1h, then the DFU will time out after 1 hour, regardless, as long as the DFU update is not done. However, this is for the TIMER_START_TIMEOUT_US. The TIMER_DATA_TIMEOUT_US will not time out as long as it receives DFU packets, regardless whether it is for itself or for someone else.

    Best regards,

    Edvin

  • 1. I have tested this and this works but I have to make TID value, when sending DFU same as before, because this value is random generated when you start DFU, and on next DFU only new devices / devices already updated will run, not devices waiting on like 50%

    2. Ok thank you!

    3. With our tests done this DFU is not really usable in working environments and it looks like it works only if everything is perfect and we don't know what would be the best thing to do...


    -If devices are far apart with like 3+ jumps between mesh devices, it usually just aborts DFU on device farthest from device running DFU, even when sending really slow(each packet on 2+second - tested with all devices provisioned, sending something each 10sec)
    -You never know when some device stops on 1% and this nrfutil update is pretty useless because you only know that device on Serial got update and for others is a mystery until you update it 100% and request their version.

    So only thing I see that can be done is 
    -run DFU on other device (not for example Client which has provisioned devices)
    -Send to client all DFU_STARTS / DFU_ENDS from all nodes
    -If you get DFU_END from some Server, redo whole DFU procedure
    -Pray it works, or you can update DFU all day

    Is there something else that we can do here to make this DFU more reliable and not so time consuming?

    Thank you in advance!

  • DFU on Mesh is time consuming. This is because Mesh is a low power network, with low throughput. When you say it isn't working with 3+ jumps. Are they far apart? Do you experience a lot of loss on a regular basis between these nodes? Have you tried turning up the relay count on the nodes? 

    Look in nrf_mesh_config_core.h. What is your MESH_FEATURE_RELAY_ENABLED, and what is your CORE_TX_REPEAT_RELAY_DEFAULT? If they are both 1, can you try to increase CORE_TX_REPEAT_RELAY_DEFAULT to  2 or 3?

    How many nodes are in the network that you are trying to perform the DFU on?

    BR,

    Edvin

Related