Mesh DFU - Free flash space & bank erasure

I have a few questions about how exactly the free flash space is intended to be managed with a banked DFU transfer. We observed some devices encounter problems after a DFU to a new application version based on Mesh SDK v5.0, which introduces a new file to the flash storage in order to persist the replay cache. On startup the application attempts to allocate flash storage space out of what was previously free flash space. The flash manager code contains sanity checks to ensure that it's not overwriting some other piece of application data. In our case it appears that the flash space the application was intending to claim had been previously used for DFU transfer and never erased, causing the flash manager's assertions to fail.

Here are my questions:

1. What code exists to prevent a DFU transfer from overwriting the "Application data" section, as shown in the diagram here? I understand the code to compute the start of the bank, but shouldn't there also be a sanity check somewhere to ensure that a large (potentially malicious) DFU will not exceed the free space and start overwriting the application data? I wasn't able to find anything that would prevent this from happening.

2. Is there code somewhere to erase the bank after the transfer is complete? I found this comment which I wasn't sure how to interpret: by "erase the bank entry" does that mean simply updating the device page? If there's nothing that would erase the bank itself, then I don't understand how the code in the flash manager which tries to ensure that newly-claimed flash areas are "safe" to use is backwards-compatible. Any flash memory previously used for a DFU transfer could in principle contain arbitrary data. If the device is then DFU'd to a version that defines a new mesh file, the flash manager will then try to claim that space but fail its sanity checks.

  • Hi,

    I have forwarded this to our mesh team, I will give you an update as soon as possible.

  • Hi,

    It looks like both concerns are correct. These are bugs but this will most likely be not be fixed since the nRF5 SDK for Mesh is in maintenance mode. 

    We can only recommend cleaning up the flash area that is required for Mesh
    Configuration before running Mesh after DFU.  

  • Hi,

    An update from the team:

    1. The first issue might be solved in two ways:

    1.1 Modify DFU command DFU_START_TARGET and add as a command field the address of the app mesh config end. Then modify bootloader to consider this address instead of BOOTLOADERADDR. Pros: automated communication between app and bootloader that considers ongoing sizes of areas. Cons: it requires fixing both bootloader and app dfu parts. Upload both bootloader and app but bootloader only once.

    It will require uploading application first. Then uploading of bootloader. Since command format incompatibility it will corrupt some memory in the command buffer but it should be safe enough.

    1.2 Add compile-time definition in bootloader then bootloader can consider BOOTLOADERADDR + NEW_DEFINED_ADDRESS. Then you should follow up on this.
    Pros: it requires fixing only bootloader. Cons: it requires following up on app mesh config size and definition and aligning them manually. Upload both bootloader and app every definition change.

    2. The second issue requires implementation erasing allocated bank if bootloader succeeds uploaded image applying.

    The sequence of applying patches

    1. add_cleaningup_after_succeed   ---- erase flash bank area after dfu succeeded
    2. add_app_data_as_upper_boundary  ------ reject image if it overlaps with application data area
    3. fix_assertion_during_bootbank_erasing ----- fix assertion if bootloader bank is erased
    4. rebuild application and bootloader

    `DFU start target` command between application and bootloader has been changed and is not compatible with old versions. 

    To upgrade devices with 5.0.0 on the current one you need to upload the application image first and then upload the bootloader image (not another way around).

     doc/user_guide/modules/dfu_integrating_into_app.md |  2 ++
     mesh/bootloader/include/bl_if.h                    |  1 +
     mesh/bootloader/include/dfu_mesh.h                 |  2 +-
     mesh/bootloader/src/bootloader.c                   |  3 +-
     mesh/bootloader/src/bootloader_app_bridge.c        |  8 +++--
     mesh/bootloader/src/dfu_mesh.c                     | 26 ++++++++-------
     mesh/core/include/bl_if.h                          |  7 ++--
     mesh/dfu/api/nrf_mesh_dfu_types.h                  |  2 +-
     mesh/dfu/src/nrf_mesh_dfu.c                        | 39 ++++++++++++++++------
     9 files changed, 59 insertions(+), 31 deletions(-)
    
    diff --git a/doc/user_guide/modules/dfu_integrating_into_app.md b/doc/user_guide/modules/dfu_integrating_into_app.md
    index 1d7f6a0de..2f1bc268b 100644
    --- a/doc/user_guide/modules/dfu_integrating_into_app.md
    +++ b/doc/user_guide/modules/dfu_integrating_into_app.md
    @@ -92,6 +92,8 @@ transfer is in progress.
     
     Make the application choose a bank address that has enough space for the entire firmware image,
     without overlapping with the final location of the transferred firmware.
    +The upper boundary of the bank area is the start address of the application data area
    +in case of Mesh configuration module usage. Otherwise, it is the bootloader start address. 
     
     Since the size info for the incoming firmware image is not available to the device
     in the firmware ID packet, leave as much space as possible for the bank
    diff --git a/mesh/bootloader/include/bl_if.h b/mesh/bootloader/include/bl_if.h
    index 31b19f747..b91fd19df 100644
    --- a/mesh/bootloader/include/bl_if.h
    +++ b/mesh/bootloader/include/bl_if.h
    @@ -194,6 +194,7 @@ struct bl_cmd
                         dfu_type_t      type;
                         fwid_union_t    fwid;
                         uint32_t*       p_bank_start;
    +                    uint32_t*       p_upper_bound;
                     } target;
                     struct
                     {
    diff --git a/mesh/bootloader/include/dfu_mesh.h b/mesh/bootloader/include/dfu_mesh.h
    index 5afa39ac8..a681ecc4c 100644
    --- a/mesh/bootloader/include/dfu_mesh.h
    +++ b/mesh/bootloader/include/dfu_mesh.h
    @@ -52,7 +52,7 @@ void dfu_mesh_start(void);
     uint32_t dfu_mesh_rx(dfu_packet_t* p_packet, uint16_t length, bool from_serial);
     void dfu_mesh_timeout(void);
     void dfu_mesh_packet_set_local_fields(mesh_packet_t* p_packet, uint8_t dfu_packet_len);
    -uint32_t dfu_mesh_req(dfu_type_t type, fwid_union_t* p_fwid, uint32_t* p_bank_addr);
    +uint32_t dfu_mesh_req(dfu_type_t type, fwid_union_t* p_fwid, uint32_t* p_bank_addr, uint32_t* p_upper_bound);
     uint32_t dfu_mesh_relay(dfu_type_t type, fwid_union_t* p_fwid, uint32_t transaction_id);
     dfu_type_t dfu_mesh_missing_type_get(void);
     bool dfu_mesh_app_is_valid(void);
    diff --git a/mesh/bootloader/src/bootloader.c b/mesh/bootloader/src/bootloader.c
    index 1167c167e..1949dcc6a 100644
    --- a/mesh/bootloader/src/bootloader.c
    +++ b/mesh/bootloader/src/bootloader.c
    @@ -306,6 +306,7 @@ static uint32_t bl_evt_handler(bl_evt_t* p_evt)
                            run unless there's an actual reason for it. */
                         rsp_cmd.type = BL_CMD_TYPE_DFU_START_TARGET;
                         rsp_cmd.params.dfu.start.target.p_bank_start = (uint32_t*) 0xFFFFFFFF; /* no banking */
    +                    rsp_cmd.params.dfu.start.target.p_upper_bound = (uint32_t*) BOOTLOADERADDR();
                         rsp_cmd.params.dfu.start.target.type = p_evt->params.dfu.new_fw.fw_type;
                         rsp_cmd.params.dfu.start.target.fwid = p_evt->params.dfu.new_fw.fwid;
                         respond = true;
    @@ -547,7 +548,7 @@ void bootloader_enable(void)
                         APP_ERROR_CHECK(NRF_ERROR_INVALID_DATA);
                 }
     
    -            (void) dfu_mesh_req(missing, &req_fwid, (uint32_t*) 0xFFFFFFFF);
    +            (void) dfu_mesh_req(missing, &req_fwid, (uint32_t*) 0xFFFFFFFF, (uint32_t*) BOOTLOADERADDR());
             }
         }
     }
    diff --git a/mesh/bootloader/src/bootloader_app_bridge.c b/mesh/bootloader/src/bootloader_app_bridge.c
    index 7009d13c7..3d5a6bbc8 100644
    --- a/mesh/bootloader/src/bootloader_app_bridge.c
    +++ b/mesh/bootloader/src/bootloader_app_bridge.c
    @@ -129,13 +129,15 @@ uint32_t bl_cmd_handler(bl_cmd_t* p_bl_cmd)
                 return m_evt_handler(&rsp_evt);
     
             case BL_CMD_TYPE_DFU_START_TARGET:
    -            __LOG("\tStart target (type 0x%x, bank 0x%x)\n",
    +            __LOG("\tStart target (type 0x%x, bank 0x%x, upper bound 0x%x)\n",
                         p_bl_cmd->params.dfu.start.target.type,
    -                    p_bl_cmd->params.dfu.start.target.p_bank_start);
    +                    p_bl_cmd->params.dfu.start.target.p_bank_start,
    +                    p_bl_cmd->params.dfu.start.target.p_upper_bound);
     
                 return dfu_mesh_req(p_bl_cmd->params.dfu.start.target.type,
                         &p_bl_cmd->params.dfu.start.target.fwid,
    -                    p_bl_cmd->params.dfu.start.target.p_bank_start);
    +                    p_bl_cmd->params.dfu.start.target.p_bank_start,
    +                    p_bl_cmd->params.dfu.start.target.p_upper_bound);
     
             case BL_CMD_TYPE_DFU_START_RELAY:
                 return dfu_mesh_relay(
    diff --git a/mesh/bootloader/src/dfu_mesh.c b/mesh/bootloader/src/dfu_mesh.c
    index 8f963d416..1aec265cc 100644
    --- a/mesh/bootloader/src/dfu_mesh.c
    +++ b/mesh/bootloader/src/dfu_mesh.c
    @@ -101,6 +101,7 @@ typedef struct
         dfu_type_t      type;
         uint32_t*       p_start_addr;
         uint32_t*       p_bank_addr;
    +    uint32_t*       p_upper_bound;
         uint32_t*       p_indicated_start_addr;
         uint32_t*       p_last_requested_entry;
         uint32_t        length;
    @@ -770,9 +771,9 @@ static uint32_t target_rx_start(dfu_packet_t* p_packet, bool* p_do_relay)
                 uint32_t upper_limit =
                     (m_bl_info_pointers.p_segment_app->start) +
                     (m_bl_info_pointers.p_segment_app->length);
    -            if (upper_limit > BOOTLOADERADDR())
    +            if (upper_limit > (uint32_t)m_transaction.p_upper_bound)
                 {
    -                upper_limit = BOOTLOADERADDR();
    +                upper_limit = (uint32_t)m_transaction.p_upper_bound;
                 }
                 /* Place bootloader bank at end of app section */
                 m_transaction.p_bank_addr = (uint32_t*) (upper_limit -
    @@ -786,15 +787,14 @@ static uint32_t target_rx_start(dfu_packet_t* p_packet, bool* p_do_relay)
                 m_transaction.p_bank_addr = m_transaction.p_start_addr;
             }
         }
    -    else
    +
    +    if ((uint32_t) m_transaction.p_bank_addr + m_transaction.length >
    +                (uint32_t) m_transaction.p_upper_bound)
         {
    -        if ((uint32_t) m_transaction.p_bank_addr + m_transaction.length > BOOTLOADERADDR())
    -        {
    -            send_end_evt(DFU_END_ERROR_BANK_IN_BOOTLOADER_AREA);
    -            start_find_fwid();
    -            /* Return error to not add packet to cache. */
    -            return NRF_ERROR_INVALID_LENGTH;
    -        }
    +        send_end_evt(DFU_END_ERROR_BANK_IN_BOOTLOADER_AREA);
    +        start_find_fwid();
    +        /* Return error to not add packet to cache. */
    +        return NRF_ERROR_INVALID_LENGTH;
         }
     
         if (m_transaction.p_start_addr != m_transaction.p_bank_addr &&
    @@ -1288,14 +1288,15 @@ void dfu_mesh_start(void)
         }
     }
     
    -uint32_t dfu_mesh_req(dfu_type_t type, fwid_union_t* p_fwid, uint32_t* p_bank_addr)
    +uint32_t dfu_mesh_req(dfu_type_t type, fwid_union_t* p_fwid,
    +                      uint32_t* p_bank_addr, uint32_t* p_upper_bound)
     {
         if (m_state != DFU_STATE_FIND_FWID)
         {
             return NRF_ERROR_INVALID_STATE;
         }
     
    -    if ((uint32_t) p_bank_addr >= BOOTLOADERADDR() &&
    +    if ((uint32_t) p_bank_addr >= (uint32_t) p_upper_bound &&
             (uint32_t) p_bank_addr != 0xFFFFFFFF)
         {
             __LOG("Attempting to overwrite bootloader\n");
    @@ -1304,6 +1305,7 @@ uint32_t dfu_mesh_req(dfu_type_t type, fwid_union_t* p_fwid, uint32_t* p_bank_ad
     
         __LOG("REQUESTING TRANSFER OF %s\n", m_dfu_type_strs[(uint32_t) type]);
         m_transaction.p_bank_addr = p_bank_addr;
    +    m_transaction.p_upper_bound = p_upper_bound;
         start_req(type, p_fwid);
     
         return NRF_SUCCESS;
    diff --git a/mesh/core/include/bl_if.h b/mesh/core/include/bl_if.h
    index 60855d96c..cce137be0 100644
    --- a/mesh/core/include/bl_if.h
    +++ b/mesh/core/include/bl_if.h
    @@ -199,9 +199,10 @@ typedef struct
     /** DFU start target command parameters, used with @ref BL_CMD_TYPE_DFU_START_TARGET. */
     typedef struct
     {
    -    uint8_t type;                     /**< DFU type of the desired transfer. */
    -    nrf_mesh_fwid_t     fwid;         /**< FWID of the DFU transfer to request. */
    -    const uint32_t *    p_bank_start; /**< Pointer to start of transfer bank. */
    +    uint8_t type;                      /**< DFU type of the desired transfer. */
    +    nrf_mesh_fwid_t     fwid;          /**< FWID of the DFU transfer to request. */
    +    const uint32_t *    p_bank_start;  /**< Pointer to start of transfer bank. */
    +    const uint32_t *    p_upper_bound; /**< Pointer to start address of application data. */
     } bl_cmd_dfu_start_target_t;
     
     /** DFU start relay command parameters, used with @ref BL_CMD_TYPE_DFU_START_RELAY. */
    diff --git a/mesh/dfu/api/nrf_mesh_dfu_types.h b/mesh/dfu/api/nrf_mesh_dfu_types.h
    index da7f8aacc..431553339 100644
    --- a/mesh/dfu/api/nrf_mesh_dfu_types.h
    +++ b/mesh/dfu/api/nrf_mesh_dfu_types.h
    @@ -153,7 +153,7 @@ typedef enum
         NRF_MESH_DFU_END_ERROR_BANK_AND_DESTINATION_OVERLAP,    /**< When copying the finished bank to its intended destination, it will have to overwrite itself. */
     
         /** @internal Largest number in the enum. */
    -    NRF_MESH_DFU_END_ERROR__LAST = NRF_MESH_DFU_END_ERROR_BANK_IN_BOOTLOADER_AREA,
    +    NRF_MESH_DFU_END_ERROR__LAST = NRF_MESH_DFU_END_ERROR_BANK_AND_DESTINATION_OVERLAP,
     } nrf_mesh_dfu_end_t;
     
     /** The various roles a device can play in a dfu transfer. */
    diff --git a/mesh/dfu/src/nrf_mesh_dfu.c b/mesh/dfu/src/nrf_mesh_dfu.c
    index 49fde514b..5c2bde911 100644
    --- a/mesh/dfu/src/nrf_mesh_dfu.c
    +++ b/mesh/dfu/src/nrf_mesh_dfu.c
    @@ -53,6 +53,7 @@
     #include "hal.h"
     #include "advertiser.h"
     #include "ad_listener.h"
    +#include "mesh_stack.h"
     #ifdef NRF_MESH_SERIAL_ENABLE
     #include "serial.h"
     #endif
    @@ -838,18 +839,36 @@ uint32_t nrf_mesh_dfu_request(nrf_mesh_dfu_type_t type,
         cmd.params.dfu.start.target.type = type;
         cmd.params.dfu.start.target.fwid = *p_fwid;
         cmd.params.dfu.start.target.p_bank_start = p_bank_addr;
    -    uint32_t error_code = nrf_mesh_dfu_cmd_send(&cmd);
    -    if (error_code == NRF_SUCCESS)
    +
    +    uint32_t dummy;
    +    uint32_t error_code =
    +            mesh_stack_persistence_flash_usage(&cmd.params.dfu.start.target.p_upper_bound,
    +                                               &dummy);
    +    if (error_code != NRF_SUCCESS)
         {
    -        m_transfer_state.role = NRF_MESH_DFU_ROLE_TARGET;
    -        m_transfer_state.type = type;
    -        m_transfer_state.fwid = *p_fwid;
    -        m_transfer_state.state = NRF_MESH_DFU_STATE_DFU_REQ;
    -        m_transfer_state.segment_count = 0;
    -        m_transfer_state.total_segments = 0;
    -        m_timer_evt.cb = abort_timeout;
    -        timer_sch_reschedule(&m_timer_evt, timer_now() + NRF_MESH_DFU_REQ_TIMEOUT_US);
    +        return error_code;
    +    }
    +
    +    if (cmd.params.dfu.start.target.p_upper_bound == NULL)
    +    {
    +        cmd.params.dfu.start.target.p_upper_bound = (uint32_t *) BOOTLOADERADDR();
         }
    +
    +    error_code = nrf_mesh_dfu_cmd_send(&cmd);
    +    if (error_code != NRF_SUCCESS)
    +    {
    +        return error_code;
    +    }
    +
    +    m_transfer_state.role = NRF_MESH_DFU_ROLE_TARGET;
    +    m_transfer_state.type = type;
    +    m_transfer_state.fwid = *p_fwid;
    +    m_transfer_state.state = NRF_MESH_DFU_STATE_DFU_REQ;
    +    m_transfer_state.segment_count = 0;
    +    m_transfer_state.total_segments = 0;
    +    m_timer_evt.cb = abort_timeout;
    +    timer_sch_reschedule(&m_timer_evt, timer_now() + NRF_MESH_DFU_REQ_TIMEOUT_US);
    +
         return error_code;
     }
     
    
     mesh/bootloader/include/dfu_types_mesh.h |  2 +-
     mesh/bootloader/src/bootloader_info.c    |  4 ++--
     mesh/bootloader/src/dfu_bank.c           | 25 ++++++++++++++-----------
     mesh/core/include/bl_info_types.h        |  2 +-
     4 files changed, 18 insertions(+), 15 deletions(-)
    
    diff --git a/mesh/bootloader/include/dfu_types_mesh.h b/mesh/bootloader/include/dfu_types_mesh.h
    index fff0f1531..88f90025e 100644
    --- a/mesh/bootloader/include/dfu_types_mesh.h
    +++ b/mesh/bootloader/include/dfu_types_mesh.h
    @@ -265,7 +265,7 @@ typedef struct
     
     /**
      * State of info bank. Written to allow state machine progression being stored
    - * in flash without needing erase.
    + * in flash.
      */
     typedef enum
     {
    diff --git a/mesh/bootloader/src/bootloader_info.c b/mesh/bootloader/src/bootloader_info.c
    index 316250596..00d66a205 100644
    --- a/mesh/bootloader/src/bootloader_info.c
    +++ b/mesh/bootloader/src/bootloader_info.c
    @@ -48,7 +48,7 @@
     
     #define HEADER_LEN       (sizeof(bootloader_info_header_t))
     
    -#define INFO_WRITE_BUFLEN   (128)
    +#define INFO_WRITE_BUFLEN   (256)
     
     typedef enum
     {
    @@ -60,7 +60,7 @@ typedef enum
         BL_INFO_STATE_RECOVER
     } bl_info_state_t;
     
    -typedef struct
    +typedef struct __attribute__((aligned(4)))
     {
         uint16_t len;
         uint16_t type;
    diff --git a/mesh/bootloader/src/dfu_bank.c b/mesh/bootloader/src/dfu_bank.c
    index e570c14f6..f276be907 100644
    --- a/mesh/bootloader/src/dfu_bank.c
    +++ b/mesh/bootloader/src/dfu_bank.c
    @@ -217,8 +217,8 @@ static void flash_bank_entry(void)
                     /* Wait for this to take effect before moving on, as the
                        potential mbr commands in the flash_fw state may trigger
                        sudden reboots. */
    -                break;
                 }
    +            break;
     
             case BL_INFO_BANK_STATE_FLASH_FW:
                 m_waiting_for_idle = true;
    @@ -297,23 +297,26 @@ static void flash_bank_entry(void)
     
                     /* Update state */
                     __LOG("Bank: Set state to FLASHED\n");
    +                m_waiting_for_idle = true;
                     bank_entry_replacement.bank.state = BL_INFO_BANK_STATE_FLASHED;
                     (void) bootloader_info_entry_overwrite((bl_info_type_t) (BL_INFO_TYPE_BANK_BASE + m_dfu_type), &bank_entry_replacement);
    -
                 }
    -            /* deliberate fallthrough */
    +            break;
             case BL_INFO_BANK_STATE_FLASHED:
    -            /* We may invalidate the bank entry in the device page now,
    -               it's all redundant. */
    -            __LOG("Bank: Invalidate.\n");
    -            if (bootloader_info_entry_invalidate((bl_info_type_t) (BL_INFO_TYPE_BANK_BASE + m_dfu_type)) == NRF_SUCCESS)
                 {
    +                /* We need to invalidate the bank entry in the device page now. */
    +                m_waiting_for_idle = true;
    +                (void) bootloader_info_entry_invalidate((bl_info_type_t) (BL_INFO_TYPE_BANK_BASE + m_dfu_type));
                     __LOG("Bank invalidated.\n");
    +
    +                /* clean up bank area since it may be used for application data. */
    +                bl_evt_t flash_evt;
    +                flash_evt.type = BL_EVT_TYPE_FLASH_ERASE;
    +                flash_evt.params.flash.erase.start_addr = (uint32_t)p_bank_entry->p_bank_addr;
    +                flash_evt.params.flash.erase.length = p_bank_entry->length;
    +                (void) bootloader_evt_send(&flash_evt);
                     mp_bank_entry = NULL; /* reset the static bank pointer, as we no longer need it. */
    -            }
    -            else
    -            {
    -                m_waiting_for_idle = true;
    +                __LOG("Bank area cleared.\n");
                 }
                 break;
         }
    diff --git a/mesh/core/include/bl_info_types.h b/mesh/core/include/bl_info_types.h
    index 729c0596a..c66696ecf 100644
    --- a/mesh/core/include/bl_info_types.h
    +++ b/mesh/core/include/bl_info_types.h
    @@ -107,7 +107,7 @@ typedef struct
     
     /**
      * State of info bank. Written to allow state machine progression being stored
    - * in flash without needing erase.
    + * in flash.
      */
     typedef enum
     {
    
     mesh/bootloader/src/dfu_bank.c | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    
    diff --git a/mesh/bootloader/src/dfu_bank.c b/mesh/bootloader/src/dfu_bank.c
    index f276be907..abf8c06ba 100644
    --- a/mesh/bootloader/src/dfu_bank.c
    +++ b/mesh/bootloader/src/dfu_bank.c
    @@ -313,7 +313,7 @@ static void flash_bank_entry(void)
                     bl_evt_t flash_evt;
                     flash_evt.type = BL_EVT_TYPE_FLASH_ERASE;
                     flash_evt.params.flash.erase.start_addr = (uint32_t)p_bank_entry->p_bank_addr;
    -                flash_evt.params.flash.erase.length = p_bank_entry->length;
    +                flash_evt.params.flash.erase.length = ((p_bank_entry->length + (PAGE_SIZE - 1)) & ~(PAGE_SIZE - 1));
                     (void) bootloader_evt_send(&flash_evt);
                     mp_bank_entry = NULL; /* reset the static bank pointer, as we no longer need it. */
                     __LOG("Bank area cleared.\n");
    

    Please consider that these are only assumptions after the source code investigation. Every solution might face additional issues that will require separate solving. Additionally, both solutions require good testing since the system `bootloader + app ---> DFU` is quite fragile from a stability point of view.  

  • Please consider that these are only assumptions after the source code investigation. Every solution might face additional issues that will require separate solving. Additionally, both solutions require good testing since the system `bootloader + app ---> DFU` is quite fragile from a stability point of view. 

    Has Nordic done a complete test of these patches all the way through, i.e. upgraded an actual device using the sequence of suggested steps?

  • Hi,

    Yes, one of our developer have tested the patches and it works.

Related