This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Issues with DSM and Access persistent storage

I have a node running with the SDK for Mesh  v3.1.0. It crashes during boot up in restore_addresses_for_model() due to dsm_address_subscription_add_handle() returning NRF_ERROR_NOT_FOUND.

This is because nothing has been restored from flash for DSM.

In mesh_stack_init() the return value from dsm_flash_config_load() is ignored and access_flash_config_load() is called regardless.

    (void) dsm_flash_config_load();
    (void) access_flash_config_load();

Wouldn't it make more sense to erase the access flash area in case the DSM failed to load as access depends on DSM?

dsm_flash_config_load() fails because it can't read the dsm metadata entry. I have no clue into how it could have been lost.

A hexdump of the DSM flash area. It doesn't contain the expected DSM_FLASH_HANDLE_METAINFO. It is however sealed.

:10400000080410100100FFFF02000200010004007C
:1040100006000030000000D68706571C8FA8CCFE93
:104020005B1938489A8CAE00060000500100000071
:1040300090B51101530FB8BA733987CE72F58F411D
:104040000600004000000000E477A7878EC8A8BFE4
:1040500050CD2D0A119EB2FF0200001001C00210C7
:104060000200011002C00A10020002100400011038
:10407000FFFFFF7FFFFFFFFFFFFFFFFFFFFFFFFFD0

The recovery page contains the following, which indicate that a defrag was performed and completed successfully.

:1060000000000000080410100100FFFF0200020061
:10601000FFFF01100200010000800110FFFFFFFFE1
:10602000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF80

bool dsm_flash_config_load(void)
{
    if (!m_flash_is_available)
    {
        return false;
    }
    dsm_flash_entry_metainfo_t metainfo;
    uint32_t metainfo_size = sizeof(metainfo);
    if (flash_manager_entry_read(&m_flash_manager,
                                 DSM_FLASH_HANDLE_METAINFO,
                                 &metainfo,
                                 &metainfo_size) != NRF_SUCCESS)
    {
        return false;
    }

Wouldn't it make sense to call reset_flash_area() if it can't find the metainfo? As the dsm_init() must have created it in case the area was empty?

Thanks.

Parents
  • Hi. 

    I've forwarded this to our Mesh developers. I will get back to you when I receive a response from them. 

    Best regards, 
    Joakim Jakobsen

  • Any update on this? I still see that the return value of dsm_flash_config_load() is ignored in 3.2.0.

  • Hi ,

    Has there been any progress on this internally?

    Thanks.

  • I've debugged this issue for some time now. My theory of why this happens is because the device is reset after the flash manager metadata is written but before the DSM metadata is written.

    The code simply assumes that the presence of the first indicates the presence of the latter which isn't always true.

    The dsm_flash_config_load() tries to readout the DSM metadata, if that fails it simply returns false, which wrongly is ignored by the caller.

    In the beginning there is no access configuration to read back from flash and thus it doesn't fail. Regular operations continues and both DSM and Access data are saved to flash.

    On the next bootup the dsm_flash_config_load() call again fails to find its metadata and never load the real data from flash and thus the access layer load asserts because it tries to reference DSM entries that wasn't loaded.

    I'm currently using the following patch as a workaround until there is an official solution to this problem.

    diff --git a/mesh/access/api/access_config.h b/mesh/access/api/access_config.h
    index 8e3fd31..4b52c67 100644
    --- a/mesh/access/api/access_config.h
    +++ b/mesh/access/api/access_config.h
    @@ -54,6 +54,11 @@
      * @{
      */
     
    +/**
    + * Clear access layer configuration from flash.
    + */
    +void access_flash_config_clear(void);
    +
     /**
      * Recover access layer configuration from flash.
      *
    diff --git a/mesh/access/src/access.c b/mesh/access/src/access.c
    index a8bdb29..b1149d0 100644
    --- a/mesh/access/src/access.c
    +++ b/mesh/access/src/access.c
    @@ -946,7 +946,7 @@ static inline uint32_t element_store(uint16_t element_index)
     
     /* ********** Public API ********** */
     
    -static void access_flash_config_clear(void)
    +void access_flash_config_clear(void)
     {
         static fm_mem_listener_t flash_clear_mem_available_struct =
             {
    @@ -1066,7 +1066,7 @@ void access_flash_config_store(void)
         bearer_event_critical_section_end();
     }
     #else
    -static void access_flash_config_clear(void)
    +void access_flash_config_clear(void)
     {
         m_access_model_config_frozen = false;
     }
    diff --git a/mesh/access/src/device_state_manager.c b/mesh/access/src/device_state_manager.c
    index b073ae6..97b708f 100644
    --- a/mesh/access/src/device_state_manager.c
    +++ b/mesh/access/src/device_state_manager.c
    @@ -1785,7 +1785,14 @@ bool dsm_flash_config_load(void)
                                      &metainfo,
                                      &metainfo_size) != NRF_SUCCESS)
         {
    -        return false;
    +        if (m_flash_manager.internal.state != FM_STATE_BUILDING) {
    +            // The flash isn't building thus the METAINFO should always be
    +            // available. Something is clearly wrong.
    +            reset_flash_area();
    +            return false;
    +        }
    +
    +        return true;
         }
     
         if (metainfo.max_addrs_nonvirtual != DSM_NONVIRTUAL_ADDR_MAX ||
    diff --git a/mesh/stack/src/mesh_stack.c b/mesh/stack/src/mesh_stack.c
    index f98adb8..e35164f 100644
    --- a/mesh/stack/src/mesh_stack.c
    +++ b/mesh/stack/src/mesh_stack.c
    @@ -109,11 +109,14 @@ uint32_t mesh_stack_init(const mesh_stack_init_params_t * p_init_params,
             p_init_params->models.models_init_cb();
         }
     
    -    (void) dsm_flash_config_load();
    -
    -    if (access_flash_config_load())
    +    if (dsm_flash_config_load())
         {
    -        access_flash_config_store();
    +        if (access_flash_config_load())
    +        {
    +            access_flash_config_store();
    +        }
    +    } else {
    +        access_flash_config_clear();
         }
     
         bool is_provisioned = mesh_stack_is_device_provisioned();
    

    Thanks.

  • Hi. 

    Thanks for sharing your workaround. 

    Our developers is working on this, but I haven't gotten an official solution to this yet.
    I know this will be fixed in future releases, but I'll ask for a fix that can be used with previous versions of the nRF5 SDK for Mesh as well. 

    Best regards. 

  • I realized that the patch posted earlier wasn't complete after the upgrade to v3.2.0.

    The following is in addition to the previous patch. It ensure that failure to flash loading of DSM causes the access layer to clear it's flash as well.

    It changes the behavior of the return value from dsm_flash_config_load() to return false if it wasn't possible to read anything useful from it. Previously it would return false if the flash was properly initialized but no address was stored.

    One might consider moving the access_flash_config_clear() into the dsm_flash_config_load() after each call to reset_flash_area() which would be more similar to the access_flash_config_load() code.

    diff --git a/mesh/access/src/access.c b/mesh/access/src/access.c
    index 4fc61c7..e584a6e 100644
    --- a/mesh/access/src/access.c
    +++ b/mesh/access/src/access.c
    @@ -977,6 +977,18 @@ bool access_flash_config_load(void)
             NRF_MESH_ASSERT(ACCESS_INTERNAL_STATE_IS_ALLOCATED(m_subscription_list_pool[i].internal_state));
         }
     
    +    if (m_flash_not_ready)
    +    {
    +        // Flash is currently being erased, consider this as a success and
    +        // freeze further changes.
    +        // NOTE: This part of the code assumes that the flash_manager_remove()
    +        // call in access_flash_config_clear() was successful. Otherwise it
    +        // would try again async causing the m_access_model_config_froze variable
    +        // to be wrongfully set to false.
    +        m_access_model_config_frozen = true;
    +        return true;
    +    }
    +
         if (flash_manager_entry_read(&m_flash_manager, FLASH_HANDLE_METADATA, &metadata, &metadata_size) != NRF_SUCCESS)
         {
             /* Entry could not be read, means the application is trying to load access data for the
    @@ -1007,6 +1019,11 @@ bool access_flash_config_load(void)
         }
     
         /* Freeze composition data. */
    +
    +    // NOTE: This part of the code assumes that the flash_manager_remove()
    +    // call in access_flash_config_clear() was successful. Otherwise it
    +    // would try again async causing the m_access_model_config_froze variable
    +    // to be wrongfully set to false.
         m_access_model_config_frozen = true;
     
         return config_restored;
    diff --git a/mesh/access/src/device_state_manager.c b/mesh/access/src/device_state_manager.c
    index 97b708f..fae2bd0 100644
    --- a/mesh/access/src/device_state_manager.c
    +++ b/mesh/access/src/device_state_manager.c
    @@ -1807,8 +1807,8 @@ bool dsm_flash_config_load(void)
         }
         /* Run through the rest of the entries and load them based on type */
         (void) flash_manager_entries_read(&m_flash_manager, NULL, flash_read_callback, NULL);
    -    /* The storage was valid if there was a local unicast address present */
    -    return bitfield_get(m_addr_unicast_allocated, 0);
    +
    +    return true;
     }
     
     bool dsm_has_unflashed_data(void)
    diff --git a/mesh/stack/src/mesh_stack.c b/mesh/stack/src/mesh_stack.c
    index e35164f..6cd38fc 100644
    --- a/mesh/stack/src/mesh_stack.c
    +++ b/mesh/stack/src/mesh_stack.c
    @@ -109,16 +109,21 @@ uint32_t mesh_stack_init(const mesh_stack_init_params_t * p_init_params,
             p_init_params->models.models_init_cb();
         }
     
    -    if (dsm_flash_config_load())
    +    if (!dsm_flash_config_load())
         {
    -        if (access_flash_config_load())
    -        {
    -            access_flash_config_store();
    -        }
    -    } else {
    +        // As any stored access data depends on the DSM it doesn't make sense
    +        // to keep that old data around.
             access_flash_config_clear();
         }
     
    +    // Regardless of the clear above, try to load it to mark the composition
    +    // as frozen.
    +    if (access_flash_config_load())
    +    {
    +        // This should only store the metadata.
    +        access_flash_config_store();
    +    }
    +
         bool is_provisioned = mesh_stack_is_device_provisioned();
     
     #if MESH_FEATURE_GATT_PROXY_ENABLED
    

    Thanks.

Reply
  • I realized that the patch posted earlier wasn't complete after the upgrade to v3.2.0.

    The following is in addition to the previous patch. It ensure that failure to flash loading of DSM causes the access layer to clear it's flash as well.

    It changes the behavior of the return value from dsm_flash_config_load() to return false if it wasn't possible to read anything useful from it. Previously it would return false if the flash was properly initialized but no address was stored.

    One might consider moving the access_flash_config_clear() into the dsm_flash_config_load() after each call to reset_flash_area() which would be more similar to the access_flash_config_load() code.

    diff --git a/mesh/access/src/access.c b/mesh/access/src/access.c
    index 4fc61c7..e584a6e 100644
    --- a/mesh/access/src/access.c
    +++ b/mesh/access/src/access.c
    @@ -977,6 +977,18 @@ bool access_flash_config_load(void)
             NRF_MESH_ASSERT(ACCESS_INTERNAL_STATE_IS_ALLOCATED(m_subscription_list_pool[i].internal_state));
         }
     
    +    if (m_flash_not_ready)
    +    {
    +        // Flash is currently being erased, consider this as a success and
    +        // freeze further changes.
    +        // NOTE: This part of the code assumes that the flash_manager_remove()
    +        // call in access_flash_config_clear() was successful. Otherwise it
    +        // would try again async causing the m_access_model_config_froze variable
    +        // to be wrongfully set to false.
    +        m_access_model_config_frozen = true;
    +        return true;
    +    }
    +
         if (flash_manager_entry_read(&m_flash_manager, FLASH_HANDLE_METADATA, &metadata, &metadata_size) != NRF_SUCCESS)
         {
             /* Entry could not be read, means the application is trying to load access data for the
    @@ -1007,6 +1019,11 @@ bool access_flash_config_load(void)
         }
     
         /* Freeze composition data. */
    +
    +    // NOTE: This part of the code assumes that the flash_manager_remove()
    +    // call in access_flash_config_clear() was successful. Otherwise it
    +    // would try again async causing the m_access_model_config_froze variable
    +    // to be wrongfully set to false.
         m_access_model_config_frozen = true;
     
         return config_restored;
    diff --git a/mesh/access/src/device_state_manager.c b/mesh/access/src/device_state_manager.c
    index 97b708f..fae2bd0 100644
    --- a/mesh/access/src/device_state_manager.c
    +++ b/mesh/access/src/device_state_manager.c
    @@ -1807,8 +1807,8 @@ bool dsm_flash_config_load(void)
         }
         /* Run through the rest of the entries and load them based on type */
         (void) flash_manager_entries_read(&m_flash_manager, NULL, flash_read_callback, NULL);
    -    /* The storage was valid if there was a local unicast address present */
    -    return bitfield_get(m_addr_unicast_allocated, 0);
    +
    +    return true;
     }
     
     bool dsm_has_unflashed_data(void)
    diff --git a/mesh/stack/src/mesh_stack.c b/mesh/stack/src/mesh_stack.c
    index e35164f..6cd38fc 100644
    --- a/mesh/stack/src/mesh_stack.c
    +++ b/mesh/stack/src/mesh_stack.c
    @@ -109,16 +109,21 @@ uint32_t mesh_stack_init(const mesh_stack_init_params_t * p_init_params,
             p_init_params->models.models_init_cb();
         }
     
    -    if (dsm_flash_config_load())
    +    if (!dsm_flash_config_load())
         {
    -        if (access_flash_config_load())
    -        {
    -            access_flash_config_store();
    -        }
    -    } else {
    +        // As any stored access data depends on the DSM it doesn't make sense
    +        // to keep that old data around.
             access_flash_config_clear();
         }
     
    +    // Regardless of the clear above, try to load it to mark the composition
    +    // as frozen.
    +    if (access_flash_config_load())
    +    {
    +        // This should only store the metadata.
    +        access_flash_config_store();
    +    }
    +
         bool is_provisioned = mesh_stack_is_device_provisioned();
     
     #if MESH_FEATURE_GATT_PROXY_ENABLED
    

    Thanks.

Children
No Data
Related