assert flash Memory Metadata not aligned after FOTA

Hi There

I run a NRF52840 with the MESH SDK, and use the Flash manager provided.

After some time i would like to update the Device over FOTA this ends in a Assert on booting with the new firmware at the point flash_manager.c Line 256

NRF_MESH_ASSERT(p_manager->config.p_area[i].metadata.pages_in_area == p_manager->config.page_count);

when erase the full mcu this assert is not showing up due no existing data.

In the best case i can keep the data in the Fields, when not how to erase the Flash bevore or at the FOTA?

Regards Simon

Parents
  • Hi Simon,

    Our apologies, this is something we should have done first, but I would like to check, are you working on a new project?

    If so, I strongly recommend using the nRF Connect SDK (NCS) instead. The nRF5 SDK is in maintenance mode and is not recommended for new project. We also provide online course here to help you ramp up quickly with NCS: Nordic Semiconductor Online Learning Platform - Nordic Developer.

    Notably, the DFU feature in the nRF5 SDK for Mesh is experimental and contains some flaw. It is also proprietary. Meanwhile, NCS supports a new Mesh DFU solution, which is now specified by the Bluetooth SIG in the Mesh Device Firmware Update Model spec.

    Hieu

  • Hi Hieu

    It is an old Project, what in the first instance was rollout 3 Years ago, to update the Devices in the field i would like to provied a new version over FOTA.

    As far i know, the issue arises also with the NCS when flashing on (the existing) Mesh SDK Projekt over FOTA or not "erase all" when flashing over jlink.

    The Issue is not from the DFU itsleve, it comes after booting the application with a different flash configuration than present in the flash. 

    The following ilustration what is happening, While checking Flash Address FC there is a missmatch in the configuration, and the Assert arrises this is ok so far. now i would like to avid the asserting with the ideas in the upper posts.

    I just wondered how you would propperly handle this.

    Flash Address Old New
    0x00000FE Mesh Data Mesh Data
    0x00000FD App Data 1 App Data1
    0x00000FC App Data 2 App Data1.1 <- here it asserts
    0x00000FB App Data 3 App Data 2
    0x00000FA empty App Data 3
    0x00000F9 empty App Data 4

    Regards

    Simon

  • Hi Simon,

    The short version is that, if I understand correctly, you are trying to expand the total flash area the application can store data in, and you do so by modifying the existing flash_manager instance.

    That will not work. You can however create a new flash_manager instance to store any new data in.

    Is that an option for you? Or do you need to increase the size of the data already stored?

    If I misunderstood anything, or if that doesn't work for you, below is the long answer, with some questions from me.

    1. Regarding memory layout

    Simsim said:
    Flash Address Old New
    0x00000FE Mesh Data Mesh Data
    0x00000FD App Data 1 App Data1
    0x00000FC App Data 2 App Data1.1 <- here it asserts
    0x00000FB App Data 3 App Data 2
    0x00000FA empty App Data 3
    0x00000F9 empty App Data 4

    1. A. I assume you mean the address as 0xF*000 here, rather than 0x00000F*.

    1. B. How are you certain that the Mesh data is only from 0xFE000? Did you use configuration like ACCESS, DSM, and NET_FLASH_AREA_LOCATION configurations?
    Or did you control the flash placement somehow?
    Or did you check with mesh_config_backend_flash_usage_get()?

    1. C. How are you using the Flash Manager module?

    2. Regarding your description on 26 Aug:

    Simsim said:
    I found many beginners struggling when loading different exampels without --eraseall of the mcu the get stuck in the asserts. 

    Yes, the past storage value might not make sense to new application, causing assert.

    However, for the case of DFU, if you don't change the Mesh stack features, then the old data should be compatible with the new firmware.

    Even if you do change the Mesh stack features, in a lot of case, the data is still compatible.

    Simsim said:
    I would like to build a software version which can be uploaded over FOTA with a partial blank flash sector or check the nvmc for a flash config id which i can compare with my actual if it match i will proceed, otherwise i had to erase all the flash to ensure the data integrity.

    Right, and,

    2.A. What have you done to attempt at this? Or is this something you are having trouble with?

    Simsim said:

    mesh_stack_config_clear() is alredy implemented and working as expected.

    while taking the mesh_stack_config_clear() as sample, there i discovered that the entry will be first overwritten and then cleared, otherwise i will find the metadata of the depreciated flash entry and assume the flash is valid?

    2.B. Could you please elaborate what you are doing here?

    Simsim said:

    The assert i could reproduce is when the pagecount is increased so that the memory layout will be shiftedfor the following entries.

    Code line 10 assert at flash_manager.c line 256

    2.C. Could you please get me the full call stack when the assert happens, using the debugger?

    Simsim said:
    Can you short summarise how to erase properly the user flash  entries/pages (without knowing the old layout).

    I am having issue figuring out how the Mesh stack can avoid having the application mess with its flash_manager area. It seems like there is no defense in place at all, and the application must use mesh_config_backend_flash_usage_get() to see that and avoid it on their own.

    So, to answer this, I need you to know the information I am asking about above. But to give you some initial information, you will use either flash_manager_remove() or flash_manager_entry_invalidate().

  • Hi Hieu

    I try to answer all your questions.

    1. A. I assume you mean the address as 0xF*000 here, rather than 0x00000F*.

    - That is correct, i shortened the trailing zeroes of the memory adress.

    1. B. How are you certain that the Mesh data is only from 0xFE000?

    - I'm not realy certain, but that is what i see when i use the nrf Mesh sdk example, this part i didn't modify from the example.

    -mesh_config_backend_flash_usage_get() is called after the assert, so i suggest that this information cant help handle arrround the assert.

    1. C. How are you using the Flash Manager module?

    - Indirect over the mesh flash manager module. see the call stack.

    - I followed the App_Data example for userdata peristent storage. 

    2.A. What have you done to attempt at this? Or is this something you are having trouble with?

    -this was an idea how to solve the problem with the unmatching flash placment(while programmimng with jlink there is an option to erase partial sectors), i could not find an example of some one who did this bevore so im not shure where to start to implement this function.

    2.B. Could you please elaborate what you are doing here?

    - i tried to understand what happend while erasing the flash, mesh_stack_config_clear() calls the cleanup function of the mesh entry, what is mostly defautling data and reboot the mcu. This depends on the existing Flash Manager, which is not present in the flash_init where it asserts in my case. 

     2.C. Could you please get me the full call stack when the assert happens, using the debugger?

    call Stack

    As picture, odd that segger studio not allows to export.

     

  • Hi Simon,

    Something is not checking out. If the assert happens that early, then it happens before any of the flash layout change you make are put into effect.

    What are your exact steps to reach this point? Starting from a working device running the old firmware, do you just flash-without-erase the application region? Or do you DFU in the new firmware?

    Did you change any Mesh related configurations between the old and the new firmware at all?
    If yes, then the assert would be understandable. However, it also means that mesh_stack_config_clear() should help avoiding it. How are you using mesh_stack_config_clear()?
    The ideal point to use it is before applying the new firmware. That most often means it should be done on the old firmware. Perhaps there can be some clever way to do it in the new firmware... We can explore that as we clear up our status and requirements.

    Simsim said:

    1. C. How are you using the Flash Manager module?

    - Indirect over the mesh flash manager module. see the call stack.

    What I mean is, how do you get the flash_manager_t structure to use with your application data?

    If you create your own structure, what parameters/configurations did you use? How did you register it with the Flash Manager module? Did you call flash_manager_add() on your own?

    These flash_manager_t instances are how you can separate different flash regions for different purposes.

  • Hi Hieu

    Thanks for digging in this case.

    Q:What are your exact steps to reach this point? Starting from a working device running the old firmware, do you just flash-without-erase the application region? Or do you DFU in the new firmware?

    A: It happens in both cases, i discovered this first with FOTA over the bootloader, and could rebuild while:

    Flash old firmware + Softdevice then load new firmware and softdevice both out of segger studio the assert happens. 

    Q:Did you change any Mesh related configurations between the old and the new firmware at all?
    If yes, then the assert would be understandable. However, it also means that mesh_stack_config_clear() should help avoiding it. How are you using mesh_stack_config_clear()?

    A: mesh_stack_config_clear()is called when button is pressed as it is in the example, when calling this to early it asserts due to non existing managers.

    Q:What I mean is, how do you get the flash_manager_t structure to use with your application data?

    i use different managers by adding with the following scheme:

    Define Configuration:

    #define APP_ENTRY_ID MESH_CONFIG_ENTRY_ID(0x0011, 0x0001)
    
    MESH_CONFIG_FILE(m_app_file, 0x0011, MESH_CONFIG_STRATEGY_CONTINUOUS);
    
    
    MESH_CONFIG_ENTRY(m_app_entry,
                      APP_ENTRY_ID,
                      1, // change from 1 to 2 entry will give assert after dfu
                      sizeof(t_Par),
                      app_entry_setter,
                      app_entry_getter,
                      NULL,
                      true); 

    // init mesh only once

    mesh_config_init();

    this will call:

    mesh_config_backend_init(entry_params_get(0), CONFIG_ENTRY_COUNT, file_params_get(0), CONFIG_FILE_COUNT, backend_evt_handler);

    and then go to:

    NRF_MESH_ERROR_CHECK(mesh_config_backend_file_create(p_file->p_backend_data));

    and this calls:

    flash_manager_add(p_manager, &config);

    and while checking the metadata versus the config it fails at the page count.

    While tickering arround i can confirm the following behaviour:

    Chaning pagecount upwards will assert that means when there are 1 page and in the new firmware there were now 2 it wont work out of the box.

    Adding a aditional manager with clean entry will be added without any issue. 

    Learning:

    Do: if more space needed create a new manager with a new entry

    Don't: enlarge the pagecount in existing managers 

  • Edit: Withdrew answer. I learned that I had some misunderstanding from discussing with a colleague.

    Please wait while I clear something out. I will follow-up soon.

Reply Children
  • Hi Simon,

    Then it makes sense now. The application is hitchhiking the Mesh stack's configuration system to store its data.

    Therefore, even though you do not change any Mesh configurations, when you change your application data size, it changes the Mesh configuration data size, causing the assert.

    I was not aware of this approach to store application data. Apparently, that is documented

    I will now propose a few directions:

    1. Switch to using an independent Flash Manager instance to store application data.

    You now create a new flash_manager_t with enough space to store all of the data in the new firmware.

    Remember to do so after Mesh stack initialization and use mesh_config_backend_flash_usage_get() to make sure the application's Flash Manager is independent of the Mesh stack.

    You also keep the previous hitchhiking implementation as is, no change to size.

    On first boot of the new firmware, after Mesh stack initialization (which shouldn't assert anymore) and the flash_manager_add() call of the new manager, the firmware check for data in the new application Flash Manager. If the data isn't there, it then performs (or schedule) a migration of the data from the old system to the new one.

    Then after that, it goes on to work normally.

    This also means that you don't have to sacrifice the Mesh data with mesh_stack_config_clear(). The device won't need to be reprovisioned.

    Another factor to consider here is future proofing to prevent further change to data layout.

    If you think that Mesh configurations might be changed in the future, then you will want to initialize the application's Flash Manager instance one or two pages away from the Mesh stack's data.

    If you also expect that the application data will change in another update, it is also a good idea to reserve some extra space for it.

    It will also be important to document the known flash memory layout carefully for future maintainers.

    2. Keep old application data with the Mesh Configuration solution. Create a new Flash Manager instance to store only the new data.

    The caveats regarding placement of the new Flash Manager instance in option 1 also applies here.

    3. Stay with the Mesh Configuration Module storage solution.

    If you prefer to stay with the Mesh Configuration hitchhiking approach, then you need to figure out how to trigger mesh_stack_config_clear() on the new application exactly once on the very first boot of the new firmware version, before the assert.

    However, I am unable to figure out what signs/basis the application can use to call mesh_stack_config_clear()... Is there something particular to your project that can help here?

    I have a feeling that will be hacky/workaround-ish. If you also find it so, I recommend the other two options more.

Related