Several problems arising with OTA DFU.

Hi, I am currently using SDK 15.3.0 with SD 6.1.1 and am playing around with the buttonless DFU.

I noticed that after uploading the new firmware via nrf connect, many times the MCU just hardfaults. It happened e.g. after I set NRF_LOG_ENABLE to 0 in the sdk_config.h file for the new firmware (it was still enabled in the old version). this lead to an internal error in a ble advertising function - doesn't make sense at all. I also hardfault or run into an error when I only comment out NRF_LOG_INIT() in a newer version. Why doesn't this just work, which memory addresses does the logging module use that a dfu will certainly end in a broken firmware?

So now I turn of logging in an initial version and want to update the device with a new software, now apparently a timer is not working alright. Something generally seems wrong with the DFU process but I cant debug it and don't know which parameters might be wrong. 

Parents
  • Hi,

    Have you confirmed that you get an actual hardfault, or something else? What have you found out by debugging? How exactly did this happen? (There are very many things that can go wrong, so we need to narrow things down as much as possible as early as possible).

    Is it so that the issue issue happened after adding buttonless DFU to your application, or does it happen after a DFU update?

    The application would be identical after a DFU update as if it was flashed directly (assuming you made no changes), but there could be changed in the other parts of the flash in case of a dual bank update. Normally that would not cause problems, but it could potentially trigger a bug in your application.

    In short, please explain in more detail what you did and what exactly fails and how. If you go back to when this issue did not happen, can you isolate at which point things start to fail and what exactly you do in that case? Then, what can you find from debugging (for instance logging and inspecting after en error handler is run or something else, depending on what actually happened)?

  • Ok so here's what I did:

    flash initial software to MCU -> test it, works

    make new software (which has basically no changes in sdk_config, mainly business logic) -> test it, works

    DFU (new software replaces old software) -> sensor crashes

    this is the call stack after attaching a debugger after the DFU

    so I reset the sensor in debug mode, then this crash report appears:

    then I reattached the debugger and this crash report appears

    NRF_LOG is enabled and RTT is enabled on both. 

    does this help?

    I cant see whats going on in that advertising function, it crashes at this handler:

  • Juliusc said:
    I cant see whats going on in that advertising function, it crashes at this handler:

    When you write "crashes", what exactly do that mean? Can you elaborate? What do you find about the state of the device and what happened from debugging? And what do you see form the RTT log?

    Juliusc said:
    07c000 to 7ffff,

    As long as the flash is not used by anything else it should be OK. However, even if it is not used by anything else when the application is running, it could be that it gets overwritten during a DFU. Generally, you should use flash right below the bootloader (but take into account that the FDS pages are there as well if you use that, and in that case use flash right after the FDS pages again). See memory layout. This is because you can configure the bootloader to lay away of this region by adjusting NRF_DFU_APP_DATA_AREA_SIZE in the bootloader's sdk_config.h.

  • I mean it just doesnt work any more and if I attach a debugger I see that it hangs on in that particular state (as shown in the screenshots). 

    The DATA_AREA_SIZE was set to 12288 in the bootloader SDK, that should be enough as far as I can see?

  • Juliusc said:
    I mean it just doesnt work any more and if I attach a debugger I see that it hangs on in that particular state (as shown in the screenshots). 

    I see. The CPU will do something though (either be in sleep waiting for an event, in an eternal loop in an error handler, or something else). It is not just "hanging". So you should debug more to find what is actually the state (and from there backtrack to find out why).

    Juliusc said:
    that should be enough as far as I can see?

    It should be enough yes, but the size is not the issue here. This counts from right below the bootloader. So you need to get a overview of your memory layout and place the flash region you use for persistent storage at a sensible location as explained in my previous post.

  • I understand what youre saying, I cant tell you more than that here: 

    that's one of the halts I get (p_stack_address is 0 at this point)

    there are some other points where the application fails, but that's now the last failure I got after doing a DFU

    Do you have another pointer on what I shall look for? 

    So you need to get a overview of your memory layout and place the flash region you use for persistent storage at a sensible location as explained in my previous post.

    Understood, I will do that!

  • I see, here it is in a hardfault handler. There are several approaches you could use, but I would start by using the hardfault handler library. With that and logging you will get a printout that can be very helpful. Most example projects have the files as part of the project already, so you typically only need to set HARDFAULT_HANDLER_ENABLED to 1 in sdk_config.h.

Reply Children
  • yeah but thats where I am right now, in the hardfault handler and it faults at this nrf_log_final_flush function

    my hardfault handler is enabled tho

  • Hey Einar, so I changed the flash start and end addresses to 0xec000 to 0xeffff after finding out that the dfu always worked if I didn't use the storage module. 

    This now seems to work, I did some tests and until now it didn't fail. My question is now if you think that this is tangible or just a fluke and if I should take a close look to something in my storage usage? 

  • Hi,

    This makes sense and is good to hear. I don't know your exa t memory layout(?), but if you have gotten an overview of that and see that the page starting at 0xec000 does not overlap with anything ese and is not used by FDS, and is within the reserved area of the bootloader (that it will never touch), then this is OK.

  • can you give me a hint on how to view my memory layout? That would probably help us both in the assessment. 

  • It is not so much a matter of viewing it, but more drawing it up on a piece of paper or similar, with the addresses you have for the various parts of your project. Some things are given though.

    First, look at the memory layout from the bootloader documentation. That has numbers for various ICs so you can use that as a starting point, also the figure you can re-use.

    Then add the following:

    • first page (0x0): MBR
    • starting at the second page (0x1000). SoftDevice. This will end at the size of the SoftDevice, which you can see form the SoftDevice specification (0x26000 for S140 6.1.1 if that is what you are using)
    • The application starts right after the SoftDevice, so at (0x26000 if using S140 6.1.1). Where it ends depends on how large it is. Look at the build output or check the application hex in nRF Connect Programmer or some other tool
    • The bootloader starts wherever the bootloader project you are using is defined to start in the linker configuration. Let's say the bootloader start at 0xf8000 for now as an example (this is a typical value SDK 15.3 based BLE  bootloaders in release mode).
    • There must be two available flash pages at the end though, one form MBR params (second to last) and one for bootloader settings (the last). 

    These are all the things that are easy to locate. And based on this, you can say more. If you use FDS, they are right below (lower address) than the bootloader. The number of pages are configured in the application sdk_config.h.

    If you don't use FDS at all in your application, you can use a page right below the bootloader for stuff you store directly to flash. If you do, pick a page right below this again. Put all this in a figure so that you see your full flash memory layout.

    And lastly, remember to update NRF_DFU_APP_DATA_AREA_SIZE in the bootloader's sdk_config.h so that it stays away of any flash you use right below the bootloader (this needs to cover both FDS and anything else). This needs to be a full multiple of 0x1000 which is a flash page, and counts downwards from the bootloader. See the figure linked to earlier in this post.

Related