I am running custom firmware on a custom board using an nRF52832 chip. I have encountered an issue after migrating my project from SDK 14.2 to SDK 15.3. The issue is that my application is not starting and it occurs on two different occasions:
With that being said, if I merge the .hex files of the BL, BL settings, SD, and APP and flash them onto my device using nrfjprog, my application is loaded and runs properly. If I then remove my device from power and plug it back in, the application no longer starts. Loading the same bootloader and softdevice and then performing a DFU with my application does not successfully start my application either.
I tried both above scenarios with an example app from the SDK (blinky) which I modified to run on my custom board, and everything worked correctly. I was able to perform a DFU and have the application start, and if I power cycled the device, the app restarted.
This functionality used to work fine with my app when it was using SDK 14.2, it has only broken since migrating to SDK 15.3. Is there something that has changed that I need to be aware of?
Any help with the issue would be greatly appreciated.
You may want to re-compile the bootloader with optimization to level 0 (so you can debug) and step into the bootloader code to see why it doesn't jump to your application after a reset. I suspect your application may change (due to a flash operation ? ) causing CRC32 check doesn't match ?
Do you have any special function in your application related to UICR ? flash ?
I assume your application works fine if you only flash the softdevice and the application ?
Thank you for getting back to me! Unfortunately it's not just after a reset that my application does not start, it is also after a DFU.
We have no special function related to UICR. We use FDS in our application, but that is it.
Do you mean from SES? If so then yes, our application works fine if flashed from Segger with just the application and softdevice. It does not work if just the softdevice and application are flashed using nrfjprog (and then the device reset).
However, what is odd, is that if we take all of the .hex files of the bootloader, bootloader settings, softdevice, and application and merge them using mergehex and then flash them onto the device using nrfjprog and reset, the application runs and works fine. Even if I flash the .hex files each individually everything runs fine.
But if I flash just the bootloader and softdevice and then successfully perform a DFU, the bootloader does not jump to my application code. Is there a reason why it would work when I flash all the components together and not when I perform a DFU?
Could you try to comment out the functions related to ble_dfu_buttonless and check if you still having trouble ? (you can still test DFU without buttonless)
Regarding the change in SDK v15.3, there is only one thing I can think of is the change of the place the start address of the bootloader is stored. It's now no longer stored in the UICR but by default stored in the MBR (but it still backward compatible). You can have a look at my answer here.
Unfortunately commenting out the functions related to ble_dfu_buttonless did not resolve the issue.
I think I may have narrowed down the issue even further. As previously mentioned, our custom firmware is having the issue with the app being loaded correctly after an OTA DFU and on power on. However, I took one of the example apps from the SDK and changed it to work with our hardware and that app runs correctly after an OTA DFU and after a power cycle. So I tried to look into differences in how they are loaded/handled by the bootloader.
As far as I can tell, they are handled in exactly the same way. By using breakpoints and stepping through the bootloader code, after the DFU is successfully complete, the bootloader then starts the application by calling app_start(vector_table_addr) from within nrf_bootloader_app_start_final(). In both cases, the address being passed to app_start is 0x1000, which corresponds to the start address of the softdevice.
In the case of the modified example app, after app_start() is called the app starts running successfully. In the case of our custom firmware, once app_start() is called, the app does not start running, Segger is "Stopped by vector cache" and has the following call stack:
I then wanted to make sure that the application was being loaded into the right address (should be 0x26000). So I double checked 0x26000 in memory before and after the DFU was performed, and sure enough it was being loaded correctly.
Something else I tried was adding our application's .hex file under the loader options of the bootloader project in Segger. So if I build and run the bootloader in Segger, it loads the softdevice and our app and then runs the bootloader. Doing this successfully started our application. Again, stepping through the bootloader code, it called app_start() with the argument address being 0x1000, except this time it successfully started the application (I double checked that it was loaded at 0x26000)
So what is the difference between Segger loading the app into memory and the DFU process loading the app into memory before jumping into the softdevice code? Why does one work and the other not?
And even when the app is loaded in Segger and the bootloader starts it correctly, why is it that if it is power cycled it does not start up successfully again once plugged in?
It's normal to have app_start () jumping to 0x1000. It's where the vector table of the softdevice is and the softdevice will then forward the vector table to the application address which is at 0x26000.
One thing you would need to make sure is that your application is configured to start at 0x26000 (IROM1 start address at 0x26000). Note that, the application is always located at 0x26000 because of the bootloader. But if it's not configured to start at 0x26000, it may not work properly (the vector table offset could be wrong)
I would suggest to do the following:
- Clarify that the application (both release and debug version) can work without the bootloader (you may need to turn off buttonless DFU feature)
- Strip down your application to a minimal function, for example blinking an LED. If it works after an DFU, you can start testing with more feature until it stop. If the application can start with LED blinking, you can start doing debugging with it because this mean the program counter is jumping to the application's reset handler. You can remove CRC checking inside the bootloader, so you can test the new image without doing DFU, just flash it normally.
Do you have any flash or UICR operation in your application ?
Hung Bui said:Strip down your application to a minimal function, for example blinking an LED. If it works after an DFU, you can start testing with more feature until it stop.
The error occurs when calling fds_init(). Specifically, pages_init() returns with a value of NO_PAGES.
This, however, only occurs after bootloader code was run. If I erase all and then run only the application, pages_init() returns FRESH_INSTALL and things work normally.
From this, I have come to understand that because the bootloader writes to flash storage, my application is no longer able to initialize fds properly. How can I get around this issue?
EDIT: To further isolate the issue, and to hopefully allow you to reproduce it on your end, I programmed and ran the secure ble bootloader example from the SDK (with softdevice loaded) on a PCA10040 DK and then ran the flash_fds example from the SDK and encountered the same issue. If I then erased all on the device and ran the flash_fds example on it's own, it functioned properly. If I loaded and ran the bootloader and then ran the flash_fds example, I would get the issue with CLI output of:
I think we are getting closer to the problem.
I'm suspecting the change in the start address of the bootloader can cause the issue on the new SDK. The reason is that fds would detect where the bootloader is (or if the bootloader exist) and then allocate the data space accordingly.
We need to check if it allocated the area correctly. Could you add a breakpoint inside flash_subsystem_init() and check what's the end_addr and the start_addr inside flash_bounds_set() function ?
Also could you do a hex dump (nrfjprog --readcode) and check if there is any data stored in the pages right before the bootloader (0x78000 or 0x72000 if you use bootloader debug version)
Regarding reproducing the issue with flash_fds example, could you let me know how you tested it ?
Did you flash softdevice and bootloader. And then you did DFU update using a .zip file of the flash_fds example ?
Or you flash the flash_fds using the programmer and change the bootloader setting to accept that image ?
So I ran my application without loading the bootloader and the start and end addresses were 0x7D000 and 0x80000. I then ran the application with the bootloader loaded and the start and end addresses were the same. I'm assuming this is the issue since it overlaps with the bootloader's location in memory?
Also, I checked for data stored using Segger and there doesn't seem to be anything stored right before the bootloader (see image below).
As for reproducing the issue with the flash_fds example, I wish I could have tested it by performing a DFU, however a DFU package for the flash_fds example was not provided in the test images in the SDK, and since I do not have the private key of the SDK, I could not generate a DFU package on my own.
I actually tested it two ways with the same results. The first way I did it was load the bootloader project file into Segger and build and run. And then load the flash_fds project file into Segger and build and run. Also, I tried using nrfjprog to program the softdevice (with --chiperase) and then program the bootloader right after (with --reset). And then once that ran I used Segger to build and run the flash_fds example.
EDIT: I tried hardcoding the end_addr value in flash_bounds_set() to 0x78000 to see if that would resolve the issue and I got some interesting results. If I load and run the bootloader (with softdevice loaded) using nrfjprog and then use Segger to build and run my application, it now loads and runs properly! Also, if I use nrfjprog to load the softdevice, bootloader, and bootloader settings and then reset the chip, allow the bootloader to run, and then load my application and reset the chip, the application runs properly. HOWEVER, I tried only loading and running the bootloader+softdevice to then load the application using OTA DFU from the bootloader and the application did NOT start. It's also worth mentioning that in the instances where I got the application to start running properly (with the hardcoded value), the application did not restart after a power cycle.
I assume you have your own private key use with your bootloader ? You mentioned that you can do OTA DFU with your application ?
If you want to test the flash_fds you should either generate a .zip file of the hex or generate at bootloader setting. I attached here the .zip file and the bootloader setting I used. Saw no issue with the example.
Could you please double check what exactly return in flash_end_addr() ? when you have the bootloader installed and the application installed.
Especially check if you have this inside app_util.h
#define BOOTLOADER_ADDRESS ((*(uint32_t *)MBR_BOOTLOADER_ADDR) == 0xFFFFFFFF ? *MBR_UICR_BOOTLOADER_ADDR : *(uint32_t *)MBR_BOOTLOADER_ADDR)
The bootloader address should point to the start address of your bootloader.
I suspect that since we don't write to MBR_UICR_BOOTLOADER_ADDR any more it could cause an issue if you don't have correct BOOTLOADER_ADDRESS macro .
Good point, I have my own set of public/private keys that I will use with the examples to test the flash_fds app.
As for the BOOTLOADER_ADDRESS macro, unfortunately I’m currently not in the office, but I looked into it a lot yesterday so will try to respond from memory to see if we can resolve this issue today.
Yes we do have that macro inside app_util.h. I believe MBR_BOOTLOADER_ADDR was 0xFF8, pointing to a value of 0xFFFFFFFF. flash_end_addr() ended up returning 0x80000 because one of the lines of code has something that resembled two _sz being multiplied together.
I’m sorry if that’s not 100% detailed or clear, I’m going off memory, but I wanted to be able to respond before you left for the day. I’m not in tomorrow and would like to get this resolved before the weekend, if possible.
I was able to be in touch with one of your colleagues in Anaheim today, and they helped me work through some of the details of this issue. Essentially the MBR_BOOTLOADER_ADDR loaded into memory location 0xFF8 was correctly set at 0x78000 when the softdevice and bootloader were loaded, but for some reason, when using Segger to load the application, the value in 0xFF8 was being overwritten to 0xFFFFFFFF.
However, that did not happen when the app was loaded via nrfjprog or via DFU (the value remained 0x78000). So that was significantly throwing off our troubleshooting. I am now working through the issue with that knowledge in mind to better isolate the main problem.
It's really strange that the application would touch the MBR area. Do you have any code that may set the attribute to the MBR address (0-0x1000) ?
If you can provide the hex file that reproduce the problem I can have a look.