I am running custom firmware on a custom board using an nRF52832 chip. I have encountered an issue after migrating my project from SDK 14.2 to SDK 15.3. The issue is that my application is not starting and it occurs on two different occasions:
With that being said, if I merge the .hex files of the BL, BL settings, SD, and APP and flash them onto my device using nrfjprog, my application is loaded and runs properly. If I then remove my device from power and plug it back in, the application no longer starts. Loading the same bootloader and softdevice and then performing a DFU with my application does not successfully start my application either.
I tried both above scenarios with an example app from the SDK (blinky) which I modified to run on my custom board, and everything worked correctly. I was able to perform a DFU and have the application start, and if I power cycled the device, the app restarted.
This functionality used to work fine with my app when it was using SDK 14.2, it has only broken since migrating to SDK 15.3. Is there something that has changed that I need to be aware of?
Any help with the issue would be greatly appreciated.
You may want to re-compile the bootloader with optimization to level 0 (so you can debug) and step into the bootloader code to see why it doesn't jump to your application after a reset. I suspect your application may change (due to a flash operation ? ) causing CRC32 check doesn't match ?
Do you have any special function in your application related to UICR ? flash ?
I assume your application works fine if you only flash the softdevice and the application ?
Thank you for getting back to me! Unfortunately it's not just after a reset that my application does not start, it is also after a DFU.
We have no special function related to UICR. We use FDS in our application, but that is it.
Do you mean from SES? If so then yes, our application works fine if flashed from Segger with just the application and softdevice. It does not work if just the softdevice and application are flashed using nrfjprog (and then the device reset).
However, what is odd, is that if we take all of the .hex files of the bootloader, bootloader settings, softdevice, and application and merge them using mergehex and then flash them onto the device using nrfjprog and reset, the application runs and works fine. Even if I flash the .hex files each individually everything runs fine.
But if I flash just the bootloader and softdevice and then successfully perform a DFU, the bootloader does not jump to my application code. Is there a reason why it would work when I flash all the components together and not when I perform a DFU?
To isolate the issue, I would suggest we focus into the observation that if you flash just the softdevice and the application with nrfjprog, the application won't start after a reset. (So it does not involve the bootloader)
We need to find out why it works with Segger but not with nrfjprog. You can do a nrfjprog --readcode and compare the hex file between flashing with nrfjprog and Segger.
What you can do is to trim down the functionality of the application (until it only blink an LED for example) we can narrow down where the problem could be. You can simply put an infinite loop into your code to limit the functionality.
Then you can send us the minimal hex file that we can try to test here on a nRF52 DK.
Thanks again for getting back to me. Unfortunately I realized that I may have had a piece of misleading information in my last reply without realizing. I mentioned that when I use nrfjprog my application runs fine but when I use Segger it does not. I realize that the difference is that in Segger I was flashing the debug build and with nrfjprog I was flashing the release build. If I flash the debug build with nrfjprog, my applications runs properly. Also, if I flash the release build using Segger, it does not work. All that to say, I am seeing the same behaviour when flashing with nrfjprog as I am when flashing with Segger.
Also, to clarify, when flashing the debug build, the application runs properly but does not start again after a power cycle. The application never starts after a power cycle.
Also, unfortunately I don't be able to provide you with a .hex file to test on your nRF52 DK since we are using a custom board and have different pin definitions.
Lastly, I never had any of these issues for the year+ that our firmware was running on the same board but with SDK 14.2. Can you think of what might have changed in SDK 15.3 that would have cause this functionality to break after the migration?
Edit: It's worth mentioning that I also tried performing a DFU with a package generated from the debug build of our application and it had the same result as with the release mode package. Either way, I did some further investigating and realized that the reason the release build threw an error is because it was calling ble_dfu_buttonless_async_svci_init(), which assumes the presence of a bootloader, which there wasn't.
Are there any new bootloader-related configurations or flags we need to set in our application that are new to SDK 15.3?
Could you try to comment out the functions related to ble_dfu_buttonless and check if you still having trouble ? (you can still test DFU without buttonless)
Regarding the change in SDK v15.3, there is only one thing I can think of is the change of the place the start address of the bootloader is stored. It's now no longer stored in the UICR but by default stored in the MBR (but it still backward compatible). You can have a look at my answer here.
Unfortunately commenting out the functions related to ble_dfu_buttonless did not resolve the issue.
I think I may have narrowed down the issue even further. As previously mentioned, our custom firmware is having the issue with the app being loaded correctly after an OTA DFU and on power on. However, I took one of the example apps from the SDK and changed it to work with our hardware and that app runs correctly after an OTA DFU and after a power cycle. So I tried to look into differences in how they are loaded/handled by the bootloader.
As far as I can tell, they are handled in exactly the same way. By using breakpoints and stepping through the bootloader code, after the DFU is successfully complete, the bootloader then starts the application by calling app_start(vector_table_addr) from within nrf_bootloader_app_start_final(). In both cases, the address being passed to app_start is 0x1000, which corresponds to the start address of the softdevice.
In the case of the modified example app, after app_start() is called the app starts running successfully. In the case of our custom firmware, once app_start() is called, the app does not start running, Segger is "Stopped by vector cache" and has the following call stack:
I then wanted to make sure that the application was being loaded into the right address (should be 0x26000). So I double checked 0x26000 in memory before and after the DFU was performed, and sure enough it was being loaded correctly.
Something else I tried was adding our application's .hex file under the loader options of the bootloader project in Segger. So if I build and run the bootloader in Segger, it loads the softdevice and our app and then runs the bootloader. Doing this successfully started our application. Again, stepping through the bootloader code, it called app_start() with the argument address being 0x1000, except this time it successfully started the application (I double checked that it was loaded at 0x26000)
So what is the difference between Segger loading the app into memory and the DFU process loading the app into memory before jumping into the softdevice code? Why does one work and the other not?
And even when the app is loaded in Segger and the bootloader starts it correctly, why is it that if it is power cycled it does not start up successfully again once plugged in?
It's normal to have app_start () jumping to 0x1000. It's where the vector table of the softdevice is and the softdevice will then forward the vector table to the application address which is at 0x26000.
One thing you would need to make sure is that your application is configured to start at 0x26000 (IROM1 start address at 0x26000). Note that, the application is always located at 0x26000 because of the bootloader. But if it's not configured to start at 0x26000, it may not work properly (the vector table offset could be wrong)
I would suggest to do the following:
- Clarify that the application (both release and debug version) can work without the bootloader (you may need to turn off buttonless DFU feature)
- Strip down your application to a minimal function, for example blinking an LED. If it works after an DFU, you can start testing with more feature until it stop. If the application can start with LED blinking, you can start doing debugging with it because this mean the program counter is jumping to the application's reset handler. You can remove CRC checking inside the bootloader, so you can test the new image without doing DFU, just flash it normally.
Do you have any flash or UICR operation in your application ?
Hung Bui said:Strip down your application to a minimal function, for example blinking an LED. If it works after an DFU, you can start testing with more feature until it stop.
The error occurs when calling fds_init(). Specifically, pages_init() returns with a value of NO_PAGES.
This, however, only occurs after bootloader code was run. If I erase all and then run only the application, pages_init() returns FRESH_INSTALL and things work normally.
From this, I have come to understand that because the bootloader writes to flash storage, my application is no longer able to initialize fds properly. How can I get around this issue?
EDIT: To further isolate the issue, and to hopefully allow you to reproduce it on your end, I programmed and ran the secure ble bootloader example from the SDK (with softdevice loaded) on a PCA10040 DK and then ran the flash_fds example from the SDK and encountered the same issue. If I then erased all on the device and ran the flash_fds example on it's own, it functioned properly. If I loaded and ran the bootloader and then ran the flash_fds example, I would get the issue with CLI output of: