Inconsistent crash handling NRF_EVT_FLASH_OPERATION_SUCCESS on first run after flash memory reset

During startup, I am encountering a new issue where the chip crashed out during boot. See screenshot for crash details:

"mesh_main_start()" is called from my main application code, and invoked after the Soft Device and BLE stack have been fully initialized. From there it calls "mesh_provisionee_start()" and it's all SDK calls after that. It looks like while generating uECC keys I'm running into a Soft Device event, and things are exploding as that event is handled.

What is unusual is that p_op->p_fs->evt_handler appears to be holding a value of 0x81febaf5 which falls outside the flash or ram ranges, so I think that may be the problem, but don't know where it's coming from or how to change it.

Some additional details:
- Only happens after a full memory reset ("nrfjprog -e") then reflashing the application
- If I reset/run multiple times, eventually it gets past this crash
- If it makes it past the crash, it provisions and functions correctly
- Once it makes it past the crash once, it never happens again

Thanks!
Jeremy

Parents
  • Hi Jeremy,

    When the peer manager is initialized, it will also initialize FDS. And if this is the first boot after a full erase and programming, the FDS pages will be fully erased, and FDS will write a page header to each of the FDS pages. So in this case, flash write operations during initialization is expected.

    The question is why this causes a problem in your application, and where, but that is difficult to say from the information here. Do test with optimizations turned off when you see that the evt_handler pointer is invalid? If no, it could be an issue with debugging and not an actual issue. Of if this is indeed the case, then the question is why. Perhaps another part of the application writes to an invalid pointer, or you have a buffer overflow or similar that corrupts some memory? These are just speculations, though.

Reply
  • Hi Jeremy,

    When the peer manager is initialized, it will also initialize FDS. And if this is the first boot after a full erase and programming, the FDS pages will be fully erased, and FDS will write a page header to each of the FDS pages. So in this case, flash write operations during initialization is expected.

    The question is why this causes a problem in your application, and where, but that is difficult to say from the information here. Do test with optimizations turned off when you see that the evt_handler pointer is invalid? If no, it could be an issue with debugging and not an actual issue. Of if this is indeed the case, then the question is why. Perhaps another part of the application writes to an invalid pointer, or you have a buffer overflow or similar that corrupts some memory? These are just speculations, though.

Children
  • Thanks you for the quick reply, that context helps a lot.

    Disabling optimization doesn't appear to impact the issue, but this being tied to page initialization makes sense. The "multiple" times I need to reset to get it to boot isn't random, it always works on exactly the 4th start. Since I'm choking on the SUCCESS event, I think that means FDS is working, it's just taking multiple runs to write headers to all the FDS pages.

    Even on the 4th start which successfully starts up, breaking execution and inspecting m_fs (the nrf_fstorage_t instance from fds.c that is referenced by "p_op->p_fs") 'evt_handler' is pointing to an invalid address, just no events fire to expose the issue since all FDS page headers have been written.

    The address is non-zero again this morning (and exploding), and different from yesterday, but the "bad" value is oddly consistent between erase/run cycles. Lots of meetings today, but I'll step through my application this afternoon/tomorrow cutting things out to see if I can't find what is causing the conflict that is triggering this problem and update the thread.

  • I worked backwards until my application was stripped down to almost an exact match an example project that was working and I was still having the error ... until I matched the preprocessor definitions.

    It appears that adding "INITIALIZE_USER_SECTIONS" to my definitions appears to have fixed my issue. With that change I was able to revert all the other changes and still maintained a valid fds evt_handler pointer.

    Does that sound like the correct fix? My issue appears to be a bit sticky, I've thrown hundreds of darts at the wall today, I don't fully understand what that define accomplishes, and want to make sure I didn't just get lucky.

    Thanks!

  • Hi Jeremy,

    Yes, good catch! When you use Segger Embedded Studio and section variables (which are used by quite a few SDK modules, including fstorage which in turn is used by FDS), you need to have INITIALIZE_USER_SECTIONS defined. That is why you see this is defined in all Segger Embedded Studio example projects in the SDK.

  • That makes perfect sense. I started with a Mesh example and have combined it with the ability to establish BLE GATT connections (necessitating the peer manager) from the nRF5 SDK and missed that define when porting things over.

    Thanks!

Related