Inconsistent crash handling NRF_EVT_FLASH_OPERATION_SUCCESS on first run after flash memory reset

During startup, I am encountering a new issue where the chip crashed out during boot. See screenshot for crash details:

"mesh_main_start()" is called from my main application code, and invoked after the Soft Device and BLE stack have been fully initialized. From there it calls "mesh_provisionee_start()" and it's all SDK calls after that. It looks like while generating uECC keys I'm running into a Soft Device event, and things are exploding as that event is handled.

What is unusual is that p_op->p_fs->evt_handler appears to be holding a value of 0x81febaf5 which falls outside the flash or ram ranges, so I think that may be the problem, but don't know where it's coming from or how to change it.

Some additional details:
- Only happens after a full memory reset ("nrfjprog -e") then reflashing the application
- If I reset/run multiple times, eventually it gets past this crash
- If it makes it past the crash, it provisions and functions correctly
- Once it makes it past the crash once, it never happens again

Thanks!
Jeremy

Parents
  • Hi Jeremy,

    When the peer manager is initialized, it will also initialize FDS. And if this is the first boot after a full erase and programming, the FDS pages will be fully erased, and FDS will write a page header to each of the FDS pages. So in this case, flash write operations during initialization is expected.

    The question is why this causes a problem in your application, and where, but that is difficult to say from the information here. Do test with optimizations turned off when you see that the evt_handler pointer is invalid? If no, it could be an issue with debugging and not an actual issue. Of if this is indeed the case, then the question is why. Perhaps another part of the application writes to an invalid pointer, or you have a buffer overflow or similar that corrupts some memory? These are just speculations, though.

  • Thanks you for the quick reply, that context helps a lot.

    Disabling optimization doesn't appear to impact the issue, but this being tied to page initialization makes sense. The "multiple" times I need to reset to get it to boot isn't random, it always works on exactly the 4th start. Since I'm choking on the SUCCESS event, I think that means FDS is working, it's just taking multiple runs to write headers to all the FDS pages.

    Even on the 4th start which successfully starts up, breaking execution and inspecting m_fs (the nrf_fstorage_t instance from fds.c that is referenced by "p_op->p_fs") 'evt_handler' is pointing to an invalid address, just no events fire to expose the issue since all FDS page headers have been written.

    The address is non-zero again this morning (and exploding), and different from yesterday, but the "bad" value is oddly consistent between erase/run cycles. Lots of meetings today, but I'll step through my application this afternoon/tomorrow cutting things out to see if I can't find what is causing the conflict that is triggering this problem and update the thread.

Reply
  • Thanks you for the quick reply, that context helps a lot.

    Disabling optimization doesn't appear to impact the issue, but this being tied to page initialization makes sense. The "multiple" times I need to reset to get it to boot isn't random, it always works on exactly the 4th start. Since I'm choking on the SUCCESS event, I think that means FDS is working, it's just taking multiple runs to write headers to all the FDS pages.

    Even on the 4th start which successfully starts up, breaking execution and inspecting m_fs (the nrf_fstorage_t instance from fds.c that is referenced by "p_op->p_fs") 'evt_handler' is pointing to an invalid address, just no events fire to expose the issue since all FDS page headers have been written.

    The address is non-zero again this morning (and exploding), and different from yesterday, but the "bad" value is oddly consistent between erase/run cycles. Lots of meetings today, but I'll step through my application this afternoon/tomorrow cutting things out to see if I can't find what is causing the conflict that is triggering this problem and update the thread.

Children
No Data
Related