Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SoftDevice hard fault while porting to mesh

Using S132 v7.2, I'm getting a hard fault at PC 0x00015810 (I see the flag set in SEGGER, and the console outputs a SOFTDEVICE ASSERT FAILED error) that I'm working on manually tracing. I'm borrowing some code from another project of ours to get this one set up, and so the problem is likely something to do with initial mesh setup and initialization that is then causing the later hard fault. I'm hoping I can get the actual assert error information to better pinpoint the problem, because it's been a bit of a wild goose chase up to now.

Parents
  • Hi,

    This assert is most likely the mesh stack failing to give the radio back to the SoftDevice in time (i.e. overstaying the timeslot.)

    First thing to check would be interrupt priorities: Have you configured the mesh stack correctly? Are you calling any of the mesh stack API from the wrong context? Also, are you by any chance using timeslots yourself, or disabling interrupts anywhere?

    Regards,
    Terje

  • Tusen takk Terje. I'll look into those and make sure they're all set up correctly.

  • Terje, here's a screenshot of the details I pulled from Ozone. I enabled every fault handler that I could that still reproduced the original error (the UNALIGN_TRP is caught several times if I enable it, but the primary HardFault isn't generated), and I find myself a bit confused, because as best as I can tell, I'm getting an unspecified HardFault that's not a Usage, Memory, or BusFault. I'm digging through the refs you just sent - to clarify, this is a project currently using the nRF5 SDK that's being ported to exclusively use mesh. It also includes FreeRTOS, which has definitely complicated things.

    Screenshot of hardFault exception details from Ozone

  • Hi Terje,

    I work with and I am starting to look into this issue with him.

    A few notes so far on your previous questions/suggestions:

    1. We have upgraded multiple other projects successfully to mesh, and none of them have had this problem.

    2. Yes, we are using FreeRTOS in our projects, and yes that does complicate things quite a bit. We have some of the MESH SDK and nRF5 SDK files patched to support it, and we have extensive library code that simplifies the setup and running of the core code that we share between our projects. We have mesh event polling and SD event processing running sequentially on the same thread, and modifications to app_timer and other files to make things play nice with FreeRTOS. We also have mutexes and other guards preventing certain asynchronus accesses to the mesh stack to prevent specific crashes we have seen arise due to the multi-threading. At this point we have mesh operating very well with FreeRTOS on our other projects, so it is unlikeley that it is the direct cause of the current issue on this project.

    3. As far as interrupt priorities, we should not be using anything different that what is working in the other projects. As far as disabling interrupts, that is a good question. I don't think so, but we can look into that more.

    I have been doing my own testing on this issue, and so far I haven't been able to track down the exact cause of the fault either. I do know that if I don't start mesh (mesh_stack_start()) it doesn't crash. Also the fault is not happening immediately, it happens around 12 seconds after startup, so it isn't happening during the mesh start or other initialization. This timing is very consistent, and doesn't seem to coincide with anything specific in our code (that I know of anyway).

    Is there a way to get more information about the nature of the ASSERT in the SoftDevice?

    Is there any particular event that happens at about 12 seconds (a timeout or something) that could give us a clue? Either in the SoftDevice, or the mesh stack?

    Thanks,
    Kevin

  • A SoftDevice assertion isn't necessarily a HardFault (it almost certainly isn't in fact, because HardFaults come through a separate handler I believe). It just means that the SD error checking has caught a state it doesn't think should happen, so it propagates up to the application handler (that was registered on initialization)

  • , we were able to resolve this issue by altering the setting of NRF_SDH_CLOCK_LF_ACCURACY from 7 down to 4 to loosen up timing requirements.

  • Hi,

    Thank you for the update. Yes, that could actually be the issue. If the clock accuracy settings are better than what you actually have, that could very well lead to issues. As noted by @kevin-rees this is not a hardfault, but rather the SoftDevice asserting. (In this instance the SoftDevice detected that timing requirements were not met, jeopardizing Bluetooth specification compliance.)

    You might want to look into that accuracy. Depending on LF clock source, check that you are using the correct:

    Regards,
    Terje

Reply Children
No Data
Related