This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

app_error_fault_handler() called with PC at 0x128A0

Given this error handler function...

Fullscreen
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and this logging from it with the Segger attached...

Fullscreen
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

how do I debug this?

I presume the info passed in is bogus and just points to some random part of memory.

75936 is 0x128A0 and that's in the soft device.

My code is not doing anything in particular when it hits this. It tends to happen out of the blue. The nRF52832 is not in a BLE connection at the time but it is advertising. It's listening to a GPS module over SPI, running a bunch of timers, and that's about it.

Also, NRF_LOG_FINAL_FLUSH() doesn't appear to work, because I frequently see this while the Segger is attached, testing on the desk, but I never see it logged to my in-memory log in field testing.

[Edit]

Here are three log "files" to demonstrate my logging problem.

This one is a Segger RTT log taken from desk testing with the debugger attached. I get the error logging for the assert that stops execution just fine.

Fullscreen
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This one is taken from my in-RAM log generated during field testing, but read back once I'm back at the desk with a tiny pynrfjprog Python script that just converts a RAM region to ASCII. I'm not worried about the PANIC at the end - that's just the panic function in my log backend being called as the Nordic goes into sys off. This is an example of a "good" in-RAM log for me - I can see everything up to the last sys off or assert.

Fullscreen
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

And this one is taken from my in-RAM log again, same as the second one, but this one is "bad" because there's no sys off at the end and no assert. I have some normal debug logging from my GPS driver and then a lot of memory that looks like it never got zero'ed out when the in-RAM log was initialised. I can't tell what the software died from here. I suspect an error, because the device isn't doing anything, but I can't tell what's happened.

Note that my in-RAM log is a circular buffer so it'll wrap around to the beginning, which is why this one doesn't start with the same init stuff.

Fullscreen
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[Edit]

Here's my error handling code.

Fullscreen
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The build I'm testing with is a BUILD_FIELD_TEST.

  • Hi,

    That is pretty strange. If you give me the version number of your Softdevice I can take a look under the hood and see what goes on at that specific address. 

    How frequently does it occur?

    And what SDK are you using?

  • Thanks.

    Fullscreen
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I've seen it about four times in the last month, but I can't reliably repro it. Happy to send the hex file if it helps.

  • Hi,

    Thanks. I found the source of the assert. The Softdevice includes what we call a Radio Event Manager (REM), and an assert at 0x128A0 is the REM warning you that a radio event has overstayed its planned duration. This is more likely to happen in complex applications where you have multiple concurrent BLE links, have multiple radio protocols running concurrently, or are using the Timeslot API to allocate time slots to time sensitive peripheral operations. You say that you are not in a connection and just advertising, but do you use the timeslot API? Any other time sensitive things going on at the time?

    Another possibility is that your clock is not accurate enough. If you are not doing anything timecritical as discussed above this possibility might be more likely. It could e.g. be that your crystal is incorrectly loaded. Are you using a custom device or a development kit?

     

  • In that case I may be wrong about not being in a connection. I'm only ever in one connection at a time but I do a lot of logging stuff over a characteristic a bit like the Nordic UART service's RX characteristic (notify only). I don't know if that would have the radio busy enough to hit this, but I guess it's possible. I don't use the timeslot API.

    I'm using a module from Wisol so I haven't chosen the crystal myself. I could ask them what they used.

    It would help if I could get reliable logging in my app_error_fault_handler() function. At the moment I really only know this is happening on the desk because the debugger breaks in the error handler. In the field, I'd expect to see the same error logging in my in-memory log as I do in the gdb client, but instead the log just stops. My in-memory log is in retained RAM, so it survives a reset.

  • Do you have more than one module that you can try? Maybe this one particular module is an outlier with a broken crystal or bad solder joint or something causing the clock to run inaccurately. If the issue occurs on all modules on the other hand, I would be more inclined towards a firmware issue. 

    What is the serial number of the module?

1 2 3 4