This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

malloc causing fault in SoftDevice

I'm using s140 on an nRF52840 and running into a hard fault when trying to malloc 20 bytes. (Yes, I know about memory fragmentation; it's not a concern here.)

app_error_fault_handler give me a counter of 0x1505C and a fault ID of 1 (and info of 0).

The sdh has been enabled, but bluetooth is not enabled.

Given that I can find no info on what the Softdevice is attempting to assert, I'm at a bit of a loss.

Parents
  • Hi,

    Which SDK and softdevice version are you using? Are you basing this on an SDK example?

    Can you post a code snippet of how you use malloc? 

    Best regards,
    Jørgen

  • I'm using the SDK v16 with the s140. I'm very loosely basing this on the flash_fds example.

    My code is building an in-memory index of what's being stored via FDS. The actual call is:

    pattern_index_t volatile *node = malloc(sizeof(pattern_index_t));

    where

    typedef volatile struct pattern_index_t {
    struct pattern_index_t volatile *p, *n;
    uint16_t record_key;
    uint32_t record_id;
    pattern_t *p_pattern;
    } pattern_index_t;

    It's part of a function that builds an index node after `fds_record_write_reserved` returns successfully.

    If I break out `sizeof(pattern_index_t)` into its own variable, I still get the error.

    I should also add I am calling malloc (& calloc) a couple other places successfully.

  • Some unrelated bug-fixing has caused this problem to miraculously fix itself.

    Our best guess is the softdevice somehow got corrupted and various subsequent wipes & re-flashes solved the issue.

  • And I spoke too soon, because the issue is back, but only the 2nd time the function is called via that particular code path. The same function is called via a different code path (during init) repeatedly without issue.

    Also, the error consistently occurs on this codepath if I'm stepping through w/ a debugger.

  • I figured out the root cause of these hard faults, but I'm at a loss to explain why they were causing an assertion error in the softdevice.

    My application takes variable length commands via usb serial and parses (application-level) headers before reading more data based on the headers. There's a state variable that keeps track of what's been parsed and I had missed a state transition. This lead to payload data shifted further into the input buffer by the length of a header. This wouldn't cause any overflow, because the input buffer is much larger than these particular messages.

    The part that perplexes me is that nothing was interacting with that corrupt data, aside from copying it to a temporary buffer, and then writing it to flash. In the struct for that data, the values are just numbers. If they were being used as a pointer, I could see how that could lead to errors by trying to access unavailable memory, but that wasn't the case here.

    The only other thing I can think of is data languishing in the CDC ACM internal 'in' buffer, but all of this code was ultimately being called from the same `cdc_acm_user_evt_handler`.

Reply
  • I figured out the root cause of these hard faults, but I'm at a loss to explain why they were causing an assertion error in the softdevice.

    My application takes variable length commands via usb serial and parses (application-level) headers before reading more data based on the headers. There's a state variable that keeps track of what's been parsed and I had missed a state transition. This lead to payload data shifted further into the input buffer by the length of a header. This wouldn't cause any overflow, because the input buffer is much larger than these particular messages.

    The part that perplexes me is that nothing was interacting with that corrupt data, aside from copying it to a temporary buffer, and then writing it to flash. In the struct for that data, the values are just numbers. If they were being used as a pointer, I could see how that could lead to errors by trying to access unavailable memory, but that wasn't the case here.

    The only other thing I can think of is data languishing in the CDC ACM internal 'in' buffer, but all of this code was ultimately being called from the same `cdc_acm_user_evt_handler`.

Children
No Data
Related