This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Softdevice hardfault when calling setenv()

My application needs to keep time and support local time as well. I have implemented a service that allows setting the epoch seconds as well as a local timezone string. With these two items, I can update the epoch seconds and determine local using the Clib time.h functions (e.g., localtime(), ctime(), etc.). In order for these time functions to work correctly, I call

setenv("TZ", timezone_string, 1);

to set the local timezone parameters. This works fine when I call setenv() from a CLI command, however when I call it in response to my change timezone characteristic (in the BLE event handler, BLE_GATTS_EVT_WRITE event), I get a hardfault in the Softdevice, at address 0x13624. Here are some details of my environment:

  • nRF52832 with SDK 14.2
  • Softdevice S132 5.0.0
  • Eclipse IDE with gcc 7.2.1 (gcc-arm-non-eabi-7-2017-q4-major)
  • My app is uses FreeRTOS
  • FreeRTOS uses heap_4.c with ucHeap[] statically allocated in BSS and NOT as part of normal HEAP section
  • HEAP section set at 1kB - for setenv() and other Clib functions
  • No corruption observed in HEAP or in C environment area used by setenv() (i.e., after a call to sentenv(), heap contains correct string "TZ=MST7MDT" whether called from CLI or BLE interface)

I've attached a screen shot of the CPU registers at the time of the hardfault and disassembly of the location of the hardfault in the SD.

If someone could shed some light on this issue, that would be greatly appreciated!

 

Parents
  • I think the problem is the context from which you are calling setenv. If this in turn calls any other SVC call which is lower in priority than the context you are calling in. Then this would trigger a hardfault as per ARM Cortex M rules. It hard to say when there is not much code reference to look into.

  • Well I don't think there is any other SVC call involved, but I don't know what setenv() does other than malloc'ing, which I guess might disable interrupts or some other locking mechanism like that.

    The context I am executing setenv() in is as I said above, the BLE event handler registered with NRF_SDH_BLE_OBSERVER() with an observer priority of 2. The code that executes on the characteristic write event is basically as follows:

    // A snippet of BLE event handler in my timekeeping service
    case BLE_GATTS_EVT_WRITE:
        ble_gatts_evt_write_t const * p_evt_write = &p_ble_evt->evt.gatts_evt.params.write;
        if (p_evt_write->handle == m_dcs.timezone.value_handle) {
            if (p_evt_write->len <= 12) {
                set_local_timezone(p_evt_write->data, p_evt_write->len);
            } else {
                NRF_LOG_INFO("write to TIMEZONE too long: %d", p_evt_write->len);
            }
        }
        break;
        
    // The set_local_timezone() function contained in a separate file is basically:
    static char m_timezone_str[13];
    
    void set_local_timezone(char const * tz, size_t len) {
        memset(m_timezone_str, 0, sizeof(m_timezone_str));
        len = MIN(len, sizeof(m_timezone_str));
        strncpy(m_timezone_str, tz, len);
        setenv("TZ", m_timezone_str, 1);
        NRF_LOG_DEBUG("TZ set: %s", getenv("TZ"));
    }
    

    So I can't see where another SVC call is at play here, unless I am missing something.

    I have tried a few other ways around this problem to no avail. In the first attempt, instead of calling set_local_timezone() directly in the BLE event handler, I sent a message containing the written timezone char array to a task that is largely sitting idle. This task is blocked on its message queue and when it receives the timezone update message, it executes set_local_timezone() it its task context. I was pretty confident that this solution would work, but it hardfaults as well. (I didn't capture the CPU registers and fault address in this case, but I could and post it if it is useful to solving the problem).

    I then tried setting a flag and saving the written timezone array of chars and upon disconnect (again handled by the BLE event handler on BLE_GAP_EVT_DISCONNECTED), if the flag was set I would call set_local_timezone() directly in the event handler, similar to the code above. This likewise causes a hardfault. Also tried this scheme of setting the timezone upon disconnection, but this time by sending a message to the other task, extremely similar to what I did above. This too ended in a hardfault.

    In all of these cases, the call to setenv() seems to complete and some time later the hardfault occurs (RTT logging shows the setenv() completing). If I simply comment out the actual call to setenv() then I don't ever see a hardfault. I could post more code, but I think it will rapidly get unwieldy. The fact that I can execute set_local_timezone() from my console command line interface and never have it fail has me baffled.

  • sorry for the late answer.

    This is strange that hardfault occurs even if there is no SVC calls (which is the reason most of the times here). Maybe there is some memory access violation, hard to say.

    Can you help me reproduce this issue? 

  • Well as it turns out, the product definition has just changed so I no longer need to keep track of local timezone.

    However just as some follow-up, while trying to figure this problem out (still not solved) it appears that I am getting memory corruption and that is the source of the hardfault. I haven't been able to isolate what exactly is going on, however it seems to be caused by setenv(), as I can comment out the call to setenv() and my app runs fine for a long time. If I keep the call to setenv() I consistently get a hardfault, however after much more code beyond setenv() has run (once the corrupted memory somehow comes into play).

    I was only able to determine that memory corruption is occurring due to strange output shown in the OpenRTOS Viewer Task Table window. This corruption is still a curiosity, however since timezones are no longer needed, I guess it will remain a curiosity. Thanks for your interest/help.

  • is it possible that you just ran out of configured max stack space? Sometimes it happens with some RTOS (which asks you to preconfigure the max stack space) that just one call is enough to cross the boundary. I would say that there is a good chance that this memory corruption problem will be fixed if you increased the task stack space. Atleast worth a try if you are still curious.

  • Aryan, thanks for the help thus far. I have been curious if it is a stack overflow memory corruption, as that seems to be a typical memory corruption problem with RTOS use (since it is not obvious what the appropriate stack size should be for each task). I will probably try bumping the stack size at some point to determine if that was the problem.

  • Sounds good Mark, please let me know about the results.

Reply Children
No Data
Related