This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

HARDFAULT handler call

Hello,

we've got a quite unpredictable issue with a hardfault call : actually this is in this function call : app_error_fault_handler (in app_error_weak.c file)

We use SDK 17.02 and bootloader.
this is a custom board.

The issue appears when : BLE is advertising and lora data is sent (but not every time, this is a necessary but not enough condition).
We use as well timer to mange uart transfer from another device which send the data to lora, timer for ble and a timer tick.
All of these timers are created with app_timer_create() function.

Actually, as the issue is not easy to reproduce I suspect that the issue is something linked with a timing issue of one or many of this timer (it is not a code that I have developped, so not so easy to give more details).

When the issue occured, then we are in __WEAK void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info), with the following :

- id = 1  so this mean NRF_FAULT_ID_SD_ASSERT : NRF_LOG_ERROR("SOFTDEVICE: ASSERTION FAILED");

- pc = 0x00012976 ==> seems in soft device part

- info : 0

hereunder the cpu registers, sp is 0x2000fd70

hereunder the stack :

the word 0xCAFEBABE seems to be a magic number to maybe help you to debug critical issue?

the callstack is not really help me :

last, the SCB arm registers contains this : where 0xE000ED2C address contains value 0x40000000 which means in "ARMRegisteredv7-M Architecture
Reference Manual" page 612 HardFault Status Register, HFSR, Purpose Shows the cause of any HardFault.

FORCED, bit[30] Indicates that a fault with configurable priority has been escalated to a HardFault exception,
because it could not be made active, because of priority or because it was disabled:
0 No priority escalation has occurred.
1 Processor has escalated a configurable-priority exception to HardFault.

Hope I give you enough details to help me in my research to find the root cause issue.

Many thanks

Regards

Mikael

  • Hi,

    What SoftDevice version and variant are you using ? Is it S112 v.7.2.0?

    If yes, then it seems to be the same type of assert that is discussed in e.g. this post, devzone.nordicsemi.com/.../assert-fault-using-softdevice

    Are you using the timeslot feature in the SoftDevice ?

    Do you have critical regions in your code which temporarily disable interrupts globally?

  • Hi,

    softdevice S112 v17.0.2

    no timeslot but lot of critical CRITICAL_SECTION_BEGIN/END in protocol lora stack. (mac, timer, radio,...)

  • maybe the most dangerous one are those located in timer.c file :

    /Application/Protocol/lora/system/timer.c:CRITICAL_SECTION_BEGIN();
    /Application/Protocol/lora/system/timer.c:CRITICAL_SECTION_END( );
    /Protocol/lora/system/timer.c:CRITICAL_SECTION_END( );
    /Protocol/lora/system/timer.c:CRITICAL_SECTION_BEGIN( );
    /Application/Protocol/lora/system/timer.c:CRITICAL_SECTION_END( );
    /Application/Protocol/lora/system/timer.c:CRITICAL_SECTION_END( );


    void TimerStart(TimerEvent_t *obj)
    {
        uint32_t elapsedTime = 0;

        CRITICAL_SECTION_BEGIN();

        if ((obj == NULL) || (TimerExists(obj) == true))
        {
            CRITICAL_SECTION_END( );
            return;
        }

        obj->Timestamp = obj->ReloadValue;
        obj->IsStarted = true;
        obj->IsNext2Expire = false;

        if( TimerListHead == NULL )
        {
            RtcSetTimerContext();
            
            // Inserts a timer at time now + obj->Timestamp
            TimerInsertNewHeadTimer( obj );
        }
        else
        {
            elapsedTime = RtcGetTimerElapsedTime();
            obj->Timestamp += elapsedTime;

            if( obj->Timestamp < TimerListHead->Timestamp )
            {
                TimerInsertNewHeadTimer(obj);
            }
            else
            {
                TimerInsertTimer(obj);
            }
        }
        CRITICAL_SECTION_END( );

        
    }


    void TimerStop( TimerEvent_t *obj )
    {
        CRITICAL_SECTION_BEGIN( );

        TimerEvent_t* prev = TimerListHead;
        TimerEvent_t* cur = TimerListHead;

        // List is empty or the obj to stop does not exist
        if( ( TimerListHead == NULL ) || ( obj == NULL ) )
        {
            CRITICAL_SECTION_END( );
            return;
        }

        obj->IsStarted = false;

        if( TimerListHead == obj ) // Stop the Head
        {
            if( TimerListHead->IsNext2Expire == true ) // The head is already running
            {
                TimerListHead->IsNext2Expire = false;
                if( TimerListHead->Next != NULL )
                {
                    TimerListHead = TimerListHead->Next;
                    TimerSetTimeout( TimerListHead );
                }
                else
                {
                    RtcStopAlarm( );
                    TimerListHead = NULL;
                }
            }
            else // Stop the head before it is started
            {
                if( TimerListHead->Next != NULL )
                {
                    TimerListHead = TimerListHead->Next;
                }
                else
                {
                    TimerListHead = NULL;
                }
            }
        }
        else // Stop an object within the list
        {
            while( cur != NULL )
            {
                if( cur == obj )
                {
                    if( cur->Next != NULL )
                    {
                        cur = cur->Next;
                        prev->Next = cur;
                    }
                    else
                    {
                        cur = NULL;
                        prev->Next = cur;
                    }
                    break;
                }
                else
                {
                    prev = cur;
                    cur = cur->Next;
                }
            }
        }
        CRITICAL_SECTION_END( );
    }

  • one more things,

    I've done another test :

    seems like we were in a critical section and then after this the hardfault happens :

    C630 is the new pointer issue

    0x27F6d is critical section ends

    What do you think about this? could it be the root cause ? Waiting for sx126 wakeup in critical section, quite long, then a soft device interrupt occurred too much time and then escalates an hardfault?

  • Hi,

    What do you think about this? could it be the root cause ? Waiting for sx126 wakeup in critical section, quite long, then a soft device interrupt occurred too much time and then escalates an hardfault?

    Yes, that seems likely. You mentioned that you are advertising when this happens. You could e.g. stop the BLE advertising before you call SX126xWakeup(), then start the advertising again when SX126xWakeup() is done. Another option could be to make this SX126xWakeup()/SX126xWaitOnBusy() interrupt driven instead, i.e. trigger a GPIO interrupt when the chip is ready.

Related