This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Debugging a Hardfualt

My app is generating hardfualts. Based on this post I looked at memory location SP + 0x14, which points to the address that generated the fault.

Memory at SP+0x14: 8F 31 02 00, which is address 0x0002318F. Which points to:

void on_tx_complete(const ble_evt_t * event)
{
	uint32_t err = 0;
	if(event == NULL) 
		err = 0xFF;
		APP_ERROR_CHECK(err);   
		
	tx_buffer_process();
}

The if(event == NULL) was my fatal attempt to catch the error. I had a breakpoint on err = 0xFF; that never triggered.

Using Keil's dissassembly window I see the following:

   407:         tx_buffer_process(); 
0x0002318A F001FFBD  BL.W     tx_buffer_process (0x00025108)
   408: } 
0x0002318E BD70      POP      {r4-r6,pc}
0x00023190 2E2E      DCW      0x2E2E
0x00023192 2E5C      DCW      0x2E5C
0x00023194 5C2E      DCW      0x5C2E
0x00023196 2E2E      DCW      0x2E2E
0x00023198 725C      DCW      0x725C
0x0002319A 6D65      DCW      0x6D65
0x0002319C 746F      DCW      0x746F
0x0002319E 2E65      DCW      0x2E65
0x000231A0 0063      DCW      0x0063
0x000231A2 0000      DCW      0x0000

I believe POP is loading the PC with a bad value. Is there some way I can check R4 - R7 for valid values before calling on_tx_complete()?

I can trace the call to on_tx_complete() back ble_remote_on_ble_evt(), and a BLE_EVT_TX_COMPLETE event. I don't know where to go from here. on_tx_complete() executes multiple times before the hardfualt, and the hardfualt seems to happen at random. What would have caused the hardfault? What else should I be looking at?

The call stack, when the fault occurred looked like: image description

My registers have:

image description

Thanks

  • First guess would be that your stack's too small and you're corrupting the end of it during the tx_buffer_process() code on occasion and so the PC value loaded is wrong and you hardfault. That stack pointer looks low enough to make you wonder.

    Things I'd try next. You probably get to this routine via the same path every time, breakpoint in it and see what the link register is on the way in, or the PC on the way out, is it different from 0x1D1F0 (which should be 0x1D1F1 on the stack) which I think is where you've returned to.

    At the point of the hardfault you've just popped 4 things off the stack, they should still be in memory just under where the SP is now, what are they? You should recognise the values popped into r4-r6 and the PC. Is the PC value even?

    Look at your memory map. Is any data, or the heap, just around the 0x20003FE0 area, that would point to a stack overflow.

  • RK - thanks for the response. I'm using nRF51422QFAC with 32K RAM with S120 V2.1
    The last 3 addresses in my map file are: m_dfus 0x20003140 Data 48 add_gatts.o(.bss) m_connection_table 0x20003410 Data 96 device_manager_central_cc.o(.bss) __libspace_start 0x200035d4 Data 96 libspace.o(.bss) __temporary_stack_top$libspace 0x20003634 Data 0 libspace.o(.bss) I'm not sure if that counts as "just around" 0x20003FE0

  • I was looking at this post and want to use it, in my application. When he calculates stack size it's stackStart - 0x800. Where did the 0x800 come from? Where is my stack size set? I FYI: I'm running a nRF51422 with 32K RAM. SD120 V2.1 and Target settings: IRAM1:0x20002800/0x5800

  • Doesn't look that close to me - but map files aren't easy to read, especially when all on one line and there could be other entries other places (where's the heap for a start, perhaps you don't use a heap). Try the other things on the list, see where you usually return to from that code, see what's left just off the stack when you hardfault to see if the address is different, or invalid. If you're sure it's the pop, and it looks quite plausible, something corrupted the stack.

    As for that post, he gets 0x800 from the fact his stack is 2048 bytes, it's right there in a comment on the line.

  • DK- Thanks for you help. When hardfult occurs, the PC is pointing to my the hardfault handler. The 16 bytes before the SP are: 00 00 00 00 F8 60 02 00 A4 600200 9C 60 02 00 In Keil Register window the SP = 0x200045F0 But the memory window, with SP for the address shows: 0x00003004 (04 30 00 00), which is the error code I get back from a call to sd_ble_gattc_write(), just before the fault occurs. It seems more too much of a coincidence that the memory window shows my error code. Is this a clue as to whats happening? You stated that the 4 values before the SP are R4-R7. So the first four bytes is the old R4? Are these the addresses that where in the SP, LR and PC?

Related