This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF52840 app - random hard fault

Roei over 5 years ago

Hi,

My application has hard faults sometimes, and I'm trying to figure out the reason.

When it occurs - I have a hard fault log, but p_stack->pc is 0. I can't understand what is the problematic code line.

Do I have another ways to catch the issue?

Thanks!

Parents

0 Susheel Nuguru over 5 years ago

if the PC value in the stack frame for the hardfault reads 0, then it is very likely that you had a memory corruption that contributed to the hardfault. You need to debug such scenarios by stepping through the code until you understand the context of the hardfault (the hard debugging way)
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Roei over 5 years ago in reply to Susheel Nuguru

The problem that it occurs randomly, and it doesn't happen due to specific code lines (already tried the hard way). Can it be related to memory space? Maybe increasing memory can solve the issue?

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Roei over 5 years ago in reply to Roei

In addition - I have a log of the following lines:

HARD FAULT at 0x00000000
R0: 0x00000010 R1: 0x00000000 R2: 0x20007908 R3: 0x00000000
R12: 0x20011F34 LR: 0x0003262F PSR: 0x00000000

Cause: The processor has attempted to execute an instruction that makes illegal use of the EPSR.

Can it help me?

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Susheel Nuguru over 5 years ago in reply to Roei

The hardfault stack frame seems incorrect, that means that wrong stack frame might have been popped into the registers. Which means that some of the stack buffers might have overflowed causing the the next stack frame to get corrupted.

Trying disabling logs to see if logs buffers are overflowing (disabling logs should fix the issue if logs buffers are too small for your application). If that does not help, then try to increase the buffer size of modules (in sdk_config.h) one by one to figure out which module's buffer is overflowing. Unfortunately, this could be a tedious debugging session.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Roei over 5 years ago in reply to Susheel Nuguru

Hi,

I can't understand which buffers can be changes in sdk_config.h.

Can you please explain?

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Roei over 5 years ago in reply to Roei

In addition - the hard fault occurs in a specific function.

When testing performance of this function - I've noticed that CPU usage of one of the services if very high (about 800%).

Can it be related to the hard fault?

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Susheel Nuguru over 5 years ago in reply to Roei

Roei said:
In addition - the hard fault occurs in a specific function

Is this a function from a Nordic solution? Which function is it?

Roei said:
When testing performance of this function - I've noticed that CPU usage of one of the services if very high (about 800%).

the percentage of CPU usage should not be the reason for the hardfault. It might be the memory usage by that function. But if you know the function that causes this hardfault, then you can step in and debug to see which line exactly causes this. If this function is from the the Nordic solution (SDK) then i might be able to help you debug it if you help me reproduce it at my desk.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Susheel Nuguru over 5 years ago in reply to Roei

Roei said:
In addition - the hard fault occurs in a specific function

Is this a function from a Nordic solution? Which function is it?

Roei said:
When testing performance of this function - I've noticed that CPU usage of one of the services if very high (about 800%).

the percentage of CPU usage should not be the reason for the hardfault. It might be the memory usage by that function. But if you know the function that causes this hardfault, then you can step in and debug to see which line exactly causes this. If this function is from the the Nordic solution (SDK) then i might be able to help you debug it if you help me reproduce it at my desk.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Roei over 5 years ago in reply to Susheel Nuguru

It's not a function from the SDK of Nordic solution - it's a new function which make a lot of job - writing and erasing a lot of pages from an external flash (using SPI protocol).

Already tried to debug and didn't find any specific line. It is also occurs not very often.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Susheel Nuguru over 5 years ago in reply to Roei

Roei said:
It's not a function from the SDK of Nordic solution - it's a new function which make a lot of job - writing and erasing a lot of pages from an external flash (using SPI protocol).

In that case, this memory corruption is hard to track from our side as these seems to be introduced with your changes. And with the minimal information given and no way to reproduce on our end, it is very hard for me to narrow down the problem for you.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Roei over 5 years ago in reply to Susheel Nuguru

Hi,

Can you please tell me where can I increase beffurs size in sdk_config.h?

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 hmolesworth over 5 years ago in reply to Roei

Maybe the issue is with an interrupt allocating too much stack space by using (say) a print buffer inside the interrupt, so the interrupt generates the hard fault in "random" foreground code locations since the buffer is allocated on the stack in the interrupt before the interrupt code executes, which generates another hard fault interrupt. This probably only happens under worst-case stack usage scenarios, so rare. This is what Susheel is driving at, I think.

"Pointer to the stack bottom. This pointer might be NULL if the HardFault was called when the main stack was the active stack and a stack overrun is detected. In such a situation, the stack pointer is reinitialized to the default position, and the stack content is lost."

In such a case, examine all interrupt (event) handlers and remove any buffers and instead set a flag and process in (say) the main loop.

A second (classic) scenario is that a function allocates a local buffer on the stack using a variable parameter as the size of the stack; IAR would warn against that but SES does not. If the is an error in that parameter then the stack blows up. No good checking the parameters since the stack blows up before the parameters can be checked. Some of the TWI example code has this "feature" (or used to).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Roei over 5 years ago in reply to hmolesworth

Hi,

Thanks for your help!

The hard fault occurs in a function which has a lot of external flash write and erase actions (via SPI interface). Hence - I think it may be the reason.

Do you have any ideas?

Thanks!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel