This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Wierd hardfault

I have a bit of an odd one, and would appreciate any input.  We have a fairly complex project, which does various things, but some of the time, I get a hardfault.  The issue appears to be a instruction access violation (MMFSR->IACCVIOL = 1).  Working backwards, the code that caused this seems to be the nrfx_coredep_delay_us function (in nrfx_coredep.h).

We're using the 17.0.2 SDK, and the code in question is this, and the PC points to delay_machine_code:

   __ALIGN(16)
    static const uint16_t delay_machine_code[] = {
        0x3800 + NRFX_COREDEP_DELAY_US_LOOP_CYCLES, // SUBS r0, #loop_cycles
        0xd8fd, // BHI .-2
        0x4770  // BX LR
    };

    typedef void (* delay_func_t)(uint32_t);
    const delay_func_t delay_cycles =
        // Set LSB to 1 to execute the code in the Thumb mode.
        (delay_func_t)((((uint32_t)delay_machine_code) | 1));
    uint32_t cycles = time_us * NRFX_DELAY_CPU_FREQ_MHZ;
    delay_cycles(cycles);

I wasn't entirely sure about the way the code is hand-loaded in, so I tried switching to using the DWT instead (since the 53833 supports that).  This produced the same hardfault behaviour, only now it's on

while ((DWT->CYCCNT - cyccnt_initial) < time_cycles)
    {}

I suspect there's something else going on here.  Our code does use nrf_delay_ms for certain things that need busy waits, which calls through to the above code, but it only hardfaults some of the time.  As above, we're using SDK 17.0.2, nrf52833 on a custom board.

I'm at a bit of a loss on this one, suggestions welcome?

  • Hi Vidar,

    Although our original design was incorrect with regard to the decoupling layout, this board has been updated to match configuration 6 WLCSP from the datasheet (and your link).  However, on careful examination, it looks like the cap fitted to DEC1 (C4 in the reference) is considerably too large - 2.2uF instead of 100nF.  I'll get our electronics chap to break out the fine soldering iron and change it, and let you know if that makes any difference.

    Also of note, we're running without any crystals - as we're not doing any BLE etc, the timing accuracy isn't critical, and fewer external components is hugely helpful in our application.  We're using the internal RCs for both HF and LF clocks.

    Putting the chip into constant latency mode makes the hardfault problem go away (or, at least, I haven't been able to reproduce it now).

    I can provide the code and schematic privately, although most of the code won't run without the custom hardware.  I will also need to excise some third-party code, as we're not authorised to share that, which will of course change the behaviour and timing somewhat.  Running on a DK has similar issues, although we can at least bolt on a lot of the necessary bits externally.  I'll have a look at porting the codebase over if changing the DEC1 cap doesn't improve things.

  • Hi Alison,

    I have actually seen reports of similiar behaviour before for boards that have been fitted with a too large capacitor on DEC1, so I think we may finally have found a root cause. I hope this is it.

    Putting the chip into constant latency mode makes the hardfault problem go away (or, at least, I haven't been able to reproduce it now).

    I think a possible explanation for this is that constant latency prevents the system from entering its lowest power state (min. idle current is ~500 uA with constant latency enabled).

  • Hi Vidar,

    I've done a number of different tests across several boards, and the DEC1 decoupling cap looks like our answer.  Dropping it to the 100nF it was supposed to be seems to have made all the hardfaults go away completely.

    I guess that the issue was more obvious at lower powers, possibly due to the cap not charging / discharging fast enough, and causing clock / rail skew, to the point where the core got really unhappy.  Hence always happening after a WFI, and being so sensitive to which peripherals were enabled or not. 

    Either way, excellent spot, and thank you for your help!  Beers on us next time you're in London...

  • Hi Alison,

    Thanks for the update! I'm glad to hear that it fixed the problem. Will definelty remember keep this in mind next time I encounter a hardfault like this. And I agree,  too long charging time of the cap seems to be the most likely explanation.

Related