This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrf9160 __bus_fault() disabling watchdog

Hi,

We have a production device using the NRF9160 and after several days end up triggering a __bus_fault() that arises likely from memory fragmentation in the heap within block_break() within mempool.c (after who knows how many memory allocations and corresponding frees). 

Not being too concerned about the heap fragmentation issue itself, but our Watchdog isn't triggering when it is in __bus_fault's idlecpu loop so our firmware has no ability to reset itself. We have a z_SysFatalErrorHandler handler, but this doesn't seem to be called for __bus_fault() cases.

Is there anyway at the application level to trap __bus_fault() so we can do a sys_reboot(), or prevent our watchdog from being disabled?

Thanks!

Parents
  • Hi,

      

    Not being too concerned about the heap fragmentation issue itself, but our Watchdog isn't triggering when it is in __bus_fault's idlecpu loop so our firmware has no ability to reset itself. We have a z_SysFatalErrorHandler handler, but this doesn't seem to be called for __bus_fault() cases.

    Watchdog not triggering is quite concerning. Does the fw crash in such a way that it kept on being fed?

    You mention idle. How is the WDT configured, specifically these bits in the config register?

    https://infocenter.nordicsemi.com/topic/ps_nrf9160/wdt.html?cp=2_0_0_5_19_3#register.CONFIG

    If its configured to pause in sleep, this might be the reason why it doesn't reset your system.

    usage fault (same should apply for bus fault) should be caught by the k_sys_fatal_error_handler():

    #include <fatal.h>
    
    void k_sys_fatal_error_handler(unsigned int reason,
    			       const z_arch_esf_t *esf)
    {
    	ARG_UNUSED(esf);
    	
    	/* Your implementation here */
    	
    	CODE_UNREACHABLE;
    }

    Here's the gdb output to show where it came from:

     

    (gdb) bt
    #0  0x0000c566 in k_sys_fatal_error_handler (reason=<optimized out>, esf=<optimized out>) at ../src/main.c:26
    #1  0x00017dbc in z_fatal_error (reason=reason@entry=0, esf=esf@entry=0x20027878 <_interrupt_stack+1920>)
        at /opt/ncs/zephyr/kernel/fatal.c:118
    #2  0x0001b6ee in z_arm_fatal_error (reason=reason@entry=0, esf=esf@entry=0x20027878 <_interrupt_stack+1920>)
        at /opt/ncs/zephyr/arch/arm/core/aarch32/fatal.c:47
    #3  0x0000d3e8 in z_arm_fault (msp=<optimized out>, psp=<optimized out>, exc_return=<optimized out>)
        at /opt/ncs/zephyr/arch/arm/core/aarch32/cortex_m/fault.c:968
    #4  0x0000d188 in z_arm_usage_fault () at /opt/ncs/zephyr/arch/arm/core/aarch32/fault_s.S:108
    #5  0xffffffbc in ?? ()
    
    

    In your case, I assume that "sys_arch_reboot(0);" is the one you want to call.

     

    Kind regards,

    Håkon

Reply
  • Hi,

      

    Not being too concerned about the heap fragmentation issue itself, but our Watchdog isn't triggering when it is in __bus_fault's idlecpu loop so our firmware has no ability to reset itself. We have a z_SysFatalErrorHandler handler, but this doesn't seem to be called for __bus_fault() cases.

    Watchdog not triggering is quite concerning. Does the fw crash in such a way that it kept on being fed?

    You mention idle. How is the WDT configured, specifically these bits in the config register?

    https://infocenter.nordicsemi.com/topic/ps_nrf9160/wdt.html?cp=2_0_0_5_19_3#register.CONFIG

    If its configured to pause in sleep, this might be the reason why it doesn't reset your system.

    usage fault (same should apply for bus fault) should be caught by the k_sys_fatal_error_handler():

    #include <fatal.h>
    
    void k_sys_fatal_error_handler(unsigned int reason,
    			       const z_arch_esf_t *esf)
    {
    	ARG_UNUSED(esf);
    	
    	/* Your implementation here */
    	
    	CODE_UNREACHABLE;
    }

    Here's the gdb output to show where it came from:

     

    (gdb) bt
    #0  0x0000c566 in k_sys_fatal_error_handler (reason=<optimized out>, esf=<optimized out>) at ../src/main.c:26
    #1  0x00017dbc in z_fatal_error (reason=reason@entry=0, esf=esf@entry=0x20027878 <_interrupt_stack+1920>)
        at /opt/ncs/zephyr/kernel/fatal.c:118
    #2  0x0001b6ee in z_arm_fatal_error (reason=reason@entry=0, esf=esf@entry=0x20027878 <_interrupt_stack+1920>)
        at /opt/ncs/zephyr/arch/arm/core/aarch32/fatal.c:47
    #3  0x0000d3e8 in z_arm_fault (msp=<optimized out>, psp=<optimized out>, exc_return=<optimized out>)
        at /opt/ncs/zephyr/arch/arm/core/aarch32/cortex_m/fault.c:968
    #4  0x0000d188 in z_arm_usage_fault () at /opt/ncs/zephyr/arch/arm/core/aarch32/fault_s.S:108
    #5  0xffffffbc in ?? ()
    
    

    In your case, I assume that "sys_arch_reboot(0);" is the one you want to call.

     

    Kind regards,

    Håkon

Children
No Data
Related