This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Spurious fault reboot

Custom board, ncs v1.4.0, modem FW v1.2.3, SES v5.34a. The code is based on the Asset Tracker (v1), LTE with PSM, motion-activated GPS, using MQTT to send messages to private backend.

I'm having some random reboot issues.  I set a breakpoint on sys_reboot() in reboot.c to see how we got there.  I hit Go and when it stops, the Debugger has stopped on a spurious fault, which would lead to a reboot.  There's nothing helpful in the call stack.  It’s quite intermittent.  So far, it has happened right after an MQTT Subscribe and then I see it during a GPS restart.  Any ideas on how to debug this?

  • Hi,

     

    There's nothing helpful in the call stack.

     If you step a bit further through the fault handler, it should eventually give you more information.

    In addition, setting CONFIG_RESET_ON_FATAL_ERROR=n and CONFIG_LOG=y should make the fault handler print out an error message.

    CONFIG_LOG_IMMEDIATE=y can be used to disable deferred logging, so that all log messages gets printed where they are in the code. That way, you will not miss any yet-to-be-printed log messages.

    Remember that you have to use 'Project -> Run CMake...' or re-open the project for changes to prj.conf or other configuration files to take effect when using SES.

    Best regards,

    Didrik

  • With these set:

    CONFIG_LOG=n
    CONFIG_LOG_IMMEDIATE=y
    CONFIG_RESET_ON_FATAL_ERROR=n

    I still see reboots, but there are no fault handler messages.  Do you have an example of what I should expect to see?

  • spline_pete said:
    CONFIG_LOG=n

     Sorry, that should be 'y', not 'n'.

     I've edited my original post so that it is now correct.

    spline_pete said:
    Do you have an example of what I should expect to see?

     Here is an example where I am writing to a NULL pointer:

    SPM: NS image at 0x8000
    SPM: NS MSP at 0x2001bca0
    SPM: NS reset vector at 0xbb95
    SPM: prepare to jump to Non-Secure image.
    *** Booting Zephyr OS build v2.4.99-ncs1  ***
    Starting GPS application
    [00:00:00.207,763] [1B][1;31m<err> os: Exception occurred in Secure State[1B][0m
    [00:00:00.214,324] [1B][1;31m<err> os: ***** HARD FAULT *****[1B][0m
    [00:00:00.219,848] [1B][1;31m<err> os:   Fault escalation (see below)[1B][0m
    [00:00:00.226,074] [1B][1;31m<err> os: ***** BUS FAULT *****[1B][0m
    [00:00:00.231,536] [1B][1;31m<err> os:   Precise data bus error[1B][0m
    [00:00:00.237,243] [1B][1;31m<err> os:   BFAR Address: 0x50008158[1B][0m
    [00:00:00.243,133] [1B][1;31m<err> os: r0/a1:  0x00000019  r1/a2:  0x00000000  r2/a3:  0x00000000[1B][0m
    [00:00:00.251,770] [1B][1;31m<err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x00009aed[1B][0m
    [00:00:00.260,406] [1B][1;31m<err> os:  xpsr:  0x41000000[1B][0m
    [00:00:00.265,594] [1B][1;31m<err> os: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x00000000[1B][0m
    [00:00:00.275,970] [1B][1;31m<err> os: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0xffffffff  s[ 7]:  0xffffffff[1B][0m
    [00:00:00.286,315] [1B][1;31m<err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000001  s[10]:  0x00000000  s[11]:  0xffffffff[1B][0m
    [00:00:00.296,691] [1B][1;31m<err> os: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00000000[1B][0m
    [00:00:00.307,067] [1B][1;31m<err> os: fpscr:  0xb873db2f[1B][0m
    [00:00:00.312,255] [1B][1;31m<err> os: Faulting instruction address (r15/pc): 0x00009aee[1B][0m
    [00:00:00.320,129] [1B][1;31m<err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0[1B][0m
    [00:00:00.327,911] [1B][1;31m<err> os: Current thread: 0x20018a98 (unknown)[1B][0m
    [00:00:00.334,655] [1B][1;31m<err> os: Halting system[1B][0m

  • The update of ncs from v1.4.0 to v1.5.0 seems to have fixed the issue of spurious faults.  I ran a couple units overnight and there weren't any MQTT disconnects or reboots, which was the case just recently. 

Related