This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

General Error Handling & avoiding Fatal Error

Hello,

(I am using SDK 15.1 on a linux system)

I will breakdown the question in two parts -

  1. when do we get fatal ? - i know one or two cases for example when softdevice doesn't have enough memory to execute the given operation but are there certain known scenarios/cases when we get it?
  2. second question is about generic error handling - i would like to know what people generally follow to avoid fatal conditions again i might know a few (probably not so good ways).
    • for example - when your call required a system packet to be sent out but queue is full so you loop the call till it returns NRF_SUCCESS etc.
    • or when our debug flag is not set and the APP_ERROR_CHECK calls its handler and it eventually calls to reset the the system.

These questions are mainly from the point of view of putting devices into production.

Thanks

EDIT :

3. referring this error module link

and the file attached below,

/**
 * Copyright (c) 2016 - 2018, Nordic Semiconductor ASA
 *
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without modification,
 * are permitted provided that the following conditions are met:
 *
 * 1. Redistributions of source code must retain the above copyright notice, this
 *    list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form, except as embedded into a Nordic
 *    Semiconductor ASA integrated circuit in a product or a software update for
 *    such product, must reproduce the above copyright notice, this list of
 *    conditions and the following disclaimer in the documentation and/or other
 *    materials provided with the distribution.
 *
 * 3. Neither the name of Nordic Semiconductor ASA nor the names of its
 *    contributors may be used to endorse or promote products derived from this
 *    software without specific prior written permission.
 *
 * 4. This software, with or without modification, must only be used with a
 *    Nordic Semiconductor ASA integrated circuit.
 *
 * 5. Any software provided in binary form under this license must not be reverse
 *    engineered, decompiled, modified and/or disassembled.
 *
 * THIS SOFTWARE IS PROVIDED BY NORDIC SEMICONDUCTOR ASA "AS IS" AND ANY EXPRESS
 * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL NORDIC SEMICONDUCTOR ASA OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
 * GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 */
#include "app_error.h"

#include "nrf_log.h"
#include "nrf_log_ctrl.h"
#include "app_util_platform.h"
#include "nrf_strerror.h"

#if defined(SOFTDEVICE_PRESENT) && SOFTDEVICE_PRESENT
#include "nrf_sdm.h"
#endif

/*lint -save -e14 */
/**
 * Function is implemented as weak so that it can be overwritten by custom application error handler
 * when needed.
 */
__WEAK void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info)
{
    __disable_irq();
    NRF_LOG_FINAL_FLUSH();

#ifndef DEBUG
    NRF_LOG_ERROR("Fatal error");
#else
    switch (id)
    {
#if defined(SOFTDEVICE_PRESENT) && SOFTDEVICE_PRESENT
        case NRF_FAULT_ID_SD_ASSERT:
            NRF_LOG_ERROR("SOFTDEVICE: ASSERTION FAILED");
            break;
        case NRF_FAULT_ID_APP_MEMACC:
            NRF_LOG_ERROR("SOFTDEVICE: INVALID MEMORY ACCESS");
            break;
#endif
        case NRF_FAULT_ID_SDK_ASSERT:
        {
            assert_info_t * p_info = (assert_info_t *)info;
            NRF_LOG_ERROR("ASSERTION FAILED at %s:%u",
                          p_info->p_file_name,
                          p_info->line_num);
            break;
        }
        case NRF_FAULT_ID_SDK_ERROR:
        {
            error_info_t * p_info = (error_info_t *)info;
            NRF_LOG_ERROR("ERROR %u [%s] at %s:%u\r\nPC at: 0x%08x",
                          p_info->err_code,
                          nrf_strerror_get(p_info->err_code),
                          p_info->p_file_name,
                          p_info->line_num,
                          pc);
             NRF_LOG_ERROR("End of error report");
            break;
        }
        default:
            NRF_LOG_ERROR("UNKNOWN FAULT at 0x%08X", pc);
            break;
    }
#endif

    NRF_BREAKPOINT_COND;
    // On assert, the system can only recover with a reset.

#ifndef DEBUG
    NRF_LOG_WARNING("System reset");
    NVIC_SystemReset();
#else
    app_error_save_and_stop(id, pc, info);
#endif // DEBUG
}
/*lint -restore */

  • I was not able to find where DEBUG is defined
  • Also whenever it prints "fatal error" according to the file it should also print "System Reset" and do a system reset - neither of which happens.

What am I missing or where am I going wrong ?

Parents
  • Hello,

    Check out this blog post:

    https://devzone.nordicsemi.com/b/blog/posts/an-introduction-to-error-handling-in-nrf5-projects

    So "fatal error" comes from an APP_ERROR_CHECK(err_code); with err_code != NRF_SUCCESS

    DEBUG should be defined in your preprocessor defines. If you are not sure where that is, google "preprocessor defines <your IDE>.

    If you have enabled NRF_LOG, it should also print in the log where the APP_ERROR_CHECK(err_code) that received an err_code != NRF_SUCCESS comes from, and what the return value was.

    Regarding your second question, that depends on what the error was. To make it "simple/stupid": Don't do anything wrong. That turns out to be difficult, so the APP_ERROR_CHECK() is a tool that helps you check that you do everything in the right order, and that the variables that you pass into your function calls are valid. 

    Typical errors could be:

    - Using a module that is not initialized

    - Trying to send too much data into a module (too long advertising data, too many packets into a queue, etc.).

    - Using a module that you can't use while the softdevice is enabled.

    and many more. 

    So try to define DEBUG and see whether you can find out what the error is. When you find out what function that returned the err_code != NRF_SUCCESS (=0), look at the header file that defines this function, or look it up on infocenter. That should give you some hints to why it doesn't work.

    Best regards,

    Edvin

Reply
  • Hello,

    Check out this blog post:

    https://devzone.nordicsemi.com/b/blog/posts/an-introduction-to-error-handling-in-nrf5-projects

    So "fatal error" comes from an APP_ERROR_CHECK(err_code); with err_code != NRF_SUCCESS

    DEBUG should be defined in your preprocessor defines. If you are not sure where that is, google "preprocessor defines <your IDE>.

    If you have enabled NRF_LOG, it should also print in the log where the APP_ERROR_CHECK(err_code) that received an err_code != NRF_SUCCESS comes from, and what the return value was.

    Regarding your second question, that depends on what the error was. To make it "simple/stupid": Don't do anything wrong. That turns out to be difficult, so the APP_ERROR_CHECK() is a tool that helps you check that you do everything in the right order, and that the variables that you pass into your function calls are valid. 

    Typical errors could be:

    - Using a module that is not initialized

    - Trying to send too much data into a module (too long advertising data, too many packets into a queue, etc.).

    - Using a module that you can't use while the softdevice is enabled.

    and many more. 

    So try to define DEBUG and see whether you can find out what the error is. When you find out what function that returned the err_code != NRF_SUCCESS (=0), look at the header file that defines this function, or look it up on infocenter. That should give you some hints to why it doesn't work.

    Best regards,

    Edvin

Children
  • Thanks Edvin,

    One more thing, whenever I get "Fatal Error" (ie. when DEBUG isn't defined) I never get "System reset" and the system doesn't actually reset, why is that ? according to the snippet below i should right?

    #ifndef DEBUG
    NRF_LOG_ERROR("Fatal error");
    #else
    switch (id)
    {
    .
    .
    .
    .
    }
    
    #ifndef DEBUG
        NRF_LOG_WARNING("System reset");
        NVIC_SystemReset();
    #else
        app_error_save_and_stop(id, pc, info);
    #endif // DEBUG

    Regarding the other points, I also really wanted to know how to handle when your device (custme boards in this case) will be sealed shut and in production, but the only reliable way seems to be just to go for a system reset rather than changing states of the application for example if we get stuck in connected state. disconnect and do the whole process again or loop the operation giving the error until it succeeds or till a fixed number of times

  • Hello,

    This is because of the line before #ifndef DEBUG:

    NRF_BREAKPOINT_COND;

    Which will generate a breakpoint if a debugger is connected. If you use the DK, it typically will be, because you power the DK via the debugger.

    If you try to comment out NRF_BREAKPOINT_COND, you will see that it will reset.

    Note that it shouldn't do this if you are not debugging, but I believe these registers (that set the breakpoint when you are debugging) are only set on power on startup. Try to turn the board off and on while an application that has not defined DEBUG in the preprocessor defines is programmed. Then it should restart at this point.

    You have to decide whether or not to use APP_ERROR_CHECK() in your final product. You may, but note that many of the examples reset on conditions that doesn't need a reset. E.g. in the ble_app_uart example, which uses APP_ERROR_CHECK() if ble_nus_data_send() returns NRF_ERROR_RESOURCES, which means that the buffer is full. In many cases, you will choose to use ble_nus_data_send() until the buffer is full. 

    So you don't have to pass all return values into APP_ERROR_CHECK(). You have to choose based on your application. Of course, sometimes a reset is probably the way to go, e.g. if you are in a deadlock (although a watchdog timer may also be used for this).

    Best regards,

    Edvin

Related