General Error Handling & avoiding Fatal Error

Question

Hello, 
 (I am using SDK 15.1 on a linux system) 
 I will breakdown the question in two parts - 
 
 when do we get fatal ? - i know one or two cases for example when softdevice doesn't have enough memory to execute the given operation but are there certain known scenarios/cases when we get it? 
 second question is about generic error handling - i would like to know what people generally follow to avoid fatal conditions again i might know a few (probably not so good ways). 
 
 for example - when your call required a system packet to be sent out but queue is full so you loop the call till it returns NRF_SUCCESS etc. 
 or when our debug flag is not set and the APP_ERROR_CHECK calls its handler and it eventually calls to reset the the system.

These questions are mainly from the point of view of putting devices into production. 
 Thanks 
 EDIT : 
 3. referring this error module link 
 and the file attached below,

Fullscreen 
 app_error_weak.c 
 Download 
 
 /**
 * Copyright (c) 2016 - 2018, Nordic Semiconductor ASA
 *
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without modification,
 * are permitted provided that the following conditions are met:
 *
 * 1. Redistributions of source code must retain the above copyright notice, this
 * list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form, except as embedded into a Nordic
 * Semiconductor ASA integrated circuit in a product or a software update for
 * such product, must reproduce the above copyright notice, this list of
 * conditions and the following disclaimer in the documentation and/or other
 * materials provided with the distribution.
 *
 * 3. Neither the name of Nordic Semiconductor ASA nor the names of its
 * contributors may be used to endorse or promote products derived from this
 * software without specific prior written permission.
 *
 * 4. This software, with or without modification, must only be used with a
 * Nordic Semiconductor ASA integrated circuit.
 *
 * 5. Any software provided in binary form under this license must not be reverse
 * engineered, decompiled, modified and/or disassembled.
 *
 * THIS SOFTWARE IS PROVIDED BY NORDIC SEMICONDUCTOR ASA "AS IS" AND ANY EXPRESS
 * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL NORDIC SEMICONDUCTOR ASA OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
 * GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 */
#include "app_error.h"

#include "nrf_log.h"
#include "nrf_log_ctrl.h"
#include "app_util_platform.h"
#include "nrf_strerror.h"

#if defined(SOFTDEVICE_PRESENT) && SOFTDEVICE_PRESENT
#include "nrf_sdm.h"
#endif

/*lint -save -e14 */
/**
 * Function is implemented as weak so that it can be overwritten by custom application error handler
 * when needed.
 */
__WEAK void app_error_fault_handler(uint32_t id, uint32_t pc, uint32_t info)
{
 __disable_irq();
 NRF_LOG_FINAL_FLUSH();

#ifndef DEBUG
 NRF_LOG_ERROR("Fatal error");
#else
 switch (id)
 {
#if defined(SOFTDEVICE_PRESENT) && SOFTDEVICE_PRESENT
 case NRF_FAULT_ID_SD_ASSERT:
 NRF_LOG_ERROR("SOFTDEVICE: ASSERTION FAILED");
 break;
 case NRF_FAULT_ID_APP_MEMACC:
 NRF_LOG_ERROR("SOFTDEVICE: INVALID MEMORY ACCESS");
 break;
#endif
 case NRF_FAULT_ID_SDK_ASSERT:
 {
 assert_info_t * p_info = (assert_info_t *)info;
 NRF_LOG_ERROR("ASSERTION FAILED at %s:%u",
 p_info->p_file_name,
 p_info->line_num);
 break;
 }
 case NRF_FAULT_ID_SDK_ERROR:
 {
 error_info_t * p_info = (error_info_t *)info;
 NRF_LOG_ERROR("ERROR %u [%s] at %s:%u
PC at: 0x%08x",
 p_info->err_code,
 nrf_strerror_get(p_info->err_code),
 p_info->p_file_name,
 p_info->line_num,
 pc);
 NRF_LOG_ERROR("End of error report");
 break;
 }
 default:
 NRF_LOG_ERROR("UNKNOWN FAULT at 0x%08X", pc);
 break;
 }
#endif

 NRF_BREAKPOINT_COND;
 // On assert, the system can only recover with a reset.

#ifndef DEBUG
 NRF_LOG_WARNING("System reset");
 NVIC_SystemReset();
#else
 app_error_save_and_stop(id, pc, info);
#endif // DEBUG
}
/*lint -restore */

I was not able to find where DEBUG is defined 
 Also whenever it prints "fatal error" according to the file it should also print "System Reset" and do a system reset - neither of which happens. 
 
 What am I missing or where am I going wrong ?

Edvin · Accepted Answer

Hello, 
 This is because of the line before #ifndef DEBUG: 
 NRF_BREAKPOINT_COND; 
 Which will generate a breakpoint if a debugger is connected. If you use the DK, it typically will be, because you power the DK via the debugger. 
 If you try to comment out NRF_BREAKPOINT_COND, you will see that it will reset. 
 
 Note that it shouldn't do this if you are not debugging, but I believe these registers (that set the breakpoint when you are debugging) are only set on power on startup. Try to turn the board off and on while an application that has not defined DEBUG in the preprocessor defines is programmed. Then it should restart at this point. 
 
 You have to decide whether or not to use APP_ERROR_CHECK() in your final product. You may, but note that many of the examples reset on conditions that doesn't need a reset. E.g. in the ble_app_uart example, which uses APP_ERROR_CHECK() if ble_nus_data_send() returns NRF_ERROR_RESOURCES, which means that the buffer is full. In many cases, you will choose to use ble_nus_data_send() until the buffer is full. 
 
 So you don't have to pass all return values into APP_ERROR_CHECK(). You have to choose based on your application. Of course, sometimes a reset is probably the way to go, e.g. if you are in a deadlock (although a watchdog timer may also be used for this). 
 
 Best regards, 
 Edvin

General Error Handling & avoiding Fatal Error

Top Replies