This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Random Fatal Error log message when powering up board

We are in the final throws of testing our boards that are based on the BLE UART Peripheral application.  The current test environment involves monitoring board and battery life conditions.

Occasionally, yet far too often for product reliability, after a battery crash (ie discharge below CPU function limits), when powering the board back up (applying battery charge source), a "Fatal Error" message is delivered via the logging system.  Sometimes there are no visible consequences to the board operation when this is event is noted, but far too often one or two critical conditions result:

1 ) The CPU  (NRF51822)  and will not properly boot and load/run the app unless the board power system is fully discharged, and/or,

2) Once booted, the contents of NVRAM (Board Cal, Sensor Cal, and board/battery life information) have been erased.

Neither of these conditions are, of course, acceptable for our product once in the field.

The log messages received give no other details as to what the Fatal Error is or what may have caused the event.

Can anyone please point me towards a way of troubleshooting what is causing these errors?

Thank you all kindly,

Robin@TL

Parents
  • Hi,

    Please try to compile your project with "DEBUG" defined if you haven't done so already. It should result in a more verbose crash log if used together with the default app error handler (Error module). Error information may include the file name and line number of where the error occurred.

    Regards,

    Vidar 

  • Hello yet again Vidar,

    I am not sure this is related, but it very well may be.  I had powered 5 new boards yesterday and started burning them in.  When I checked them this morning all but one had multiple reboots.  At that juncture I elected to recompile with the DEBUG flag set, as you suggested, downloaded to all five boards, and restarted the burnin test on all of them, while monitoring the worst for errors.  So far no errors.  The only change I made to the code was the DEBUG flag.  What do you think could be going on?

    The failures the first attempt, before setting the flag happened within a few hours of the original burnin start.  I am now more than half way through the (24hr) burnin restart, with no sign of problem.

    Thanks for your input

    Robin @ TL

  • Hello Robin,

    It might be related. The DEBUG flag changes the error handling in the default app error handler: without the flag, the device resets itself by calling NVIC_SystemReset() to attempt to recover, and with the flag set, it enters the infinite loop app_error_save_and_stop() to allow the error information to be inspected with the debugger.

    I know I indicated earlier that the error information (from app_error_save_and_stop)  would be logged. But I had forgotten that crash logging was first introduced in SDK 14, so sorry for the confusion. 

    The default error handler from SDK 12.3.0 is shown below. If you want to verify if the behavior I described above with your app, you may insert an APP_ERROR_CHECK() macro in main() and pass an error code to it. Then try debugging with and without the DEBUG flag.

    /**@brief Application main function.
     */
    int main(void)
    {
        uint32_t err_code;
        bool erase_bonds;
    
        // Initialize.
        APP_TIMER_INIT(APP_TIMER_PRESCALER, APP_TIMER_OP_QUEUE_SIZE, false);
        uart_init();
    
        APP_ERROR_CHECK(NRF_ERROR_INVALID_PARAM); //<-- will invoke error handler when input is not NRF_SUCCESS
        ...
    }

    Anyway, the "Fatal Error" message confirms the app is entering the fault handler, so the next step should be to find out where the error occurred. Since your app is based on ble_app_uart I'm wondering if you have kept the default handling for the APP_UART_COMMUNICATION_ERROR event in the UART callback? This is a common source of errors. A floating RX input, level change on startup, etc may lead to a UART Error condition (link to nrf52 but same applies to 51). To check if this may be the case I suggest that you insert 

    case APP_UART_COMMUNICATION_ERROR:

              NRF_LOG_INFO(" UART comm error. Error source %d", p_event->data.error_communication);
              APP_ERROR_HANDLER(p_event->data.error_communication); // <-- Will invoke fault handler. Handling of this error is application-specific. May be removed if it's OK to ignore com. errors. 
    break;

    2) Once booted, the contents of NVRAM (Board Cal, Sensor Cal, and board/battery life information) have been erased.

    This is probably not directly related to the code assertion. Is there anything in your code that may delete the data in certain conditions? 

    Regards,

    Vidar

  • Hi Vidar,

    I am not sure you caught the gist of this latest finding.  With the DEBUG flag there had need no reboots or errors.  Or might I not be understanding your reply?

    Indeed, during the Cal and test processes there is code that writes (and therefore necessarily erases) NVRAM.  I may have a handle on this, as I am doing battery life monitoring then at full charge and discharge saving the data to NVRAM.  Under some battery conditions the battery may fully crash before the write. This could be the rare memory erasure.  

    Right now I am more concerned about why before DEBUG there were multiple reboots within a few hours, and since it, there had been none ( will be rechecking that this AM).

    Thanks again for your time,

    Robin @ TL

  • Hi Robin,

    I am not sure you caught the gist of this latest finding.  With the DEBUG flag there had need no reboots or errors.  Or might I not be understanding your reply?

    You're right. The DEBUG flag could explain why you no longer get reboots, but it doesn't explain why the errors have stopped occurring. It's hard to say if that's a coincidence or not.

    Right now I am more concerned about why before DEBUG there were multiple reboots within a few hours, and since it, there had been none ( will be rechecking that this AM).

    Could you try to add logging in APP_UART_COMMUNICATION_ERROR without setting the DEBUG flag to see if this may be the error source?

    Regards,

    Vidar

  • Hello Vidar,

    These error have not reoccured since I started with the DEBUG flag set. They may have been related to a bug I found wherein I was attempting read/write NVRAM near battery death.  At this stage is is unlikely I will have time to chase it unless they become problematic again, or something changes in our schedule.  For now the plan to move forward with DEBUG set for production code.  Do you see any serious concerns with doing this?

    Thanks

    Robin @ TL

    PS.  You can close this case for now.  I will be opening one today regarding pairing/bonding implications for ble_app_uart and ble_app_uart_central.

  • Hello Robin,

    The DEBUG flag is only used for pre-processor checks in the code, so I would suggest reviewing its usage in your code to make sure it doesn't have any unindented side effects if you haven't already done it. E.g., check that the program doesn't go into an infinite loop if your error handler gets invoked. 

Reply Children
No Data
Related