Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

app_timer hangs

Hello.

I have encountered a case where app_timer gets stuck indefinitely while processing its timer queue. I have attached a simplified version of the code that runs on the nRF52840 DK. (I actually have the nRF52840 PDK, but I doubt that it makes a difference here).

The attached main.c should replace the main.c of the PCA10056 BSP example in SDK 15.2. The code toggles leds 1-3 every time buttons 1-3 are pressed. In addition there is another app_timer that toggles led 4 every second.

Essentially my app_timer code adds long press detection to the basic button handling done by app_button. I looked at the BSP library for an example of long press detection, and implemented the code similarly as the BSP does. So a button press starts an app_timer and a release stops it. If the timer expires before a release, a long press is signaled. This works otherwise, but rapidly clicking two buttons makes the code hang. It usually doesn't take more than a dozen clicks to make it hang on the DK.

Here is the button handling part of the code: (The full, compilable code is the attached main.c)

// Timer callback when long press timeout occurs.
// Forward long press events to the button handler.
static void button_long_press_handler(void * p_context)
{
    uint8_t * p_button = (uint8_t *)p_context;
    
    // Check if we have a button combo
    uint8_t is_pushed = 0;
    bool button_1 = false;
    bool button_2 = false;
    bool button_3 = false;
    if(app_button_is_pushed(0)){is_pushed++; button_1 = true;}    // app_button_is_pushed needs the index into the buttons[] array, hence (0,1,2).
    if(app_button_is_pushed(1)){is_pushed++; button_2 = true;}
    if(app_button_is_pushed(2)){is_pushed++; button_3 = true;}

    uint8_t event_id = BUTTON_LONG_PUSH;

    if(is_pushed > 1)
    {
        if(button_1 && button_2)
            event_id = BUTTON_COMBO_1_2;
        else if(button_1 && button_3)
            event_id = BUTTON_COMBO_1_3;
        else if(button_2 && button_3)
            event_id = BUTTON_COMBO_2_3;
    }

    button_common_handler(*p_button, event_id);
}

// Button definitions
// The config values are taken from custom_board.h
static const app_button_cfg_t buttons[BUTTONS_NUMBER] =
{
    {
        .pin_no = BUTTON_1,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    },

    {
        .pin_no = BUTTON_2,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    },

    {
        .pin_no = BUTTON_3,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    }
#if (BUTTONS_NUMBER > 3)
    ,
    {
        .pin_no = BUTTON_4,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    }
#endif
};


void buttons_init(void)
{
    ret_code_t err_code;
    err_code = app_button_init(buttons, BUTTONS_NUMBER, APP_TIMER_TICKS(20));
    APP_ERROR_CHECK(err_code);
    
    err_code = app_button_enable();    
    APP_ERROR_CHECK(err_code);

    err_code = app_timer_create(&button_long_press_timer, APP_TIMER_MODE_SINGLE_SHOT, button_long_press_handler);
    APP_ERROR_CHECK(err_code);
 
}

// Common handler for button events. Detects short and long presses.
static void button_common_handler(uint8_t pin_no, uint8_t button_action)
{
    static bool suppress_next_release_event = false;
    
    if(button_action == APP_BUTTON_PUSH)
    {
        // Start long press timer on PUSH event
        ret_code_t err_code = app_timer_start(button_long_press_timer, APP_TIMER_TICKS(LONG_PRESS_TIMEOUT_MS), (void*)&current_long_push_pin_no);
        if (err_code == NRF_SUCCESS)
        {
            current_long_push_pin_no = pin_no;
        }

        suppress_next_release_event = false;
    }
    if(button_action == APP_BUTTON_RELEASE)
    {
        // Stop long press timer
		(void)app_timer_stop(button_long_press_timer);
        
        // If this was the release event after a long press, ignore it.
        if(suppress_next_release_event){return;}
        
        if(pin_no == BUTTON_1){nrf_gpio_pin_toggle(LED_1);}
        if(pin_no == BUTTON_2){nrf_gpio_pin_toggle(LED_2);}
        if(pin_no == BUTTON_3){nrf_gpio_pin_toggle(LED_3);}
        
    }
    if(button_action == BUTTON_LONG_PUSH)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
    if(button_action == BUTTON_COMBO_1_2)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
    if(button_action == BUTTON_COMBO_1_3)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
    if(button_action == BUTTON_COMBO_2_3)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
}

One interesting aspect is that the code hangs in two different while() loops inside app_timer, depending on whether another app_timer (that implements a one second tick) is running. If the tick timer is running, execution hangs in timer_list_remove(), similarly as in this old thread. If I commend out the tick timer (one_sec_timer_start() in the attached code), the code hangs in timer_timeouts_check() in the following loop:

        // Expire all timers within ticks_elapsed and collect ticks_expired.
        while (p_timer != NULL)
        {
            // Do nothing if timer did not expire.
            if (ticks_elapsed < p_timer->ticks_to_expire)
            {
                break;
            }

            // Decrement ticks_elapsed and collect expired ticks.
            ticks_elapsed -= p_timer->ticks_to_expire;
            ticks_expired += p_timer->ticks_to_expire;

            // Move to next timer.
            p_previous_timer = p_timer;
            p_timer = p_timer->next;

            // Execute Task.
            if (p_previous_timer->is_running)
            {
                p_previous_timer->is_running = false;
                timeout_handler_exec(p_previous_timer);
            }
        }

When it hangs, both p_timer and p_previous_timer point to the same timer. And their next pointers also point to the same, so the loop never exits.

I suspect that the root cause is a race condition, since I am triggering starts and stops of the same timer rapidly. Yet this code uses the same principle as the BSP, and I can't get the unmodified BSP example to hang. So I must be doing something wrong. I would appreciate any advice on how to make this work.

/**
 * Copyright (c) 2014 - 2018, Nordic Semiconductor ASA
 *
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without modification,
 * are permitted provided that the following conditions are met:
 *
 * 1. Redistributions of source code must retain the above copyright notice, this
 *    list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form, except as embedded into a Nordic
 *    Semiconductor ASA integrated circuit in a product or a software update for
 *    such product, must reproduce the above copyright notice, this list of
 *    conditions and the following disclaimer in the documentation and/or other
 *    materials provided with the distribution.
 *
 * 3. Neither the name of Nordic Semiconductor ASA nor the names of its
 *    contributors may be used to endorse or promote products derived from this
 *    software without specific prior written permission.
 *
 * 4. This software, with or without modification, must only be used with a
 *    Nordic Semiconductor ASA integrated circuit.
 *
 * 5. Any software provided in binary form under this license must not be reverse
 *    engineered, decompiled, modified and/or disassembled.
 *
 * THIS SOFTWARE IS PROVIDED BY NORDIC SEMICONDUCTOR ASA "AS IS" AND ANY EXPRESS
 * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL NORDIC SEMICONDUCTOR ASA OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
 * GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 */
/** @file
 *
 * @defgroup bsp_example_main main.c
 * @{
 * @ingroup bsp_example
 * @brief BSP Example Application main file.
 *
 */

#include <stdbool.h>
#include <stdint.h>
#include "boards.h"
#include "app_timer.h"
#include "app_button.h"
#include "nrf_gpio.h"
#include "nordic_common.h"
#include "nrf_error.h"


#include "nrf_log.h"
#include "nrf_log_ctrl.h"
#include "nrf_log_default_backends.h"


#define LONG_PRESS_TIMEOUT_MS   3000

#define BUTTON_LONG_PUSH        2
#define BUTTON_COMBO_1_2        3
#define BUTTON_COMBO_1_3        4
#define BUTTON_COMBO_2_3        5

#define ONE_SEC_TIMER_TIMEOUT     (APP_TIMER_TICKS(1000))

static volatile uint32_t one_sec_ticks;

static void button_common_handler(uint8_t pin_no, uint8_t button_action);

uint8_t current_long_push_pin_no;

// The button timer is used to detect long presses
APP_TIMER_DEF(button_long_press_timer);
APP_TIMER_DEF(one_sec_timer_id);


/**@brief Function for initializing low frequency clock.
 */
void clock_initialization()
{
    NRF_CLOCK->LFCLKSRC            = (CLOCK_LFCLKSRC_SRC_Xtal << CLOCK_LFCLKSRC_SRC_Pos);
    NRF_CLOCK->EVENTS_LFCLKSTARTED = 0;
    NRF_CLOCK->TASKS_LFCLKSTART    = 1;

    while (NRF_CLOCK->EVENTS_LFCLKSTARTED == 0)
    {
        // Do nothing.
    }
}


// This handler is called every second
void handle_one_sec_timer_tick(void *p_context)
{
    one_sec_ticks++;
    nrf_gpio_pin_toggle(LED_4);
}

/**@brief Function for the Timer initialization.
 *
 * @details Initializes the timer module. This creates and starts application timers.
 */
void one_sec_timer_init(void)
{
    // Initialize app_timer module.
    ret_code_t err_code = app_timer_init();

    // Create timers.
    err_code = app_timer_create(&one_sec_timer_id, APP_TIMER_MODE_REPEATED, &handle_one_sec_timer_tick);
    APP_ERROR_CHECK(err_code);

}

/**@brief Function for starting timers.
 */
void one_sec_timer_start(void)
{
    ret_code_t err_code;
    err_code = app_timer_start(one_sec_timer_id, ONE_SEC_TIMER_TIMEOUT, NULL);
    APP_ERROR_CHECK(err_code);
}



// Timer callback when long press timeout occurs.
// Forward long press events to the button handler.
static void button_long_press_handler(void * p_context)
{
    uint8_t * p_button = (uint8_t *)p_context;
    
    // Check if we have a button combo
    uint8_t is_pushed = 0;
    bool button_1 = false;
    bool button_2 = false;
    bool button_3 = false;
    if(app_button_is_pushed(0)){is_pushed++; button_1 = true;}    // app_button_is_pushed needs the index into the buttons[] array, hence (0,1,2).
    if(app_button_is_pushed(1)){is_pushed++; button_2 = true;}
    if(app_button_is_pushed(2)){is_pushed++; button_3 = true;}

    uint8_t event_id = BUTTON_LONG_PUSH;

    if(is_pushed > 1)
    {
        if(button_1 && button_2)
            event_id = BUTTON_COMBO_1_2;
        else if(button_1 && button_3)
            event_id = BUTTON_COMBO_1_3;
        else if(button_2 && button_3)
            event_id = BUTTON_COMBO_2_3;
    }

    button_common_handler(*p_button, event_id);
}

// Button definitions
// The config values are taken from custom_board.h
static const app_button_cfg_t buttons[BUTTONS_NUMBER] =
{
    {
        .pin_no = BUTTON_1,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    },

    {
        .pin_no = BUTTON_2,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    },

    {
        .pin_no = BUTTON_3,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    }
#if (BUTTONS_NUMBER > 3)
    ,
    {
        .pin_no = BUTTON_4,
        .active_state = BUTTONS_ACTIVE_STATE,
        .pull_cfg = BUTTON_PULL,
        .button_handler = button_common_handler
    }
#endif
};


void buttons_init(void)
{
    ret_code_t err_code;
    err_code = app_button_init(buttons, BUTTONS_NUMBER, APP_TIMER_TICKS(20));
    APP_ERROR_CHECK(err_code);
    
    err_code = app_button_enable();    
    APP_ERROR_CHECK(err_code);

    err_code = app_timer_create(&button_long_press_timer, APP_TIMER_MODE_SINGLE_SHOT, button_long_press_handler);
    APP_ERROR_CHECK(err_code);
 
}

// Common handler for button events. Detects short and long presses.
static void button_common_handler(uint8_t pin_no, uint8_t button_action)
{
    static bool suppress_next_release_event = false;
    
    if(button_action == APP_BUTTON_PUSH)
    {
        // Start long press timer on PUSH event
        ret_code_t err_code = app_timer_start(button_long_press_timer, APP_TIMER_TICKS(LONG_PRESS_TIMEOUT_MS), (void*)&current_long_push_pin_no);
        if (err_code == NRF_SUCCESS)
        {
            current_long_push_pin_no = pin_no;
        }

        suppress_next_release_event = false;
    }
    if(button_action == APP_BUTTON_RELEASE)
    {
        // Stop long press timer
		(void)app_timer_stop(button_long_press_timer);
        
        // If this was the release event after a long press, ignore it.
        if(suppress_next_release_event){return;}
        
        if(pin_no == BUTTON_1){nrf_gpio_pin_toggle(LED_1);}
        if(pin_no == BUTTON_2){nrf_gpio_pin_toggle(LED_2);}
        if(pin_no == BUTTON_3){nrf_gpio_pin_toggle(LED_3);}
        
    }
    if(button_action == BUTTON_LONG_PUSH)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
    if(button_action == BUTTON_COMBO_1_2)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
    if(button_action == BUTTON_COMBO_1_3)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
    if(button_action == BUTTON_COMBO_2_3)
    {
        // Suppress next release event
        suppress_next_release_event = true;
    }
}

/**
 * @brief Function for application main entry.
 */
int main(void)
{
    clock_initialization();

    nrf_gpio_cfg_output(LED_1);
    nrf_gpio_cfg_output(LED_2);
    nrf_gpio_cfg_output(LED_3);
    nrf_gpio_cfg_output(LED_4);

    one_sec_timer_init();
    //one_sec_timer_start();
    
    APP_ERROR_CHECK(NRF_LOG_INIT(NULL));
    NRF_LOG_DEFAULT_BACKENDS_INIT();

    NRF_LOG_INFO("BSP example started.");

    buttons_init();
    
    while (true)
    {
        NRF_LOG_FLUSH();
        __SEV();
        __WFE();
        __WFE();
        // no implementation needed
    }
}


/** @} */

Parents
  • I just came accross the same issue:

    First, you should definetly use APP_ERROR_CHECK for app_timer_start and app_timer_stop, which would had probably returned with an out of memory error.

    Second, probably the timer queue is filled to quickly (or more quickly as items are removed from it). 
    I cannot give you a direct solution, but maybe it is possible for you to reduce the number of stop/start actions. Or, less gracefully, you can increment APP_TIMER_OP_QUEUE_SIZE to a more fitting value (but i would not consider this a good solution)

  • Thanks csaller, I'll need to check if it really is the app_timer queue filling up. An out of memory condition shouldn't cause the code to hang though, it should just cause missed events.

    Looking closer at the app_button debounce code I see that it also uses the same principle for button debouncing. I'm now beginning to see how the queue could fill up after only a few button presses.

    I need to do some more testing on this. And in any case I need to find a way to process fast button presses without the code hanging. My current code hangs and the BSP example doesn't, so I'll need to check why is that. The queue sizes should be the same.

  • I found out I can make the application hang by pressing one button at the same time as releasing another one. But if I try doing it the other way around it doesn't hang, so the order is important. Maybe it has something to do with starting and stopping a timer at the same time.

  • Yes, that has been my suspicion as well. But my code should essentially be a simplification of the BSP code, which also does the same kind of starting and stopping. And it doesn't seem to hang in the same way.

  • Hello again, and sorry for late reply. I found a fix for your problem. You need to make sure that a timer is not inserted in timer_list_insert() when it is already in the queue. To do that, you can check it like this

    if (p_current == p_timer)
    {
    	return;
    }

    The timer_list_insert() will then look like this

    static void timer_list_insert(timer_node_t * p_timer)
    {
    	
        if (mp_timer_id_head == NULL)
        {
            mp_timer_id_head = p_timer;
        }
        else
        {
            if (p_timer->ticks_to_expire <= mp_timer_id_head->ticks_to_expire)
            {
                mp_timer_id_head->ticks_to_expire -= p_timer->ticks_to_expire;
    
                p_timer->next   = mp_timer_id_head;
                mp_timer_id_head = p_timer;
            }
            else
            {
                timer_node_t * p_previous;
                timer_node_t * p_current;
                uint32_t       ticks_to_expire;
    
                ticks_to_expire   = p_timer->ticks_to_expire;
                p_previous        = mp_timer_id_head;
                p_current         = mp_timer_id_head;
    
                while ((p_current != NULL) && (ticks_to_expire > p_current->ticks_to_expire))
                {
                    ticks_to_expire   -= p_current->ticks_to_expire;
                    p_previous         = p_current;
                    p_current          = p_current->next;
                }
    
                if (p_current != NULL)
                {
                    p_current->ticks_to_expire -= ticks_to_expire;
                }
    
    			if (p_current == p_timer)
    			{
    				return;
    			}
    
                p_timer->ticks_to_expire = ticks_to_expire;
                p_timer->next            = p_current;
                p_previous->next         = p_timer;
            }
        }
    }

    This is probably not an ideal solution, but it seems to work.

  • Thank you. That change seems to fix the immediate case, but I'm not sure if it fixes the general case.

    I have tried to dig deeper into the timer queue handling, and would like to confirm with you if my understanding is correct:

    The app_timer queue is a linked list of timer_node_t -structs. When new timers are added to the queue new nodes are NOT allocated (dynamically or from a fixed size pool). Instead, every timer created by APP_TIMER_DEF is itself a timer_node_t node. Is my understanding correct so far?

    If this is true, then one must never, ever add the same timer into the queue twice. Doing so would modify the existing queue by modifying the next pointer of the first timer instance that is already in the queue. This would result in either skipping the timers in the queue between the first and second instances, or lead to a circular queue that causes the hang we're seeing in my case. Again is this correct or am I way off here?

    The fix you propose would prevent adding the same timer twice consecutively into the queue. This might be enough to prevent the hang, but if I have understood the operation correctly one could still add the same timer twice if there are other timers in between. And that case would lead to the timers in between to be lost from the queue.

  • mrono said:
    The app_timer queue is a linked list of timer_node_t -structs. When new timers are added to the queue new nodes are NOT allocated (dynamically or from a fixed size pool). Instead, every timer created by APP_TIMER_DEF is itself a timer_node_t node. Is my understanding correct so far?

     Yes, this is my impression as well. The timer_id seems to be the pointer to the timer_node_t struct, and never changes after it is allocated by APP_TIMER_DEF.

     

    mrono said:
    If this is true, then one must never, ever add the same timer into the queue twice. Doing so would modify the existing queue by modifying the next pointer of the first timer instance that is already in the queue. This would result in either skipping the timers in the queue between the first and second instances, or lead to a circular queue that causes the hang we're seeing in my case. Again is this correct or am I way off here?

     This is correct too, as far as I know.

     

    mrono said:
    The fix you propose would prevent adding the same timer twice consecutively into the queue. This might be enough to prevent the hang, but if I have understood the operation correctly one could still add the same timer twice if there are other timers in between. And that case would lead to the timers in between to be lost from the queue.

     Yes, I'm aware of this. As I said the solution isn't ideal. Yesterday when I suggested the fix I was considering a more general approach that would check all the nodes in the queue to determine if the timer is already in the queue. This shouldn't be too hard to implement I believe. Something like

    p_current         = mp_timer_id_head;
    
    while ((p_current != NULL))
    {
        if (p_current == p_timer)
            return;
        p_current          = p_current->next;
    }

    should do the trick. I haven't tested it though so I'm not sure.

Reply
  • mrono said:
    The app_timer queue is a linked list of timer_node_t -structs. When new timers are added to the queue new nodes are NOT allocated (dynamically or from a fixed size pool). Instead, every timer created by APP_TIMER_DEF is itself a timer_node_t node. Is my understanding correct so far?

     Yes, this is my impression as well. The timer_id seems to be the pointer to the timer_node_t struct, and never changes after it is allocated by APP_TIMER_DEF.

     

    mrono said:
    If this is true, then one must never, ever add the same timer into the queue twice. Doing so would modify the existing queue by modifying the next pointer of the first timer instance that is already in the queue. This would result in either skipping the timers in the queue between the first and second instances, or lead to a circular queue that causes the hang we're seeing in my case. Again is this correct or am I way off here?

     This is correct too, as far as I know.

     

    mrono said:
    The fix you propose would prevent adding the same timer twice consecutively into the queue. This might be enough to prevent the hang, but if I have understood the operation correctly one could still add the same timer twice if there are other timers in between. And that case would lead to the timers in between to be lost from the queue.

     Yes, I'm aware of this. As I said the solution isn't ideal. Yesterday when I suggested the fix I was considering a more general approach that would check all the nodes in the queue to determine if the timer is already in the queue. This shouldn't be too hard to implement I believe. Something like

    p_current         = mp_timer_id_head;
    
    while ((p_current != NULL))
    {
        if (p_current == p_timer)
            return;
        p_current          = p_current->next;
    }

    should do the trick. I haven't tested it though so I'm not sure.

Children
  • Thank you for the confirmation. I was thinking that something along those lines would be needed as well.

    What I find strange is that I still can't make the unmodified BSP example hang. It uses the exact same principle, my code is just a simplification of that. I tried the BSP example again now that I think I understand what is going on, and somehow that code avoids this issue.

  •     if (p_node->is_running)
        {
            return NRF_ERROR_INVALID_STATE;
        }

    Instead of adding the check proposed yesterday to timer_list_insert(), I added the above check to app_timer_start(). Now I can also get an error response if the timer isn't started because it is already running.

    I think this should work for repeated app_timers as well, but I haven't tested that yet.

  • So to conclude this thread I'll summarize here what I've learned:

    Due to the way app_timer implements its internal queue, one should never add the same timer instance into the queue twice. In practice this means one should never start an app_timer instance that is already running, as a running timer will already be in the queue.

    The logical and simple way to prevent this is to add a check to app_timer_start(), so that attempting to (re-)start a running timer will caught. The check I'm using is below. I believe a check like this is all that is needed to avoid this issue.

        if (p_node->is_running)
        {
            return NRF_ERROR_INVALID_STATE;
        }

Related