Task watchdog callback - Printk, Logging and saving to the internal flash not working

Greetings,

I have recently incorporated the Task Watchdog library in our custom application for our custom board based on nRF52840 MCU with success and everything works as expected.

The only issue I have not been able to solve is getting the information on which thread was responsible for a watchdog timeout.

Inside the task watchdog callback I tried printing the information of the log using printk, the logging system (LOG_WRN/INF/DBG) & saving the information in the internal flash so that I can at least retrieve it and print it using the logger after the reset on initialization.

Below a have attached the callback used for all the task watchdog channels.

void task_wdt_callback(int channel_id, void *user_data)
{
	// LOG_INF("Task watchdog channel %d callback, thread: %s",
	// 	channel_id, k_thread_name_get((k_tid_t)user_data));
	printk("Task watchdog channel %d callback, thread: %s\n",
		channel_id, k_thread_name_get((k_tid_t)user_data));

  // ch_id = channel_id;
  // LOG_INF("ch_id: %d", ch_id);

  wdt_state_t state = { .wdt_reboot = true, 
                        .channel_id = channel_id, 
                        .user_data = user_data };
  fs_store_wdt_info( state );

	/*
	 * If the issue could be resolved, call task_wdt_feed(channel_id) here
	 * to continue operation.
	 *
	 * Otherwise we can perform some cleanup and reset the device.
	 */

	printk("Resetting device...\n");
	// LOG_INF("Resetting device...");
  //k_msleep(2000);
	sys_reboot(SYS_REBOOT_COLD);
}
 

Whatever I put inside this callback seems like it is never executed, not even storing the values with the LittleFS system using fs_store_wdt_info (which is already implemented and tested and works great as in many other modules of our custom application, when I retrieve the values from flash on initialization I get irrelevant invalid values).

What could be the problem here?

I have confirmed using the debugger that the callback function is indeed called when simulating a hang in a thread (by inserting an infinite loop in the thread) but seems like the code inside the callback is not running at all.

I look forward to hearing from you with any feedback on what could be causing this.

Thank you!

Best regards,

Stavros

  • Hello,

    I assume you have looked at for insance the example sample for the task watchdog timer in zephyr: 
    \zephyr\samples\subsys\task_wdt\src\main.c

    Looking at your code I get the impression that task_wdt_callback() runs, but maybe fs_store_wdt_info() is not blocking, so do you ensure that you give the the rtos sufficient time to actually store data to flash? I can see that you have commented out k_msleep(2000), though I think you should have it there to ensure the system have time to store the data to flash before you do a soft reset.

    Kenneth

  • Hello Kenneth,

    Thank you very much for your immediate response!

    I uncommented the "k_msleep(2000)" and have tested the callback again ( also I tested with 5000 msec delay ) so that I give the rtos enough time to complete the storage of the data in flash.

    Unfortunately, I do not get back the correct values -that I tried to store in the callback- when the system resets.

    Also FYI I have tested the flash storage and retrieval functions (i.e. fs_store_wdt_info()) separately (not inside the watchdog callback) and they work perfectly!

    It just seems that whatever I put inside the watchdog callback is not running as expected.

    What could be the reason for this?

    Also the LOG_XXX() & printk() functions inside the callback do not work at all, is this expected(I have used them with complete success in any other point of the application except for this one)? 

    Thank you very much and I look forward to hearing from you!

    Best regards,

    Stavros

  • Hello,

    I have not used the task watchdog, but typically watchdogs run on very high priority and will not be interrupted by anything. If you are using the hardware watchdog of the nRF52-series, then it will always reset within a few RTC periods so there is no time to store anything to flash, however you are not mentioning anything about your sleep routine is not working, so I assume you are not using the hardware watchdog here?

    I can find that to not use the hardware watchdog you must pass NULL to task_wdt_init(). When using NULL zephyr will use a timer to emulate hardware watchdog, it may be more flexible.

    Kenneth

  • Hello Kenneth,

    Thank you very much for your immediate response!

    I am currently using the Task Watchdog Module with multiple channels, one for each thread.

    The operation of the task watchdog is very successful and they reset the system when an eternal loop is inserted in one of the threads.

    I have tried disabling the hardware watchdog as you suggested (pass NULL to task_wdt_init() and disable the CONFIG_TASK_WDT_HW_FALLBACK=n ) and added a k_msleep(5000) command after storing the thread info in the flash memory (fs_store_wdt_info()). 

     

    void task_wdt_callback(int channel_id, void *user_data)
    {
    	// LOG_INF("Task watchdog channel %d callback, thread: %s",
    	// 	channel_id, k_thread_name_get((k_tid_t)user_data));
    	printk("Task watchdog channel %d callback, thread: %s\n",
    		channel_id, k_thread_name_get((k_tid_t)user_data));
    
      wdt_state_t state = { .wdt_reboot = true, 
                            .channel_id = channel_id, 
                            .user_data = user_data };
      fs_store_wdt_info( state );
    
      k_msleep(5000);
    
    	/*
    	 * If the issue could be resolved, call task_wdt_feed(channel_id) here
    	 * to continue operation.
    	 *
    	 * Otherwise we can perform some cleanup and reset the device.
    	 */
    
    	printk("Resetting device...\n");
    	sys_reboot(SYS_REBOOT_COLD);
    }

    Unfortunately, I had no success, and the values stored (I am reading the stored values on program start after the watchdog forced a reset) were not updated as expected.

    I am looking forward to hearing from you on any feedback you have about what I could do to get the information on which thread (task watchdog channel) caused the expiration of the watchdog timer as this is vital information for debugging problems, should one arise. The watchdog even though it is working as it should, would be rendered partially useless if I cannot get the information on which of the threads running causes the issue(triggers the watchdog).

    Thank you very much!

    Best regards,

    Stavros

  • Hello,

    Can you direct this question to the zephyr support (they have a discord channel). I have a feeling that a watchdog is something that should not occur, and if it occurs it is a kind of point of no return, where you can't really allow anything to run (since you don't know what actually caused the watchdog to start with). So I have a feeling this is a conscientious decision and implementation that the only thing allowed is really execute a reset.

    I can find that memfault have their own workaround to this:
    \modules\lib\memfault-firmware-sdk\examples\nrf-connect-sdk\nrf9160\memfault_demo_app\src\watchdog.c

    If you get some feedback on discord it would be nice if you can share it.

    Best regards,
    Kenneth

Related