What (and why) is the most safe way to use a internal hardware watchdog in a scheduled bare-metal embedded app?

Question

I've already used internal hardware watchdogs in several OS-less embedded applications (with static schedulers).

What I do is:

I look for the slowest periodic and lowest priority task
I set the watchdog timeout to something more than the period of the slowest task
then I kick the dog at the beginning of the slowest task

I think this is a minimalist but safe approach.

Is there any best practice? (personal experience or verified sources)

I've heard/seen people doing different things like kicking the dog more than once in different tasks, or kicking only if all tasks have been called within a timeout,...

user6556709 · Answer 1 · 2019-04-15T13:21:16.963

4

Your approach has the problem that you can't guarantee by running the slowest task that all other task have run.

And as an extension in a multitasking environment you usually end up with some high priority task which are needed to ensure the functionality and other tasks (IO, hw-monitoring, etc) about which you don't really care.

So your watchdog is only needed for the important but you have to observe them all. to ensure that you need as very simple solution a running state structure like that:

struct{
  bool task1HaRun;
  bool task2HasRun;
  bool task3HasRun;
};

with a mutex around it. Each tasks sets its own hasRunFlag and checks if all others are also set. If all others are set it resets all and triggers the watch-dogs. If you don't let every task check for itself you may miss blocked tasks.

There are more elegant ways for that problem but that one is portable and gives you an idea what to do.

edited Apr 15 '19 at 13:21

answered Apr 15 '19 at 10:05

user6556709

1,272
8
13

"can't guarantee"? – Guillaume D Apr 15 '19 at 10:24
@Guillaume D sorry, fixed it. – user6556709 Apr 15 '19 at 11:10
"by running the slowest task that all other task have run." In fact, by running the slowest task, in a cooperative scheduler, all tasks are called sequentially, so if the slowest task runs, any task runs....It could be done, as @Lundin said, in the main loop. – Guillaume D Apr 23 '19 at 07:35
@Guillaume D I assumed you use some kind of RTOS and most of them are preemptive. Or to be more correct to ensure real-time capability they have to. – user6556709 Apr 23 '19 at 08:04
I said "OS-less" and "bare-metal" – Guillaume D Apr 23 '19 at 08:43
@Guillaume D You said scheduler and tasks. On embedded systems RTOS didn't make munch more and btw. if you have a pure cooperative system you don't need tasks. They buy nothing but more complexity. – user6556709 Apr 23 '19 at 16:32

score 1 · Accepted Answer · answered Apr 15 '19 at 11:00

1

Your question is a bit subjective, but there is something of an industry de facto standard for real-time applications, which goes like this:

Specify a maximum allowed response time of the system. As in, the longest time period that some task is allowed to take. Take ISRs etc in account. For example 1ms.
Set the dog to a time slightly longer than the specified response time.
Kick the dog from one single place in the whole program, preferably from the main() loop or similar suitable location (RTOS has no standard for where to do this, AFAIK).

This is the toughest requirement - ideally the dog doesn't know anything about your various tasks but is kept away from application logic.

In practice this might be hard to do for some systems - suppose you have flash bootloaders and similar which by their nature simply must take long time. Then you might have to do dirty stuff like placing watchdog kicks inside a specific driver. But it is a best practice to strive for.

So ideally you have this at the very top level of your application:

void main (void)
{
  /* init stuff */

  for(;;)
  {
    kick_dog();
    result = execute();
    error_handler(result);
  }
}

As a side-effect of this policy, it eliminates the risk of having "talented" people end up kicking the dog from inside a ISR.

answered Apr 15 '19 at 11:00

Lundin

195,001
40
254
396

by doing this you can't tell if one task missed its deadline...Calling the kick in the main loop but at the end would be a better solution, wouldn't be? – Guillaume D Apr 23 '19 at 07:50
@GuillaumeD Depends on if you allow the system to miss deadlines or not. For a real-time system, a missed deadline is a bug. But during debug it might be wise to disable the wdog and use a timer instead. – Lundin Apr 23 '19 at 08:04
but disabling wdg during debug is one thing. Kick the wdg before or after the main loop execution is another thing. What do you think about calling the kick after the main loop execution instead of before the main loop execution? – Guillaume D Apr 23 '19 at 09:19
@GuillaumeD "Kick the wdg before or after the main loop execution" It is a loop... it goes around. But if you have real-time requirements that are significantly harder than the wdog timeout, it might make sense to use a timer anyway, to produce diagnostics and to handle errors gracefully - depends on the nature of the application. Some applications must reset upon errors, others cannot be allowed to do that. For safety-critical systems, wdog timeout is the last resort - they should ideally revert to a safe state instead, before that happens. – Lundin Apr 23 '19 at 10:34

What (and why) is the most safe way to use a internal hardware watchdog in a scheduled bare-metal embedded app?

2 Answers2