Feb 3 / Wadix Technologies

Watchdog Timers in Embedded Systems: Preventing Silent Software Failures

Watchdog Timers in Embedded Systems: Preventing Silent Software Failures


1. Introduction

In many embedded systems, the most dangerous failures are not complete crashes, but situations where the software continues running while no longer behaving correctly. A stalled task, an unexpected interrupt sequence, or a peripheral that never responds can leave the system in an undefined state.

Unlike power failures, these issues may not trigger a reset automatically. The system may appear alive while silently failing to perform its function. Without a recovery mechanism, such failures can persist indefinitely.

To protect against these scenarios, embedded systems rely on watchdog timers. Watchdogs monitor software execution and force a system reset if the firmware stops responding as expected, converting unpredictable failures into controlled recovery events.


2. What Is a Watchdog Timer?

A watchdog timer is a hardware peripheral that counts down independently of the main program flow. The firmware must periodically refresh, or “kick,” the watchdog within a defined time window.

If the watchdog is not refreshed in time, it assumes the system is no longer operating correctly and triggers a reset. This mechanism ensures that the system cannot remain stuck indefinitely due to software faults.

The key idea behind a watchdog is simple: if the software cannot prove it is alive, the system resets.

Figure 1 – Basic Watchdog Operation


3. Independent Watchdog

An independent watchdog is a hardware watchdog that runs from its own clock source, separate from the main system clock. This makes it effective even when the CPU clock stops, becomes unstable, or is misconfigured.

Because it does not rely on the system clock, the independent watchdog continues counting down even if the firmware is completely stuck or the clock tree fails. If the software does not refresh the watchdog in time, a reset is guaranteed.

Figure 2 – Independent Watchdog (IWDG)

On STM32 microcontrollers, this mechanism is implemented as the Independent Watchdog (IWDG), which runs from the internal low-speed oscillator (LSI). Once enabled, the IWDG cannot be stopped by software, making it a reliable last line of defense in production systems.

IWDG_HandleTypeDef hiwdg;

void IWDG_Init(void)

{

  hiwdg.Instance = IWDG;

  hiwdg.Init.Prescaler = IWDG_PRESCALER_64;

  hiwdg.Init.Reload = 1000; // timeout depends on LSI frequency

  HAL_IWDG_Init(&hiwdg);

}

void IWDG_Refresh(void)

{

  HAL_IWDG_Refresh(&hiwdg);

}

The watchdog should be refreshed only after the system has completed its critical tasks successfully. Refreshing it blindly in a timer interrupt defeats its purpose.


4. Window Watchdog

A window watchdog adds an additional constraint to the watchdog mechanism by defining a valid refresh window. The watchdog must be refreshed neither too early nor too late.

Refreshing the watchdog too early may indicate that the software is stuck in a fast loop, while refreshing it too late suggests that the system is no longer responding correctly. In both cases, the window watchdog triggers a reset.

Figure 3 – Window Watchdog (WWDG)

On STM32 devices, this functionality is provided by the Window Watchdog (WWDG). Unlike the independent watchdog, it is clocked from the system bus, allowing it to detect timing-related software failures rather than only total lockups.

WWDG_HandleTypeDef hwwdg;

void WWDG_Init(void)

{

  hwwdg.Instance = WWDG;

  hwwdg.Init.Prescaler = WWDG_PRESCALER_8;

  hwwdg.Init.Window = 0x50; // refresh allowed only below this value

  hwwdg.Init.Counter = 0x7F; // starting counter

  hwwdg.Init.EWIMode = WWDG_EWI_DISABLE;

  HAL_WWDG_Init(&hwwdg);

}

void WWDG_Refresh(void)

{

  HAL_WWDG_Refresh(&hwwdg, 0x7F);

}

Because the refresh must occur inside a specific timing window, the window watchdog requires more careful configuration and tighter control over execution timing, but it provides additional protection against subtle software failures.


5. Where to Refresh the Watchdog in Software

In RTOS-based systems, a robust and widely used approach is to manage the watchdog through a supervisor task that represents the overall health of the system. Rather than refreshing the watchdog from individual tasks or low-level mechanisms, each critical task periodically reports its status to the supervisor, for example through heartbeat flags, counters, or task notifications. The supervisor task executes at a known interval and checks whether all monitored tasks have reported within their expected time windows. Only when the entire system is verified to be operating correctly does the supervisor refresh the watchdog timer.

Refreshing the watchdog from a specific interrupt, such as a timer interrupt or the SysTick handler, should be avoided. Interrupts may continue to execute even when the main application logic is stalled, blocked, or malfunctioning. In such cases, the watchdog would still be refreshed, giving a false impression that the system is healthy while critical tasks are no longer running correctly. By centralizing watchdog control in a supervisor task that depends on task-level execution, the watchdog becomes a true indicator of system health and can reliably trigger recovery when silent software failures occur.


6.Conclusion

Watchdog timers are a critical safety mechanism in embedded systems, ensuring that software failures cannot leave the system in an undefined state. By forcing recovery through controlled resets, watchdogs improve system robustness and reliability in real-world deployments.

Created with