// Copyright (C) 2024 The Qt Company Ltd. // SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GFDL-1.3-no-invariants-only /*! \page watchdog.html \ingroup qtappman \title Watchdog \brief Describes configuration and mode of operation of the built-in watchdog mechanism. \section1 Introduction The application manager features a built-in watchdog mechanism that monitors the main thread's event loop, every Quick window's render thread and all clients of the application manager's Wayland compositor for unresponsive behavior. The event loop and render thread monitoring is implemented for both the System UI as well as for any QML application running in multi-process mode. The watchdog is implemented as a separate thread that periodically (see \c checkInterval) checks the state of the monitored subsystems. If any of these fail to respond within a given time frame, the watchdog will first issue a warning (see \c warnTimeout) and eventually kill (see \c killTimeout) the affected thread or client. Please keep in mind, that due to the periodic nature of this check, the actual warning and killing timeout messages might be delayed by up to the \c checkInterval. Killing the affected thread directly (instead of just aborting the whole process) will cause the application manager's crash handler to print a backtrace for the stuck thread, which can be very useful to diagnose freezes. \note The watchdog is disabled by default. You need to enable it by setting at least one of the \c checkInterval configuration values in the \l{Configuration}{main configuration} file to a timeout that suits your specific device setup. \section1 Systemd Support Support for systemd's watchdog is built into the application manager as well: see \{Installation}. If enabled, the application manager will automatically detect at startup if it was launched by systemd and if the systemd unit file has the \c{WatchdogSec} option set. If this is the case, the application manager will periodically send the requested notifications to systemd from its watchdog thread. \section1 Logging The watchdog logs all its messages to the \c{am.wd} logging category. All logging is done from the separate watchdog thread and the main thread to minimize interference with the monitored threads or render loops. The following logging levels are used: \table \header \li Log Level \li Description \row \li \c info \li The watchdog started (or stopped) watching an object (thread, window, Wayland client). \row \li \c warning \li A \c warnTimeout has been exceeded. \row \li \c critical \li A \c killTimeout has been exceeded. \endtable \section1 Performance Considerations Nothing in life comes for free and the watchdog is no exception. While the overhead of the watchdog is generally very low, it does impact three areas: \list \li For every frame rendered, the watchdog adds three invocations of a \e direct signal/slot connection: each call retrieves the current system time and stores it via an atomic fetch-and-store operation. \li For every Qt event delivered in a watched thread, the watchdog adds two callbacks: each call checks the state via an atomic load, then retrieves the current system time, but only one stores it via an atomic fetch-and-store operation. \li The separate watchdog thread runs a periodic check (see \c checkInterval). It retrieves the current system time and then collects time data via atomic load operations once for each of the watched objects. \endlist \section1 Configuration The watchdog is configured via the \c{watchdog} key in the \l{Configuration}{main configuration} file. Applications inherit these settings, but can also override any value by setting the corresponding key in their \l{Manifest Definition}{info.yaml manifest} file. The following interval and timeout values listed below let you specify the exact \l{Time Duration Values}{times} with milli-seconds precision. Setting any of the values to \c 0ms (or \c off) disables the respective functionality. There's also the \c{--disable-watchdog} command line option that makes your life easier when debugging or testing in a production environment, as it completely disables all watchdog functionality in the System UI as well as in QML applications. \table \header \li Config Key \li Type \li Description \row \li \c eventloop/checkInterval \li duration \li If set to a positive time duration, the main event loop will be monitored by triggering a timer every \c checkInterval. (default: off) \row \li \c eventloop/warnTimeout \li duration \li In case the check timer is not firing within \c warnTimeout, the watchdog will print a warning. In addition another warning will be printed if the timer does eventually fire, stating the exact duration the event loop was blocked. (default: off) \row \li \c eventloop/killTimeout \li duration \li In case the check timer is not firing within \c killTimeout, the watchdog will print a critical warning and then abort the thread running the main event loop. (default: off) \row \li \c quickwindow/checkInterval \li duration \li The render thread monitor works a bit differently to the event loop and Wayland one: Instead of just a single "blocked" state, three different states are monitored: \list \li \c Sync: The time it takes for the render thread to synchronize with the main thread. \li \c Render: The time it takes for the render thread to actually render a frame. \li \c Swap: The time the render thread spends in the graphics driver, swapping buffers. \endlist As a render thread is not always actively rendering, the watchdog will only print a warning every \c checkInterval, if the thread is active and stuck in one of the aforementioned states. This periodic report also contains some statistics on how often the render thread got stuck in each state. (default: off) \row \li \c quickwindow/warnTimeout \li duration \li The watchdog will print a warning if a render thread is stuck in any of the syncing, rendering or swapping states for longer than \c warnTimeout. In addition another warning will be printed if the thread eventually leaves the state it was stuck in, stating the exact duration it was blocked. (default: off) \row \li \c quickwindow/killTimeout \li duration \li In case a render thread is stuck in any of the syncing, rendering or swapping states for longer than \c killTimeout, the watchdog will print a critical warning and then abort the thread. (default: off) \row \li \c wayland/checkInterval \li duration \li If set to a positive time duration, all currently active Wayland clients that use the XDG shell protocol will be pinged every \c checkInterval. (default: off) \row \li \c wayland/warnTimeout \li duration \li In case the pong reply from the Wayland client is not received within \c warnTimeout, the watchdog will print a warning. In addition another warning will be printed if the pong reply is eventually received, stating the exact duration the ping/pong round-trip took. (default: off) \row \li \c wayland/killTimeout \li duration \li In case the pong reply from the Wayland client is not received within \c killTimeout, the watchdog will print a critical warning and then kill the unresponsive Wayland client. For application manager apps, ApplicationObject::stop() with \c forceKill set to \c true will be invoked. Other apps will be killed by raising \c SIGKILL on the process id associated with the Wayland client. (default: off) \endtable */