Check system page size using sysconf()
Minor updates now applied that deal with the code review above.
Use meaningful name for dry fork test
Check process name has minimum for any absolute path
Remove unnecessary NULL test
Thanks for the feedback. That is just the zero-length process name used as a dummy, I can change it to reserve some other name to make more sense. True, it was overlooked as part of playing with your patch. Strictly speaking there is still a zombie process after the forking exists but now there is a reaping call 50ms after the main loop has run (line 539 0f watchdog.c) so most short-lived tests are only present for around 1/20 of the time as zombies. Usually I suggest folks have a simple bash script...
Hi! Right, I have tested your improved patch. It's better since it doesn't leave a zombie behind until the next watchdog loop, so that's of course nice. Thanks! I have some minor nitpicks though, if you care: The verbose output could be improved from the current watchdog: test binary returned 0 = 'no error' The new if -statement in check_bin() pointlessly rechecks if tbinary == NULL, which is impossible at that point, and the code relies on that fact already in check_processes() It could be time...
Start-up hold-off time
Seems the work-around is acceptable so closing this.
Feature request: handle PID file creation race condition
New behaviour should have fix this and no feedback to say otherwise so closing.
Thanks for that link. Yes that is an unusual bug, most bugs stop thing executing, rather than having them continue when all should die! Hopefully Peter Rosin will respond with related testing. Regards, Paul
In case you're curious why the daemon survived a kernel panic, that was due to a bug in ARM kernels that's discussed here: https://lore.kernel.org/all/BX1W47JXPMR8.58IYW53H6M5N@dragonstone/
Restore periodic fork test mentioned in the documentation
I have implemented the requested functionality as commit [69b46c] in a slightly different manner. Basically it now has a dummy test added if none are configured, and that simply forks and returns. Can you test this to verify that it solves the specific bug you reported? I am very surprised that panicking the kernel via sysrq-trigger did not stop daemon execution and so lead to the watchdog hardware timing-out and rebooting the system! Regards, Paul
Restore periodic fork test
See also: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033246 Cheers, Peter
Restore periodic fork test mentioned in the documentation
Thinking about this. While it is true that some systems might be recoverable from out of memory, using the watchdog repair to do so suggests known processes with bad behaviour that might better be fixed or constrained by ulimit or systemd's cgroups capability.
Network interface counter overflows on busy systems
Patch applied.
Use MemAvailable for memory calculations, if available
The supplied patch seemed to to actually read 'FREEAVAILMEM' anywhere! That has been fixed and the rest of the logic applied.
Use bootclk
Applied with added check that CLOCK_BOOTTIME is defined.
Do not guard sys/quota.h sys/swap.h and sys/reboot.h with __GLIBC__
Now applied
Do not guard shutdown __GLIBC__
Fix network interface counter overflow
Use MemAvailble where supported for free memory check
Minor - spelling corrections
Use BOOTCLK which is ticking when suspend
Network interface counter overflows on busy systems
(Forgot to add attachement)
Make ENOMEM handleable by wd-repair
Use MemAvailable for memory calculations, if available
Use bootclk
Do not guard sys/quota.h sys/swap.h and sys/reboot.h with __GLIBC__
Fix memory leak in realloc()
Minor tidy
Fix a potential buffer overflow
Remove the need to root in build process
Remove double clearing of variables.
Updated copyright notice
Mark #958602 as closed, the dependency was removed already in a prior commit.
Pepare for upload new release.
Updated Portuguese translation
Updated to latest Debian versions.
It's time to release a new version.
Removed ancient cruft.
Treat no-memory error as irreparable
Separate error-timer for allocatable memory
Compute usable memory from free+buffers+cache
Use long integers for kB values
Move string parse to function call
Update documentation on memory testing
Allow enough space to read all of /proc/meminfo
Add separate test of swap use
Add SIGTERM delay to config file
Add config option to ignore watchdog errors
Set minimum refresh time to 0.2s
Update example watchdog.conf file
Add '-lrt' to other programs
Merge branch 'master' of ssh://git.code.sf.net/p/watchdog/code
Add '-lrt' to other programs
Brackets for less ambiguous code
Remove delay before main loops
Enforce minimum watchdog refresh time
Remove unused 'safe' functions
Depreciate use of usleep()
Apply a short wait before main loop
Use same timing loop for wd_keepalive
Use local variable 'err' for errno copy
Fix 'Ignoring invalid option ... realtime=yes' message
Simplify parsing code
Simplify network ping code
Add comment on function purpose
Cosmetic - fix code indent
Move file change code to function
Move config file parsing to function
Merge branch 'master' of ssh://git.code.sf.net/p/watchdog/code
Add bracket for code future-safety
Indent start-up information messages
Revert patch [257654] as RPC no longer needed
Update TODO list
Fix stale PID for watchdog service
Run sync() for generated PID kill lists
Cosmetic change (consistent brackets)
Make verbosity an integer
Replace debug compile-option with run-time option
Report dump file closure
Does not compile on RHEL 6 or 7
This seems to have been fixed some time ago - closing this bug.
File stat functionality is unresistant against system clock changes
Closing now that Andrey Mazo's patch is working OK.
Fix build issues found when building with musl C library on linux
Closing as fixed some time ago.
watchdog.c fails to compile with musl
Closing as fixed some time ago.