fishxz
Posts: 71
Joined: Thu Feb 14, 2013 7:38 pm

giving watchdog highest i/o priority

Mon Oct 15, 2018 2:15 pm

hey,
i have a problem with my watchdog. when i cause high i/o load on my pi, the watchdog doesnt seems to be able to write anymore and causing a reboot.

so i wonder, if there is an easy way to give the watchdog the highest priority, to prevent this behavior.
im using the systemd integrated watchdog, but its the same problem with the watchdog package.

epoch1970
Posts: 2079
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: giving watchdog highest i/o priority

Mon Oct 15, 2018 2:34 pm

The answer to the question is to set the process to real-time scheduling (SCHED_RR) with the "chrt" command.
However, the traditional watchdog package has a configuration option "realtime=yes" and I have no doubt the systemd watchdog is also non preemptable.

Maybe the thresholds you set in your watchdog config are crossed in case of heavy activity. So after a while watchdog does reboot your system.

In the traditional watchdog daemon there are, by default, system load thresholds. If you start an I/O intensive operation like a filesystem copy, system load goes up and it is possible to get the machine to reboot itself... Run "uptime" during some operations to see the system load.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

fishxz
Posts: 71
Joined: Thu Feb 14, 2013 7:38 pm

Re: giving watchdog highest i/o priority

Mon Oct 15, 2018 2:50 pm

Maybe the thresholds you set in your watchdog config are crossed in case of heavy activity. So after a while watchdog does reboot your system.
probably, but as far i know 16sec is max? and thats what i have set it to.
In the traditional watchdog daemon there are, by default, system load thresholds. If you start an I/O intensive operation like a filesystem copy, system load goes up and it is possible to get the machine to reboot itself... Run "uptime" during some operations to see the system load.

Code: Select all

16:48:20 up 26 min,  2 users,  load average: 2.15, 0.76, 0.41
thats the peak, so not super high.

im almost sure its the disk, because when i set the i/o scheduler to "cfq", it doesnt seem to happen (im currently testing, so im not entirely sure)... but i heard performancewise this is not the optimal scheduler to use on a sd card.

epoch1970
Posts: 2079
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: giving watchdog highest i/o priority

Mon Oct 15, 2018 3:28 pm

I was alluding to "max-load-1", "max-load-5", "max-load-15" in the linux watchdog daemon. These are system load limits.

The device timeout and polling frequency are something else.
AFAIK Raspbian has the device timeout set to the max value of 16 secs. This is a device module option, but since the device is compiled in the Raspbian kernel you'll need to "modprobe configs" and "zcat /proc/configs | grep DOG" to verify that. read the source code.

So your watchdog program in linux should poll the device every 7 to 10 secs to be safe. If you set a write frequency of 16 secs in watchdog (watchdog-timeout=16 in the linux watchdog daemon) then you have a very good chance of rebooting suddenly. Use watchdog-timeout=10 instead, it should be safe.
I don't know the equivalent settings with the systemd watchdog, sorry.

I've almost never seen a linux watchdog daemon fail to manage a machine, even running on shoddy hardware. Check your configs.
Last edited by epoch1970 on Mon Oct 15, 2018 5:06 pm, edited 1 time in total.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

fishxz
Posts: 71
Joined: Thu Feb 14, 2013 7:38 pm

Re: giving watchdog highest i/o priority

Mon Oct 15, 2018 4:06 pm

I was alluding to "max-load-1", "max-load-5", "max-load-15" in the linux watchdog daemon. These are system load limits.
they are out commented (default) and there is no such thing on systemd watchdog.
AFAIK Raspbian has the device timeout set to the max value of 16 secs. This is a device module option, but since the device is compiled in the Raspbian kernel you'll need to "modprobe configs" and "zcat /proc/configs | grep DOG" to verify that.

Code: Select all

CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
# CONFIG_WATCHDOG_SYSFS is not set
# CONFIG_SOFT_WATCHDOG is not set
CONFIG_GPIO_WATCHDOG=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_ZIIRAVE_WATCHDOG is not set
# CONFIG_ARM_SP805_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_USBPCWATCHDOG is not set
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set
# CONFIG_WQ_WATCHDOG is not set
So your watchdog program in linux should poll the device every 7 to 10 secs to be safe. If you set a write frequency of 16 secs in watchdog (watchdog-timeout=16 in the linux watchdog daemon) then you have a very good chance of rebooting suddenly. Use watchdog-timeout=10 instead, it should be safe.
i tried almost every value there.
I don't know the equivalent settings with the systemd watchdog, sorry.
To make use of the hardware watchdog it is sufficient to set the RuntimeWatchdogSec= option in /etc/systemd/system.conf. It defaults to 0 (i.e. no hardware watchdog use). Set it to a value like 20s and the watchdog is enabled. After 20s of no keep-alive pings the hardware will reset itself. Note that systemd will send a ping to the hardware at half the specified interval, i.e. every 10s. And that's already all there is to it. By enabling this single, simple option you have turned on supervision by the hardware of systemd and the kernel beneath it.
I've almost never seen a linux watchdog daemon fail to manage a machine, even running on shoddy hardware. Check your configs.
i can reproduce this on 2 pi`s. both with a decent sd card.

Return to “Troubleshooting”