Update 1-nov-2017:
After all this time, I finally got a reply from the kernel folks. They have closed my bug reports.
Here is a link to an updated description for Raspbian: viewtopic.php?t=147501&p=972709
Note that in the text it says to activate the watchdog with 20seconds. This does not work on the RPi, and is related to a hardware timer limitation. The maximum is 10 seconds.
If you're interested in the application level support, read on, I have an example using Python.
To get the watchdog started on the RPi, you need to do this:
In /boot/config.txt add/change:
Code: Select all
watchdog=on
Code: Select all
RuntimeWatchdogSec=10s
[Edited]
After realizing that I was mixing two watchdog approaches, one using the watchdog package, and inadvertently also activating the systemd watchdog feature while getting things to work. I decided to completely redo this post. I know, I know, it's confusing, but in order to make it more clear, I have now tried to completely split the two approaches.
In this post I'm showing the steps to make the "traditional" software watchdog package work with the hardware watchdog the RPi provides.
The lowest level of using the BCM hardware watchdog is to watch out for kernel related issues. ie. a hung operating system. It can do more, but let's keep it simple.
I have used the hardware watchdog in the past, but as with so many other things that worked pre-Jessie (and or pre-systemd), they now don't, or require a different approach. Sadly, documentation is sorely lacking or outdated, and the software keeps on changing.
I have been able to make it work, but I still question the validity of my approach, which is a collection of steps I found all over the place. If this is the right way, it seems overly complex and half-baked to me. If my method is not correct, help me to get it right. I'll do my best to publish it for others to use.
Here is what I did on a clean install (on a blanc SD card) of Jessie Lite ( 2016-03-18), and after the usual update/upgrade/dist-upgrade.
Activating the BCM hardware watchdog:
------------------------------------------------
[UPDATE]
The goal post has suddenly been moved again:
After an update/upgrade, we are now at :
The name of the watchdog module has now changed from bcm2708_wdog to bcm2835-wdtLinux raspi-svr 4.4.9+ #884 Fri May 6 17:25:37 BST 2016 armv6l GNU/Linux
While I was searching for the solution to the error message that bcm2708 was missing, I also found out that the activation method has changed from using modprobe to adding "dtparam=watchdog=on" in /boot/config.txt. This probably happened a while ago, although I did not find it in the mostly outdated information available to us.
------------------------------------------------
To activate the watchdog, edit the system config file :
Code: Select all
sudo nano /boot/config.txt
Code: Select all
# activating the hardware watchdog
dtparam=watchdog=on
Code: Select all
pi@raspi-tst:~ $ ls -al /dev/watchdog*
crw------- 1 root root 10, 130 May 19 07:09 /dev/watchdog
crw------- 1 root root 253, 0 May 19 07:09 /dev/watchdog0
Install the software watchdog module:
Code: Select all
sudo apt-get install watchdog
Hmm, very cryptic and certainly not very helpful. It could say "use systemd" to run it instead./run/udev or .udevdb or .udev presence implies active udev. Aborting MAKEDEV invocation.
/run/udev or .udevdb or .udev presence implies active udev. Aborting MAKEDEV invocation.
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
The installation installs scripts in the init.d (systemV) system, as before, but also adds this for systemd support:
Looks prepared and converted for use with systemd.ls -l /lib/systemd/system/ :
watchdog.service
wd_keepalive.service
The previous configuration file used with the modprobe method (/etc/modprobe.d/watchdog.conf) is not longer usable. There is an additional configuration file:
Code: Select all
ls -al /etc/default/watchdog
Code: Select all
# Start watchdog at boot time? 0 or 1
run_watchdog=1
# Start wd_keepalive after stopping watchdog? 0 or 1
run_wd_keepalive=1
# Load module before starting watchdog
watchdog_module="none"
# Specify additional watchdog options here (see manpage).
To set some of the parameters the watchdog daemon should watch :
Code: Select all
nano /etc/watchdog.conf
This is the pesky error message that will turn up if you ask for a status :# this is an optional test by pinging my router
ping=192.168.1.1
max-load-1 = 24
min-memory = 1
watchdog-device = /dev/watchdog
# I added the following command to get rid of a pesky error message
watchdog-timeout=15
This seems to be a default that is set somewhere, which is plain wrong! The maximum timeout for the RPi BMC is 15 seconds. This is a bug, and it let many people astray (google for it), myself included.cannot set timeout 60 (errno = 22 = 'Invalid argument')
Starting watchdog:
Here is where things are different from the systemV way. This no longer works as before.
We'll be using the systemd method to start, stop and ask for a status report.
Code: Select all
sudo systemctl start watchdog
Code: Select all
sudo systemctl status watchdog
You can stop the watchdog withpi@raspi-server:~ $ sudo systemctl status watchdog
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; static)
Active: active (running) since Thu 2016-05-19 21:04:47 CEST; 33min ago
Process: 627 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
Process: 623 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS)
Main PID: 632 (watchdog)
CGroup: /system.slice/watchdog.service
└─632 /usr/sbin/watchdog
May 19 21:04:47 raspi-server watchdog[632]: int=1s realtime=yes sync=no soft=no mla=24 mem=1
May 19 21:04:47 raspi-server watchdog[632]: ping: 192.168.1.1
May 19 21:04:47 raspi-server watchdog[632]: file: no file to check
May 19 21:04:47 raspi-server watchdog[632]: pidfile: no server process to check
May 19 21:04:47 raspi-server watchdog[632]: interface: no interface to check
May 19 21:04:47 raspi-server watchdog[632]: temperature: no sensors to check
May 19 21:04:47 raspi-server watchdog[632]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none to=roo...ce=no
May 19 21:04:47 raspi-server watchdog[632]: watchdog now set to 15 seconds
May 19 21:04:47 raspi-server watchdog[632]: hardware watchdog identity: Broadcom BCM2835 Watchdog timer
May 19 21:04:47 raspi-server systemd[1]: Started watchdog daemon.
Hint: Some lines were ellipsized, use -l to show in full.
Code: Select all
sudo systemctl stop watchdog
I tried it with my two fork bombs (below) and it worked correctly and as advertised. The system reboots in about 15 seconds.
Great, so now that it works, we can install it in the boot sequence now.
The correct systemd way of doing that is to use :
Code: Select all
sudo systemctl enable watchdog
Hmmm, what the heck is the developer trying to tell us and what does that all mean???? I don't have much of a clue! I understand 1,2,3, but that is way to general. However "informative" this is supposed to be, it is no help whatsoever (for me) to point to a solution to get it installed at boot time.Synchronizing state for watchdog.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d watchdog defaults
Executing /usr/sbin/update-rc.d watchdog enable
The unit files have no [Install] section. They are not meant to be enabled using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
.wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, ...).
Notice this though:
Executing /usr/sbin/update-rc.d watchdog defaults
Executing /usr/sbin/update-rc.d watchdog enable
It looks to me that systemd invoked the systemV method. OK, seems plausible, but what about this :
The unit files have no [Install] section. They are not meant to be enabled using systemctl. ???? What exactly does "not meant to be" mean? Not yet or not ever? No clue is given how to do it otherwise, or how to make it work. Is it just a warning and did it work anyway? If you google long enough, as I did, you see a myriad of people having issues, and trying workarounds by changing the watchdog.service file.
But did it work? No!
I tried every which way to get the watchdog daemon started the old (systemV) way, and by using systemd, to no avail. It starts by hand, but does not install after a reboot. You need to make changes to make it work and there is no information that tells you how to do it. That is not very RPi user friendly folks!
To stay with the "traditional" approach as much as possible, and not mess with systemd service files, I tried two well known kludges to make it work. One is to use /etc/rc.local and the other is to use cron.
Code: Select all
sudo nano /etc/rc.local
Code: Select all
printf "Starting software Watchdog"
/usr/sbin/service watchdog start &
exit 0
You can also use cron if you like that better :
Code: Select all
crontab -e
Code: Select all
@reboot sudo /usr/sbin/service watchdog start
I tried this setup with two fork bombs. One using the shell :
Code: Select all
#!/bin/bash
echo "Starting shell fork bomb"
# prevent swapping to the SD card!
sudo systemctl stop dphys-swapfile.service
# start the bomb
: (){ :|:& };:
Code: Select all
#!/usr/bin/python
#-------------------------------------------------------------------------------
# Name: fork.bomb
# Purpose:
#
# Author: paulv
#
# Created: 09-05-2016
# Copyright: (c) paulv 2016
# Licence: <your licence>
#-------------------------------------------------------------------------------
import os
import subprocess
def main():
print "fork bomb starting"
# prevent swapping to the SD card!
subprocess.call(['sudo systemctl stop dphys-swapfile.service'], shell=True, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while True:
os.fork()
if __name__ == '__main__':
main()
The Pi rebooted after about 15 seconds.
So as a minimum, we have a working system again.
If however, you have a better method, please chime in!
In the follow-up posts I will show how I got the systemd software watchdog to work, and also how to add extra support for your own application by using the systemd framework.