Is this the correct way to install the watchdog package?


23 posts
by paulv » Mon May 09, 2016 9:52 am
I'm working on a post to show various ways to make sure applications or the kernel runs through issues. Required when you run a server application, security camera or network related devices.

[Edited]
After realizing that I was mixing two watchdog approaches, one using the watchdog package, and inadvertently also activating the systemd watchdog feature while getting things to work. I decided to completely redo this post. I know, I know, it's confusing, but in order to make it more clear, I have now tried to completely split the two approaches.

In this post I'm showing the steps to make the "traditional" software watchdog package work with the hardware watchdog the RPi provides.

The lowest level of using the BCM hardware watchdog is to watch out for kernel related issues. ie. a hung operating system. It can do more, but let's keep it simple.

I have used the hardware watchdog in the past, but as with so many other things that worked pre-Jessie (and or pre-systemd), they now don't, or require a different approach. Sadly, documentation is sorely lacking or outdated, and the software keeps on changing.

I have been able to make it work, but I still question the validity of my approach, which is a collection of steps I found all over the place. If this is the right way, it seems overly complex and half-baked to me. If my method is not correct, help me to get it right. I'll do my best to publish it for others to use.

Here is what I did on a clean install (on a blanc SD card) of Jessie Lite ( 2016-03-18), and after the usual update/upgrade/dist-upgrade.

Activating the BCM hardware watchdog:
------------------------------------------------
[UPDATE]
The goal post has suddenly been moved again:
After an update/upgrade, we are now at :
Linux raspi-svr 4.4.9+ #884 Fri May 6 17:25:37 BST 2016 armv6l GNU/Linux

The name of the watchdog module has now changed from bcm2708_wdog to bcm2835-wdt
While I was searching for the solution to the error message that bcm2708 was missing, I also found out that the activation method has changed from using modprobe to adding "dtparam=watchdog=on" in /boot/config.txt. This probably happened a while ago, although I did not find it in the mostly outdated information available to us.
------------------------------------------------

To activate the watchdog, edit the system config file :
Code: Select all
sudo nano /boot/config.txt
And add this at the end of the file :
Code: Select all
# activating the hardware watchdog
dtparam=watchdog=on
At this point you need to reboot the RPi to activate the hardware watchdog. After the reboot you can check that there are two devices called watchdog in /dev :
Code: Select all
pi@raspi-tst:~ $ ls -al /dev/watchdog*
crw------- 1 root root  10, 130 May 19 07:09 /dev/watchdog
crw------- 1 root root 253,   0 May 19 07:09 /dev/watchdog0
I was not able to use watchdog0, but they seemed linked. If you use one of the watchdog's, trying to use the other one gives you a permission error. Bottom line, there was and is only one hardware watchdog provided.

Install the software watchdog module:
Code: Select all
sudo apt-get install watchdog
This installation produces a few cryptic lines:
/run/udev or .udevdb or .udev presence implies active udev. Aborting MAKEDEV invocation.
/run/udev or .udevdb or .udev presence implies active udev. Aborting MAKEDEV invocation.
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Hmm, very cryptic and certainly not very helpful. It could say "use systemd" to run it instead.

The installation installs scripts in the init.d (systemV) system, as before, but also adds this for systemd support:
ls -l /lib/systemd/system/ :
watchdog.service
wd_keepalive.service
Looks prepared and converted for use with systemd.

The previous configuration file used with the modprobe method (/etc/modprobe.d/watchdog.conf) is not longer usable. There is an additional configuration file:
Code: Select all
ls -al /etc/default/watchdog
It contains the following :
Code: Select all
# Start watchdog at boot time? 0 or 1
run_watchdog=1
# Start wd_keepalive after stopping watchdog? 0 or 1
run_wd_keepalive=1
# Load module before starting watchdog
watchdog_module="none"
# Specify additional watchdog options here (see manpage).
Where is says to see the "manpage", I assume man watchdog.conf is meant. Unfortunately I was unable to get the several parameters I tried, to work, so I left this file alone.

To set some of the parameters the watchdog daemon should watch :
Code: Select all
nano /etc/watchdog.conf
For the fork bomb test (below) I took away the "#" marks from the following lines:
# this is an optional test by pinging my router
ping=192.168.1.1
max-load-1 = 24
min-memory = 1
watchdog-device = /dev/watchdog

# I added the following command to get rid of a pesky error message
watchdog-timeout=15
This is the pesky error message that will turn up if you ask for a status :
cannot set timeout 60 (errno = 22 = 'Invalid argument')
This seems to be a default that is set somewhere, which is plain wrong! The maximum timeout for the RPi BMC is 15 seconds. This is a bug, and it let many people astray (google for it), myself included.

Starting watchdog:
Here is where things are different from the systemV way. This no longer works as before.

We'll be using the systemd method to start, stop and ask for a status report.
Code: Select all
sudo systemctl start watchdog
Now ask for a status report
Code: Select all
sudo systemctl status watchdog
You should get something like this:
pi@raspi-server:~ $ sudo systemctl status watchdog
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; static)
Active: active (running) since Thu 2016-05-19 21:04:47 CEST; 33min ago
Process: 627 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
Process: 623 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS)
Main PID: 632 (watchdog)
CGroup: /system.slice/watchdog.service
└─632 /usr/sbin/watchdog

May 19 21:04:47 raspi-server watchdog[632]: int=1s realtime=yes sync=no soft=no mla=24 mem=1
May 19 21:04:47 raspi-server watchdog[632]: ping: 192.168.1.1
May 19 21:04:47 raspi-server watchdog[632]: file: no file to check
May 19 21:04:47 raspi-server watchdog[632]: pidfile: no server process to check
May 19 21:04:47 raspi-server watchdog[632]: interface: no interface to check
May 19 21:04:47 raspi-server watchdog[632]: temperature: no sensors to check
May 19 21:04:47 raspi-server watchdog[632]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none to=roo...ce=no
May 19 21:04:47 raspi-server watchdog[632]: watchdog now set to 15 seconds
May 19 21:04:47 raspi-server watchdog[632]: hardware watchdog identity: Broadcom BCM2835 Watchdog timer
May 19 21:04:47 raspi-server systemd[1]: Started watchdog daemon.
Hint: Some lines were ellipsized, use -l to show in full.
You can stop the watchdog with
Code: Select all
sudo systemctl stop watchdog
And change the configuration parameters in the configuration file to test it again.

I tried it with my two fork bombs (below) and it worked correctly and as advertised. The system reboots in about 15 seconds.

Great, so now that it works, we can install it in the boot sequence now.
The correct systemd way of doing that is to use :
Code: Select all
sudo systemctl enable watchdog
Watch this:
Synchronizing state for watchdog.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d watchdog defaults
Executing /usr/sbin/update-rc.d watchdog enable
The unit files have no [Install] section. They are not meant to be enabled using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
.wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, ...).
Hmmm, what the heck is the developer trying to tell us and what does that all mean???? I don't have much of a clue! I understand 1,2,3, but that is way to general. However "informative" this is supposed to be, it is no help whatsoever (for me) to point to a solution to get it installed at boot time.

Notice this though:
Executing /usr/sbin/update-rc.d watchdog defaults
Executing /usr/sbin/update-rc.d watchdog enable

It looks to me that systemd invoked the systemV method. OK, seems plausible, but what about this :
The unit files have no [Install] section. They are not meant to be enabled using systemctl. ???? What exactly does "not meant to be" mean? Not yet or not ever? No clue is given how to do it otherwise, or how to make it work. Is it just a warning and did it work anyway? If you google long enough, as I did, you see a myriad of people having issues, and trying workarounds by changing the watchdog.service file.

But did it work? No!
I tried every which way to get the watchdog daemon started the old (systemV) way, and by using systemd, to no avail. It starts by hand, but does not install after a reboot. You need to make changes to make it work and there is no information that tells you how to do it. That is not very RPi user friendly folks!

To stay with the "traditional" approach as much as possible, and not mess with systemd service files, I tried two well known kludges to make it work. One is to use /etc/rc.local and the other is to use cron.
Code: Select all
sudo nano /etc/rc.local
Add these two lines just before the exit 0 like this :
Code: Select all
printf "Starting software Watchdog"
/usr/sbin/service watchdog start &

exit 0
Notice the "&" to put it in the background. If you don't, the console will hang.
You can also use cron if you like that better :
Code: Select all
crontab -e
And add this :
Code: Select all
@reboot sudo /usr/sbin/service watchdog start

With either of those methods, the watchdog package is installed at boot time. You can verify that with "dmesg | grep watchdog" or use "cat /var/log/syslog | grep watchdog"

I tried this setup with two fork bombs. One using the shell :
Code: Select all
#!/bin/bash
echo "Starting shell fork bomb"
# prevent swapping to the SD card!
sudo systemctl stop dphys-swapfile.service
# start the bomb
: (){ :|:& };:
And one using Python :
Code: Select all
#!/usr/bin/python
#-------------------------------------------------------------------------------
# Name:        fork.bomb
# Purpose:
#
# Author:      paulv
#
# Created:     09-05-2016
# Copyright:   (c) paulv 2016
# Licence:     <your licence>
#-------------------------------------------------------------------------------

import os
import subprocess

def main():
    print "fork bomb starting"
    # prevent swapping to the SD card!
    subprocess.call(['sudo systemctl stop dphys-swapfile.service'], shell=True, \
        stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    while True:
        os.fork()

if __name__ == '__main__':
    main()
Both worked fine.
The Pi rebooted after about 15 seconds.

So as a minimum, we have a working system again.

If however, you have a better method, please chime in!

In the follow-up posts I will show how I got the systemd software watchdog to work, and also how to add extra support for your own application by using the systemd framework.
Last edited by paulv on Thu May 19, 2016 8:19 pm, edited 9 times in total.
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by paulv » Thu May 12, 2016 8:50 am
[updated]
Now that Jessie embraces the systemd system, let's see what else that can bring to the ball game.

Systemd has a philosophy behind protection against hang-ups embedded in the system. It supports a, rather simple, software watchdog system. It does not have all the bells and whistles of the watchdog package I used above, but it is sufficient in many cases.

Let's set that up on a clean Jessie installation.
(if you used the watchdog package on this installation before, you need to disable the activation or remove that package all together : sudo apt-get remove watchdog. You can only run one or the other, not both) The reason is that the hardware watchdog device is single user only and you'll get access issues

Activate the systemd software watchdog:

NOTE
When you activate the systemd (software) watchdog, it automatically activates the BCM hardware watchdog. There is no need to activate it in the /boot/config.txt file. It does not seem to harm if you have already activated it that way though.

Follow these steps to activate and setup the software watchdog.
Code: Select all
sudo nano /etc/systemd/system.conf
And change these two lines from:
Code: Select all
#RuntimeWatchdogSec=20
#ShutdownWatchdogSec=10min
To :
Code: Select all
RuntimeWatchdogSec=10
ShutdownWatchdogSec=10min
The first line is all that is needed to activate the watchdog, I also decided to activate another safety measure as well. The default Runtime value of 20 (seconds) is halved and so the hardware watchdog device is actually "pinged" every 10 seconds. For this test I used 10 seconds, so the ping is every 5 seconds and if there is no activity within a 10 second period, the RPI is rebooted. Note that the default of 20 seconds is wrong (I think this is a bug) for the RPi. [NOTE: this has been fixed, the default now is 0]
The maximum is 15 seconds, and is set by the SOC counter. If you leave the original setting, you'll get a rather cryptic error message. The Shutdown setting is used to force a reboot if the shutdown process takes longer than 10 minutes (default), to prevent a hang in that process.

So all you need to do to start the systemd watchdog is to activate the "RuntimeWatchdogSec" setting, and optionally activate the "ShutdownWatchdogSec" option. That's all there is to it.

After setting this up, reboot the RPi to make the watchdog settings active.

First check the activation of the hardware watchdog by running:
Code: Select all
cat /var/log/syslog | grep watchdog
It should return something like this :
May 20 08:33:43 raspi-server kernel: [ 10.362509] bcm2835-wdt 20100000.watchdog: Broadcom BCM2835 watchdog timer
May 20 08:33:43 raspi-server systemd[1]: Hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0
May 20 08:33:43 raspi-server systemd[1]: Set hardware watchdog to 10s.
The first two lines show that the hardware watchdog device is activated. The third line shows that the software watchdog loaded our 10 sec. parameter.

A simple test to see if this protection works is by issuing a
Code: Select all
sudo poweroff
When you have a console attached to the RPi, you can see what is going on. The Pi will go through the shutdown process, and seems to crash with a Kernel panic, but that is normal behavior. The last message on the console will tell you that it will reboot in 10 seconds, and so it does.

Now use one of the fork-bombs from the previous post to verify that the system works when there is a lack of response (no more pinging of the watchdog).
My RPi [Model (1) B] rebooted in about 15 seconds, unfortunately, there is not much showing on the console this time, other than some causes of the fork bomb.

So, with these simple steps we can add a minimum level of protection for systemd level hang-ups and kernel panics, to address what I call the "lowest level" protection.

When you have installed this method, how do you now cut the power to the Pi? It won't halt anymore either (sudo halt). Well, you have 10 seconds to pull the power plug at the end of the shutdown procedure.

As you will have realized by now, this method is perfect for embedded applications, servers, security camera's etc., to keep them running at all time, or reboot if they don't. Not so much for desktop applications, where you have much more control through a keyboard and console.

Although this does not offer the same variety of protection options the watchdog package offers, but wait, there's more that systemd has to offer.

The next step is to configure protection for the next level "up", the user application level, (or services in systemd speak) by using the systemd software watchdog interface system. I'll get to that in a next post.

Please chime in if you feel you can contribute.
Last edited by paulv on Mon Jan 09, 2017 10:19 am, edited 11 times in total.
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by paulv » Fri May 13, 2016 7:59 am
This is a follow-up to the first post.
There is a third solution to get the software watchdog package installed across a reboot.

I wanted to address that separate from the /etc/rc.local and crontab solution in the first post, because this solution makes a change to the package installation that may have consequences I cannot oversee.

In the first post I showed that the software watchdog refuses to install in the boot process by following the old and proven installation routine.

If you try to install the software watchdog package into the boot sequence the systemd way, with
Code: Select all
sudo systemctl enable watchdog
you get this :
Synchronizing state for watchdog.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d watchdog defaults
Executing /usr/sbin/update-rc.d watchdog enable
The unit files have no [Install] section. They are not meant to be enabled
using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
.wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, ...).
I think this is confusing and not very helpful information for the typical RPi users. The last three items on this list deal with the absence of the [Install] section of the service file, but that is very general information and not much help for what we want to do.

Here is what is in the watchdog.service file that is located in /lib/systemd/system/
[Unit]
Description=watchdog daemon
Conflicts=wd_keepalive.service
After=multi-user.target
OnFailure=wd_keepalive.service

[Service]
Type=forking
EnvironmentFile=/etc/default/watchdog
ExecStartPre=/bin/sh -c '[ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module
ExecStart=/bin/sh -c '[ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options'
ExecStopPost=/bin/sh -c '[ $run_wd_keepalive != 1 ] || false'

[Install]
So the enable complaint "The unit files have no [Install] section. They are not meant to be enabled using systemctl." is factually incorrect, because there is an Install section, but it's empty. The statement "They are not meant to be enabled using systemctl" is misleading, and the rest is very general, but not helpful. In any case, the net-net is that systemd does not know when to install it in the boot sequence. It would be very helpful for the typical RPi audience if the text could be changed to give a pointer on what to do to make it active. Or by a comment in the .service file.

To help systemd to install the software watchdog in the boot process (enable it), all we need to do is to tell it when we want to have it installed. The most simple method is to wait until the system switches to the multi-user level, which is pretty much at the end of the boot process. If you look in the [Unit] section, you see that there is another hint that this is indeed the correct point in time:
After=multi-user.target

This is how you do that:

1. Quick and Dirty Method:
Modify the original file.
Code: Select all
sudo nano /lib/systemd/system/watchdog.service
And add this line after the [Install] section :
Code: Select all
WantedBy=multi-user.target
That's all. However, you may loose this setting when you do an update/upgrade.

2. Preserve the Settings:
First copy the file from the original location to a place where user defined service files are to be located.
Code: Select all
sudo cp /lib/systemd/system/watchdog.service /etc/systemd/system/
And make the change in this file. If systemd finds a file in this directory it is loaded instead of the original service file.

3. The Recommended Systemd Method
Create a so called "drop in" file that only adds or changes the settings you want. To create this drop-in file location, do this:
Code: Select all
sudo mkdir /etc/systemd/system/watchdog.service.d
Then create a config file with only the changed or new parameters, in our case only the install section.
Code: Select all
sudo nano /etc/systemd/system/watchdog.service.d/local.conf
Add this to the open editor:
Code: Select all
[Install]
WantedBy=multi-user.target
Save and close the editor.

Try that by updating the daemon setup files:
Code: Select all
sudo systemctl daemon-reload
And then
Code: Select all
sudo systemctl enable watchdog
Now you only get this :
Synchronizing state for watchdog.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d watchdog defaults
insserv: warning: current start runlevel(s) (empty) of script `watchdog' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `watchdog' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d watchdog enable
Let's ask for a status report:
Code: Select all
sudo systemctl status watchdog
And you get this :
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled)
Active: inactive (dead)
OK, ready for action. Let's start it now to test it.
Code: Select all
sudo systemctl start watchdog
And ask for a status report again
Code: Select all
sudo systemctl -l status watchdog
(note the -l flag to avoid truncation) Now we get this :
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled)
Active: active (running) since Fri 2016-05-20 09:33:02 CEST; 1min 14s ago
Process: 705 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
Process: 702 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS)
Main PID: 707 (watchdog)
CGroup: /system.slice/watchdog.service
└─707 /usr/sbin/watchdog

May 20 09:33:02 raspi-server watchdog[707]: int=1s realtime=yes sync=no soft=no mla=24 mem=1
May 20 09:33:02 raspi-server watchdog[707]: ping: 192.168.1.1
May 20 09:33:02 raspi-server watchdog[707]: file: no file to check
May 20 09:33:02 raspi-server watchdog[707]: pidfile: no server process to check
May 20 09:33:02 raspi-server watchdog[707]: interface: no interface to check
May 20 09:33:02 raspi-server watchdog[707]: temperature: no sensors to check
May 20 09:33:02 raspi-server watchdog[707]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none to=root no_act=no force=no
May 20 09:33:02 raspi-server watchdog[707]: watchdog now set to 15 seconds
May 20 09:33:02 raspi-server watchdog[707]: hardware watchdog identity: Broadcom BCM2835 Watchdog timer
May 20 09:33:02 raspi-server systemd[1]: Started watchdog daemon.
Ok, while we're at it, try the fork bomb to force a reboot.

After the reboot, check the status again to see if it indeed loaded and is active:
Code: Select all
sudo systemctl status watchdog

So although it seems to be working, I'm not sure if this is the right way or the intended way to activate the software watchdog package. It seems overly vague and even misleading. Why does the watchdog.service file not have a complete [Install] section? That would really solve the problems, so is this a bug (oversight) or a feature? Or do we have it completely wrong?

If you know more about the "correct" way to install the software watchdog, please chime in.
Last edited by paulv on Sun May 29, 2016 4:43 am, edited 3 times in total.
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by paulv » Fri May 13, 2016 2:25 pm
If you want to use the systemd method of using a software watchdog to add control to your own application program, you can use the following method to implement that.

As I showed in an earlier post, you use the hardware BMC watchdog system to reboot the RPi when the kernel gets unresponsive, or when systemd is no longer operational.

A higher level of control can be added by a software watchdog. Systemd provides that, plus an interface to implement that.
The combination of the two provide the Supervisor chain (in systemd speak).

Ok, so what do you need to do.
There are two steps.

1. You need to provide a service configuration file for systemd to instruct it what to do.
2. You need to add a few things to your own application to make it all work in this environment.

In essence, you ask systemd to initiate a software watchdog, and your application needs to "ping" it at regular intervals. If the application fails to do that, systemd will take action.

I wrote a service file that will let you test a number of elements.
Code: Select all
# This service installs a python test program that allows us to test the
# systemd software watchdog. This watchdog can be used to protect from hangups.
# On top of that, when the service crashes, it is automatically restarted.
# If it crashes too many times, it will be forced to fail, or you can let systemd reboot
#

[Unit]
Description=Installing Python test script for a systemd s/w watchdog
Requires=basic.target
After=multi-user.target

[Service]
Type=notify
WatchdogSec=10s
ExecStart=/usr/bin/python /home/pi/systemd-test.py
Restart=always

# The number of times the service is restarted within a time period can be set
# If that condition is met, the RPi can be rebooted
#
StartLimitBurst=4
StartLimitInterval=180s
# actions can be none|reboot|reboot-force|reboot-immidiate
StartLimitAction=none

# The following are defined the /etc/systemd/system.conf file and are
# global for all services
#
#DefaultTimeoutStartSec=90s
#DefaultTimeoutStopSec=90s
#
# They can also be set on a per process here:
# if they are not defined here, they fall back to the system.conf values
TimeoutStartSec=2s
TimeoutStopSec=2s

[Install]
WantedBy=multi-user.target
Details can be found if you look for systemd.service(5)
I also wrote a Python script that lets you play with this system and experiment to you hearts delight.
Code: Select all
#!/usr/bin/python2.7
#-------------------------------------------------------------------------------
# Name:        systemd daemon & watchdog test file
# Purpose:
#
# Author:      paulv
#
# Created:     07-05-2016
# Copyright:   (c) paulv 2016
# Licence:     <your licence>
#-------------------------------------------------------------------------------

import sys
import os
from time import sleep
import signal
import subprocess
import socket

init = True

def sd_notify(unset_environment, s_cmd):

    """
    Notify service manager about start-up completion and to kick the watchdog.

    https://github.com/kirelagin/pysystemd-daemon/blob/master/sddaemon/__init__.py

    This is a reimplementation of systemd's reference sd_notify().
    sd_notify() should be used to notify the systemd manager about the
    completion of the initialization of the application program.
    It is also used to send watchdog ping information.

    """
    global init

    sock = None

    try:
        if not s_cmd:
            sys.stderr.write("error : missing s_cmd\n")
            return(1)

        s_adr = os.environ.get('NOTIFY_SOCKET', None)
        if init : # report this only one time
            sys.stderr.write("Notify socket = " + str(s_adr) + "\n")
            # this will normally return : /run/systemd/notify
            init = False

        if not s_adr:
            sys.stderr.write("error : missing socket\n")
            return(1)

        sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
        sock.sendto(s_cmd, s_adr)
        # sendto() returns number of bytes send
        # in the original code, the return was tested against > 0 ???
        if sock.sendto(s_cmd, s_adr) == 0:
            sys.stderr.write("error : incorrect sock.sendto  return value\n")
            return(1)
    except e:
        pass
    finally:
        # terminate the socket connection
        if sock:
            sock.close()
        if unset_environment:
            if 'NOTIFY_SOCKET' in os.environ:
                del os.environ['NOTIFY_SOCKET']
    return(0) # OK


def sig_handler (signum=None, frame = None):
    """
    This function will catch the most important system signals, but NOT a shutdown!
    During testing, you can use this code to see what termination methods are used or filter
    some out.

    This handler catches the following signals from the OS:
        SIGHUB = (1) SSH Terminal logout
        SIGINT = (2) Ctrl-C
        SIGQUIT = (3) ctrl-\
        IOerror = (5) when terminating the SSH connection (input/output error)
        SIGTERM = (15) Deamon terminate (deamon --stop): is coming from deamon manager
    However, it cannot catch SIGKILL = (9), the kill -9 or the shutdown procedure
    """

    try:
        print "\nSignal handler called with signal : {0}".format(signum)
        if signum == 1 :
            sys.stderr.write("Sighandler: ignoring SIGHUB signal : " + str(signum) + "\n")
            return # ignore SSH logout termination
        sys.stderr.write("terminating : python test script\n")
        sys.exit(1)

    except Exception as e: # IOerror 005 when terminating the SSH connection
        sys.stderr.write("Unexpected Exception in sig_handler() : "+ str(e) + "\n")
        subprocess.call(['logger "Unexpected Exception in sig_handler()"'], shell=True)
        return



def main():

    # setup a catch for the following termination signals: (signal.SIGINT = ctrl-c)
    for sig in (signal.SIGTERM, signal.SIGINT, signal.SIGHUP, signal.SIGQUIT):
        signal.signal(sig, sig_handler)

    # get the timeout period from the systemd-test.service file
    wd_usec = os.environ.get('WATCHDOG_USEC', None)
    if wd_usec == None or wd_usec == 0:
        sys.stderr.write("terminating : incorrect watchdog interval sequence\n")
        exit(1)

    wd_usec = int(wd_usec)
    # use half the time-out value in seconds for the kick-the-dog routine to
    # account for Linux housekeeping chores
    wd_kick = wd_usec / 1000000 / 2
    sys.stderr.write("watchdog kick interval = " + str(wd_kick) + "\n")

    try:
        sys.stderr.write("starting : python daemon watchdog and fail test script started\n")
        # notify systemd that we've started
        retval = sd_notify(0, "READY=1")
        if retval <> 0:
            sys.stderr.write("terminating : fatal sd_notify() error for script start\n")
            exit(1)

        # after the init, ping the watchdog and check for errors
        retval = sd_notify(0, "WATCHDOG=1")
        if retval <> 0:
            sys.stderr.write("terminating : fatal sd_notify() error for watchdog ping\n")
            exit(1)

        ctr = 0 # setup a counter to initiate a watchdog fail
        while True :
            if ctr > 5 :
                sys.stderr.write("forcing watchdog fail, restarting service\n")
                sleep(20)

            sleep(wd_kick)
            sys.stderr.write("kicking the watchdog : ctr = " + str(ctr) + "\n")
            sd_notify(0, "WATCHDOG=1")
            ctr += 1


    except KeyboardInterrupt:
        print "\nTerminating by Ctrl-C"
        exit(0)


if __name__ == '__main__':
    main()
The comments should give you an idea of what is needed. In a nutshell, the application needs to signal systemd that it has finished the initialization. At regular intervals, the software watchdog is updated. There is a fail condition in the code that will mimic a hung application.

Here is how you install and test this all.
Open an editor:
Code: Select all
nano systemd-test.service
Copy and paste the service code above into the editor. Save the file and close the editor. Copy this file into the systemd structure with :
Code: Select all
sudo cp systemd-test.service /etc/systemd/system

Open an editor again:
Code: Select all
nano systemd-test.py
Copy and paste the Python code above into the editor. Save the file and close the editor. Make the python script executable :
Code: Select all
chmod +x systemd-test.py
Run the service script in the systemd environment :
Code: Select all
sudo systemctl start systemd-test
Watch what is going on with
Code: Select all
tail -f /var/log/syslog

After 4 failures and automatic restarts of the python script, systemd declares it a failed state. You can also let the RPi reboot when this happens and all you need to do is to change StartLimitAction=none to StartLimitAction=reboot in the systemd-test.service file.

If you would like to test the application within the boot process, run this :
Code: Select all
sudo systemctl enable systemd-test
After a reboot, you can again watch it all by using the above tail command again.
If you decide to change the Python script, you can do that while the system is running. At the next restart, the new code is automatically loaded and executed. If you want to change parameters in the .service file, you can do that too, but you need to activate and reload those changes. You do that with
Code: Select all
sudo systemctl daemon-reload
and then
Code: Select all
sudo systemctl restart systemd-test
I had great fun to discover all the possibilities systemd now offers me to add better control to my own scripts.

Please chime in if you have improvements or suggestions!

Enjoy!
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by paulv » Wed May 18, 2016 7:55 pm
I have not received any comments or replies to this post, nor on a request on stackoverflow.
http://stackoverflow.com/questions/37144547/setup-issues-using-the-hw-watchdog-with-systemd

I filed a bug report hoping that will lead to a solution or clarification.
https://bugs.launchpad.net/raspbian/+bug/1582707
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by sskn » Sun Jun 05, 2016 5:20 am
Thanks a lot!
I successfully activated the watchdog timer with simple way (2nd post).
I was confused because I couldn't install and activate the watchdog package using the way for old version of raspbian.
Posts: 1
Joined: Sun Jun 05, 2016 5:14 am
by ejolson » Mon Jun 06, 2016 8:08 am
paulv wrote:I have not received any comments or replies to this post, nor on a request on stackoverflow.
http://stackoverflow.com/questions/37144547/setup-issues-using-the-hw-watchdog-with-systemd

I filed a bug report hoping that will lead to a solution or clarification.
https://bugs.launchpad.net/raspbian/+bug/1582707
This is a very interesting post. While reading the first part one starts to sympathize with the graybeards who who claim systemd is trying to take over the world and ruin it. The second post then points out how nicely systemd directly interfaces with the watchdog hardware.

Eric Raymond said, "...it looks like Plan 9 failed simply because it fell short of being a compelling enough improvement on Unix to displace its ancestor." At this point in time systemd has replaced the System V init deamon in almost all major Linux distributions. Hopefully the resulting improvement will be worth the hardships caused.
Posts: 1000
Joined: Tue Mar 18, 2014 11:47 am
by torekk » Tue Jun 07, 2016 6:09 pm
Thank you for this.

However I noticed that when I shutdown the Pi manually, it doesn't reboots itself after 10 seconds? Or is that what the Shutdown setting is for, seeing as that's set so 10minutes?

Anyways that's actually what I wanted, watchdog to reboot the Pi once it freezes, but not if I shut it down manually.
User avatar
Posts: 12
Joined: Mon May 16, 2016 1:07 am
by onefastt997 » Sat Sep 03, 2016 1:53 am
Thanks for this. It doesn't power back up when I do sudo poweroff but it definitely recovers from the fork bombs.
Posts: 1
Joined: Sat Sep 03, 2016 1:51 am
by Samweis » Mon Sep 26, 2016 8:21 am
Thanks for this!

For whatever reason the file /etc/systemd/system/watchdog.service.d/local.conf did not work for me,
but
Code: Select all
ln -s /lib/systemd/system/watchdog.service /etc/systemd/system/multi-user.target.wants/

did the trick.
Posts: 1
Joined: Mon Sep 26, 2016 8:07 am
by paulv » Mon Sep 26, 2016 5:21 pm
Hi Samweis,

This is most likely the appropriate way of getting the watchdog to behave with systemd.
I found that out using another package, so this is probably the way to go.

Thanks for contributing!

Paul
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by shalvan » Sun Oct 02, 2016 5:25 am
Hello there,

I have a problem witch watchdog on Raspberry pi b+ 512mb with fresh jessie install.
I did everything like You, but when i run command:

Code: Select all
 sudo systemctl start watchdog


nothing happens

When i was using Wheezy watchdog worked like charm, and now it donst :(
Posts: 4
Joined: Sat Oct 01, 2016 3:11 pm
by paulv » Sun Oct 02, 2016 12:50 pm
shalvan,

You're not very specific.
What does the status report tell you? Can you post that?
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by shalvan » Sun Oct 02, 2016 1:02 pm
this is my status:

Code: Select all
pi@RasPi:~ $ sudo systemctl status watchdog
● watchdog.service - watchdog daemon
   Loaded: loaded (/lib/systemd/system/watchdog.service; enabled)
  Drop-In: /etc/systemd/system/watchdog.service.d
           └─local.conf
   Active: inactive (dead)


when i start with :
Code: Select all
sudo systemctl start watchdog

all i see is blinking cursor one line below my command and nothing else happens. Just like the command is trying to do something but still nothing.
Posts: 4
Joined: Sat Oct 01, 2016 3:11 pm
by paulv » Sun Oct 02, 2016 1:14 pm
Are you on the same software version that I used to report my post?
Linux raspi-svr 4.4.9+ #884 Fri May 6 17:25:37 BST 2016 armv6l GNU/Linux


If you are on a newer version, you can thank the powers to be for yet another change of the goalposts, and I can't help you at this moment. Maybe somebody else already knows what changed this time...

Note that nobody from the Foundation chimed in to help with my post or clarify what is going on. :(

Good luck!
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by shalvan » Sun Oct 02, 2016 1:23 pm
My version is:

Code: Select all
Linux RasPi 4.4.17+ #902 Mon Aug 15 12:17:32 BST 2016 armv6l GNU/Linux


but i am experiencing this problem from febuary, it was then when i switched to Jessie.
Posts: 4
Joined: Sat Oct 01, 2016 3:11 pm
by gkoper » Tue Oct 11, 2016 5:19 pm
Dear paulv

Thanks for your extensive description of the results of your research. Impressive!

I did some tests on a more recent version of Rasbian, the version is Jessie Lite
Code: Select all
Linux TestPi2 4.4.21+ #911 Thu Sep 15 14:17:52 BST 2016 armv6l GNU/Linux

As you already surmised, things would be different for later versions and indeed they are!

First of all, the watchdog device is already loaded from the start, although there is no entry in the /boot/config.txt file.

I had a preference for the second method using the systemd. The entries could indeed be set in system.conf file and after booting one could find the watchdog records in the syslog. So far so good. However, neither the simple test involving poweroff nor a fork bomb would make the watchdog bark and cause a reboot! I did find out, that the /dev/watchdog was allocated after setting a nonzero RuntimeWatchdogSec in system.conf though: apparently something had happened! Since I did not have a clue how to proceed, I left this issue alone.

I briefly turned to the first option, using the watchdog package but I stopped as it would not serve my purpose. I would like to monitor a program I wrote to continuously send weather reports to wunderground.com. It already runs for weeks without problems. However, sometimes Rasbian causes a problem that hangs the program or there is a hardware issue. Both are typically resolved by rebooting but that so far had to be done manually.

I found out that the method (http://binerry.de/post/28263824530/raspberry-pi-watchdog-timer) described by Binerry actually can be used still despite being 4 years old. The only requirement is to have free access to /dev/watchdog so that means that in system.conf the parameter RuntimeWatchdogSec should be set to zero.

Noting your success to get more information on Raspbian issues added up to my own ("this may be resolved in a later version ...") I think on the short run the Binerry method is the preferred one. Nevertheless, I would like to know how to deal properly with the "second" method involving the systemd. This is the standard Linux method of dealing with the watchdog but maybe it is still under construction? Hope this invites some useful comments.

gkoper
Posts: 12
Joined: Sun Mar 10, 2013 2:53 pm
by paulv » Tue Oct 11, 2016 7:02 pm
Hi gkoper,

Thanks for contributing to this post.
My own experience, after several years of working with the RPi's, is that the RPi environment is still very much a work in progress (no, that's a bit too generous, let's call it process). As other users regularly comment, the goal posts seem to get moved just about every update. What works with one release may not work with the next, and the forums are full of users that are left dangling.

Alas, back to the topic. I have found, through another project of mine, that the watchdog is initiated automatically by systemd as soon as you invoke the following in a service setup, as an example:
Code: Select all
[Service]
# Add restart options
Restart=always
RestartSec=5
StartLimitBurst=4
StartLimitInterval=180s

# Add optional rebooting options:
StartLimitAction=reboot

This makes sense, but unfortunately, this is not mentioned anywhere, at least not as far as I could find, but that's not much of a guarantee either.

I guess we have to wait until the fog clears, or when enough users start to complain to raise the issue.

If that sounds skeptical, it is caused by the growing frustration of working with the RPi's for several years now and contribution many posts and designs that continuously need tweaking, fixing and updating. As a result, I switched platforms and interest levels and so the Pi has taken a back-seat.

Good luck!
Last edited by paulv on Mon Jan 09, 2017 10:48 am, edited 1 time in total.
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by gkoper » Wed Oct 12, 2016 7:43 am
Thanks paulv

I will have a look later.

Regarding any support, please note that upon startup Raspbian says:
Code: Select all
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

In other words, no promises made.

gkoper
Posts: 12
Joined: Sun Mar 10, 2013 2:53 pm
by no3rpi » Fri Mar 31, 2017 12:12 pm
paulv I want to thank you for investigating and solving this really important issue with watchdog ( at least for me ).

I am using rpi3 as private email server and cctv monitor, headless in a remote location and unfortunately until now in 2 years I had several incidents when it stop respond to any network traffic ( ssh, tcp, udp ....) and had to be hard reset - restarted by hand, despite using monit and other redundant solution to allow me to remote access it.

Today after last incident I implemented your solution from second post and I want to confirm that until now it seems to work ok for this raspberry pi 3:
Code: Select all
uname -a:
Linux rpi3 4.4.50-v7+ #970 SMP Mon Feb 20 19:18:29 GMT 2017 armv7l GNU/Linux
/opt/vc/bin/vcgencmd version:
Mar  3 2017 13:43:37
Copyright (c) 2012 Broadcom
version 9ae30f71c7ef4239e9d5b56346c0842f3ef56736 (clean) (release)
I edited:
Code: Select all
/etc/systemd/system.conf with:
RuntimeWatchdogSec=10
ShutdownWatchdogSec=10min

reboot and tested: cat /var/log/syslog | grep watchdog
Code: Select all
Mar 30 11:17:05 rpi3 kernel: [    8.096998] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer
Mar 31 10:06:52 rpi3 kernel: [    6.056467] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer
Mar 31 10:06:52 rpi3 systemd[1]: Hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0
Mar 31 10:06:52 rpi3 systemd[1]: Set hardware watchdog to 10s.
Mar 31 10:06:53 rpi3 systemd[1]: Hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0
Mar 31 10:06:53 rpi3 systemd[1]: Set hardware watchdog to 10s.
Mar 31 10:06:53 rpi3 kernel: [    6.797667] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer
tested with your bomb script and rpi3 is restarting ok;
for the moment I did not tested if at "shutdown -h now" it will be restarted automatically or not.

I hope watchdog solution will solve hang out problem from now on...

thank you.
Posts: 8
Joined: Fri Mar 31, 2017 11:44 am
by paulv » Fri May 19, 2017 3:10 pm
As was almost to be expected, the description I wrote above no longer works.
The proverbial goal posts have been move once again by the powers in the tower.

Is there anybody that can shed some light on this obscure procedure?

A few pointers.
The hardware watchdog devices are now always listed, regardless of the dt setting in config.txt
The dt setting is required to "really" activate these devices. - There is no entry in the syslog that the watchdog is activated, the entry in systlog does not change if you add the activation - this seems an oversight or a bug.
The watchdog service file (/lib/systemd/system/watchdog.service) is no longer there.
Syslog no longer reports the activation of the systemd watchdog settings - this seems to be a bug to me.
There is no more termination information going to the console anymore if systemd or the watchdog terminates. There is a console reset and a new boot. This also seems to be a bug to me.
It looks like it is no longer possible to cause an automatic reboot after a shutdown/halt/poweroff, like before. - This is either an intentional "fix", without any documentation, or a bug, or a new method is required, and that needs documentation - please.

What still works:
If the watchdog is activated in config.txt and the systemd watchdog is activated in the /etc/systemd/system.conf file, the forkbomb still reboots the Rpi.
Installing a systemd service that watches over a user script, using the notify option, as I showed in an earlier post still works and all that activity is (still) reported in the syslog.

Help!
Last edited by paulv on Sat May 20, 2017 11:12 am, edited 6 times in total.
Posts: 519
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands
by no3rpi » Fri May 19, 2017 3:49 pm
Yes it is true and really bad news.
Code: Select all
uname -a
Linux rpi3 4.9.24-v7+ #993 SMP Wed Apr 26 18:01:23 BST 2017 armv7l GNU/Linux

Code: Select all
cat /var/log/syslog | grep watchdog

nothing to display from log... $#&!
Posts: 8
Joined: Fri Mar 31, 2017 11:44 am
by no3rpi » Mon May 22, 2017 1:41 pm
@paulv
editing a previous post is not a good idea if you add new / update info because there is no notification for users flowing this thread.
At least please add a last message with what post is updated if something important you discovered/changed.

By chance I re-read all this thread today and I found that I also need to have watchdog activated also in config.txt beside /etc/systemd/system.conf file I only used until now...

now I can see in log:
Code: Select all
cat syslog | grep watchdog
May 22 16:20:58 rpi3 kernel: [    0.813864] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer

thank you.
Posts: 8
Joined: Fri Mar 31, 2017 11:44 am