7fh3498f
Posts: 28
Joined: Sat Apr 14, 2018 10:09 am

RPi3B+ and watchdog...

Sat Apr 14, 2018 10:53 am

Hi guys

i have been looking for instructions how setup properly built in watchdog in RPi3B+ but i cannot find these kind of informations on internet. All of them seems to be for older models. Anybody could kindly point me to the place where i can find instruction for Raspberry Pi 3 Model B + and how to setup watchdog properly? Thank you.

7fh3498f
Posts: 28
Joined: Sat Apr 14, 2018 10:09 am

Re: RPi3B+ and watchdog...

Tue Apr 17, 2018 5:29 pm

Anybody guys?

epoch1970
Posts: 6924
Joined: Thu May 05, 2016 9:33 am
Location: France

Re: RPi3B+ and watchdog...

Tue Apr 17, 2018 7:13 pm

What makes you think anything has changed?
There is no change related to 3B+ in the source tree for watchdog in the Raspbian repo, and google seems very quiet, too.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

7fh3498f
Posts: 28
Joined: Sat Apr 14, 2018 10:09 am

Re: RPi3B+ and watchdog...

Tue Apr 17, 2018 7:29 pm

Solution:

Code: Select all

modprobe bcm2835_wdt
echo "bcm2835_wdt" | sudo tee -a /etc/modules

apt-get install watchdog
update-rc.d watchdog defaults

nano /etc/watchdog.conf
Uncomment:

Code: Select all

#watchdog-device
#max-load-1
Add:

Code: Select all

watchdog-timeout = 15
Save, reboot, test by command:

Code: Select all

:(){ :|:& };:

7fh3498f
Posts: 28
Joined: Sat Apr 14, 2018 10:09 am

Re: RPi3B+ and watchdog...

Tue Apr 17, 2018 7:30 pm

epoch1970 wrote:
Tue Apr 17, 2018 7:13 pm
What makes you think anything has changed?
There is no change related to 3B+ in the source tree for watchdog in the Raspbian repo, and google seems very quiet, too.
Just this one little thing:

Code: Select all

bcm2835_wdt

7fh3498f
Posts: 28
Joined: Sat Apr 14, 2018 10:09 am

Re: RPi3B+ and watchdog...

Tue Apr 17, 2018 7:36 pm

One thing still bothers me...

Anybody knows if

Code: Select all

watchdog-timeout = 15
value can be greater than 15? Found some articles where people were saying it cannot be nothing else than 15. Any clues guys?

epoch1970
Posts: 6924
Joined: Thu May 05, 2016 9:33 am
Location: France

Re: RPi3B+ and watchdog...

Tue Apr 17, 2018 7:41 pm

What do you mean?
The name for this linux driver hasn't changed in ages because the driver has worked across many releases of hardware.
bcm2835_wdt is also the name of the linux driver on the "old" Pi 3.

(No 15 sec is the max timeout, your code has to write at 7-10 sec interval to be safe. There are a few threads related to setting up watchdog in this forum.)
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

7fh3498f
Posts: 28
Joined: Sat Apr 14, 2018 10:09 am

Re: RPi3B+ and watchdog...

Tue Apr 17, 2018 8:04 pm

epoch1970 wrote:
Tue Apr 17, 2018 7:41 pm
What do you mean?
The name for this linux driver hasn't changed in ages because the driver has worked across many releases of hardware.
bcm2835_wdt is also the name of the linux driver on the "old" Pi 3.

(No 15 sec is the max timeout, your code has to write at 7-10 sec interval to be safe. There are a few threads related to setting up watchdog in this forum.)
I found many articles where bcm2708_wdog was used to configure Watchdog. Anyway... thanks for you reply.

gdt
Posts: 85
Joined: Thu Jul 19, 2012 10:19 am

Re: RPi3B+ and watchdog...

Tue Feb 12, 2019 6:00 pm

The watchdog and random number generator seem to be built into the kernel in recent Raspbian (eg, Linux kernel build 4.14.79-v7+ #1159) , so no module needs to be loaded.

Simply edit /etc/systemd/system.conf to include "RuntimeWatchdogSec=14".

Barabba
Posts: 58
Joined: Wed Aug 03, 2016 3:49 pm

Re: RPi3B+ and watchdog...

Sat Apr 20, 2019 3:17 pm

Hi mates, I?ve some questions too, sorry I'm a newbie here.. thanks for support

So I'll write this to console:
modprobe bcm2835_wdt
echo "bcm2835_wdt" | sudo tee -a /etc/modules
apt-get install watchdog
update-rc.d watchdog defaults

1) question: I read somewhere the watchdog service is already installed and running as default, is it wrong? Maybe this installation is additional one to use watchdog from user?

Then:
nano /etc/watchdog.conf
uncomment #watchdog-device
uncomment #max-load-1
and add: watchdog-timeout = 15

Save, reboot, test by command:
:(){ :|:& };:
2) what this command does exactly?

3) Now how can I manage watchdog? I mean.. I want to test some TCP ports on loopback 127.0.0.1 if they still responsive (open), and maybe to test a file contents if it's changing. How to do that?
4) As installed on this way, how this watchdog is "reloading" within 15seconds?

Than you a lot!

Barabba
Posts: 58
Joined: Wed Aug 03, 2016 3:49 pm

Re: RPi3B+ and watchdog...

Sat Apr 20, 2019 3:21 pm

SOmebody else told to do that:
In /boot/config.txt add/change:

watchdog=on

In /etc/systemd/systemd.conf, change #RuntimeWatchdogSec= to:

RuntimeWatchdogSec=10s

so what config file should be changed?

epoch1970
Posts: 6924
Joined: Thu May 05, 2016 9:33 am
Location: France

Re: RPi3B+ and watchdog...

Sat Apr 20, 2019 4:45 pm

Barabba wrote:
Sat Apr 20, 2019 3:21 pm
SOmebody else told to do that:
In /boot/config.txt add/change:

watchdog=on

In /etc/systemd/systemd.conf, change #RuntimeWatchdogSec= to:

RuntimeWatchdogSec=10s

so what config file should be changed?
This options is a good one. The 1st instructions you posted certainly date a few years back, they are obsolete indeed.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

epoch1970
Posts: 6924
Joined: Thu May 05, 2016 9:33 am
Location: France

Re: RPi3B+ and watchdog...

Sun Apr 21, 2019 3:10 pm

Barabba wrote:
Sat Apr 20, 2019 3:17 pm
Hi mates, I?ve some questions too, sorry I'm a newbie here.. thanks for support
Some additional comments hopefully answering Q. 2 to 4.

The general principle of a watchdog.
  • The logic is that if some machine ceases working OK, restarting it will solve the problem until next time.
  • To achieve this, the device that restarts the machine must be very unlikely to fail itself. This is the watchdog hardware device included in the Pi (and any modern computers). The hardware watchdog is designed to reset the machine, and it fails closed: if it has been activated and something is not quite right, you get an instant reset (restart.)
  • So there is need for another part tasked with preventing restart; that part runs within the environment you wish to validate it is working OK. This watchdog manager is a reliable program running in the OS. If the OS fails, that program ceases to indicate "all is right", so the hardware watchdog closes and the machine reboots.
Some implementation concerns.
  • On the Pi, the hardware watchdog has a (fixed?) timeout of 15 seconds. The watchdog manager tells it everything is still OK by writing to /dev/watchdog at least every 15 secs. Both parts of the system run completely independently: they don't start at the same time, the hardware part counts elapsed time very accurately while the OS is a bit wobbly, it schedules many tasks and there is no guarantee repetitive tasks are timed perfectly. So to be sure the watchdog manager hits the hardware watchdog within its 15 secs time window in nominal conditions, it should write (at least) at twice that rate, every 7 secs.
  • Before systemd existed, the "linux watchdog", a small and well written C program, was used to act as watchdog manager. The "Linux watchdog" program is still around, but now systemd implements communication with the hardware watchdog, so it is simpler to use it on Linux OSes that run systemd, as Raspbian does.
  • Since the watchdog is basically a time bomb, not everybody wants one ticking in their machines... The hardware device doesn't start counting down until the first write to /dev/watchdog has happened. This is what setting "RuntimeWatchdogSec" does in systemd.conf; if it is not zero, systemd will write at twice the indicated rate and the time bomb will start ticking.
  • The boot process itself, before systemd has started, is normally not under surveillance by the watchdog. Changing that behaviour is complicated and not a good idea, unless your Pi is a stand-alone appliance mandated to boot in much less than 15 seconds.
    Declaring "watchdog=on" in config.txt merely exposes the watchdog device to the OS, it does not activate it. For many years now, the watchdog driver module has been inlined in the Raspbian kernel, there is no module to load.
See Lennart Poettering's explanation of how systemd's watchdog manager works here: http://0pointer.de/blog/projects/watchdog

Using it in your applications.
  • Defining "RuntimeWatchdogSec" basically guarantees that once systemd has started, if systemd fails, you get a reboot. This is what the "fork bomb" shell script above tests: it is a runaway loop that quickly makes the OS crumble under its own load.
  • Since total OS failure is a condition too broad/unlikely to be useful, you can configure the watchdog manager to check for application-specific conditions, and stop writing to /dev/watchdog when conditions are no longer met. With this, the OS can still be running fine, but you command reboot because some user application deemed critical has failed.
    As presented in the second half of Poeterring's paper, that configuration is done by adding keywords like "WatchdogSec=", "Restart=", "Start-Limit-Action=" etc. in the systemd unit of said critical applications.
  • Within the application, provision has to be made for regular calls to the sd_notify library function provided by systemd: if the application calls sd_notify within the "WatchdogSec" delay, then systemd continues to write to /dev/watchdog every half of RuntimeWatchdogSec. If the applications fails to call sd_notify, systemd might restart the application, and after enough failed attempts it will execute the configured Start-Limit-Action, set to reboot the machine most probably.
Using it with any application.
  • If you want to harness a simple shell script, or a program not designed to call sd_notify, you can use systemd's systemd-notify CLI utility. Call "systemd-notify WATCHDOG=1" from a shell script and systemd will be able to monitor that script (assuming it is started via a properly configured systemd unit file, e.g. Type=notify)
  • For any external program, you can simply create a simple shell wrapper designed to start said program and test some condition regularly. If tests fail, stop notifying and what's defined in the wrapper's unit file will be executed.
A complete example here: https://www.medo64.com/2019/01/systemd- ... y-service/
(NB: there is a slight glitch in it, I think. Program /opt/test/test.sh, line 5 should read "/opt/test/application &" instead of "/opt/test/service &")

HTH
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

FlashT
Posts: 65
Joined: Fri Jul 24, 2015 3:51 pm

Re: RPi3B+ and watchdog...

Mon Jun 17, 2019 10:48 am

Can't make watchdog daemon to start automatically... any ideas?

Code: Select all

:~ $ sudo update-rc.d watchdog defaults
insserv: script mysql: service mysql already provided!
after reboot watchdog status is:

Code: Select all

● watchdog.service - watchdog daemon
   Loaded: loaded (/lib/systemd/system/watchdog.service; static)
   Active: inactive (dead)

bilaliz
Posts: 3
Joined: Tue Jul 09, 2019 6:53 pm

Re: RPi3B+ and watchdog...

Tue Jul 09, 2019 7:30 pm

Say hello to another noob lol .. so this is all what we need to do to enable watchdog? and then

Code: Select all

:(){ :|:& };:
to test?
epoch1970 wrote:
Sat Apr 20, 2019 4:45 pm
Barabba wrote:
Sat Apr 20, 2019 3:21 pm
SOmebody else told to do that:
In /boot/config.txt add/change:

watchdog=on

In /etc/systemd/systemd.conf, change #RuntimeWatchdogSec= to:

RuntimeWatchdogSec=10s

so what config file should be changed?
This options is a good one. The 1st instructions you posted certainly date a few years back, they are obsolete indeed.

bxdobs
Posts: 16
Joined: Sat Feb 23, 2019 1:07 am

Re: RPi3B+ and watchdog...

Wed Jan 15, 2020 1:50 am

so here is a related question with a twist ... remote mountain top PI installation has been running for 6+ months ... today I went to get the ambient air temperature (pic 3004 ADC) reading a thermistor gave me a reading of -15.9 C ... then attempted to get the CPU temp (sudo /opt/vc/bin/vcgencmd measure_temp) ... this command hung ... so ... killed the process but it left vcgencmd still running even with a kill -9 ... so ... attempted to do a reboot with a (sudo shutdown -r now) ... that was over an hour ago ... this beasty is still silent.

A collocated Pi was able to ping the silent Pi for several minutes after reboot prior to it permanently going silent ... arp -a returns the IP but <incomplete> for the address ... this Pi has been configured with both a Hardware 15 Second Watchdog as well as a crontab Watchdog (Application monitor)

Question: presuming some or all of the hardware has failed due to extreme cold, is there any hope that the hardware watchdog might keep kicking the reset until it recovers? (presumably, once the weather warms up this weekend > 0 C) or is it more likely that there is a semiconductor junction fracture (permanent failure) due to extreme cold which would have no chance of recovery?

epoch1970
Posts: 6924
Joined: Thu May 05, 2016 9:33 am
Location: France

Re: RPi3B+ and watchdog...

Wed Jan 15, 2020 2:42 pm

The watchdog device is in hardware following the assumption that the hardware platform is more reliable than the software layer.
If your hardware platform is broken, it's game over.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

bxdobs
Posts: 16
Joined: Sat Feb 23, 2019 1:07 am

Re: RPi3B+ and watchdog...

Tue Jan 28, 2020 5:32 am

New info

A group of hardy souls snowshoed two hours up the 300m to site and recycled the RPI power ... the RPI did not restart ... yet ... once the RPI was rescued from the mountain top it powered up with no noticeable issues

So the big question is what actually failed?

RPI is a stock RPi3B+ (Temp Specs?)
with one MCP3004 ADC chip (Temp Specs -40 to +85C)
powered from a HT05-05102500USL 5.1V 2500mA P/S (Temp Specs?)
Micro SDHC Class 6 8G Memory Card loaded with Raspian (Temp Specs?)

The device is connected to a router that provides remote control to 3 AC outlets (optoisolated Zero Cross Triacs)

Using crontab as a soft watchdog to monitor a python continuous loop process

configured a hardware watchdog but didn't find a way to test it ... perhaps needs a one time cron job to halt the processor?

Had expected/hoped the hardware Watchdog would have reset the Pi but being it didn't manually reboot on site suggests there is some other issue(s) ... but ... for the HW watchdog to work, does raspbian need to boot up properly? ... in other words, is the WD being set each time the RPI starts or is this a one time setting that the RPI remembers even if it doesn't boot successfully?

Thinking the weak link may be the SD card so have purchased a Kingston Class 10 that has a Temp Spec of -25 to +85 C rating

Thoughts on how to properly test to ensure this doesn't fail again

epoch1970
Posts: 6924
Joined: Thu May 05, 2016 9:33 am
Location: France

Re: RPi3B+ and watchdog...

Tue Jan 28, 2020 10:44 am

To test it, run a "fork bomb". The machine will reboot if the watchdog is properly activated.
The watchdog only works if the OS boots fully. Then it keeps running.
AFAIK the hardware device is integrated with the SoC, I don't see there is a way it would fail but the entire Pi wouldn't fail.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

User avatar
davidcoton
Posts: 6515
Joined: Mon Sep 01, 2014 2:37 pm
Location: Cambridge, UK

Re: RPi3B+ and watchdog...

Tue Jan 28, 2020 10:46 am

Commercial equipment specced for "extreme" temperatures spends time in environmental test chambers, both in the development lab and later at the approvals test house. A week of test time is not uncommon, and has to be repeated if it's not right first time. You may be able to test the cold performance in a repurposed domestic freezer.

When failures occur (field or test) there is still the minor problem of finding what failed. Examining the logs may help, or there may be something obvious like SDCard corruption. Of course a transient card failure will prevent any logging anyway, and in that case it would still be difficult to know whether the cold affected the card or the Pi's ability to access it. Good luck swapping SDCards at sub-zero temperatures!

How is the Pi packaged? Is it possible to insulate the Pi so that its own waste heat keeps it warmer? If you do that, what happens in warmer(?) summer ambient temperatures?
Location: 345th cell on the right of the 210th row of L2 cache

bxdobs
Posts: 16
Joined: Sat Feb 23, 2019 1:07 am

Re: RPi3B+ and watchdog...

Tue Jan 28, 2020 7:20 pm

There is another RPI (Rpi4) at this site that has been running fine in the same subzero conditions ... the crew did report that this RPI was cozy warm so perhaps it's generating more heat than the Rpi3B+

if the RPi HW reset requires the OS to boot, and a manual restart didn't work then probably waste of time to run GPIO remote resets between the 2 RPIs (provide the ability to remotely reset one RPI from the other)

Was thinking of throwing the entire package into a deep freeze to simulate the site conditions but first need to rework the downstream controls ... the controls will need to be changed from a NO to NC operation so downstream devices have a better chance of not being affected by another RPi failure ... simple enough just waiting for parts ... (control leds will be rewired so they are always on ... the RPi controls will be rewired so they short out the control leds to allow the downstream devices to be turned off)

joetole
Posts: 1
Joined: Fri Apr 03, 2020 8:46 pm

Re: RPi3B+ and watchdog...

Fri Apr 03, 2020 8:52 pm

You no longer need to modprobe the bcm2835_wdt module or add it to /etc/modules. This is on a rpi3 running Raspbian Buster. I don't know if you needed to on the previous Raspbian release and I haven't tested this on a rpi4.

I first noticed it when I ran modprobe and then lsmod and noticed the module was not appearing as loaded. I removed the modprobe line and disabled the watchdog service. After boot, /dev/watchdog was there automatically already. I know it worked as I tested it by running

Code: Select all

echo 1 | sudo tee /dev/watchdog
and ~15 seconds later the system restarted. This means that once I gave it the initial kick, it started the watchdog timer and rebooted when it didn't receive the next one.

The watchdog service should still be ran. You just don't need to worry about anything to do with any command relating to bcm2835_wdt.

Hope this helps.

epoch1970
Posts: 6924
Joined: Thu May 05, 2016 9:33 am
Location: France

Re: RPi3B+ and watchdog...

Sun Apr 05, 2020 9:02 am

This module has been inlined in the Raspbian kernel for years. No need to load it.
Just configure config.txt so that the platform tells Linux it includes a wd device. Then pick a watchdog manager to be executed in the system: the watchdog package, the one built in systemd, or a custom one.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

Return to “Beginners”