Page 1 of 1

Watchdog config help needed

Posted: Sun Nov 04, 2018 9:41 am
by AussieDaveF
Hi,
I have a .sh script running at boot via crontab (because it does some steps that need 20 seconds after boot before they work). the script runs a loop doing checks and sending notifications.

Can the watchdog be set to monitor the .sh script and restart it or reboot if the script fails/stops?
ps -aux returns a new pid after each reboot.

Code: Select all

pi         705  0.4  0.2   4664  2644 ?        S    09:03   0:09 bash /home/username/superTest.sh
What do I need to set in the watchdog.conf file to restart the script or reboot the pi when this script stops running?

Re: Watchdog config help needed

Posted: Sun Nov 04, 2018 10:00 am
by DougieLawson
Use a systemd service file. You can include restarts in that, systemd has a built in watchdog.

Re: Watchdog config help needed

Posted: Sun Nov 11, 2018 3:51 am
by AussieDaveF
Thanks for the tip Dougie. Systemd does sound like the best option.

I've it set up to run as a systemd unit, and as best as I can tell it's running, but the only think it now does is write to the log file to say it has started and appears to do nothing else.

The script runs loops forever that includes sending notification mail with sstmp, checking network devices with nc, cpu temperature checks, and writing out to log files when issues occur. I'm forcing the issues it should detect for testing, but as a service the script no longer outputs anything (emails, log files) so something is not quite right. When it was initiated from a command line it was performing as expected.

The systemd config for it from /lib/systemd/system, built up from info I could gleam from my best guesses from reading https://www.digitalocean.com/community/ ... unit-files, is as follows:

Code: Select all

[Unit]
Description=Network check script by Dave F
After=multi-user.target
 
[Service]
#Type=idle
Type=forking
PIDfile=/home/me/pingTest.sh
ExecStart=/home/me/pingTest.sh
Restart=on-always
RestartSec=5

[Timer]
#delay added to allow time for sstmp and networking to be 100% ready
OnStartupSec=30
 
[Install]
WantedBy=multi-user.target

Have I made the right choices? Any suggestions for hos to ensure my .sh script works as well as a systemd service as it does when executed from the command line?

Thanks in advance.

Re: Watchdog config help needed

Posted: Sun Nov 11, 2018 8:29 am
by DougieLawson
Try it.

Find the pid of the task with ps -ef | grep ping, kill the task with sudo kill -9 <pid goes here> or sudo killall pingTest.sh and see what happens.

I wouldn't use Type=forking unless your script actually forks new tasks.

Re: Watchdog config help needed

Posted: Sun Nov 11, 2018 9:12 pm
by AussieDaveF
I changed the Type from forking to simple and rebooted. I did the

Code: Select all

ps -ef 
command and found my script running then did the kill as described. I checked a few times a few minutes apart but it did not restart.

Re: Watchdog config help needed

Posted: Sun Nov 11, 2018 9:28 pm
by DougieLawson
This one of mine works OK.

Code: Select all

[Unit]
Description=LEDspi server
After=bmp180.service sunny.service

[Service]
ExecStart=/usr/local/bin/LEDspi
Restart=always
User=pi
Group=pi
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=LEDspi
Environment=TZ=:/usr/share/zoneinfo/GB-Eire

[Install]
WantedBy=multi-user.target

Code: Select all

root@falcon:/etc/systemd/system # ps -ef | grep LED
pi        6819     1  0 Nov10 ?        00:01:48 /usr/local/bin/LEDspi
root     10874 10686  0 21:28 pts/0    00:00:00 grep --color=auto LED
root@falcon:/etc/systemd/system # kill -9 6819
root@falcon:/etc/systemd/system # ps -ef | grep LED
pi       10896     1  1 21:28 ?        00:00:00 /usr/local/bin/LEDspi
root     10898 10686  0 21:28 pts/0    00:00:00 grep --color=auto LED
root@falcon:/etc/systemd/system #

Re: Watchdog config help needed

Posted: Mon Nov 12, 2018 7:07 am
by AussieDaveF
Thanks again Dougie, I'll give that a go shortly.

Re: Watchdog config help needed (and systemd)

Posted: Mon Nov 12, 2018 10:20 am
by AussieDaveF
OK, so those changes have allowed the script to restart if it dies (or gets killed off), so that's a step forward. Thanks Dougie.

The following piece of code is what I use to determine if comms are up before progressing to the remainder of the script. netTestA is set to an IP on the LAN.

Putting in some lines to echo to the log file and I find that while running as a service that the below loop never exits. It is never returning 0 from the nc. It works when run from bash.

Is sending the output to /dev/null the wrong thing to use in a service? Do I need to set something in the service config to allow the script to use nc? Where might this be going wrong?

Code: Select all

# Check that the LAN is available before continuing
webWasUp="TBC"	
while [ "$webWasUp" != "yes" ];
do
  nc -zw5 $netTestA 443 2>/dev/null
  if [[ $? -eq 0 ]]; then
  	webWasUp="yes"	
  else
	sleep 5
  fi
done

Re: Watchdog config help needed

Posted: Mon Nov 12, 2018 12:36 pm
by DougieLawson
Send the output to /tmp/log until you get it debugged.

Re: Watchdog config help needed

Posted: Mon Nov 12, 2018 8:50 pm
by AussieDaveF
I replaced all /dev/null in my script with /tmp/log. After 10 minutes the log file is completely empty. Even when I run the script from the command line (when the rest of the script works successfully). hmmmm.....

Code: Select all

#  nc -zw5 $netTestUSA 443 2>/dev/null
  nc -zw5 $netTestUSA 443 2>/tmp/log
...Actually, I'll try that again tonight with >> instead of >.

Re: Watchdog config help needed

Posted: Wed Nov 14, 2018 10:29 am
by AussieDaveF
After running for a full day the only text in the log file is below - just 2 lines. I've edited the script to write to the log file throughout the script as checkpoints to track how far through (or not) the script runs, but as a service nothing writes out except the first write to timestamp script initiation.

Code: Select all

nc: getaddrinfo: Temporary failure in name resolution
nc: getaddrinfo: Temporary failure in name resolution
If I boot to the UI and run the script from File Manager (right click, open, execute) the same issue occurs - only the initialise write out to my log file.

Running the same script from the bash prompt writes the initiation line, all the checkpoints, and all my expected log file entries to the log each loop through the code. After an hour with all the checkpoints in I would have hundreds of lines in the log file if run from bash.

The nc command doesn't write anything out to logs if the test was successful no matter how I run the script, but those other log file write points through the script do, except when the script is run as a service. This is doing my head in.

Re: Watchdog config help needed

Posted: Thu Nov 15, 2018 11:34 pm
by AussieDaveF
Firstly, Dougie, thanks for your time and assistance. It indeed helped me resolve complete my project. Putting me onto sysd was just what I needed. Fixing the not working in sysd issue took a chance conversation over lunch at work.

The issue with my script working when run from the command prompt with bash, but not working as a service was my shebang. I totally missed that I'd started scripting with /bin/sh. This was also stopping the write to logs, because those commands were bash and not shell commands in the script. I changed that 1st line to /bin/bash and the script's commands and logging all worked as expected. 2 little letters completely undid me.

I now have the watchdog keeping the pi alive, systemd keeping my script alive, and my script keeping my network devices monitored 24/7. So pleased.