AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Watchdog config help needed

Sun Nov 04, 2018 9:41 am

Hi,
I have a .sh script running at boot via crontab (because it does some steps that need 20 seconds after boot before they work). the script runs a loop doing checks and sending notifications.

Can the watchdog be set to monitor the .sh script and restart it or reboot if the script fails/stops?
ps -aux returns a new pid after each reboot.

Code: Select all

pi         705  0.4  0.2   4664  2644 ?        S    09:03   0:09 bash /home/username/superTest.sh
What do I need to set in the watchdog.conf file to restart the script or reboot the pi when this script stops running?

User avatar
DougieLawson
Posts: 39121
Joined: Sun Jun 16, 2013 11:19 pm
Location: A small cave in deepest darkest Basingstoke, UK
Contact: Website Twitter

Re: Watchdog config help needed

Sun Nov 04, 2018 10:00 am

Use a systemd service file. You can include restarts in that, systemd has a built in watchdog.
Note: Any requirement to use a crystal ball or mind reading will result in me ignoring your question.

Criticising any questions is banned on this forum.

Any DMs sent on Twitter will be answered next month.
All non-medical doctors are on my foes list.

AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Re: Watchdog config help needed

Sun Nov 11, 2018 3:51 am

Thanks for the tip Dougie. Systemd does sound like the best option.

I've it set up to run as a systemd unit, and as best as I can tell it's running, but the only think it now does is write to the log file to say it has started and appears to do nothing else.

The script runs loops forever that includes sending notification mail with sstmp, checking network devices with nc, cpu temperature checks, and writing out to log files when issues occur. I'm forcing the issues it should detect for testing, but as a service the script no longer outputs anything (emails, log files) so something is not quite right. When it was initiated from a command line it was performing as expected.

The systemd config for it from /lib/systemd/system, built up from info I could gleam from my best guesses from reading https://www.digitalocean.com/community/ ... unit-files, is as follows:

Code: Select all

[Unit]
Description=Network check script by Dave F
After=multi-user.target
 
[Service]
#Type=idle
Type=forking
PIDfile=/home/me/pingTest.sh
ExecStart=/home/me/pingTest.sh
Restart=on-always
RestartSec=5

[Timer]
#delay added to allow time for sstmp and networking to be 100% ready
OnStartupSec=30
 
[Install]
WantedBy=multi-user.target

Have I made the right choices? Any suggestions for hos to ensure my .sh script works as well as a systemd service as it does when executed from the command line?

Thanks in advance.
Last edited by AussieDaveF on Sun Nov 11, 2018 8:52 pm, edited 1 time in total.

User avatar
DougieLawson
Posts: 39121
Joined: Sun Jun 16, 2013 11:19 pm
Location: A small cave in deepest darkest Basingstoke, UK
Contact: Website Twitter

Re: Watchdog config help needed

Sun Nov 11, 2018 8:29 am

Try it.

Find the pid of the task with ps -ef | grep ping, kill the task with sudo kill -9 <pid goes here> or sudo killall pingTest.sh and see what happens.

I wouldn't use Type=forking unless your script actually forks new tasks.
Note: Any requirement to use a crystal ball or mind reading will result in me ignoring your question.

Criticising any questions is banned on this forum.

Any DMs sent on Twitter will be answered next month.
All non-medical doctors are on my foes list.

AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Re: Watchdog config help needed

Sun Nov 11, 2018 9:12 pm

I changed the Type from forking to simple and rebooted. I did the

Code: Select all

ps -ef 
command and found my script running then did the kill as described. I checked a few times a few minutes apart but it did not restart.

User avatar
DougieLawson
Posts: 39121
Joined: Sun Jun 16, 2013 11:19 pm
Location: A small cave in deepest darkest Basingstoke, UK
Contact: Website Twitter

Re: Watchdog config help needed

Sun Nov 11, 2018 9:28 pm

This one of mine works OK.

Code: Select all

[Unit]
Description=LEDspi server
After=bmp180.service sunny.service

[Service]
ExecStart=/usr/local/bin/LEDspi
Restart=always
User=pi
Group=pi
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=LEDspi
Environment=TZ=:/usr/share/zoneinfo/GB-Eire

[Install]
WantedBy=multi-user.target

Code: Select all

root@falcon:/etc/systemd/system # ps -ef | grep LED
pi        6819     1  0 Nov10 ?        00:01:48 /usr/local/bin/LEDspi
root     10874 10686  0 21:28 pts/0    00:00:00 grep --color=auto LED
root@falcon:/etc/systemd/system # kill -9 6819
root@falcon:/etc/systemd/system # ps -ef | grep LED
pi       10896     1  1 21:28 ?        00:00:00 /usr/local/bin/LEDspi
root     10898 10686  0 21:28 pts/0    00:00:00 grep --color=auto LED
root@falcon:/etc/systemd/system #
Note: Any requirement to use a crystal ball or mind reading will result in me ignoring your question.

Criticising any questions is banned on this forum.

Any DMs sent on Twitter will be answered next month.
All non-medical doctors are on my foes list.

AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Re: Watchdog config help needed

Mon Nov 12, 2018 7:07 am

Thanks again Dougie, I'll give that a go shortly.

AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Re: Watchdog config help needed (and systemd)

Mon Nov 12, 2018 10:20 am

OK, so those changes have allowed the script to restart if it dies (or gets killed off), so that's a step forward. Thanks Dougie.

The following piece of code is what I use to determine if comms are up before progressing to the remainder of the script. netTestA is set to an IP on the LAN.

Putting in some lines to echo to the log file and I find that while running as a service that the below loop never exits. It is never returning 0 from the nc. It works when run from bash.

Is sending the output to /dev/null the wrong thing to use in a service? Do I need to set something in the service config to allow the script to use nc? Where might this be going wrong?

Code: Select all

# Check that the LAN is available before continuing
webWasUp="TBC"	
while [ "$webWasUp" != "yes" ];
do
  nc -zw5 $netTestA 443 2>/dev/null
  if [[ $? -eq 0 ]]; then
  	webWasUp="yes"	
  else
	sleep 5
  fi
done

User avatar
DougieLawson
Posts: 39121
Joined: Sun Jun 16, 2013 11:19 pm
Location: A small cave in deepest darkest Basingstoke, UK
Contact: Website Twitter

Re: Watchdog config help needed

Mon Nov 12, 2018 12:36 pm

Send the output to /tmp/log until you get it debugged.
Note: Any requirement to use a crystal ball or mind reading will result in me ignoring your question.

Criticising any questions is banned on this forum.

Any DMs sent on Twitter will be answered next month.
All non-medical doctors are on my foes list.

AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Re: Watchdog config help needed

Mon Nov 12, 2018 8:50 pm

I replaced all /dev/null in my script with /tmp/log. After 10 minutes the log file is completely empty. Even when I run the script from the command line (when the rest of the script works successfully). hmmmm.....

Code: Select all

#  nc -zw5 $netTestUSA 443 2>/dev/null
  nc -zw5 $netTestUSA 443 2>/tmp/log
...Actually, I'll try that again tonight with >> instead of >.

AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Re: Watchdog config help needed

Wed Nov 14, 2018 10:29 am

After running for a full day the only text in the log file is below - just 2 lines. I've edited the script to write to the log file throughout the script as checkpoints to track how far through (or not) the script runs, but as a service nothing writes out except the first write to timestamp script initiation.

Code: Select all

nc: getaddrinfo: Temporary failure in name resolution
nc: getaddrinfo: Temporary failure in name resolution
If I boot to the UI and run the script from File Manager (right click, open, execute) the same issue occurs - only the initialise write out to my log file.

Running the same script from the bash prompt writes the initiation line, all the checkpoints, and all my expected log file entries to the log each loop through the code. After an hour with all the checkpoints in I would have hundreds of lines in the log file if run from bash.

The nc command doesn't write anything out to logs if the test was successful no matter how I run the script, but those other log file write points through the script do, except when the script is run as a service. This is doing my head in.

AussieDaveF
Posts: 21
Joined: Tue Jun 26, 2018 11:36 am

Re: Watchdog config help needed

Thu Nov 15, 2018 11:34 pm

Firstly, Dougie, thanks for your time and assistance. It indeed helped me resolve complete my project. Putting me onto sysd was just what I needed. Fixing the not working in sysd issue took a chance conversation over lunch at work.

The issue with my script working when run from the command prompt with bash, but not working as a service was my shebang. I totally missed that I'd started scripting with /bin/sh. This was also stopping the write to logs, because those commands were bash and not shell commands in the script. I changed that 1st line to /bin/bash and the script's commands and logging all worked as expected. 2 little letters completely undid me.

I now have the watchdog keeping the pi alive, systemd keeping my script alive, and my script keeping my network devices monitored 24/7. So pleased.

Return to “Advanced users”