nebbles
Posts: 4
Joined: Tue Jan 01, 2019 11:50 pm

Cron jobs randomly failing

Wed Jan 02, 2019 12:05 am

I have a number of users on the Pi who each have their own crontab set up.
When analysing the behaviour of one user, I've noticed that occasionally, the cron script won't run.

Crontab configured as:

Code: Select all

*/3 * * * * /home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log
*/3 * * * * /home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log
It works most of the time, as the scripts are collecting data and timestamping them. For example, I can see there is no data point for 23:24 this evening. If I also check the .log files that should be receiving the prints of the scripts above I can also see that there is no trace that they ever made it to file:

Code: Select all

================================================================================
2019-01-01 23:21:05   Collecting data...
Done.

================================================================================
2019-01-01 23:27:05   Collecting data...
Done.
Furthermore, checking the log files in /var/log/cron.log just indicates that the job was actually run:

Code: Select all

Jan  1 23:21:01 rps01 CRON[754]: (siot) CMD (/home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log)
Jan  1 23:21:01 rps01 CRON[759]: (siot) CMD (/home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log)
Jan  1 23:21:07 rps01 CRON[737]: (CRON) info (No MTA installed, discarding output)
Jan  1 23:21:08 rps01 CRON[740]: (CRON) info (No MTA installed, discarding output)
Jan  1 23:24:01 rps01 CRON[868]: (siot) CMD (/home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log)
Jan  1 23:24:01 rps01 CRON[871]: (siot) CMD (/home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log)
Jan  1 23:24:03 rps01 CRON[851]: (CRON) info (No MTA installed, discarding output)
Jan  1 23:24:03 rps01 CRON[849]: (CRON) info (No MTA installed, discarding output)
Jan  1 23:27:01 rps01 CRON[984]: (siot) CMD (/home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log)
Jan  1 23:27:01 rps01 CRON[987]: (siot) CMD (/home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log)
So somewhere in the mix, cron thinks the file is being run, but the file in no way has been run (no print statements in the .log file and no data in my data file).

On a small side note, I have been having weird freezes... an extract from syslog shows this

Code: Select all

Dec 31 12:45:04 rps01 kernel: [84556.234172] Unable to handle kernel NULL pointer dereference at virtual address 00000072
Dec 31 12:45:04 rps01 kernel: [84556.235057] Unable to handle kernel NULL pointer dereference at virtual address 0000004d
Dec 31 12:45:04 rps01 kernel: [84556.235063] pgd = a7528000
Dec 31 12:45:04 rps01 kernel: [84556.235065] [0000004d] *pgd=36f5a835, *pte=00000000, *ppte=00000000
Dec 31 12:45:04 rps01 kernel: [84556.235077] Internal error: Oops: 17 [#1] SMP ARM
Dec 31 12:45:04 rps01 kernel: [84556.235081] Modules linked in: rfcomm bnep hci_uart btbcm serdev bluetooth ecdh_generic brcmfmac brcmutil snd_bcm2835(C) cfg80211 rfkill snd_pcm snd_timer snd uio_pdrv_genirq fixed uio i2c_dev ip_tables x_tables ipv6
Dec 31 12:45:04 rps01 kernel: [84556.235128] CPU: 0 PID: 30704 Comm: Main.py Tainted: G         C      4.14.79-v7+ #1159
Dec 31 12:45:04 rps01 kernel: [84556.235130] Hardware name: BCM2835
Dec 31 12:45:04 rps01 kernel: [84556.235132] task: b6738f00 task.stack: b6fd0000
Dec 31 12:45:04 rps01 kernel: [84556.235143] PC is at __page_set_anon_rmap+0x14/0x68
Dec 31 12:45:04 rps01 kernel: [84556.235147] LR is at page_add_new_anon_rmap+0x70/0x8c
Dec 31 12:45:04 rps01 kernel: [84556.235150] pc : [<80262b64>]    lr : [<80262e44>]    psr: 00000113
Dec 31 12:45:04 rps01 kernel: [84556.235153] sp : b6fd1df8  ip : b6fd1e08  fp : b6fd1e04
Dec 31 12:45:04 rps01 kernel: [84556.235155] r10: b65d3594  r9 : b65d3000  r8 : 00000000
Dec 31 12:45:04 rps01 kernel: [84556.235158] r7 : ba8d075c  r6 : 00000009  r5 : 0000000a  r4 : ba8d075c
Dec 31 12:45:04 rps01 kernel: [84556.235161] r3 : 00000001  r2 : 0000000a  r1 : 00000009  r0 : ba8d075c
Dec 31 12:45:04 rps01 kernel: [84556.235165] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Dec 31 12:45:04 rps01 kernel: [84556.235168] Control: 10c5383d  Table: 2752806a  DAC: 00000055
Dec 31 12:45:04 rps01 kernel: [84556.235170] Process Main.py (pid: 30704, stack limit = 0xb6fd0210)
Dec 31 12:45:04 rps01 kernel: [84556.235174] Stack: (0xb6fd1df8 to 0xb6fd2000)
Dec 31 12:45:04 rps01 kernel: [84556.235178] 1de0:                                                       b6fd1e24 b6fd1e08
Dec 31 12:45:04 rps01 kernel: [84556.235184] 1e00: 80262e44 80262b5c 00000000 24edf75f b90e6e40 00000040 b6fd1ea4 b6fd1e28
Dec 31 12:45:04 rps01 kernel: [84556.235189] 1e20: 80259e08 80262de0 00000000 00000000 00000000 804a2b40 b6fd1e74 00000000
Dec 31 12:45:04 rps01 kernel: [84556.235195] 1e40: b90e6e40 00000055 014000c0 00075d65 75d65000 a7529d70 a7529d70 00000000
Dec 31 12:45:04 rps01 kernel: [84556.235200] 1e60: 00000000 00000000 00000000 b65d3594 bab441c8 00000000 b6fd1ea4 b6fd1fb0
Dec 31 12:45:04 rps01 kernel: [84556.235205] 1e80: b6738f00 00000817 75d65010 b90aa700 b90aa73c 00000055 b6fd1efc b6fd1ea8
Dec 31 12:45:04 rps01 kernel: [84556.235211] 1ea0: 807a564c 80259750 00000000 00000000 00000000 00000000 00014559 00000000
Dec 31 12:45:04 rps01 kernel: [84556.235216] 1ec0: 00001000 00000000 00000800 00000000 5c1e5f67 80c093bc 00000817 807a531c
Dec 31 12:45:04 rps01 kernel: [84556.235222] 1ee0: 75d65010 b6fd1fb0 00000008 00ec2b48 b6fd1fac b6fd1f00 801011e0 807a5328
Dec 31 12:45:04 rps01 kernel: [84556.235227] 1f00: 000000c3 801081c4 80c0469c 8017a8cc b6fd1f34 b6fd1f20 8017a8cc 8017a508
Dec 31 12:45:04 rps01 kernel: [84556.235233] 1f20: 8017a7a8 80c0469c b6fd1f44 b6fd1f38 80b8e92c 804c518c b6fd1f5c b6fd1f48
Dec 31 12:45:04 rps01 kernel: [84556.235238] 1f40: 804c518c 804c50b0 804c5150 80123a80 b6fd1f6c b6fd1f60 80123a80 80184cf8
Dec 31 12:45:04 rps01 kernel: [84556.235244] 1f60: b6fd1f94 b6fd1f70 80175c68 801239f0 b6fd1fb0 80c046cc 60000010 ffffffff
Dec 31 12:45:04 rps01 kernel: [84556.235249] 1f80: 10c5383d 001ab074 20000010 001ab074 20000010 ffffffff 10c5383d 10c5387d
Dec 31 12:45:04 rps01 kernel: [84556.235255] 1fa0: 00000000 b6fd1fb0 807a4e64 801011a4 0120c0ec 0000000f 000000e7 75d65000
Dec 31 12:45:04 rps01 kernel: [84556.235260] 1fc0: 00000004 00000021 00000008 003eab84 00ede51c 00000008 00ec2b48 00378000
Dec 31 12:45:04 rps01 kernel: [84556.235266] 1fe0: 75d66000 7eacc338 0000ffff 001ab074 20000010 ffffffff 00000000 00000000
Dec 31 12:45:04 rps01 kernel: [84556.235280] [<80262b64>] (__page_set_anon_rmap) from [<80262e44>] (page_add_new_anon_rmap+0x70/0x8c)
Dec 31 12:45:04 rps01 kernel: [84556.235290] [<80262e44>] (page_add_new_anon_rmap) from [<80259e08>] (handle_mm_fault+0x6c4/0xc20)
Dec 31 12:45:04 rps01 kernel: [84556.235300] [<80259e08>] (handle_mm_fault) from [<807a564c>] (do_page_fault+0x330/0x3a4)
Dec 31 12:45:04 rps01 kernel: [84556.235310] [<807a564c>] (do_page_fault) from [<801011e0>] (do_DataAbort+0x48/0xc4)
Dec 31 12:45:04 rps01 kernel: [84556.235318] [<801011e0>] (do_DataAbort) from [<807a4e64>] (__dabt_usr+0x44/0x60)
Dec 31 12:45:04 rps01 kernel: [84556.235320] Exception stack(0xb6fd1fb0 to 0xb6fd1ff8)
Dec 31 12:45:04 rps01 kernel: [84556.235324] 1fa0:                                     0120c0ec 0000000f 000000e7 75d65000
Dec 31 12:45:04 rps01 kernel: [84556.235330] 1fc0: 00000004 00000021 00000008 003eab84 00ede51c 00000008 00ec2b48 00378000
Dec 31 12:45:04 rps01 kernel: [84556.235335] 1fe0: 75d66000 7eacc338 0000ffff 001ab074 20000010 ffffffff
Dec 31 12:45:04 rps01 kernel: [84556.235340] Code: e92dd800 e24cb004 e52de004 e8bd4000 (e591c044) 
Dec 31 12:45:04 rps01 kernel: [84556.235346] ---[ end trace a52b25530133fe35 ]---
It's something I've never come across with the Pi before and have no idea if they could be related. This freeze up has happened 2/3 times in the last few days, this cron thing seems to be happening every few hours randomly, bearing in mind i gets the majority of these "3rd minute" jobs executed without problems.

robjak
Posts: 26
Joined: Sun Mar 12, 2017 7:22 pm

Re: Cron jobs randomly failing

Wed Jan 02, 2019 2:38 am

install a mail service, postfix for example. Cron will write issues to mail. this could help you troubleshoot

You could use mutt to read the emails

sudo apt-get install postfix, mutt

User avatar
scruss
Posts: 3518
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: Cron jobs randomly failing

Wed Jan 02, 2019 3:33 am

Is there anything to the fact that the scripts that logged successfully took 6-7 seconds to run, but the scripts that failed only took 2 seconds?

If the log file were locked for writing, that might cause an issue.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him

User avatar
rpdom
Posts: 17737
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: Cron jobs randomly failing

Wed Jan 02, 2019 5:09 am

It looks like you are only logging the normal output of your scripts and not the error output. Try changing your crontabs to look like this and see if you get anything more helpful in the logs:

Code: Select all

*/3 * * * * /home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log 2>&1
*/3 * * * * /home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log 2>&1

nebbles
Posts: 4
Joined: Tue Jan 01, 2019 11:50 pm

Re: Cron jobs randomly failing

Wed Jan 02, 2019 11:09 am

rpdom wrote: It looks like you are only logging the normal output of your scripts and not the error output. Try changing your crontabs to look like this and see if you get anything more helpful in the logs:

Code: Select all

*/3 * * * * /home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log 2>&1
*/3 * * * * /home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log 2>&1
Thanks @rpdom. This seems to have helped as I now have the reason for why the scripts die so quickly. "Segmentation fault". Now to work out why sometimes randomly, there is a segmentation fault running these scripts...

nebbles
Posts: 4
Joined: Tue Jan 01, 2019 11:50 pm

Re: Cron jobs randomly failing

Wed Jan 02, 2019 11:20 am

Interestingly the exit status has also changed in the cron.log when something goes wrong

Code: Select all

Jan  2 10:48:01 rps01 CRON[28992]: (userL) CMD ([29008] /home/userL/Sensing-IOT/Main.py)
Jan  2 10:48:01 rps01 CRON[28994]: (siot) CMD ([29010] /home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log 2>&1)
Jan  2 10:48:01 rps01 CRON[28995]: (userJ) CMD ([29009] /home/userJ/SIOT2018/main.py)
Jan  2 10:48:01 rps01 CRON[28993]: (siot) CMD ([29015] /home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log 2>&1)
Jan  2 10:48:04 rps01 CRON[28993]: (CRON) error (grandchild #29015 failed with exit status 139)
Jan  2 10:48:04 rps01 CRON[28993]: (siot) END ([29015] /home/siot/SIOT/data_collection/stocks.py >> /home/siot/stocks.log 2>&1)
Jan  2 10:48:06 rps01 CRON[28994]: (siot) END ([29010] /home/siot/SIOT/data_collection/weather.py >> /home/siot/weather.log 2>&1)
Jan  2 10:48:07 rps01 CRON[28992]: (CRON) info (No MTA installed, discarding output)
Jan  2 10:48:07 rps01 CRON[28992]: (userL) END ([29008] /home/userL/Sensing-IOT/Main.py)
Jan  2 10:48:07 rps01 CRON[28995]: (CRON) info (No MTA installed, discarding output)
Jan  2 10:48:07 rps01 CRON[28995]: (userJ) END ([29009] /home/userJ/SIOT2018/main.py)
It reflects what I am seeing in the script logs... a Segmentation fault.

nebbles
Posts: 4
Joined: Tue Jan 01, 2019 11:50 pm

Re: Cron jobs randomly failing

Wed Jan 02, 2019 1:54 pm

In addition to the Segmentation fault, I've had two other reasons for failed script runs in the past couple hours. Most of the time as usual it works without problems...

I had one error with

Code: Select all

File "/home/siot/SIOT/data_collection/gsheet.py", line 2, in <module>
    from oauth2client.service_account import ServiceAccountCredentials
  File "/home/siot/.local/lib/python3.5/site-packages/oauth2client/service_account.py", line 25, in <module>
    from oauth2client import client
  File "/home/siot/.local/lib/python3.5/site-packages/oauth2client/client.py", line 45, in <module>
    from oauth2client import crypt
  File "/home/siot/.local/lib/python3.5/site-packages/oauth2client/crypt.py", line 45, in <module>
    from oauth2client import _openssl_crypt
  File "/home/siot/.local/lib/python3.5/site-packages/oauth2client/_openssl_crypt.py", line 16, in <module>
    from OpenSSL import crypto
  File "/usr/lib/python3/dist-packages/OpenSSL/__init__.py", line 8, in <module>
    from OpenSSL import rand, crypto, SSL
  File "/usr/lib/python3/dist-packages/OpenSSL/crypto.py", line 13, in <module>
    from cryptography.hazmat.primitives.asymmetric import dsa, rsa
  File "/usr/lib/python3/dist-packages/cryptography/hazmat/primitives/asymmetric/rsa.py", line 14, in <module>
    from cryptography.hazmat.backends.interfaces import RSABackend
  File "/usr/lib/python3/dist-packages/cryptography/hazmat/backends/__init__.py", line 7, in <module>
    import pkg_resources
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3019, in <module>
    @_call_aside
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3003, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3032, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 646, in _build_master
    ws = cls()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 639, in __init__
    self.add_entry(entry)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 695, in add_entry
    for dist in find_distributions(entry, True):
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2019, in find_on_path
    path_item, entry, metadata, precedence=DEVELOP_DIST
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2432, in from_location
    py_version=py_version, platform=platform, **kw
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2409, in __init__
    self.project_name = safe_name(project_name or 'Unknown')
TypeError: 'EggInfoDistribution' object is not iterable
And another with this separate one

Code: Select all

File "/home/siot/SIOT/data_collection/gsheet.py", line 3, in <module>
    import gspread
  File "/home/siot/.local/lib/python3.5/site-packages/gspread/__init__.py", line 16, in <module>
    from .client import Client
  File "/home/siot/.local/lib/python3.5/site-packages/gspread/client.py", line 12, in <module>
    import requests
  File "/home/siot/.local/lib/python3.5/site-packages/requests/__init__.py", line 95, in <module>
    from urllib3.contrib import pyopenssl
  File "/home/siot/.local/lib/python3.5/site-packages/urllib3/contrib/pyopenssl.py", line 48, in <module>
    from cryptography.hazmat.backends.openssl import backend as openssl_backend
  File "/usr/lib/python3/dist-packages/cryptography/hazmat/backends/openssl/__init__.py", line 7, in <module>
    from cryptography.hazmat.backends.openssl.backend import backend
  File "/usr/lib/python3/dist-packages/cryptography/hazmat/backends/openssl/backend.py", line 47, in <module>
    from cryptography.hazmat.backends.openssl.x509 import (
  File "/usr/lib/python3/dist-packages/cryptography/hazmat/backends/openssl/x509.py", line 22, in <module>
    class _Certificate(object):
  File "/usr/lib/python3/dist-packages/cryptography/utils.py", line 28, in register_decorator
    verify_interface(iface, klass)
  File "/usr/lib/python3/dist-packages/cryptography/utils.py", line 82, in verify_interface
    sig = signature(getattr(iface, method))
  File "/usr/lib/python3.5/inspect.py", line 2988, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
  File "/usr/lib/python3.5/inspect.py", line 2738, in from_callable
    follow_wrapper_chains=follow_wrapped)
  File "/usr/lib/python3.5/inspect.py", line 2175, in _signature_from_callable
    obj = unwrap(obj, stop=(lambda f: hasattr(f, "__signature__")))
  File "/usr/lib/python3.5/inspect.py", line 466, in unwrap
    def _is_wrapper(f):
SystemError: expected tuple for closure, got 'tuple'
This seems to be to more and more like some sort of SD card corruption which is causing anolmalies with Segmentation faults and random python module to fail on import, but not all the time...

Return to “Troubleshooting”