n7wlc
Posts: 4
Joined: Mon Sep 29, 2014 8:24 pm

cpu temperature bug

Mon Sep 29, 2014 8:32 pm

Has anyone run into a problem reading the cpu temperature from /sys/class/thermal/thermal_zone0/temp ?

My program is repeated reading the temperature from this file, once every 20 seconds. After about an hour or so, the process hangs trying to read from this file. If I come in with a separate login session and simply cat this file, that process will also hang. Control-C will not get me out of this.

Any ideas of what could be causing this and more importantly how to get around it?

Thanks,

Mike

User avatar
DougieLawson
Posts: 40214
Joined: Sun Jun 16, 2013 11:19 pm
Location: A small cave in deepest darkest Basingstoke, UK
Contact: Website Twitter

Re: cpu temperature bug

Tue Sep 30, 2014 12:05 am

Change your program to use
open
read
close
every time round the loop.

Code: Select all

#!/usr/bin/python
import RPi.GPIO as GPIO
GPIO.setmode(GPIO.BCM)
GPIO.setup(17, GPIO.OUT)

while True:
  try:
    tFile = open('/sys/class/thermal/thermal_zone0/temp')
    temp = float(tFile.read())
    tempC = temp/1000
    if tempC > 43.5:
      GPIO.output(17, 1)
      print "HOT"
    else:
      GPIO.output(17, 0)
      print "COLD"

  except:
    tFile.close()
    GPIO.cleanup()
    exit
Criticising any questions is banned on this forum.

Any DMs sent on Twitter will be answered next month.
All fake doctors are on my foes list.

Note: Any requirement to use a crystal ball or mind reading will result in me ignoring your question.

n7wlc
Posts: 4
Joined: Mon Sep 29, 2014 8:24 pm

Re: cpu temperature bug

Fri Oct 10, 2014 7:54 pm

Thanks for the tip. My program is in C and does what you suggest. I can see how leaving open file handles will eventually fill up the table.

I modified the program by removing a status request on a separate, remote pi. That worked for 7 days. I added back the status check, and it continues to work properly. (Uptime is now 10d 22h.) I am left with an unsatisfactory explanation that a kernel update or reboot cleared something.

Thanks for the help!

Mike

User avatar
ejohnfel
Posts: 7
Joined: Mon Feb 23, 2015 2:40 am
Location: Long Island NY
Contact: Website

Re: cpu temperature bug

Mon Feb 23, 2015 6:20 am

I have been working on a python script of some complexity. Basically it is outputting some text to an Adafruit LCD Pi Plate. Several pieces of data that is output are some temperatures, one read from /sys/class/thermal/thermal_zone0/temp and another via serial from an Alamode's microcontroller that has a TMP36 rigged up to an analog line.

For about three weeks I have been attempting to debug why my script will, randomly, freeze up; sometimes a few hours, sometimes a day or so. I can't kill the script, a reboot always fails and the only thing I can do is power off the Pi (Model B) to reset everything. It appears to strongly be a race-condition... possibly mutex issue or lock issue on the file.

After reading on a few forums that I am not the only one having trouble reading the temperature from the Pi. From some posts, I decided to switch from reading the thermal zone file to simply running vcgencmd and reading the output through a pipe. Only to read further that the command and the thermal zone file all seem to be interrelated.

It seems this problem is known and goes back to 2012; apparently there might also have been a patch for it... since I update my Pi's religiously, I am assuming the if there was one, my Pi has it.

Anyhow, I am running an experiment now, where I have simply eliminated using either the thermal zone file or reading the output from vcgencmd and relying only on the ambient temperature sensor I have connected to the Alamode to see if it . My theory being, I simply can't use either, OR, my Python script has some serious trouble or there is something seriously wrong with the Alamode's engineering or the LCD PiPlate's... that is somehow able to freeze a process so that even rebooting or using "kill -9" can't kill it.

My tests so far tend to indicate its neither the Pi Plate nor the Alamode are the problem, since the debugging trace from the script always ends at reading the temp from the file or from vcgencmd's output.

Although my experiment isn't complete yet (its going to take a few days to ensure complete stability of the script) I just wanted to know if anyone has any kind of hints or best practices for accessing the thermal zone file. Basically, am I missing something here?
It ain't over until the system crashes

User avatar
ejohnfel
Posts: 7
Joined: Mon Feb 23, 2015 2:40 am
Location: Long Island NY
Contact: Website

Re: cpu temperature bug

Sun Mar 01, 2015 8:13 am

Ok, six days and change, no crash or freeze... must be something wrong with Pi's temp sensor.

While I can live without the internal CPU temp sensor since I have an external one for the ambient temperature, it is a little disappointing that I cannot monitor the CPU temp though. The ambient sensor monitors how hard some external hardware is working which will no doubt cause the Pi to heat up. I was hoping to use the CPU temperature to help moderate and protect the Pi from damage should the external hardware get to hot... now I will just have to guestimate it... :(

- Eric
It ain't over until the system crashes

DirkS
Posts: 10449
Joined: Tue Jun 19, 2012 9:46 pm
Location: Essex, UK

Re: cpu temperature bug

Sun Mar 01, 2015 9:01 am

ejohnfel wrote:Ok, six days and change, no crash or freeze... must be something wrong with Pi's temp sensor.
Well, it's unlikely that you'll get any response if you don't post your code...

User avatar
ejohnfel
Posts: 7
Joined: Mon Feb 23, 2015 2:40 am
Location: Long Island NY
Contact: Website

Re: cpu temperature bug

Thu Mar 05, 2015 6:15 am

As a debugging measure, I recoded the function to be a bit more explicit, and later, to ensure I cleared the file's input buffer out I added an extra read... anyhow... code for reading from the class file...

# Get CPU Temperature
def CPUTemperature(self):
value = 0.0

DebugMsg("Entering CPUTemperature")

try:
file = open("/sys/class/thermal/thermal_zone0/temp","r")
value = float(file.readline().strip("\n"))
buffer = file.read(256) # Clear Buffer
file.close()
except Exception as e:
DebugMsg("CPUTemperature - An Error Occurred: {0}".format(e))

temperature = value * (10.0 ** -3)

return (temperature)

I also created another function in the same python class to use vcgencmd via a pipe... this function experiences the freezing issue as well, I created this as an alternative to the first (ie. I don't use them both at the same time)...

# Get GPU Temperature
def GPUTemperature(self):
value = 0.0
data = ""

try:
process = subprocess.Popen([ "/opt/vc/bin/vcgencmd", "measure_temp" ], stdout=subprocess.PIPE)
line = process.communicate()[0]
data = line[5:9]
value = float(data)
process.wait()
except Exception as e:
DebugMsg("GPUTemperature - An Error Occurred: {0}, {1}/{2}".format(e,line,data))

return value

Lastly, something interesting... this whole code base in Python is meant to replace a set of shell scripts that are already running on another Pi inside a project. I realized two days ago that... the shell script has been running for months with no problems... I checked the syslogs to be sure, at least it has been running for the last 7 days with no issues.

temperature=`vcgencmd measure_temp | cut -d"'" -f1,2 --output-delimiter=" " | cut -d"=" -f2`

This line in the script seems to work...

The main difference between the scripts aside from one being a Bash Shell script and the other Python, is that the Bash script runs every 10 minutes from a cron job under 'root', where as the Python script is intended to be launched at boot and remain running as a service and polls the temp every 15 seconds. Although, since the python script is not finished, it is currently being run as 'root' from the command line *and* due to my troubles I changed the poll time to every 15 minutes instead of every 15 seconds.... makes me wonder if the Python script is leaking handles... The current temperature reads are coming from a serial connection to a microcontroller with a TMP36 sensor, so it differs from the PIPE and file methods... its now running 10 days with no freezes... so I am pretty sure the issue is around the PIPE or file reads.

The only last detail is, the current running project, reboots every night at midnight (to be sure any brain-dead ASICs are reset), whereas the test bed is powered for days on end with no reboots (and should, incidentally, detect the ASIC's dropping out and correct the problem immediately with a reboot or a power cycle; so far the one ASIC on the test bed has had no issues, so no reboots).

I'm going to run a few more tests... but if there is anything glaringly wrong in the code above, please let me know.
It ain't over until the system crashes

User avatar
rpdom
Posts: 17727
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: cpu temperature bug

Thu Mar 05, 2015 6:51 am

Your code is hard to read without proper formatting, as the indentation has been lost.
Please put "

Code: Select all

" before the code and "
" after it to make it look like this

Code: Select all

try:
    file = open("/sys/class/thermal/thermal_zone0/temp","r")
    value = float(file.readline().strip("\n"))
    buffer = file.read(256) # Clear Buffer
    file.close()
except Exception as e:
    DebugMsg("CPUTemperature - An Error Occurred: {0}".format(e))

hippy
Posts: 8558
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: cpu temperature bug

Thu Mar 05, 2015 12:59 pm

I went for something even simpler. I have left this running and will report back later.

Code: Select all

#!/usr/bin/python
import time
while True:
  with open("/sys/class/thermal/thermal_zone0/temp","r") as f:
    print time.ctime(), f.readline().strip()
  time.sleep(0.1)

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27436
Joined: Sat Jul 30, 2011 7:41 pm

Re: cpu temperature bug

Thu Mar 05, 2015 1:09 pm

Not familiar with Python - does that close the file when it goes out of scope?
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

DirkS
Posts: 10449
Joined: Tue Jun 19, 2012 9:46 pm
Location: Essex, UK

Re: cpu temperature bug

Thu Mar 05, 2015 1:26 pm

jamesh wrote:Not familiar with Python - does that close the file when it goes out of scope?
From http://www.tutorialspoint.com/python/py ... les_io.htm
Python automatically closes a file when the reference object of a file is reassigned to another file.
So I think that's a 'yes'.

hippy
Posts: 8558
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: cpu temperature bug

Thu Mar 05, 2015 5:35 pm

jamesh wrote:Not familiar with Python - does that close the file when it goes out of scope?
Yes, that's correct. That code has been running for some six hours now with no lock-ups or oddities or issues.

User avatar
ejohnfel
Posts: 7
Joined: Mon Feb 23, 2015 2:40 am
Location: Long Island NY
Contact: Website

Re: cpu temperature bug

Fri Mar 06, 2015 3:51 am

Hi All,

Thanks for the help, I will remember the code tag for the future.

I also used "with" in an earlier iteration of the code, I went more explicit to test a theory.

Anyhow, while I was mulling this problem over it occurred to me that the one other item different between the test bed and the live project is that the test bed has a Wyolum Alamode on the Pi (and both have an Adafruit 16x2 LCD).

One other "weird" thing I noticed is that after long runs, I can't shutdown or reboot the Pi. Literally, issuing "reboot" does nothing. I have a whole bunch of various model Pi's and I haven't experienced that on any of the others.

Anyhow, I decided to remove the Alamode and comment out all the code that talks to it, just on a hunch. So far, its been running almost 10 hours.

I am going to let it run for a few days just to be sure, but at this point I'm thinking the Alamode (or Alamode and LCD Combo) might be causing the problem... but it escapes me how either scenario would cause reading the temp class file to fail randomly.

Anyhow, thank you for looking into this, I will post back with any news.
It ain't over until the system crashes

ktb
Posts: 1447
Joined: Fri Dec 26, 2014 7:53 pm

Re: cpu temperature bug

Fri Mar 06, 2015 3:59 am

I run this script from .conkyrc

Code: Select all

${execi 999 /opt/scripts/get_cpu_temp.sh } °C
/opt/scripts/get_cpu_temp.sh

Code: Select all

#!/bin/bash
cpuTemp0=$(cat /sys/class/thermal/thermal_zone0/temp)
cpuTemp1=$(($cpuTemp0/1000))
cpuTemp2=$(($cpuTemp0/100))
cpuTempM=$(($cpuTemp2 % $cpuTemp1))
echo ${cpuTemp1}'.'${cpuTempM}
I haven't ever noticed a problem like OP mentioned.

hippy
Posts: 8558
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: cpu temperature bug

Fri Mar 06, 2015 10:20 am

My test ran for 12 hours without a problem but it could be that its sleep() is leading to it being out of phase with whatever might be causing an issue. I am now running it with more variable timing and no sleep() to see what happens. No problems so far.

hippy
Posts: 8558
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: cpu temperature bug

Fri Mar 06, 2015 5:38 pm

Eight hours on and no issues that I can see. If there were some timing or mutex issues with accessing the file I would have expected to have encountered them during that time.

User avatar
ejohnfel
Posts: 7
Joined: Mon Feb 23, 2015 2:40 am
Location: Long Island NY
Contact: Website

Re: cpu temperature bug

Fri Mar 06, 2015 7:42 pm

Hi All,

Woke up this morning, the Python script was frozen again.

I think I am going get a fresh OS image for the Pi, and also run a parallel test on another Pi to see if the problem will reproduce or not.
It ain't over until the system crashes

User avatar
ejohnfel
Posts: 7
Joined: Mon Feb 23, 2015 2:40 am
Location: Long Island NY
Contact: Website

Re: cpu temperature bug

Mon Apr 06, 2015 1:51 am

Hi All,

First, thanks to everyone who looked into this and offered help.

After many different experiments and some help from a few folks over at the Element14.com community I can say I know what is causing the problem; but not really how to fix it, although I have some ideas, but for another iteration of the project perhaps. For the time being I will only use the temp sensor connected to the Alamode.

The long and the short of it is, there is something about the Adafruit LCD Plate (or its code) that the Pi does not like. In all iterations of my experiments, anytime I attempt to read from the CPU sensor and write to the LCD, I get random freezes. Remove the LCD plate from the equation (either remove the plate from the Pi or simply don't use it, there are no problems).

A community member over at Element14 may have struck on the possible issue...
I2C spec requires a "clock high" of 5 microseconds when running with a 100kHz clock. But when the slave stretches the clock for 4.9, 9.9 or 14.9 microseconds, the broadcom will look at the clock (at 5, 10, or 15 microseconds) and see: Hey: it's high, so I can continue with the next LOW cycle. This results in a clock of only 0.1 microseconds, with "funny" results. Broadcom has known about this bug in the 2835, and apparently decided not to fix it for the BCM2836.
While I have not confirmed this myself, but the explanation makes some sense as there is a clear difference in the performance of the new libs and the old ones. The new ones are visibly slower, which makes me think its a timing issue related to the new libs causing some bad I2C mojo.

Anyhow, I am going to ping Adafruit to see what they think. Perhaps I can either confirm this or find a work-around.
It ain't over until the system crashes

mrkorb
Posts: 4
Joined: Fri Feb 13, 2015 2:02 am
Location: Tigard, OR, USA

Re: cpu temperature bug

Mon Aug 24, 2015 1:21 pm

Just want to throw my 2 cents in with a similar experience I've been having. I'm presuming you've heard of the RetroPie project that bundles a whole bunch of video game emulators together into one kickass little pi shaped package? Ok, well a feature of that project is a nifty little ASCII art display that shows up when you log into your pi at the console and tells you things like your uptime, disk space used, IP address...oh hell, I'll just paste it in here.

Code: Select all

   .~~.   .~~.    Monday, 24 August 2015, 05:38:22 AM
  '. \ ' ' / .'   Linux 3.18.11-v7+ armv7l GNU/Linux
   .~ .~~~..~.
  : .~.'~'.~. :   Filesystem      Size  Used Avail Use% Mounted on
 ~ (   ) (   ) ~  /dev/root        30G  4.6G   24G  17% /
( : '~'.~.'~' : ) Uptime.............: 0 days, 00h02m10s
 ~ .~       ~. ~  Memory.............: 643272kB (Free) / 754336kB (Total)
  (   |   |   )   Running Processes..: 100
  '~         ~'   IP Address.........: 192.168.1.141
    *--~-~--*     Temperature........: CPU: 41°C/105°F GPU: 41°C/105°F
                  The RetroPie Project, http://www.petrockblock.com
Neat, right? Ok. So it uses 'cat /sys/class/thermal/thermal_zone0/temp' to get the system temperature, and I noticed after I installed RetroPie on my existing Raspian install that it would lock up when I would try logging in after a long period of uptime. Pull the plug, and I could log in right away with no problem, but let it go for a day or two and the system would lock up precisely at the point where the above display should have popped up on my screen. I disabled the 'bashwelcometweak' as it is known in the RetroPie package, and the login problem vanished as well.

I took my concern over this to the RetroPie GitHub and was told that it was probably unique to me alone. I finally got around to debugging it just now, and found it got hung up on 'cat /sys/class/thermal/thermal_zone0/temp'. So some Googling lead me to here, and I think I found the cause of my problem. I have a Piglow LED board attached to my Pi, which is running a binary clock script. It's obviously something more simple than an Adafruit LCD readout is, but they're both things that attach to the Pi, and it's not too much of a stretch to see that the 'cat /sys/class/thermal/thermal_zone0/temp' command appears to be affected by them similarly.

tl;dr don't use 'cat /sys/class/thermal/thermal_zone0/temp' when you have something attached to your pi, because it locks up

User avatar
ejohnfel
Posts: 7
Joined: Mon Feb 23, 2015 2:40 am
Location: Long Island NY
Contact: Website

Re: cpu temperature bug

Fri Aug 28, 2015 2:25 am

Hi mrkorb,

At least this proves I'm not crazy. :lol: I gave up trying to read the CPU temp with the LCD attached, I moved on to an external TMP36 sensor to sample the ambient temperature... not quite as good as measuring the CPU temp to monitor its performance, but I don't need it to be that accurate. For me, gross measurements are enough, I am only looking to protect the rig from overheating (say, should a fan fail or something of that nature).
It ain't over until the system crashes

User avatar
rpdom
Posts: 17727
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: cpu temperature bug

Fri Aug 28, 2015 4:54 am

mrkorb wrote: It's obviously something more simple than an Adafruit LCD readout is, but they're both things that attach to the Pi, and it's not too much of a stretch to see that the 'cat /sys/class/thermal/thermal_zone0/temp' command appears to be affected by them similarly.
More specifically they are both things that attach to the Pi using I2C. So it seems to be an issue with the I2C driver/hardware.

ShiftPlusOne
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6259
Joined: Fri Jul 29, 2011 5:36 pm
Location: The unfashionable end of the western spiral arm of the Galaxy

Re: cpu temperature bug

Fri Aug 28, 2015 5:35 am

This problem has been fixed.
https://github.com/raspberrypi/firmware/issues/192

Edit: I should elaborate that you need to run 'apt-get update && apt-get install raspberrypi-bootloader && reboot' to get the fix...

Return to “Troubleshooting”