jahboater
Posts: 4463
Joined: Wed Feb 04, 2015 6:38 pm

Re: Pi3 incorrect results under load (possibly heat related)

Mon Mar 14, 2016 11:35 pm

I looked at the code but unfortunately don't know ARM assembler very well. It appears to be an infinite loop that doesn't compute any residual to detect errors. If this understanding is correct, then this test provides no way to know whether computational errors were made in the cases when the system successfully throttles back.
You are right, the test does nothing to verify correctness - I mentioned it as an example of instability possible with a standard Pi3 at standard frequencies, in the open air. The Linpack suit is much better.
clivem wrote:
ejolson wrote: The fact that the system crashes before the CPU has a chance to overheat may implicate a faulty power supply or a too high resistance in the USB cable delivering the power.
Looks like I need to return 4x of the new "official" 2.5A power supplies to RS as well then......
Well it crashed after about 9 seconds, otherwise it would throttle back after about 13 seconds. Its the official Pi3 2.5A PSU with 18AWG cable. There is nothing plugged in except the PSU and I am using the on-board wifi for the ssh sessions. I'm not convinced its anything to do with the supply voltage.

ejolson
Posts: 3078
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 12:11 am

jahboater wrote:
I looked at the code but unfortunately don't know ARM assembler very well. It appears to be an infinite loop that doesn't compute any residual to detect errors. If this understanding is correct, then this test provides no way to know whether computational errors were made in the cases when the system successfully throttles back.
You are right, the test does nothing to verify correctness - I mentioned it as an example of instability possible with a standard Pi3 at standard frequencies, in the open air. The Linpack suit is much better.
clivem wrote:
ejolson wrote: The fact that the system crashes before the CPU has a chance to overheat may implicate a faulty power supply or a too high resistance in the USB cable delivering the power.
Looks like I need to return 4x of the new "official" 2.5A power supplies to RS as well then......
Well it crashed after about 9 seconds, otherwise it would throttle back after about 13 seconds. Its the official Pi3 2.5A PSU with 18AWG cable. There is nothing plugged in except the PSU and I am using the on-board wifi for the ssh sessions. I'm not convinced its anything to do with the supply voltage.
While I suspect marginal power supplies and cables might account for some of the reported problems, I would be surprised if the official power supply with the thicker wires was marginal. Maybe someone with a scope should monitor the voltage on board to see if there is any noticeable drop or increased ripple when running the Linpack benchmark using the official supply. All things considered, I'd be pretty happy if reducing the clock by 10% is all that's needed to make the 64-bit quad-core CPU stable without a heat sink.

Tom's Hardware removed the heat sink from a number of running x86 machines and demonstrated that a CPU can burn up in less than a second or two. It is possible the Pi 3B heats fast enough to experience malfunctions without a heat sink but not fast enough to actually catch on fire before throttling slows it down again.

clivem
Posts: 79
Joined: Sun Aug 03, 2014 11:18 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 12:34 am

jahboater wrote:I'm not convinced its anything to do with the supply voltage.
I was being sarcastic. ;) There's nowt wrong with the 4x 2.5A supplies I purchased. It's down to silicon variations between individual Pi3B's, IMHO. I've already seen 10-15degC differences between units..... Same load, same voltages, same clock, same case....

lumsdot
Posts: 119
Joined: Wed Mar 11, 2015 5:29 pm

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 3:33 am

Using my brand new and chunky samsung tab s2 tablet 2A charger and cable i get a big rainbow square when i enable the opengl driver on boot on my pi3.
Yet if i use another 2A phone charger and cable it works fine.

No way it can be pulling 2 amps. It must be very sensitive to something most likely voltage.
Every charger is made different
and has different levels of ripple on the output.
Plus using a micro usb cable to reliably pass 2amps is asking for trouble.
Fine for charging where dirty contacts just mean the charge may take a bit longer, but running a live system maybe not so good.

I.e. before doing any stress tests it would be ideal to make sure power supply is upto the job, so that it can be ruled out as part of problem.

ejolson
Posts: 3078
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 5:50 am

lumsdot wrote:I.e. before doing any stress tests it would be ideal to make sure power supply is upto the job, so that it can be ruled out as part of problem.
As damage to the Pi 3B hardware from running too many NEON instructions has been reported, it would also be wise to install a heat sink or under clock the system before trying any computationally intensive task such as solving a linear algebra problem. If it were my Pi, I'd set the parameter arm_freq=700 in config.txt, slowly increase it until Linpack fails and then set the final value 10 percent lower.

While it's educational to debug the Pi 3B hardware to figure out the heat sink, under clock and over voltage needed for it to run reliably, that fact that different parameters might be needed for each device could also be an impediment in many educational settings.

ziddey
Posts: 19
Joined: Thu Mar 10, 2016 7:42 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 6:43 am

Looks like mine is stable at stock voltage. -4 doesn't boot. -2 locks up after a second, and -1 after about a minute. With a small heatsink and fan, temps max out around 52'c. Previously thought mine was stable at 1.35ghz and +6, but cpuburn locks it up after a few seconds. +8 doesn't help

ejolson
Posts: 3078
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:14 am

ziddey wrote:Looks like mine is stable at stock voltage. -4 doesn't boot. -2 locks up after a second, and -1 after about a minute. With a small heatsink and fan, temps max out around 52'c. Previously thought mine was stable at 1.35ghz and +6, but cpuburn locks it up after a few seconds. +8 doesn't help
As cpuburn doesn't check whether the calculation is performed correctly, only very severe system malfunctions can be detected. As your hardware seems quite robust with stock settings, it might also get the right answer when running the Linpack linear algebra solver. It would be interesting to know for sure.

ziddey
Posts: 19
Joined: Thu Mar 10, 2016 7:42 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:34 am

ejolson wrote:
ziddey wrote:Looks like mine is stable at stock voltage. -4 doesn't boot. -2 locks up after a second, and -1 after about a minute. With a small heatsink and fan, temps max out around 52'c. Previously thought mine was stable at 1.35ghz and +6, but cpuburn locks it up after a few seconds. +8 doesn't help
As cpuburn doesn't check whether the calculation is performed correctly, only very severe system malfunctions can be detected. As your hardware seems quite robust with stock settings, it might also get the right answer when running the Linpack linear algebra solver. It would be interesting to know for sure.
Looks like default is N=8000. Locked up in half a minute :(

Did a test with +2 and it passed
Last edited by ziddey on Tue Mar 15, 2016 8:21 am, edited 1 time in total.

gkreidl
Posts: 5953
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:36 am

I think it's far too early to state that. It could also be a compiler / firmware / kernel problem.
But I'm wondering a bit why nobody from the "officials" is joining this thread.

My own observations about running a high load (converting a video with HandBrake) for about an hour showed that throttling worked well. It started at 80 C, the clock was between 922 and 960 MHz most of the time and temperature never went above 83.2 C. (no heat sink yet). No crash and the converted video has no errors either. The optimized HandBrake version is compiled for RPi2 and I don't know about the compiler options being used (especially about using NEON), but as it does conversion in almost real time on a RPi3 I'm quite sure that it must use NEON.
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 22733
Joined: Sat Jul 30, 2011 7:41 pm

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 9:36 am

Will flag this up to Phil and Dom at the Foundation.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

lb
Posts: 256
Joined: Sat Jan 28, 2012 8:07 pm

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 3:48 pm

I think I can rule out power issues in my case. With my 2000 mA rated RS Components PSU, surely enough the voltage drops from 5.13V at idle down to 4.83V under heavy load with xhpl. However, there are no dropouts below that and there's little line noise. I hooked up a scope to check this. So everything is still easily in spec.

Edit: fixed typo.
Last edited by lb on Tue Mar 15, 2016 9:11 pm, edited 1 time in total.

joyrider3774
Posts: 19
Joined: Sun Mar 13, 2016 12:21 pm

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 4:43 pm

just a quick consideration but could compile flags have a influence on the calculations and / or checks ?

consider the following quote from "https://gcc.gnu.org/onlinedocs/gcc-4.9. ... tions.html" in the -mfpu section

-mfpu=name
If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

it's just on a side note tough because i don't think the original poster probably used -mfpu and selecting a neon extension. It also say MAY lead so it's possible it doesn't happen at all and lastly it's also possible the loss of precision might have no impact on the calculation done by this program.

Just thought i should mention it anways

jahboater
Posts: 4463
Joined: Wed Feb 04, 2015 6:38 pm

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 5:03 pm

Just a quick consideration but could compile flags have a influence on the calculations and / or checks ?
Possibly, but only after a re-compilation.
The problem here is related to over heating and the effectiveness of the throttling mechanism.
ejolson wrote:
jojopi wrote:UPDATE: With very aggressive cooling (80mm fan blowing towards the existing heatsink), N=8000 now passes reliably for me at ~6.4Gflops, 53s.
This is a very interesting data point. Not only does the heat sink and fan make the Pi 3B run twice as fast, but it shows that without extra cooling the CPU doesn't throttle down fast enough to prevent errors when doing linear algebra. Maybe there are no 3Bs that can do this calculation reliably at 1.2 GHz without a heat sink. I wonder if there is an under clock setting that would work without the fan.
You are right about possible issues with denormals, but it would happen all the time, and the problem wouldn't go away by pointing a more powerful fan at the SoC.

Perhaps the reason why NEON keeps being mentioned is that it is so powerful. NEON on the Pi3 is quad issue and there are four cores, so it could potentially do 16 SIMD operations at once.
The cpuburn program uses the vaba instruction which subtracts two numbers, gets the absolute value and adds that to a result, on 4 separate 32 bit numbers - each instruction ...
Like the GPU it probably takes up quite a lot of the chip.

ejolson
Posts: 3078
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 5:39 pm

gkreidl wrote:
I think it's far too early to state that. It could also be a compiler / firmware / kernel problem.
But I'm wondering a bit why nobody from the "officials" is joining this thread.

My own observations about running a high load (converting a video with HandBrake) for about an hour showed that throttling worked well. It started at 80 C, the clock was between 922 and 960 MHz most of the time and temperature never went above 83.2 C. (no heat sink yet). No crash and the converted video has no errors either. The optimized HandBrake version is compiled for RPi2 and I don't know about the compiler options being used (especially about using NEON), but as it does conversion in almost real time on a RPi3 I'm quite sure that it must use NEON.
I agree at the moment that there is no incontrovertible proof that running NEON optimized code has, in fact, damaged any Pi. However, I don't want to be the one who provides that proof either! There has been one report claiming damage, the cpuburn source code contains a warning
cpuburn-a53.S wrote:WARNING: improperly cooled or otherwise flawed hardware may potentially overheat and fail. Use at your own risk.
and the Tom's Hardware video clearly demonstrates damage when removing the heatsink on a running x86 system. Therefore I would be careful with any Pi 3B that doesn't have a heat sink installed.

As far as I know most video encoding software uses integer arithmetic that may not generate as much heat as the floating point arithmetic used when solving linear algebra problems. While the video may play fine, it is difficult to claim there were no errors unless you have encoded the same stream twice and done a bit-level compare on the output to ensure to both encodes are identical.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5268
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 5:51 pm

Can you try adding to config.txt:

Code: Select all

over_voltage=2
and report if the test passes/fails?

ejolson
Posts: 3078
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 6:02 pm

dom wrote:Can you try adding to config.txt:

Code: Select all

over_voltage=2
and report if the test passes/fails?
One person earlier in this thread reported that +2 over volt helped one system but not the other.
lb wrote:I think I can rule out power issues in my case. With my 2000 mA rated RS Components PSU, surely enough the voltage drops from 5.13V at idle down to 5.83V under heavy load with xhpl. However, there are no dropouts below that and there's little line noise. I hooked up a scope to check this. So everything is still easily in spec.
I suspect that is a typo and meant to be 4.83 volts.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5268
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 6:21 pm

ejolson wrote:One person earlier in this thread[/url] reported that +2 over volt helped one system but not the other.
Okay, if 2 doesn't work then try 3 or 4. If 2 works then try 1.
Just trying to get a feel if all failures can be solved by extra voltage and how much is needed.

(2 works on my board).

clivem
Posts: 79
Joined: Sun Aug 03, 2014 11:18 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:09 pm

dom wrote: (2 works on my board).
Dom, I was going to leave this overnight....... +4, now appears to have stabilised the board that hard-locked as soon as it got the faint whiff of NEON code, and +2 on the ">100degC" board, seems to have it stable again.

ziddey
Posts: 19
Joined: Thu Mar 10, 2016 7:42 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:11 pm

Just tested 10 runs with +1. All passed. However, at stock voltage, the system locks up hard after about half a minute, so I'm tempted to run +2 just in case. Seeing a 4'c difference in load temps between +1 and +2.

deater
Posts: 27
Joined: Fri Mar 11, 2016 3:58 pm
Location: 45N

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:25 pm

over_voltage=2 works on my board, I have done multiple linpack runs with N=10000 and they have finished properly.

I do have a small heatsink on the processor though.

clivem
Posts: 79
Joined: Sun Aug 03, 2014 11:18 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:35 pm

Sometimes one has to wonder if it wouldn't have been wiser to have made an effort to get boards into testers hands before the public launch and listen to any feedback that may have resulted........ Instead of the secret squirrel, top secret bullshit, prior to launching the new hardware.......

Instead of which, it looks like the "stock" voltage to run reliably at the advertised 1.2GHz is going to need to be increased on a board, where the manufacturer already decided to skimp on the 10 cents per unit that a passive extruded ali heatsink, attached with thermal tape, would have cost in bulk.... More voltage, more heat, no "stock" thermal solution, funny..... LOL.

Heater
Posts: 12708
Joined: Tue Jul 17, 2012 3:02 pm

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:46 pm

clivem,

I know what you mean but it's impossible to test for everything.

For example: Years back we found that on an Intel 286 if you did a multiply by an immediate value that happened to be negative whilst in "protected" mode you basically got a random number as a result. Subsequently Intel provided us, under NDA, a thick document describing all the "features" found in the 286.

Such issues have been going on for ages.

The Pi 3 could have been given to a thousand testers prior to launch and the issue under discussion here may not have been found.

If indeed it is an issue.

hippy
Posts: 5368
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:48 pm

clivem wrote:Sometimes one has to wonder if it wouldn't have been wiser to have made an effort to get boards into testers hands before the public launch and listen to any feedback that may have resulted........ Instead of the secret squirrel, top secret bullshit, prior to launching the new hardware.......
I had similar views regarding the circuit errors and design issues which affected earlier Pi boards which may have been spotted and rectified earlier if exposed to wider scrutiny.

But it's the Foundation's product and their right to choose how to develop and release products.

ejolson
Posts: 3078
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 7:59 pm

clivem wrote:Sometimes one has to wonder if it wouldn't have been wiser to have made an effort to get boards into testers hands before the public launch and listen to any feedback that may have resulted........ Instead of the secret squirrel, top secret bullshit, prior to launching the new hardware.......

Instead of which, it looks like the "stock" voltage to run reliably at the advertised 1.2GHz is going to need to be increased on a board, where the manufacturer already decided to skimp on the 10 cents per unit that a passive extruded ali heatsink, attached with thermal tape, would have cost in bulk.... More voltage, more heat, no "stock" thermal solution, funny..... LOL.
It is reasonable to assume the board was tested by third parties under a non-disclosure agreement. Apparently the third parties chosen did not have the technical expertise or imagination to run the Linpack benchmark. It should be pointed out that the original post is apparently from a researcher in computer engineering at the university of Maine whose faculty website states
Dr Weaver wrote:My areas of interest include:
. Hardware Performance Counters
. Computer Architecture
. High Performance Computing
. Architectural Simulation
. Dynamic Binary Instrumentation
. Linux Kernel
. Embedded Systems
. Operating Systems
. Assembly Language Programming
Still, it does seem unfortunate that more rigorous testing was not performed before the unit went into production as it appears a heatsink and voltage change may be needed for every unit currently in the field.

deater
Posts: 27
Joined: Fri Mar 11, 2016 3:58 pm
Location: 45N

Re: Pi3 incorrect results under load (possibly heat related)

Tue Mar 15, 2016 8:11 pm

haha yes, but being an engineering professor doesn't necessarily make me a better tester.

I'm one of those crazy people who makes large computing clusters out of the PIs, which is why I care about Linpack performance (Specifically GFLOPS/W). I can see how that's sort of outside the normal testing area for the boards.

My cluster originally started out as 24 Pi-Bs, then I had to upgrade to Pi-B+ because the power was lower, than Pi-2s because of the leap in performance. So when the Pi-3 came out I had to get one and test, because it does seem that even when throttled to 600MHz a Pi-3 still runs about 3 times as fast as a Pi-2, and has a really impressive GFLOPS/W ratio.

Though to be honest, my cluster would probably benefit a lot more from a Pi board with a sane ethernet setup.

The rapid upgrade treadmill is hurting my operating system class where we write a custom operating system for the Pi. I've been making the students stick to the B/B+ just because the changes with the 2 and now the 3 are complicated enough that trying to write bare-metal code that works on all of the various models is not really practical for an intro class.

Return to “General discussion”