ejolson
Posts: 3424
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Fri Mar 18, 2016 5:59 pm

deater wrote:And when I up the voltage in config.txt it runs fine.

I'm not sure what compiling for single thread armv6 would do at all, except potentially stress the CPU a bit less.
It is interesting that +2 resolves the errors for your Pi 3, I think others couldn't fix things with any over volt and had to resort to under clocking.

For many the interest in running ARMv6 code is binary compatibility with older models of Pi. This is useful in code clubs where sdcards may be swapped between devices and heterogenous PiNet setups where different models of Pi are running binaries from the same network file system. Also interesting is the possibility that no over volt settings would be necessary on the Pi 3 when running ARMv6 code and how much performance is lost.

When I compiled the Intel/MIT Cilkplus runtime for ARM, I chose suboptimal memory fence instructions and ARMv6 compatibility so any program linked against the library could run on any Pi model. From what I could tell, this affected performance of the library (which implements work stealing algorithms for threads) by less than 5 percent.

clivem
Posts: 79
Joined: Sun Aug 03, 2014 11:18 am

Re: Pi3 incorrect results under load (possibly heat related)

Fri Mar 18, 2016 9:34 pm

deater wrote: What exactly are you getting at here?
I hope you don't think I am ignoring you. I can't really answer that question without posting what I have found, which I don't want to do before I have all my ducks in a row. Suffice to say, that I am now more intimately familiar with Linpack, openblas, vfpv3, neon, pthreads, openmp, than I ever wanted (or needed) to be......

binaryhermit
Posts: 54
Joined: Sun Apr 13, 2014 1:26 am
Location: Lockport, Illinois
Contact: Website

Re: Pi3 incorrect results under load (possibly heat related)

Sat Mar 19, 2016 2:14 am

ejolson wrote: Tom's Hardware removed the heat sink from a number of running x86 machines and demonstrated that a CPU can burn up in less than a second or two. It is possible the Pi 3B heats fast enough to experience malfunctions without a heat sink but not fast enough to actually catch on fire before throttling slows it down again.
That reminded me of (WARNING: Foul language and German accents) https://www.youtube.com/watch?v=ssL1DA_K0sI

Apparently overclocking a MHz war era AMD processor to ~3.8 GHz and removing the heatsink can cause the CPU to explode.

EDIT: Apparently it's a "Spitfire" AMD Duron (initially clocked somewhere between 600 and 950 MHz according to wikipedia). And upon further review, screen images may be fake since it seems unlikely you could boot a 950 MHz processor overclocked to over 3.8 GHz no matter what cooling you used, in my opinion.

Still, exploding CPU.

jahboater
Posts: 4607
Joined: Wed Feb 04, 2015 6:38 pm

Re: Pi3 incorrect results under load (possibly heat related)

Sat Mar 19, 2016 9:24 am

Another data point:
At stock frequencies, with a heatsink (25C/W) it required over_voltage=2 to pass the linpack N=10000
+1 failed, and 0 froze the machine.

Other stuff (such as a five hour compilation with make -j5) runs fine with the stock voltage.

Nickcn
Posts: 200
Joined: Sat Mar 05, 2016 8:18 pm
Location: USA

Re: Pi3 incorrect results under load (possibly heat related)

Mon Mar 21, 2016 8:49 pm

delete

ziddey
Posts: 19
Joined: Thu Mar 10, 2016 7:42 am

Re: Pi3 incorrect results under load (possibly heat related)

Mon Mar 21, 2016 9:00 pm

Mine passed N=8000 with +1 (8-hour loop). However, just tested N=10000 and it locked up on the first pass. So far so good with +2. :roll:

Nickcn
Posts: 200
Joined: Sat Mar 05, 2016 8:18 pm
Location: USA

Re: Pi3 incorrect results under load (possibly heat related)

Thu Mar 24, 2016 7:25 pm

deleted
Last edited by Nickcn on Mon May 16, 2016 2:52 am, edited 1 time in total.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5318
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Pi3 incorrect results under load (possibly heat related)

Thu Mar 24, 2016 8:41 pm

Nickcn wrote: So I am just wondering, when I try to set arm freq to 1300, and it locks up, does that indicate that temps are just getting too high too fast? Or is something wrong with this config (lowering or raising over_voltage also causes crash/reboot during cpuburn).
It's not just temperature that matters. If you clock your Pi3 at 2GHz it's not going to run even you cool it to 0'C.
It takes time for signals to propagate across the silicon and that limits the maximum frequency.
It may just be that 1275MHz is the limit for your Pi.

ejolson
Posts: 3424
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Thu Apr 07, 2016 4:41 am

clivem wrote:
deater wrote: What exactly are you getting at here?
I hope you don't think I am ignoring you. I can't really answer that question without posting what I have found, which I don't want to do before I have all my ducks in a row. Suffice to say, that I am now more intimately familiar with Linpack, openblas, vfpv3, neon, pthreads, openmp, than I ever wanted (or needed) to be......
There hasn't been any updates on this thread for a while so I'm wondering whether the problem of incorrect answers was fixed and affected only a few Pi 3 units or whether the problem is widespread but most people don't have the interest to diagnose it using the linpack linear algebra solver. Any replies or independent tests of additional non-overclocked Pi 3's using the linpack program linked earlier in this thread would be welcome.

User avatar
Gerd
Posts: 66
Joined: Wed Mar 16, 2016 10:48 am
Location: Europe

Re: Pi3 incorrect results under load (possibly heat related)

Sat Apr 23, 2016 6:32 am

ejolson wrote:There hasn't been any updates on this thread for a while so I'm wondering whether the problem of incorrect answers was fixed and affected only a few Pi 3 units or whether the problem is widespread but most people don't have the interest to diagnose it using the linpack linear algebra solver. Any replies or independent tests of additional non-overclocked Pi 3's using the linpack program linked earlier in this thread would be welcome.
Please find attached the documentation of my tests. I needed over_voltage=3 at stock (arm=1200) to get no crashes. Interestingly all crashes occured before throtteling could kick in. Never had result not passing test, plain digital, crash or good.
linpack Raspi.png
linpack Raspi.png (33.59 KiB) Viewed 10759 times
Edit:
Did some more tests. This time with all oc removed, also the ram. Also ran with N=10000 which passed after overvolting to 2. Also see that ram oc yields 0,5 Gflops, but needs overvolting to 3
linpack raspi1.png
linpack raspi1.png (10.14 KiB) Viewed 10701 times
The big point to me is:
with over_voltage=0 i can run sysbench, memtester for hours (without fan); also cpuburn (with fan)
BUT linpack needs 2
So being able to run cpuburn for hours only means that you have a good fan, nothing more.

User avatar
Rive
Posts: 586
Joined: Sat Mar 26, 2016 5:21 pm
Location: USA

Re: Pi3 incorrect results under load (possibly heat related)

Sat Apr 23, 2016 9:13 pm

My pi3:
viewtopic.php?f=63&t=144391&p=952371#p952371
Image

My Overclock (Ambient: 20C; Idle: 28C; Full Load with Stress 43C; Full Load with cpuburn-a53 61C; With Linpack 54C)

Code: Select all

dtparam=sd_overclock=100
arm_freq=1260
core_freq=500
over_voltage=4
sdram_freq=575
sdram_schmoo=0x02000020
over_voltage_sdram_p=6
over_voltage_sdram_i=4
over_voltage_sdram_c=4
v3d_freq=500
h264_freq=333
linpack on overclocked pi3 @ 1.26 GHz N=8000 Gflop 6.74

Code: Select all

[email protected]:~ $ ./xhpl
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              50.63              6.743e+00
HPL_pdgesv() start time Sat Apr 23 14:38:56 2016

HPL_pdgesv() end time   Sat Apr 23 14:39:47 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
N=8000 Gflop 6.6 with overclocks appears to be stable (no fails)

Code: Select all

[email protected]:~ $ ./xhpl
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              51.27              6.660e+00
HPL_pdgesv() start time Sun Apr 24 09:13:21 2016

HPL_pdgesv() end time   Sun Apr 24 09:14:12 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
[email protected]:~ $ ./xhpl
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              51.47              6.634e+00
HPL_pdgesv() start time Sun Apr 24 09:14:53 2016

HPL_pdgesv() end time   Sun Apr 24 09:15:44 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
[email protected]:~ $ ./xhpl
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              51.69              6.605e+00
HPL_pdgesv() start time Sun Apr 24 09:16:14 2016

HPL_pdgesv() end time   Sun Apr 24 09:17:06 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
Stock at 1.2 GHz (over_voltage=4) N=8000 Gflop 6.25

Code: Select all

[email protected]:~ $ ./xhpl
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              54.59              6.255e+00
HPL_pdgesv() start time Sat Apr 23 14:55:29 2016

HPL_pdgesv() end time   Sat Apr 23 14:56:23 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
Stock at 1.2 GHz (over_voltage=1) N=8000 Gflop 6.16

Code: Select all

[email protected]:~ $ ./xhpl
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              55.37              6.166e+00
HPL_pdgesv() start time Sat Apr 23 15:14:17 2016

HPL_pdgesv() end time   Sat Apr 23 15:15:12 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
Last edited by Rive on Mon Apr 25, 2016 3:33 am, edited 18 times in total.
DNPNWO

User avatar
Gerd
Posts: 66
Joined: Wed Mar 16, 2016 10:48 am
Location: Europe

Re: Pi3 incorrect results under load (possibly heat related)

Sat Apr 23, 2016 11:04 pm

N=10000?

User avatar
Rive
Posts: 586
Joined: Sat Mar 26, 2016 5:21 pm
Location: USA

Re: Pi3 incorrect results under load (possibly heat related)

Sat Apr 23, 2016 11:06 pm

Gerd wrote:N=10000?
Show me yours at N=10000
DNPNWO

User avatar
Gerd
Posts: 66
Joined: Wed Mar 16, 2016 10:48 am
Location: Europe

Re: Pi3 incorrect results under load (possibly heat related)

Sat Apr 23, 2016 11:15 pm

Rive wrote:
Gerd wrote:N=10000?
Show me yours at N=10000
The post above? download/file.php?id=14480

User avatar
Rive
Posts: 586
Joined: Sat Mar 26, 2016 5:21 pm
Location: USA

Re: Pi3 incorrect results under load (possibly heat related)

Sat Apr 23, 2016 11:48 pm

Gerd wrote:
Rive wrote:
Gerd wrote:N=10000?
Show me yours at N=10000
The post above? download/file.php?id=14480
I am doing a pi3 restore/new backup, I will try 10000 later and post.
DNPNWO

User avatar
Rive
Posts: 586
Joined: Sat Mar 26, 2016 5:21 pm
Location: USA

Re: Pi3 incorrect results under load (possibly heat related)

Sun Apr 24, 2016 12:50 am

Passed. Full Overclock 1.26 GHz N=10000 59C Gflops 6.85

Code: Select all

dtparam=sd_overclock=100
arm_freq=1260
core_freq=500
over_voltage=4
sdram_freq=575
sdram_schmoo=0x02000020
over_voltage_sdram_p=6
over_voltage_sdram_i=4
over_voltage_sdram_c=4
v3d_freq=500
h264_freq=333

Code: Select all

[email protected]:~ $ ./xhpl
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2       10000   256     1     1              97.22              6.859e+00
HPL_pdgesv() start time Sat Apr 23 20:46:32 2016

HPL_pdgesv() end time   Sat Apr 23 20:48:10 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0023045 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
DNPNWO

datajerk
Posts: 7
Joined: Tue Sep 16, 2014 3:52 pm

Re: Pi3 incorrect results under load (possibly heat related)

Wed Jun 01, 2016 10:57 am

I know I'm late to the party, but I just discovered this problem myself today with xhpl and openblas. Googling led me here.

Code: Select all

over_voltage=4
Works for me, but 2 did not (either locked up, or FAILED to validate solution). My results below. My rpi3 is not in a case, has a cheap heatsink about the size of a sugar cube just resting on top (no paste), and a 6" fan 4" from the rpi3 blowing at high speed. Without the fan IIRC I get 5 Gflops. As long as the temp stays below 85C I think I'm ok. I wrote my own integer benchmark (sha256 hashes/sec). Same issue with performance and temp. As long as I keep the temp below 85C I get max performance.

I instrumented my output to report the performance and the temp. Amps peak at 1.7. office_temp: 74.5F, office_humi: 20

N=8000:

Code: Select all

================================================================================
HPLinpack 2.0  --  High-Performance Linpack benchmark  --   September 10, 2008
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

Column=    256 Mflops=     6527 Temp= 53692 WC=00:00:05 ETA=00:00:47 ETT=00:00:52   9.3%
Column=    512 Mflops=     6552 Temp= 59072 WC=00:00:09 ETA=00:00:43 ETT=00:00:52  18.0%
Column=    768 Mflops=     6524 Temp= 62300 WC=00:00:14 ETA=00:00:38 ETT=00:00:52  26.1%
Column=   1024 Mflops=     6520 Temp= 64451 WC=00:00:18 ETA=00:00:34 ETT=00:00:52  33.7%
Column=   1280 Mflops=     6509 Temp= 65528 WC=00:00:21 ETA=00:00:31 ETT=00:00:52  40.7%
Column=   1536 Mflops=     6499 Temp= 66604 WC=00:00:25 ETA=00:00:28 ETT=00:00:53  47.2%
Column=   1792 Mflops=     6481 Temp= 67679 WC=00:00:28 ETA=00:00:25 ETT=00:00:53  53.3%
Column=   2048 Mflops=     6461 Temp= 68218 WC=00:00:31 ETA=00:00:22 ETT=00:00:53  58.8%
Column=   2304 Mflops=     6445 Temp= 68756 WC=00:00:34 ETA=00:00:19 ETT=00:00:53  63.9%
Column=   2560 Mflops=     6433 Temp= 69294 WC=00:00:36 ETA=00:00:17 ETT=00:00:53  68.6%
Column=   2816 Mflops=     6419 Temp= 69832 WC=00:00:39 ETA=00:00:14 ETT=00:00:53  72.8%
Column=   3072 Mflops=     6408 Temp= 70369 WC=00:00:41 ETA=00:00:12 ETT=00:00:53  76.6%
Column=   3328 Mflops=     6394 Temp= 69832 WC=00:00:43 ETA=00:00:10 ETT=00:00:53  80.1%
Column=   3584 Mflops=     6380 Temp= 70908 WC=00:00:44 ETA=00:00:09 ETT=00:00:53  83.2%
Column=   3840 Mflops=     6366 Temp= 69832 WC=00:00:46 ETA=00:00:08 ETT=00:00:54  85.9%
Column=   4096 Mflops=     6353 Temp= 70369 WC=00:00:47 ETA=00:00:07 ETT=00:00:54  88.4%
Column=   4352 Mflops=     6339 Temp= 70369 WC=00:00:49 ETA=00:00:05 ETT=00:00:54  90.5%
Column=   4608 Mflops=     6326 Temp= 70908 WC=00:00:50 ETA=00:00:04 ETT=00:00:54  92.4%
Column=   4864 Mflops=     6312 Temp= 69832 WC=00:00:51 ETA=00:00:03 ETT=00:00:54  94.0%
Column=   5120 Mflops=     6299 Temp= 70369 WC=00:00:52 ETA=00:00:02 ETT=00:00:54  95.3%
Column=   5376 Mflops=     6285 Temp= 69832 WC=00:00:52 ETA=00:00:02 ETT=00:00:54  96.5%
Column=   5632 Mflops=     6273 Temp= 69832 WC=00:00:53 ETA=00:00:01 ETT=00:00:54  97.4%
Column=   5888 Mflops=     6262 Temp= 69832 WC=00:00:54 ETA=00:00:01 ETT=00:00:55  98.2%
Column=   6144 Mflops=     6251 Temp= 69832 WC=00:00:54 ETA=00:00:01 ETT=00:00:55  98.8%
Column=   6400 Mflops=     6241 Temp= 69832 WC=00:00:54 ETA=00:00:01 ETT=00:00:55  99.2%
Column=   6656 Mflops=     6232 Temp= 69294 WC=00:00:55 ETA=00:00:00 ETT=00:00:55  99.5%
Column=   6912 Mflops=     6224 Temp= 69294 WC=00:00:55 ETA=00:00:00 ETT=00:00:55  99.7%
Column=   7168 Mflops=     6217 Temp= 69294 WC=00:00:55 ETA=00:00:00 ETT=00:00:55  99.9%
Column=   7424 Mflops=     6211 Temp= 68756 WC=00:00:55 ETA=00:00:00 ETT=00:00:55 100.0%
Column=   7680 Mflops=     6207 Temp= 69832 WC=00:00:55 ETA=00:00:00 ETT=00:00:55 100.0%
Column=   7936 Mflops=     6205 Temp= 69294 WC=00:00:55 ETA=00:00:00 ETT=00:00:55 100.0%
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              55.23              6.182e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
N=10000:

Code: Select all

================================================================================
HPLinpack 2.0  --  High-Performance Linpack benchmark  --   September 10, 2008
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

Column=    256 Mflops=     6634 Temp= 58533 WC=00:00:08 ETA=00:01:32 ETT=00:01:40   7.5%
Column=    512 Mflops=     6621 Temp= 63914 WC=00:00:15 ETA=00:01:26 ETT=00:01:41  14.6%
Column=    768 Mflops=     6637 Temp= 67142 WC=00:00:21 ETA=00:01:19 ETT=00:01:40  21.3%
Column=   1024 Mflops=     6646 Temp= 69832 WC=00:00:28 ETA=00:01:12 ETT=00:01:40  27.7%
Column=   1280 Mflops=     6644 Temp= 70908 WC=00:00:34 ETA=00:01:06 ETT=00:01:40  33.7%
Column=   1536 Mflops=     6636 Temp= 71984 WC=00:00:40 ETA=00:01:00 ETT=00:01:40  39.4%
Column=   1792 Mflops=     6635 Temp= 71984 WC=00:00:45 ETA=00:00:55 ETT=00:01:40  44.7%
Column=   2048 Mflops=     6633 Temp= 73598 WC=00:00:50 ETA=00:00:50 ETT=00:01:40  49.7%
Column=   2304 Mflops=     6624 Temp= 73060 WC=00:00:55 ETA=00:00:46 ETT=00:01:41  54.4%
Column=   2560 Mflops=     6610 Temp= 73598 WC=00:00:59 ETA=00:00:42 ETT=00:01:41  58.8%
Column=   2816 Mflops=     6601 Temp= 75212 WC=00:01:04 ETA=00:00:37 ETT=00:01:41  62.9%
Column=   3072 Mflops=     6592 Temp= 74136 WC=00:01:07 ETA=00:00:34 ETT=00:01:41  66.7%
Column=   3328 Mflops=     6579 Temp= 74674 WC=00:01:11 ETA=00:00:30 ETT=00:01:41  70.3%
Column=   3584 Mflops=     6573 Temp= 74674 WC=00:01:15 ETA=00:00:26 ETT=00:01:41  73.6%
Column=   3840 Mflops=     6565 Temp= 74674 WC=00:01:18 ETA=00:00:24 ETT=00:01:42  76.6%
Column=   4096 Mflops=     6557 Temp= 74674 WC=00:01:21 ETA=00:00:21 ETT=00:01:42  79.4%
Column=   4352 Mflops=     6549 Temp= 75212 WC=00:01:23 ETA=00:00:19 ETT=00:01:42  82.0%
Column=   4608 Mflops=     6540 Temp= 74674 WC=00:01:26 ETA=00:00:16 ETT=00:01:42  84.3%
Column=   4864 Mflops=     6531 Temp= 74674 WC=00:01:28 ETA=00:00:14 ETT=00:01:42  86.5%
Column=   5120 Mflops=     6521 Temp= 74136 WC=00:01:30 ETA=00:00:12 ETT=00:01:42  88.4%
Column=   5376 Mflops=     6511 Temp= 74136 WC=00:01:32 ETA=00:00:10 ETT=00:01:42  90.1%
Column=   5632 Mflops=     6500 Temp= 74136 WC=00:01:34 ETA=00:00:09 ETT=00:01:43  91.7%
Column=   5888 Mflops=     6491 Temp= 74136 WC=00:01:36 ETA=00:00:07 ETT=00:01:43  93.0%
Column=   6144 Mflops=     6483 Temp= 73598 WC=00:01:37 ETA=00:00:06 ETT=00:01:43  94.3%
Column=   6400 Mflops=     6474 Temp= 73598 WC=00:01:38 ETA=00:00:05 ETT=00:01:43  95.3%
Column=   6656 Mflops=     6466 Temp= 73598 WC=00:01:39 ETA=00:00:04 ETT=00:01:43  96.3%
Column=   6912 Mflops=     6458 Temp= 74136 WC=00:01:40 ETA=00:00:03 ETT=00:01:43  97.1%
Column=   7168 Mflops=     6450 Temp= 73598 WC=00:01:41 ETA=00:00:02 ETT=00:01:43  97.7%
Column=   7424 Mflops=     6442 Temp= 73060 WC=00:01:42 ETA=00:00:01 ETT=00:01:43  98.3%
Column=   7680 Mflops=     6435 Temp= 73060 WC=00:01:42 ETA=00:00:02 ETT=00:01:44  98.8%
Column=   7936 Mflops=     6428 Temp= 73060 WC=00:01:43 ETA=00:00:01 ETT=00:01:44  99.1%
Column=   8192 Mflops=     6422 Temp= 72522 WC=00:01:43 ETA=00:00:01 ETT=00:01:44  99.4%
Column=   8448 Mflops=     6416 Temp= 73060 WC=00:01:44 ETA=00:00:00 ETT=00:01:44  99.6%
Column=   8704 Mflops=     6411 Temp= 73060 WC=00:01:44 ETA=00:00:00 ETT=00:01:44  99.8%
Column=   8960 Mflops=     6407 Temp= 72522 WC=00:01:44 ETA=00:00:00 ETT=00:01:44  99.9%
Column=   9216 Mflops=     6403 Temp= 72522 WC=00:01:44 ETA=00:00:00 ETT=00:01:44 100.0%
Column=   9472 Mflops=     6400 Temp= 71984 WC=00:01:44 ETA=00:00:00 ETT=00:01:44 100.0%
Column=   9728 Mflops=     6398 Temp= 71984 WC=00:01:44 ETA=00:00:00 ETT=00:01:44 100.0%
Column=   9984 Mflops=     6397 Temp= 72522 WC=00:01:44 ETA=00:00:00 ETT=00:01:44 100.0%
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2       10000   256     1     1             104.56              6.378e+00

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0023045 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
N=10000, fan off, watch the perf start to drop off quickly when temp gets around 85C (NOTE: sustained performance is expected to drop as the solution nears completion (increased communication vs computation). Compare with above 10K results to see effects of temp and no fan).

Code: Select all

================================================================================
HPLinpack 2.0  --  High-Performance Linpack benchmark  --   September 10, 2008
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

Column=    256 Mflops=     6772 Temp= 61224 WC=00:00:07 ETA=00:01:31 ETT=00:01:38   7.5%
Column=    512 Mflops=     6772 Temp= 67679 WC=00:00:14 ETA=00:01:24 ETT=00:01:38  14.6%
Column=    768 Mflops=     6766 Temp= 73060 WC=00:00:21 ETA=00:01:18 ETT=00:01:39  21.3%
Column=   1024 Mflops=     6752 Temp= 76287 WC=00:00:27 ETA=00:01:12 ETT=00:01:39  27.7%
Column=   1280 Mflops=     6730 Temp= 79516 WC=00:00:33 ETA=00:01:06 ETT=00:01:39  33.7%
Column=   1536 Mflops=     6690 Temp= 81668 WC=00:00:39 ETA=00:01:01 ETT=00:01:40  39.4%
Column=   1792 Mflops=     6596 Temp= 82205 WC=00:00:45 ETA=00:00:56 ETT=00:01:41  44.7%
Column=   2048 Mflops=     6486 Temp= 82744 WC=00:00:51 ETA=00:00:52 ETT=00:01:43  49.7%
Column=   2304 Mflops=     6369 Temp= 83820 WC=00:00:57 ETA=00:00:48 ETT=00:01:45  54.4%
Column=   2560 Mflops=     6254 Temp= 83820 WC=00:01:03 ETA=00:00:44 ETT=00:01:47  58.8%
Column=   2816 Mflops=     6147 Temp= 84358 WC=00:01:08 ETA=00:00:40 ETT=00:01:48  62.9%
Column=   3072 Mflops=     6046 Temp= 84358 WC=00:01:14 ETA=00:00:36 ETT=00:01:50  66.7%
Column=   3328 Mflops=     5951 Temp= 83820 WC=00:01:19 ETA=00:00:33 ETT=00:01:52  70.3%
Column=   3584 Mflops=     5864 Temp= 83820 WC=00:01:24 ETA=00:00:30 ETT=00:01:54  73.6%
Column=   3840 Mflops=     5787 Temp= 84896 WC=00:01:28 ETA=00:00:27 ETT=00:01:55  76.6%
Column=   4096 Mflops=     5711 Temp= 82744 WC=00:01:33 ETA=00:00:24 ETT=00:01:57  79.4%
Column=   4352 Mflops=     5669 Temp= 84896 WC=00:01:36 ETA=00:00:22 ETT=00:01:58  82.0%
Column=   4608 Mflops=     5601 Temp= 82205 WC=00:01:40 ETA=00:00:19 ETT=00:01:59  84.3%
Column=   4864 Mflops=     5571 Temp= 84896 WC=00:01:43 ETA=00:00:17 ETT=00:02:00  86.5%
Column=   5120 Mflops=     5525 Temp= 85434 WC=00:01:47 ETA=00:00:14 ETT=00:02:01  88.4%
Column=   5376 Mflops=     5491 Temp= 84896 WC=00:01:49 ETA=00:00:12 ETT=00:02:01  90.1%
Column=   5632 Mflops=     5463 Temp= 85434 WC=00:01:52 ETA=00:00:10 ETT=00:02:02  91.7%
Column=   5888 Mflops=     5431 Temp= 83820 WC=00:01:54 ETA=00:00:09 ETT=00:02:03  93.0%
Column=   6144 Mflops=     5414 Temp= 84896 WC=00:01:56 ETA=00:00:07 ETT=00:02:03  94.3%
Column=   6400 Mflops=     5390 Temp= 84358 WC=00:01:58 ETA=00:00:06 ETT=00:02:04  95.3%
Column=   6656 Mflops=     5364 Temp= 82744 WC=00:02:00 ETA=00:00:04 ETT=00:02:04  96.3%
Column=   6912 Mflops=     5356 Temp= 84896 WC=00:02:01 ETA=00:00:03 ETT=00:02:04  97.1%
Column=   7168 Mflops=     5344 Temp= 84896 WC=00:02:02 ETA=00:00:03 ETT=00:02:05  97.7%
Column=   7424 Mflops=     5330 Temp= 84896 WC=00:02:03 ETA=00:00:02 ETT=00:02:05  98.3%
Column=   7680 Mflops=     5318 Temp= 84896 WC=00:02:04 ETA=00:00:01 ETT=00:02:05  98.8%
Column=   7936 Mflops=     5307 Temp= 84896 WC=00:02:05 ETA=00:00:01 ETT=00:02:06  99.1%
Column=   8192 Mflops=     5298 Temp= 84896 WC=00:02:05 ETA=00:00:01 ETT=00:02:06  99.4%
Column=   8448 Mflops=     5291 Temp= 84896 WC=00:02:06 ETA=00:00:00 ETT=00:02:06  99.6%
Column=   8704 Mflops=     5284 Temp= 84896 WC=00:02:06 ETA=00:00:00 ETT=00:02:06  99.8%
Column=   8960 Mflops=     5279 Temp= 84358 WC=00:02:06 ETA=00:00:00 ETT=00:02:06  99.9%
Column=   9216 Mflops=     5275 Temp= 84358 WC=00:02:06 ETA=00:00:00 ETT=00:02:06 100.0%
Column=   9472 Mflops=     5272 Temp= 83820 WC=00:02:06 ETA=00:00:00 ETT=00:02:06 100.0%
Column=   9728 Mflops=     5271 Temp= 84358 WC=00:02:06 ETA=00:00:00 ETT=00:02:06 100.0%
Column=   9984 Mflops=     5269 Temp= 84358 WC=00:02:07 ETA=00:00:00 ETT=00:02:07 100.0%
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2       10000   256     1     1             126.87              5.256e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0023045 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

ejolson
Posts: 3424
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Sat Oct 01, 2016 8:16 pm

datajerk wrote:I know I'm late to the party, but I just discovered this problem myself today with xhpl and openblas.
I held off purchasing a Raspberry Pi 3B until now because of errors reported in this thread. Today, I tried to use hpl-2.2 with OpenBLAS-0.2.19 to invert a 8000x8000 system of linear equations. With all clock settings at default, my new Raspberry Pi 3B running the new Raspbian Pixel distribution with the 4.4.21-v7+ #911 kernel locked up within a couple seconds of starting the program. I rebooted the Pi, tried again and exactly the same thing happened. The power supply is a 2.5amp unit with thick looking wires supplied by CanaKit specifically for the 3B. No low-power warnings flashed on the screen either run. A small aluminum heatsink, also supplied as part of the kit, was attached to the SOC prior to testing.

Underclocking as

Code: Select all

arm_freq=900
core_freq=250
sdram_freq=450
resulted in stable operation. Even at these settings, CPU temperatures reached 84.9'C as reported by vcgencmd measure_temp and throttling to 756Mhz was observed. The linpack test averaged 4.4 Gflops and passed the residual check. This is about 73% the performance of a Pi 3B running at advertised speeds. I will experiment with different cooling solutions and report further results in a subsequent post.

The purpose of this post is to point out that the stability issues discussed in this thread still exist with recently shipping hardware running the most recent version of Raspbian.

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: Pi3 incorrect results under load (possibly heat related)

Sat Oct 01, 2016 8:44 pm

ejolson wrote: Underclocking as

Code: Select all

arm_freq=900
core_freq=250
sdram_freq=450
resulted in stable operation. Even at these settings, CPU temperatures reached 84.9'C as reported by vcgencmd measure_temp and throttling to 756Mhz was observed. The linpack test averaged 4.4 Gflops and passed the residual check. This is about 73% the performance of a Pi 3B running at advertised speeds.
Even Intel has troubles sustaining a high clock frequency when running heavy SIMD optimized code. Now they have a separate "AVX base clock" thing, which is lower than the nominal advertised clock speed of their processors ;)

But the fact that the RPi 3B was unable to gracefully throttle when running this particular workload is very bad :(

ejolson
Posts: 3424
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Sat Oct 01, 2016 8:53 pm

ejolson wrote:I will experiment with different cooling solutions and report further results in a subsequent post.
Repeated runs with underclock settings

Code: Select all

arm_freq=900
core_freq=250
sdram_freq=450
and the cover off saw throttling to as low as 600Mhz and further reduction in performance. All tests passed residual checks. With a toy-elephant sized fan

Image

no throttling was observed. This resulted in a peak performance of 5.0 Gflops with underclock settings. Thus, the Pi 3B performs about 3.3 times faster than the Pi 2B when clocked at similar speeds.

Returning the system to default settings while using active cooling continued to result in complete system lockup within seconds of running the test.

ejolson
Posts: 3424
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Mon Oct 03, 2016 2:03 pm

ejolson wrote:Returning the system to default settings while using active cooling continued to result in complete system lockup within seconds of running the test.
I'm starting to suspect the power supply, even though it is rated 2.5 amps and was designed for the Pi 3B. Does anyone else have the CanaKit 2.5 amp Raspberry Pi 3 power supply and a willingness to run the linpack benchmark? There is a binary posted earlier in this thread, which is essentially the same as the one I compiled.

ejolson
Posts: 3424
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Mon Oct 03, 2016 11:03 pm

deater wrote:I just wanted to answer that yes, the hpl binary provided does run just fine on a pi2 with no problems (you can run it with N=10000 and it will still run). Currently only getting 1GFLOP out of it though, which is odd, because I know I've gotten 1.4GFLOP in the past.
For the record 1 Gflop is the performance of a Pi 2B that is throttling to 600 Mhz. Since a 2B stays relatively cool compared to the 3B, such throttling is likely due to low voltage. With a suitable power supply I obtain around 1.56 Gflop for the 2B.

For my Pi 3B adding over_voltage=2 in the /boot/config.txt file fixed the stablity issues. Transcript of a typical run is

Code: Select all

$ ./xhpl 
================================================================================
HPLinpack 2.2  --  High-Performance Linpack benchmark  --   February 24, 2016
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              53.24              6.413e+00
HPL_pdgesv() start time Mon Oct  3 20:55:40 2016

HPL_pdgesv() end time   Mon Oct  3 20:56:33 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
$ 
which shows a speed of 6.4 Gflops.

DNPNWO
Posts: 126
Joined: Fri Jul 08, 2016 1:51 am

Re: Pi3 incorrect results under load (possibly heat related)

Fri Oct 07, 2016 8:14 pm

jahboater wrote:This stress test code is all NEON and uses all four cores.
On a Pi3 with no heatsink it either crashes the system within a few seconds (and about 75C) or carries on and throttles back. The Pi3 is at stock frequency settings. With a heatsink it always successfully throttles.

Code: Select all

wget https://raw.githubusercontent.com/ssvb/cpuburn-arm/master/cpuburn-a53.S
gcc -o cpuburn-a53 cpuburn-a53.S
./cpuburn-a53

Been reading this, and maybe I am missing something, But I have a fan and a heatsink, and I have no issues with throttling even on linpack or cpuburn-a53.

There is a Youtube user named 'CJNK' (some of you may remember him as 'Rive', who posted above), and he can run an OC, and do cpuburn-a53 at 57C, and do linpack 6.8 Gflops...without throttling.

He states also that if you are not properly cooled, not properly powered, or unstable, you will crash. He is using the 5.25v 2.4a power supply from MCMElectronics/MicroCenter, and a common 12v brushless pc fan on the 5v rail.

I followed his suggestions, and i can run cpuburn-a53 with ease at around 64C with no throttling, and even pass linpack at around 6.6 Gflops.

Not quite sure what you guys that are throttling/having issues are doing wrong.

CJNK's Youtube Video: https://youtu.be/KuEZV0WsRLg?t=5m23s

jahboater
Posts: 4607
Joined: Wed Feb 04, 2015 6:38 pm

Re: Pi3 incorrect results under load (possibly heat related)

Sat Oct 08, 2016 12:50 pm

DNPNWO wrote:Been reading this, and maybe I am missing something, But I have a fan and a heatsink, and I have no issues with throttling even on linpack or cpuburn-a53.

There is a Youtube user named 'CJNK' (some of you may remember him as 'Rive', who posted above), and he can run an OC, and do cpuburn-a53 at 57C, and do linpack 6.8 Gflops...without throttling.

He states also that if you are not properly cooled, not properly powered, or unstable, you will crash. He is using the 5.25v 2.4a power supply from MCMElectronics/MicroCenter, and a common 12v brushless pc fan on the 5v rail.

I followed his suggestions, and i can run cpuburn-a53 with ease at around 64C with no throttling, and even pass linpack at around 6.6 Gflops.

Not quite sure what you guys that are throttling/having issues are doing wrong.
Most people do not want to use a fan. Noisy and potentially unreliable.

ejolson
Posts: 3424
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi3 incorrect results under load (possibly heat related)

Sat Oct 08, 2016 3:30 pm

jahboater wrote:Most people do not want to use a fan.
My difficulty, and those of some others, seemed to be related to transients caused by suddenly switching from one core busy to all four cores. This is why over_voltage was needed even when running at default frequencies. Of course throttling is no fun either, but much less of an issue than incorrect results and system crashes.
Last edited by ejolson on Sat Oct 08, 2016 3:57 pm, edited 3 times in total.

Return to “General discussion”