Page 1 of 3
Raspberry Pi4 8GB xHPL benchmark results
Posted: Wed Jun 03, 2020 7:28 pm
by aa3025
Hi All,
got my Pi 4 8GB recently and decided to do HPL benchmark on it. Result: almost
9 GFlops !!! tutorial and results are here:
https://www.hydromag.eu/~aa3025/rpi/.
Alex P.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Wed Jun 03, 2020 8:39 pm
by arif-ali
Thanks for sharing Alex,
This is a benchmark I used to run regularly for HPC systems in my previous role. So great to see someone attempt to run this.
I might also have a go at some point, when I buy my second one.
Have you looked at the scaling_frequencies, as well as the governor, I know these tend to make some differences. Not sure how it works on the ARM, but we used to set the min and max frequencies to the maximum available, and the governor to performance. The other thing we used to do, was have 2 runs within the HPL.dat, the first one would help bump the memory and CPU to 100%, and therefore potentially seeing maximum performance
I would deffo be interested in seeing what I could achieve on the rpi4 with these things in mind
Arif
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Wed Jun 03, 2020 11:30 pm
by aa3025
Hi Arif,
yes, HPL is my default test tool too for the compute nodes of HPC I'm managing at work.
I kept monitoring throttling state during the test to make sure it does not scale down the frequency, kept it at about 50 C during the tests -- way below "official" 85C throttling limit. Pi CPU sits normally at 600MHz at idle, and then immediately jumps to 1.5GHz and stays there when I start the test, so i think its all fine there with CPU governor.
Alex
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Wed Jun 03, 2020 11:49 pm
by jahboater
If you run this after your benchmark run completes:
vcgencmd get_throttled
it should return zero (0x0)
Anything else and it has throttled because of temperature or low voltage.
There are bits to say its currently throttled and sticky bits to say its been throttled since the last boot.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Wed Jun 03, 2020 11:51 pm
by aa3025
Yes that's how I monitored throttling state.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Wed Jun 03, 2020 11:53 pm
by jahboater
aa3025 wrote: ↑Wed Jun 03, 2020 11:51 pm
Yes that's how I monitored throttling state.
Yes, but you said "I kept monitoring throttling state during the test"
I suggest that you do not need to. Just check its 0x0 once only after the test has finished.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Thu Jun 04, 2020 12:01 am
by aa3025
yes, but if the HPL test takes 45-30 min, you can abort it earlier, if you know that throttling has already occurred. Anyway , with my "improved" case (fan) temperature did not go above 52C during the tests.
@Arif: I actually checked whether changing scaling_governor to "performance" (from "ondemand") makes any difference:
Code: Select all
for i in $(seq 0 3); do echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done
This way CPU frequency is kept at 1.5 GHz all the time.
And it doesn't (makes any difference) , still get the same 8.93 GFlops off HPL.
Alex
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Thu Jun 04, 2020 8:04 am
by arif-ali
aa3025 wrote: ↑Thu Jun 04, 2020 12:01 am
@Arif: I actually checked whether changing scaling_governor to "performance" (from "ondemand") makes any difference:
Code: Select all
for i in $(seq 0 3); do echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done
This way CPU frequency is kept at 1.5 GHz all the time.
And it doesn't (makes any difference) , still get the same 8.93 GFlops off HPL.
nice one, thanks for checking. Wasn't expecting a quick turnaround
Looking around on other forums, it seems people are consistently getting around 6GFlops. It would be interesting to see results from some of the places that have setup multi node setups as the ones I saw at Super Computing Conferences in the past few years
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 3:45 am
by ejolson
I ran a shorter test with N=8000 and obtained 9.1697 GFLOPS.
Code: Select all
$ ./xhpl
================================================================================
HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 8000
NB : 256
PMAP : Row-major process mapping
P : 1
Q : 1
PFACT : Left
NBMIN : 2
NDIV : 2
RFACT : Right
BCAST : 2ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR02R2L2 8000 256 1 1 37.23 9.1697e+00
HPL_pdgesv() start time Fri Jun 5 04:10:05 2020
HPL_pdgesv() end time Fri Jun 5 04:10:42 2020
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.59405429e-03 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
This was using the Pi 4B 2GB model. Note that MPICH was installed to use the slower ch3 network device rather than shared memory, but I think the run was using OpenBLAS threads rather than MPI anyway. Likely the smaller matrix size gives slightly faster results than using the full memory size on the 8GB model. I'll try to optimize the build and then post more details in case anyone wants to continue improving things.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 3:56 am
by ejolson
Recompiling hpl with gcc-10.1 but still in 32-bit mode resulted in performance over 10 GFLOPS.
Code: Select all
$ ./xhpl
================================================================================
HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 8000
NB : 256
PMAP : Row-major process mapping
P : 1
Q : 1
PFACT : Left
NBMIN : 2
NDIV : 2
RFACT : Right
BCAST : 2ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR02R2L2 8000 256 1 1 32.78 1.0415e+01
HPL_pdgesv() start time Fri Jun 5 04:53:42 2020
HPL_pdgesv() end time Fri Jun 5 04:54:15 2020
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.59405429e-03 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
Alex, since I can't run the large matrix size with my 2GB Pi, would it be possible for you to try N=8000 and compare?
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 4:48 am
by ejolson
Here is a run with OpenBLAS threads turned off using MPICH and the shared memory communication device.
Code: Select all
$ export OPENBLAS_NUM_THREADS=1
$ mpirun -np 4 ./xhpl
================================================================================
HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 8000
NB : 192
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 8000 192 2 2 32.50 1.0505e+01
HPL_pdgesv() start time Fri Jun 5 05:39:32 2020
HPL_pdgesv() end time Fri Jun 5 05:40:04 2020
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.04717139e-03 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
For some reason the version of MPICH I installed doesn't allow me to specify a rank file, so I can't check if that makes a difference. The result is still greater than 10 GFLOPS. It would be nice to know whether the differences in our timing results are related to the 8GB versus 2GB hardware, the matrix size, or some other software difference such as using the version 10.1 gcc compiler. This could also be one of those weird situations where native 64-bit runs slower than 32-bit compatibility mode.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 5:19 am
by ejolson
arif-ali wrote: ↑Thu Jun 04, 2020 8:04 am
Looking around on other forums, it seems people are consistently getting around 6GFlops. It would be interesting to see results from some of the places that have setup multi node setups as the ones I saw at Super Computing Conferences in the past few years
I think 6 GFLOPS is for the Pi 3B. Using default clock speeds, 6.4 is the original 3B and 6.7 the 3B+ . More information is in the thread
viewtopic.php?f=63&t=208167
Therefore, in terms of Linpack, the 4B is 57 percent faster than the 3B+ that came before. At the same time, having more memory makes the 4B more powerful.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 7:47 am
by aa3025
@ejolson: Looks like your 1st run was SMP code then, and yes, your problem size was way smaller (took just 30 sec), how much memory did it use? whole 2GB?
On 32bit one process could not address whole 8GB obviously, this is why I did not try in SMP. If I try to launch on one process, it will crash with segfault after filling 2+GB (25-27%) of memory, but this is what you can expect from 32 bit OS.
How did you compile xhpl in SMP? (just changing compiler from mpicc to gcc? or is it the same mpich-compiled xhpl, but just launched without mpirun?)
My last test with OpenMPI showed 9.05 GFlops (when I did not pull continuously temperature and throttling state), I can try with MPICH instead of OpenMPI, I've heard it has better process-affinity-behavior, so perhaps it does not require that rankfile witchcraft.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 9:50 am
by arif-ali
Thanks to both of you for sharing your experiences
I have gone and compiled OpenBLAS, mpich, and linked then statically with latest version of hpl. Linking statically tends to help with performance. I didn't go a minimal build, and had other things running on my pi4, so probably not a true reflection on performance
OS: Ubuntu 20.04
Kernel: Linux pi02.arif.local 5.4.0-1011-raspi #11-Ubuntu SMP Fri May 8 07:43:33 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
Run 1:
P=1 and Q=4
OMP_NUM_THREADS=1
OMP_NUM_THREADS=1 /opt/mpich/3.3.2/bin/mpirun -np 4 --machinefile=machinefile hpl-2.3/bin/rpi4/xhpl
where the contents of machine file is below
The results for this is below (11.57 GFlops)
Code: Select all
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 8000 192 1 4 29.49 1.1578e+01
HPL_pdgesv() start time Fri Jun 5 09:19:58 2020
HPL_pdgesv() end time Fri Jun 5 09:20:27 2020
--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV-
Max aggregated wall time rfact . . . : 1.08
+ Max aggregated wall time pfact . . : 0.27
+ Max aggregated wall time mxswp . . : 0.12
Max aggregated wall time update . . : 28.28
+ Max aggregated wall time laswp . . : 1.08
Max aggregated wall time up tr sv . : 0.08
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 4.23928476e-03 ...... PASSED
================================================================================
Run 2:
OMP_NUM_THREADS=4 hpl-2.3/bin/rpi4/xhpl
The results for this is below (11.73 GFlops)
Code: Select all
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 8000 192 1 1 29.09 1.1737e+01
HPL_pdgesv() start time Fri Jun 5 09:21:46 2020
HPL_pdgesv() end time Fri Jun 5 09:22:15 2020
--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV-
Max aggregated wall time rfact . . . : 1.59
+ Max aggregated wall time pfact . . : 0.38
+ Max aggregated wall time mxswp . . : 0.15
Max aggregated wall time update . . : 27.40
+ Max aggregated wall time laswp . . : 1.89
Max aggregated wall time up tr sv . : 0.08
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 5.66468717e-03 ...... PASSED
================================================================================
I am sure we can get a bit more by increasing the N number as I was only using 400M of the RAM, also I was using Ubuntu 20.04 64bit, so with minimal installation of raspi OS, I may be able to get better performance
Make.rpi4:
https://paste.ubuntu.com/p/VZKrDynpp5/
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 10:15 am
by arif-ali
A final result from me for the day, this was using 60% of my 4GB rpi4. From my experience, we wouldn't get much more out of it in terms of performance even if we were utilising more memory. We may get 1 to 2 percent more, but not worth waiting 30 mins for results to come through.
Code: Select all
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 17280 192 1 1 254.71 1.3507e+01
HPL_pdgesv() start time Fri Jun 5 10:02:20 2020
HPL_pdgesv() end time Fri Jun 5 10:06:34 2020
--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV-
Max aggregated wall time rfact . . . : 6.46
+ Max aggregated wall time pfact . . : 1.66
+ Max aggregated wall time mxswp . . : 0.50
Max aggregated wall time update . . : 247.84
+ Max aggregated wall time laswp . . : 7.79
Max aggregated wall time up tr sv . : 0.35
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 4.61419323e-03 ...... PASSED
================================================================================
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 10:19 am
by aa3025
Oh, that's nice!
I wasn't able to produce working xhpl binary when compiling with stock mpich, the resulting exe segfaulting straight away...When compiled with OpenMPI and gcc10.1 it is still produces 9.07 GFlops with my big (8GB) problem size (N=28800, P=2, Q=2)
With N=8000 I've got 7.8 GFlops with openmpi... I guess 64 bit OS -- Ubuntu 20 results in better performance.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 10:21 am
by jamesh
So, 6GF is about 40 times faster than the Cray-1.
How times have changed!
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 3:09 pm
by arif-ali
So, taking the theory of raspi OS 32 bit issue out of the window
Kernel: Linux pi04.arif.local 5.4.44-v7l+ #1320 SMP Wed Jun 3 16:13:10 BST 2020 armv7l GNU/Linux
OS: Raspbian GNU/Linux 10 (buster)
Same HPL.dat for run1 but now on the 8GB version, and the same command.
* recompiled mpich
* recompiled OpenBLAS
* recompiled hpl
I didn't tune anything on the OS, it wasn't a minimal install, and some other tasks were probably running in the background
re-ran the benchmark
Code: Select all
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 17280 192 1 1 258.66 1.3301e+01
HPL_pdgesv() start time Fri Jun 5 15:49:47 2020
HPL_pdgesv() end time Fri Jun 5 15:54:05 2020
--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV-
Max aggregated wall time rfact . . . : 7.46
+ Max aggregated wall time pfact . . : 2.49
+ Max aggregated wall time mxswp . . : 0.53
Max aggregated wall time update . . : 250.75
+ Max aggregated wall time laswp . . : 8.34
Max aggregated wall time up tr sv . : 0.42
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 1.33839235e-03 ...... PASSED
================================================================================
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 3:33 pm
by aa3025
Decent! Is it gcc8 (stock) compiler you compiled everything with?
I don't know, may be your compiled OpenBLAS is magic! (did you try with the stock libopenblas?)
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 3:37 pm
by ejolson
arif-ali wrote: ↑Fri Jun 05, 2020 3:09 pm
So, taking the theory of raspi OS 32 bit issue out of the window
Kernel: Linux pi04.arif.local 5.4.44-v7l+ #1320 SMP Wed Jun 3 16:13:10 BST 2020 armv7l GNU/Linux
OS: Raspbian GNU/Linux 10 (buster)
Same HPL.dat for run1 but now on the 8GB version, and the same command.
* recompiled mpich
* recompiled OpenBLAS
* recompiled hpl
I didn't tune anything on the OS, it wasn't a minimal install, and some other tasks were probably running in the background
re-ran the benchmark
Code: Select all
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 17280 192 1 1 258.66 1.3301e+01
HPL_pdgesv() start time Fri Jun 5 15:49:47 2020
HPL_pdgesv() end time Fri Jun 5 15:54:05 2020
--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV-
Max aggregated wall time rfact . . . : 7.46
+ Max aggregated wall time pfact . . : 2.49
+ Max aggregated wall time mxswp . . : 0.53
Max aggregated wall time update . . : 250.75
+ Max aggregated wall time laswp . . : 8.34
Max aggregated wall time up tr sv . : 0.42
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 1.33839235e-03 ...... PASSED
================================================================================
A speed of 13.3 GFLOPS seems pretty good and now about 80 percent faster than previous 3B+ timings. Would you mind posting the full output, or better, your HPL.DAT file so we can see the runtime tuning parameters. Also, could you please confirm not overclocking the Pi 4B, as that's a different game.
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 3:46 pm
by aa3025
arif-ali wrote: ↑Fri Jun 05, 2020 3:09 pm
So, taking the theory of raspi OS 32 bit issue out of the window
Kernel: Linux pi04.arif.local 5.4.44-v7l+ #1320 SMP Wed Jun 3 16:13:10 BST 2020 armv7l GNU/Linux
OS: Raspbian GNU/Linux 10 (buster)
You are on rpi-5.4.y kernel. On my Pi I have the latest to be
Code: Select all
4.19.118-v7l+ #1311 SMP Mon Apr 27 14:26:42 BST 2020 armv7l GNU/Linux
... I see I did not do firmware update (rpi-update) ... The race is on

Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 4:02 pm
by arif-ali
Below is my setup
* OS: Raspberry PI OS
* Kernel: Linux pi04.arif.local 5.4.44-v7l+ #1320 SMP Wed Jun 3 16:13:10 BST 2020 armv7l GNU/Linux*
* Bootlader:
Code: Select all
root@pi04:~# vcgencmd bootloader_version
May 27 2020 18:47:29
version d648db3968cd31d4948341e09cb8a925c49d2ea1 (release)
timestamp 1590601649
Everything else is stock
* Download latest mpich, and compile using
Code: Select all
tar xfz mpich-3.3.2.tar.gz
cd mpich-3.3.2
./configure --prefix=/opt/mpich/3.3.2
make -j 3
sudo make install
* Download latest OpenBLAS
Code: Select all
unzip OpenBLAS.zip
cd OpenBLAS-develop
make -j 3
* Download latest hpl
HPL.dat:
https://paste.ubuntu.com/p/xmq22Th5P2/
Make.rpi4-mpich:
https://paste.ubuntu.com/p/yhpStnhzRr/
The compile using the following command
My HPL.dat was in ~/hpl so my PWD was ~/hpl
Code: Select all
OMP_NUM_THREADS=4 ./hpl-2.3/bin/rpi4-mpich/xhpl
Finally, my result is here from the above environment for the 8GB board, it was using 2.82GB of RAM in this instance
https://paste.ubuntu.com/p/YtfY5M8M6v/
If there is anything missing or unsure about, then let me know
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 4:06 pm
by aa3025
OK but you ran in SMP (threaded) mode (I did in MPP mode):
OMP_NUM_THREADS=4 ./hpl-2.3/bin/rpi4-mpich/xhpl
Will it crash if you increase problem size and 32-bit OS can't allocate the required memory?
my HPL.dat for 4 mpi workers and 8GB RAM:
Code: Select all
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
28800 Ns
1 # of NBs
192 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
2 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0 Number of additional problem sizes for PTRANS
1200 10000 30000 values of N
0 number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64 values of NB
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 4:27 pm
by arif-ali
Now in MPP, just for you @aa3025
OMP_NUM_THREADS=1 /opt/mpich/3.3.2/bin/mpirun -np 4 --machinefile=machinefile.mpich ./hpl-2.3/bin/rpi4-mpich/xhpl
machinefile.mpich has the following contents
HPL.dat:
https://paste.ubuntu.com/p/Dps9BJkqVz/
Results:
https://paste.ubuntu.com/p/CbrHss68Nh/
EDIT:
Adding numbers from same N number as @aa3025
HPL.dat:
https://paste.ubuntu.com/p/wnD6FdGD3m/
Results:
https://paste.ubuntu.com/p/6P3zvMfCCr/
Re: Raspberry Pi4 8GB xHPL benchmark results
Posted: Fri Jun 05, 2020 4:43 pm
by aa3025
Great thanks! I still got 9 GFlops after kernel firmware upgrade to v5.4 and with stock openmpi and openblas. Will try to reproduce your compilation of mpich and openblas to see if I can get your 13 GFlops...
