baldyza
Posts: 37
Joined: Fri Sep 21, 2012 11:23 am

Bramble HPL (High-Performance Linpack Benchmark) results

Sun Feb 10, 2013 5:57 pm

I just got my bramble setup (4 x gentoo +mpich2) with hpl benchmark running, albeit without great performance yet.
Would be very interested at the results other people are getting on the hpl benchmark and what they have in their HPL.dat. Even guesstimates of how many GFLOPS a Raspberry Pi cluster can achieve.

HPL (High-Performance Linpack Benchmark) is the standard benchmark used on the top500 list -> http://www.top500.org/

Seems like a good site to tweak the HPL.dat -> http://hpl-calculator.sourceforge.net/

User avatar
diereinegier
Posts: 164
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
Contact: Website

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Feb 11, 2013 8:04 am

baldyza wrote: Seems like a good site to tweak the HPL.dat -> http://hpl-calculator.sourceforge.net/
How did you even enter half a gigabyte of memory per node? ;)
Download my repositories at https://github.com/GeorgBisseling

baldyza
Posts: 37
Joined: Fri Sep 21, 2012 11:23 am

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Feb 11, 2013 10:17 am

diereinegier wrote:
baldyza wrote: Seems like a good site to tweak the HPL.dat -> http://hpl-calculator.sourceforge.net/
How did you even enter half a gigabyte of memory per node? ;)
Yeah I dunno, suppose not many super computers have 496M of memory :)

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Tue Feb 12, 2013 6:02 pm

On a single RPI I'm getting only 276 MFLOPS using this config:

Code: Select all

$ cat HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any) 
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
5040         Ns
1            # of NBs
168          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)


baldyza
Posts: 37
Joined: Fri Sep 21, 2012 11:23 am

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Tue Feb 12, 2013 10:31 pm

ber0tech wrote:On a single RPI I'm getting only 276 MFLOPS using this config
Thanks for the sample config. Are you sure about your MFLOPS conversion. I stand to be corrected but 0.02803 Gflops /1000 for Mflops for 28.03, which is what I would expect for the pi.

Initial runs on mine with same config but N: 2000, gentoo with mpich2 and overclocked to 900.
1 node 2.803e-02 Gflops
WR11C2R4 2000 168 1 1 190.51 2.803e-02
2 nodes 4.627e-02 Gflops
WR11C2R4 2000 168 2 1 115.40 4.627e-02
3 nodes 5.834e-02 Gflops
WR11C2R4 2000 168 3 1 91.52 5.834e-02
4 nodes 8.117e-02 Gflops
WR11C2R4 2000 168 2 2 65.78 8.117e-02
Need to set up a proper batch system then with run with bigger N (and hopefully more nodes).

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Wed Feb 13, 2013 6:41 am

I got 2.761e-01 Gflops. But the question is: What is the expected peak performance of the RPI ? From the Linpack FAQ: http://www.netlib.org/utk/people/JackDo ... npack.html
The theoretical peak performance is determined by counting the number of floating-point additions and multiplications (in full precision) that can be completed during a period of time, usually the cycle time of the machine.
From the ARM1176JZF-S hardware spec: http://infocenter.arm.com/help/index.js ... index.html
DP MUL and MAC 2 cycle
SP DIV, SQRT 14 cycles
DP DIV, SQRT 28 cycles
All other instructions 1 cycle
The RPI runs at 700MHz. Isn't then the expected peak performance between 350-700 MFlops?

This is the complete output of Linpack HPL of my RPI at 700MHz:

Code: Select all

================================================================================
HPLinpack 2.0  --  High-Performance Linpack benchmark  --   September 10, 2008
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    5040 
NB     :     168 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :   Right 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :  1ringM 
DEPTH  :       1 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4        5040   168     1     1             309.27              2.761e-01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0049582 ...... PASSED
================================================================================


User avatar
diereinegier
Posts: 164
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
Contact: Website

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Fri Feb 15, 2013 7:16 pm

Maybe memory bandwidth is the limit and not CPU speed?
Download my repositories at https://github.com/GeorgBisseling

baldyza
Posts: 37
Joined: Fri Sep 21, 2012 11:23 am

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Sun Feb 17, 2013 10:48 am

ber0tech wrote: The RPI runs at 700MHz. Isn't then the expected peak performance between 350-700 MFlops?
The MFLOPS is also discussed in this thread, they confirm 350 MFLOPS as the theoretical peak performance.
http://www.raspberrypi.org/phpBB3/viewt ... 2&p=211634
So in the best case it still takes 2 cycles for one operation and then 700MHz/2 = 350 Mflops. In the worst case where in your algorithm the result of the current operation is required for the next operation, ie pipelining can't be used, it takes 8 cycles for one operation and we end up with 700MHz/8 = 87.5Mflops.
Using that with my bramble (4 nodes 900Mhz)
Theoretical peak performance = 4 nodes * 900MHz / 2 = 1800Mflops = 1.8 Gflops
Actual Performance = 0.08117 Gflops
Efficiency = Actual Performance GFLOPS / Theoretical Peak Performance GFLOPS = 0.08117 / 1.8 = 4.5%
Which is really dismal :?

Doing the same calculations from the "standard" RPi benchmark (1 node 700Mhz)
http://elinux.org/RPi_Performance#Results
Theoretical peak performance = 700MHz / 2 = 350Mflops = 0.35 Gflops
Actual Performance = 41047 Kflops = 41.047Mflops = 0.04147 Gflops
Efficiency = 0.04147/ 0.35 = 11.84%

And with ber0tech results HPL benchmark (1 node 700Mhz)
Theoretical peak performance = 0.35 Gflops
Actual Performance = 0.2761 Gflops (2.761e-01)
Efficiency = 0.2761 /0.35 = 78%

Kind of confused myself now, will go back and check my HPL benchmark on 1 node and see if I can get similar performance as ber0tech. What distribution and setup do you have, maybe I am missing something obvious.

User avatar
diereinegier
Posts: 164
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
Contact: Website

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Sun Feb 17, 2013 1:02 pm

When used my Bramble to compute some gravitational N-Body system (http://www.raspberrypi.org/phpBB3/viewt ... 34#p257934) the result was that parallelization could only start to help in the area where the computational complexity was O(N*N) and the network traffic was O(N).

And then only for N so large that you would not really want to compute the problem on a single or even four raspberries. See the link above for a graph of the scaling.

Part of the problem is that network traffic consumes an awful lot of CPU power on the Raspberry. Maybe this is due to the fact that the Ethernet chip is connected via USB.
Download my repositories at https://github.com/GeorgBisseling

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Sun Feb 17, 2013 3:35 pm

I prefer Slackware on my RPI. I think Slackware is perfect for such a device because of it's simplicity and purity. Although it is not optimised for the RPI hardware (e.g.“soft floating point” ABI vs. “hard floating point” ABI) I haven't seen any real application that runs faster on Raspbian than on Slackware. Someone should add Slackware to the RPI_Distributions wiki page.

However most important for good Linpack HPL results is the BLAS library.
I use ATLAS 3.8.4 from this source: http://www.vesperix.com/arm/atlas-arm/index.html
Compiling ATLAS takes more than a day and cross compiling doesn't work.
I'm trying to get a second RPI for some MPI games. Maybe you can also run some tests on a two node "cluster".

baldyza
Posts: 37
Joined: Fri Sep 21, 2012 11:23 am

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Feb 18, 2013 8:55 am

ber0tech wrote: However most important for good Linpack HPL results is the BLAS library.
I use ATLAS 3.8.4 from this source: http://www.vesperix.com/arm/atlas-arm/index.html
Compiling ATLAS takes more than a day and cross compiling doesn't work.
I'm trying to get a second RPI for some MPI games. Maybe you can also run some tests on a two node "cluster".
This is really interesting, comparing my stock gentoo to your results is dramatic difference. Busy building atlas-3.10.1 for gentoo, it comes packaged for the science gentoo overlay. Hopefully with atlas installed I will be able to match your benchmark.

baldyza
Posts: 37
Joined: Fri Sep 21, 2012 11:23 am

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Tue Feb 19, 2013 4:41 pm

I am totally blown away with the performance improvement from ATLAS!
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 5040 168 1 1 274.87 3.106e-01
It did take ages to compile, over 24 hours, but 0.3 Gflops compared to 0.03 Gflops with standard Gentoo reference BLAS libraries, really huge difference.

Out of interest while compiling ATLAS it gives lots of info on benchmarks its running. Some of the results were over 500 Mflops so the assumption of peak performance being between 350 and 700 looks to be correct depending on what operation is being used.

baldyza
Posts: 37
Joined: Fri Sep 21, 2012 11:23 am

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Thu Feb 21, 2013 11:03 pm

Cracked 1 Gflops using 4 RPi over clocked to 900Mhz. If anyone else is interested maybe we could make a performance table on the wiki, top 10 brambles, kind of a nod to the top500 list?

Code: Select all

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4        7000   168     1     4             226.35              1.011e+00

User avatar
diereinegier
Posts: 164
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
Contact: Website

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Fri Feb 22, 2013 9:33 am

Congratulations!

The Top 500 Bramble List sounds like a very appealing idea.

Or should I say necessity? ;-)
Download my repositories at https://github.com/GeorgBisseling

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Fri Feb 22, 2013 5:55 pm

Great! Below some results for 1 and 2 nodes.
For comparison only: My Intel laptop with gcc: 80 GFlop/s

Code: Select all

Nodes   CPUfreq.[MHz] Rmax[MFlop/s]
-----------------------------------
1       700           280
1       900           379
2       700           523
2       900           699

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Sun Mar 03, 2013 1:40 pm

Mini HowTo: Linpack HPL on Raspberry Pi

Install additional packages and their dependencies:

Code: Select all

sudo apt-get install libatlas-base-dev libmpich2-dev gfortran
The Linpack HPL source code can be found here: http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz
Extract the tar file and create a makefile based on the given template

Code: Select all

tar xf hpl-2.1.tar.gz
cd hpl-2.1/setup
sh make_generic
cd ..
cp setup/Make.UNKNOWN Make.rpi
Adjust Make.rpi

Code: Select all

ARCH         = rpi
TOPdir       = $(HOME)/hpl-2.1
MPlib        = -lmpi
LAdir        = /usr/lib/atlas-base/
LAlib        = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a
Compile linpack. The xhpl binary will be placed in bin/rpi/

Code: Select all

make arch=rpi
Create the HPL input file bin/rpi/HPL.dat

Code: Select all

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
4000         Ns
1            # of NBs
128          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
A test run on a single node should print about 270 Mflops :

Code: Select all

cd bin/rpi
./xhpl
Prepare ssh-keys for a distributed run. If you are not using a shared home directory (e.g. NFS) ensure the .ssh directory and the xhpl binary are coppied to all nodes of your cluster.

Code: Select all

ssh-keygen
   # hit enter 3 times
cp $HOME/.ssh/id_rsa.pub $HOME/.ssh/authorized_keys
Adjust bin/rpi/HPL.dat for two processes. One xhpl process on each node of a two node cluster (row 12):

Code: Select all

2  Qs
To run Linpack HPL on two nodes via Ethernet:

Code: Select all

cd $HOME/hpl-2.1/bin/rpi
mpiexec -n 2 -host arm1,arm2 ./xhpl
Bookmarks:
http://www.netlib.org/utk/people/JackDo ... npack.html
http://www.top500.org
http://en.wikipedia.org/wiki/LINPACK_be ... #HPLinpack
http://www.southampton.ac.uk/~sjc/raspb ... ampton.htm

TheHandsomeCoder
Posts: 5
Joined: Mon Jul 07, 2014 4:54 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Jul 07, 2014 5:02 pm

A little late to to the party but I was hoping someone here might be able to help me with this. I'm trying to run HPL over my 1 - 4 pi cluster but no matter how I start HPL I get the following error below

Code: Select all

mpiexec -n 2 -host 192.168.100.50,192.168.100.51 ./xhpl
HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 2 processes for these tests <<<

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<

-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[52766,1],0]
  Exit code:    1

Has anyone seen this before and would you be able to point me in the right direction, I'm doing this as part of a University project and would really appreciate the help. I'll attach my HPL.dat file to if that helps

Code: Select all

 cat HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
5040         Ns
1            # of NBs
168          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
Cheers

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Jul 07, 2014 9:11 pm

Did you use the same MPI version for compiling and running xhpl?

TheHandsomeCoder
Posts: 5
Joined: Mon Jul 07, 2014 4:54 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Jul 07, 2014 9:46 pm

Hmm I'm not sure I used mpicc which was in the MPICH folder after I compiled it. I've attached my Make.<arch> below to give a better idea.

Code: Select all

#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH         = armv71-a
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#

HOME = /home/pi
TOPdir       = $(HOME)/hpl
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - MPI directories - library ------------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#

#for openmpi
#MPdir         = /usr/lib/openmpi
#MPinc         = -I$(MPdir)/include
##MPlib         = $(MPdir)/lib/libmpi.a
#MPlib         = -L$(MPdir)/lib
##MPlib         = $(MPdir)/lib/libmpi.so


#FOR CUSTOM OPENMPI
#MPdir         = /mnt/nfs/install/openmpi-install
#MPinc         = -I$(MPdir)/include
#MPlib         = -L$(MPdir)/lib


#For mpich
#MPdir         = /mnt/nfs/install/mpich-3.0.4
MPdir         = /home/pi/mpich-install
MPinc         = -I$(MPdir)/include
#MPlib         = -L$(MPdir)/lib
MPlib         = $(MPdir)/lib/libmpich.a
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#

# Default BLAS comes with ubuntu 12
#LAdir        = /usr/local/atlas/lib
#LAinc        =
#LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a

# ATLAS Generated BLAS
#LAdir        = /mnt/nfs/jahanzeb/bench/atlas/original/ATLAS2/buildDir/lib
LAdir           = /home/pi/builds/ATLAS/newBuild/lib
LAinc        =
LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a


#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore  (Suns,
#                       Intel, ...),                           [default]
# -DNoChange          : all lower case (IBM RS6000),
# -DUpCase            : all upper case (Cray),
# -DAdd__             : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string loca-
#                       tion on the stack, and the string length is then
#                       passed as  an  F77_INTEGER  after  all  explicit
#                       stack arguments,                       [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#                       Fortran 77  string,  and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#                       77 string,  and  the  structure is  of the form:
#                       struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#                       Cray  fcd  (fortran  character  descriptor)  for
#                       interoperation.
#
F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)

#for mpich
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
#HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib) -lmpl

#for openmpi
#HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)

#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_CALL_VSIPL       call the vsip  library;
# -DHPL_DETAILED_TIMING  enable detailed timers;
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the Fortran 77 BLAS interface
#    *) not display detailed timing information.
#
HPL_OPTS     = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
#CC           = /usr/bin/gcc

#CC           = /usr/bin/mpicc
#CC           = /mnt/nfs/install/openmpi-install/bin/mpicc
CC           = /home/pi/mpich-install/bin/mpicc
CCNOOPT      = $(HPL_DEFS)
#CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
CCFLAGS      = $(HPL_DEFS) $(CFLAGS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall

#
#LINKER       = /usr/bin/gcc
#LINKER       = /usr/bin/mpicc

LINKER       = /home/pi/mpich-install/bin/mpicc
#LINKER        = /mnt/nfs/install/openmpi-install/bin/mpicc
LINKFLAGS    = $(CCFLAGS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# ----------------------------------------------------------------------
Is there a different compiler I should be using? in the guide above there isn't one mentioned

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Jul 07, 2014 10:01 pm

I'm getting the same error message when trying to run a xhpl binary that is linked against mpich2 but the mpiexec binary is from openmpi. Try the following to see if the libraries and mpiexec fit together.

Code: Select all

ldd xhpl
which mpiexec

TheHandsomeCoder
Posts: 5
Joined: Mon Jul 07, 2014 4:54 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Jul 07, 2014 10:08 pm

I think I get your meaning. Suggesting I compiled with MPICH but trying to run with OpenMPI.

I ran the commands and got the following output which doesn't show mpiexec for the hpl.

Code: Select all

[pi@master-pi armv71-a]$ ldd xhpl
        libmpich.so.12 => /home/pi/mpich-install/lib/libmpich.so.12 (0xb6d11000)
        libopa.so.1 => /home/pi/mpich-install/lib/libopa.so.1 (0xb6d08000)
        libmpl.so.1 => /home/pi/mpich-install/lib/libmpl.so.1 (0xb6cfc000)
        librt.so.1 => /usr/lib/librt.so.1 (0xb6ce2000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0xb6cc2000)
        libc.so.6 => /usr/lib/libc.so.6 (0xb6b8b000)
        /lib/ld-linux-armhf.so.3 (0xb6f33000)
[pi@master-pi armv71-a]$ which mpiexec
        /usr/bin/mpiexec

Due to mpiexec being missing should I assume I compiled xhpl incorrectly?

ber0tech
Posts: 9
Joined: Sun Feb 10, 2013 9:41 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Jul 07, 2014 10:10 pm

try to run via /home/pi/mpich-install/bin/mpiexec if that exists

TheHandsomeCoder
Posts: 5
Joined: Mon Jul 07, 2014 4:54 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Mon Jul 07, 2014 10:18 pm

That seems to have solved the problem issue, as HPL is now running across two of the pis in the cluster!

Thanks so much for your help ber0tech seriously appreciate it! figure I should probably copy across my own mpich-install directory in future to ensure this doesn't happen again. Assuming I can replace the OpenMPI etc

berke_aslan
Posts: 7
Joined: Thu Jul 03, 2014 9:36 am

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Sat Sep 13, 2014 8:05 am

Hello everyone,

For my Personal Project I need to create a Raspberry Pi supercomputer. I managed to install everything from the HPL Benchmark and can run it on 1 Raspberry PI. However, I want to be able to run it on 4 Raspberry PI's. I tried to understand the part with the shared directory and distributed run but I don't understand it. Can someone explain it easier or give a little tutorial.

Kind regards,

berke_aslan

TheHandsomeCoder
Posts: 5
Joined: Mon Jul 07, 2014 4:54 pm

Re: Bramble HPL (High-Performance Linpack Benchmark) results

Sun Sep 14, 2014 12:29 pm

Hi berke_aslan

I just had to do something similar for my own MSc Project so hopefully I can help you out :) I set mine up differently to how it is documented here but it shouldn't be too different to be too confusing. I set mine up not to use the shared storage that you mention as there was only 4 Pi's in my cluster.

Firstly how have you set up your Pi's, Do you only have one that is set up to run the HPL benchmark or have you them all set up that they can run the HPL benchmark individually. If you have only one set up then can I recommend that you clone the SD card of that Pi and image the rest of your Pi's so that they all have the same setup. We can take it from there once we know

THC

Return to “Networking and servers”

Who is online

Users browsing this forum: No registered users and 25 guests