User avatar
bensimmo
Posts: 3951
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Mon Nov 26, 2018 9:41 am

The charts came out nicely, it what I used to compare the speeds.
It did throttle as I could see the CPU usage (same as in Raspbian desktop) drop.
Also the warnings in dmesg help make sure. 90C iirc before it warns.

That single sieve is interesting, looks to me they all have the same IPC
1.4/1.2/0.9/1.9/0.7 GHz as we go down and the scale seems similar, I guess they could be normalised.
The Pi2 (original) and the B+ can both go at 1GHz (as mentioned before). So that would level them out.


I'll see what else I can run it on, got an i5-4460, T4500 in desktop/AIO, and others in laptops.

The snappiness of the G70 could well be the 3GB ram, almost certainly and just generally better in Chrome (which should be multicore capable).
I ran it off the same SD cards for a while sandiskA1 (over usb, as it wouldn't boot front the SD slot) and it was still generally much faster in 'user percieved' use.
Another place you see it is Thonny (python IDE), it's quite clunky to use on a Pi3, but not on the laptop, same version too.

I would say the Pi3 does well for $35, but you could probably pick the laptops up for that too.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Mon Nov 26, 2018 6:36 pm

bensimmo wrote:
Mon Nov 26, 2018 9:41 am
The charts came out nicely, it what I used to compare the speeds.
It did throttle as I could see the CPU usage (same as in Raspbian desktop) drop.
The parallel benchmark tries different numbers of software threads in sequence starting with twice the number of hardware threads and ending with the serial version of the code. Tests for each threading configuration are run a minimum of three times or for 5 seconds which ever takes longer and the best timing is kept as the final result. Therefore, one expects intervals where fewer cores are busy when the parallel code is running. This allows automatic tuning for hyperthreading and cases where there are more or fewer floating point units than integer units per core. It also allows the system to cool off a bit before performing the next benchmark.

User avatar
DavidS
Posts: 3800
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: A Pi Pie Chart

Tue Nov 27, 2018 2:53 am

What kind of run time should I expect for this suite? I am running it now, and would like a rough idea of if something has gone wrong, based on taking way to long.

I am running it on a Raspberry Pi B+ at 900MHz on RISC OS. I know that you do not have any RISC OS results, though how long would be expected for a RPi B+ at 900MHz.
RPi = Way for me to have fun and save power.
100% Off Grid.
Household TTL Electricity Usage = 1.4KW/h per day.
500W Solar System, produces 2.8KW/h per day average.

User avatar
bensimmo
Posts: 3951
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Tue Nov 27, 2018 7:20 am

Has it finished yet, run time is not very long, a minute-ish?
I was reading a webpage while I ran them, by the time I finished it had done.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Tue Nov 27, 2018 7:24 pm

bensimmo wrote:
Tue Nov 27, 2018 7:20 am
Has it finished yet, run time is not very long, a minute-ish?
I was reading a webpage while I ran them, by the time I finished it had done.
The pichart-serial benchmark runs in about 22 minutes on the original Pi B+ because it takes the minimum time (maximum speed) of eight measurements in each of the four categories and spends from 20 seconds to a minute on each measurement.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Wed Nov 28, 2018 4:55 pm

This post describes how to compile and run the pichart benchmark to compare your desktop computer with the Raspberry Pi. If your computer is running Microsoft Windows, Plan 9, OpenVMS or any other generally non-POSIX standards compliant operating system, it would be easiest to first boot into the Raspberry Pi Desktop for PC before proceeding. If you are running Windows 10 it would also be possible to install Windows Subsystem for Linux.

After logging in and obtaining a command-line shell, download the pichart source archive and unpack it with the commands

$ wget http://fractal.math.unr.edu/~ejolson/pi ... urrent.tgz
$ tar zxf pichart-current.tgz

Alternatively, use a web browser to download this file. If your tar command doesn't natively support gzip compressed files unpack the archive using

$ gunzip <pichart-current.tgz | tar xf -

After unpacking the archive the source code will be contained in a subdirectory called pichart-current. Change to that directory and open the Makefile in a text editor. There are three lines that you may wish to change: CFLAGS, CC and TARGETS. By default these lines are set to build pichart-openmp and pichart-serial using the system C compiler with generic optimizations. Best performance, however, may be achieved by selecting architecture specific tuning options and a custom C compiler. If the chosen C compiler supports the Intel/MIT Cilk parallel programming extensions you may further want to include pichart-cilk in the list of build targets.

For this example we will be using a custom C compiler installed in /usr/local/gcc-6.5 and specify -march=native -mtune=native to build architecture optimized executables. As this compiler supports the Cilk parallel processing extensions, we also build pichart-cilk to compare with the OpenMP version. To do this, change the Makefile so the first lines read as

CFLAGS=-O3 -mtune=native -march=native -ffast-math -Wall -lm -lrt
CC=/usr/local/gcc-6.5/bin/gcc
TARGETS=pichart-serial pichart-openmp pichart-cilk

Note, if you are using a version of the GNU C compiler earlier than 5.x or later than 7.x, then it does not support Cilk and you should not add pichart-cilk to the TARGETS line. Note also that I've included the option -lrt because the clock_gettime function used by the timing routines in the benchmark are contained in the librt library of the particular version of Linux being used. Linking with -lrt is not usually needed.

Compile the source using the command

$ make

If all goes well, there should now be three executables in the directory: pichart-cilk, pichart-openmp and pichart-serial. If your computer only has a single processor core, then use run pichart-serial to create the pie chart; otherwise, use either pichart-openmp or pichart-cilk to make the comparison.

The computer being tested in this example is a dual-processor Intel Pentium III server running at 650 MHz. This machine has two cores so I'll run both pichart-cilk and pichart-openmp to see which one gives the best result. The programs accept an option -t which can be used to give a descriptive label to the benchmark run as it appears in the pie chart. Otherwise, the default label "My Computer" is used. Between each run I'll copy the output file pichart.svg to a safe place so it doesn't get overwritten by the next run.

Code: Select all

$ ./pichart-cilk -t "dual PIII 650MHz"
pichart -- Raspberry Pi Performance CILKPLUS version 23

Prime Sieve          P=14630843 Threads=2 Sec=4.76001 Mops=196.287
Merge Sort           N=16777216 Threads=2 Sec=10.6594 Mops=37.7746
Fourier Transform    N=4194304 Threads=2 Sec=9.29035 Mflops=49.6616
Lorenz 96            N=32768 K=16384 Threads=2 Sec=28.9737 Mflops=111.178

Making pie charts...done.
$ cp pichart.svg p3cilk.svg
$ ./pichart-openmp -t "dual PIII 650MHz"
pichart -- Raspberry Pi Performance OPENMP version 23

Prime Sieve          P=14630843 Threads=2 Sec=4.70759 Mops=198.473
Merge Sort           N=16777216 Threads=4 Sec=10.9022 Mops=36.9333
Fourier Transform    N=4194304 Threads=2 Sec=9.49918 Mflops=48.5698
Lorenz 96            N=32768 K=16384 Threads=2 Sec=27.9837 Mflops=115.111

Making pie charts...done.
$ cp pichart.svg p3openmp.svg
The merge sort and Fourier transforms are faster using Cilk while prime sieve and Lorenz 96 are faster using OpenMP. The differences are small enough, however, that the pie charts generated in either case are visually identical.

The resulting scalable vector graphics pie charts can be viewed using geeqie or by loading them into gimp. Gimp can also be used to convert the SVG format to, for example, PNG. For reference, here is a PNG image file corresponding to the Cilk pie chart that the above run produced:

Image

Update: The computer tested above was actually a dual-processor Pentium III running at 650MHz. The labels on the pie chart have been fixed using Gimp.
Last edited by ejolson on Mon Dec 10, 2018 5:57 am, edited 5 times in total.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat Dec 01, 2018 2:01 am

Here is another Pi pie chart, this time comparing the speed of a single-core Pentium 4 processor running at 1500MHz. This system is interesting due to the use of the somewhat controversial and expensive RAMBUS memory of the time. During testing it was discovered that the system compiler gcc 4.7.2 with flags -march=pentium4 -O3 -ffast-math led to incorrect results with the Lorenz 96 benchmark. Using the more recent gcc version 6.5.0 solved the problem.

The output from the run was

Code: Select all

$ ./pichart-serial -t "P4 1500MHz"
pichart -- Raspberry Pi Performance Serial version 23

Prime Sieve          P=14630843 Threads=1 Sec=4.74821 Mops=196.775
Merge Sort           N=16777216 Threads=1 Sec=6.5265 Mops=61.6952
Fourier Transform    N=4194304 Threads=2 Sec=5.90099 Mflops=78.1857
Lorenz 96            N=32768 K=16384 Threads=2 Sec=6.05222 Mflops=532.238

Making pie charts...done.
with the resulting pie chart

Image

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Fri Dec 07, 2018 8:46 pm

Here is a Pi pie chart comparing the speed of a Pentium 4 D 2.8GHz processor. These dual-core processors were used in many desktop computers at the peak of Pentium 4 popularity. The timings

Code: Select all

$ ./pichart-openmp -t "P4D 2.8GHz"
pichart -- Raspberry Pi Performance OPENMP version 23

Prime Sieve          P=14630843 Threads=2 Sec=1.4254 Mops=655.484
Merge Sort           N=16777216 Threads=4 Sec=2.09171 Mops=192.499
Fourier Transform    N=4194304 Threads=2 Sec=2.05578 Mflops=224.427
Lorenz 96            N=32768 K=16384 Threads=2 Sec=0.993851 Mflops=3241.16

Making pie charts...done.
led to the pie chart

Image

which is notable because of the low performance on Merge Sort and the exceptional performance on Lorenz 96. To confirm proper scaling to both cores the serial code was also run to obtain

Code: Select all

$ ./pichart-serial -t "P4D 2.8Ghz"
pichart -- Raspberry Pi Performance Serial version 23

Prime Sieve          P=14630843 Threads=1 Sec=2.83992 Mops=328.998
Merge Sort           N=16777216 Threads=1 Sec=4.14171 Mops=97.2192
Fourier Transform    N=4194304 Threads=2 Sec=2.88016 Mflops=160.19
Lorenz 96            N=32768 K=16384 Threads=1 Sec=1.80798 Mflops=1781.67

Making pie charts...done.
This shows near linear scaling for every metric except Fourier Transform where dual-core performance was likely constrained by memory bandwidth.

User avatar
scruss
Posts: 2224
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Sat Dec 08, 2018 12:13 am

This is an odd one. I'm sure I could make it run faster, but this is all I could do with stock development tools:
pichart-bigmac.png
results from a dual g5 macintosh
pichart-bigmac.png (86.39 KiB) Viewed 1773 times
This is from an Apple Power Mac G5 tower with a dual-core 2.0 GHz PowerPC G5 (970MP). In 2005 it was quite a nifty machine. Now, not so much. It has four large fans running push-pull into a toaster-sized CPU heatsink. It has a 450 W power supply fed from a high current (C19) connector. It is heavier than some Code Club graduates.

Code: Select all

./pichart-openmp 
pichart -- Raspberry Pi Performance OPENMP version 23

Prime Sieve          P=14630843 Threads=2 Sec=3.83142 Mops=243.86
Merge Sort           N=16777216 Threads=4 Sec=4.34331 Mops=92.7064
Fourier Transform    N=4194304 Threads=1 Sec=6.78003 Mflops=68.0488
Lorenz 96            N=32768 K=16384 Threads=1 Sec=2.22188 Mflops=1449.78

Making pie charts...done.
bigmac:pichart-current scruss$ uname -a 
Darwin bigmac.local 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh
bigmac:pichart-current scruss$ gcc-4.2 --version
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I'm sure there were some cleverer options I could have used for compilation, but gcc-4.2 doesn't know about -march options. The compiler complained about some OpenMP pragmas. And yes, I didn't know about the '-t' option to edit the machine name …

I have a fractionally faster 2.0 GHz dual-processor (970FX) G5, but it's running an older OS that doesn't have gcc-4.2. I'm not sure if its MP support is as good. It weighs as much as several Code Club graduates …
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat Dec 08, 2018 2:09 am

scruss wrote:
Sat Dec 08, 2018 12:13 am
I'm sure there were some cleverer options I could have used for compilation, but gcc-4.2 doesn't know about -march options. The compiler complained about some OpenMP pragmas. And yes, I didn't know about the '-t' option to edit the machine name
The gcc implementation of OpenMP has improved dramatically in recent years to include efficient support for dynamic parallelism. Since all four performance tests were originally written as Cilk parallel programs, they rely quite significantly on that feature. Maybe it's time to install gcc 6.3 or better.

Have you compared pichart-openmp with pichart-serial to check whether you are getting roughly double the performance when running on two cores?
Last edited by ejolson on Sat Dec 08, 2018 4:25 am, edited 1 time in total.

User avatar
scruss
Posts: 2224
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Sat Dec 08, 2018 3:02 am

Maybe; as I said, I just installed the latest packaged Xcode compiler bundles that work with 10.5 PPC. Most Linux distros have dropped support for PPC, so I'd have to build it from source, which barely seems worth it.
Last edited by scruss on Sat Dec 08, 2018 11:36 am, edited 1 time in total.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat Dec 08, 2018 5:01 am

scruss wrote:
Sat Dec 08, 2018 3:02 am
Maybe; as I said, I just installed the latest packaged Xcode compiler bundles that work with 10.5 PPC. Most Linux distros have dropped support for PPC, so I'd have to build it from source, which brealy seems worth it.
Those PowerPC Macintosh computers--especially the big silver coloured towers--were quite impressive. I think what you have measured so far is the single core performance. Unfortunately, the only Macintosh computers I have for testing are modern iMacs without toaster-sized heatsinks that seem engineered to toast their hard disks instead.

It looks like Debian no longer boots on the G5, but maybe NetBSD would work. That processor architecture definitely presents a different balance of performance characteristics than the Intel and ARM chips. It would also be interesting to see how the DEC Alpha and Sun SPARC compare.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sun Dec 09, 2018 11:23 pm

Here is a Pi pie chart comparing the speed of a Pentium 4 HT 3.4GHz processor with hyper-threading technology turned on in EM64T mode.

Image

Although there is only one CPU core, hyper threading yielded a 5 to 50 percent performance increase in the benchmarks. Even though Merge Sort enjoyed the greatest performance boost from hyper threading, it is also the metric in which the Pentium 4 architecture lagged farthest behind the Raspberry Pi. The exact performance numbers follow.

Code: Select all

$ ./pichart-serial -t "P4 3.4GHz"
pichart -- Raspberry Pi Performance Serial version 23

Prime Sieve          P=14630843 Threads=2 Sec=2.55177 Mops=366.15
Merge Sort           N=16777216 Threads=2 Sec=3.43348 Mops=117.273
Fourier Transform    N=4194304 Threads=1 Sec=2.60827 Mflops=176.888
Lorenz 96            N=32768 K=16384 Threads=1 Sec=1.24317 Mflops=2591.14

Making pie charts...done.
$ ./pichart-openmp -t "P4 3.4GHz"
pichart -- Raspberry Pi Performance OPENMP version 23

Prime Sieve          P=14630843 Threads=4 Sec=2.18189 Mops=428.219
Merge Sort           N=16777216 Threads=4 Sec=2.18205 Mops=184.53
Fourier Transform    N=4194304 Threads=2 Sec=2.12401 Mflops=217.218
Lorenz 96            N=32768 K=16384 Threads=2 Sec=1.18418 Mflops=2720.23

Making pie charts...done.
Tests were performed using gcc version 8.1 running under version 5.4 of DragonflyBSD.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Wed Dec 12, 2018 6:01 pm

A Pi pie chart for the ASUS Tinker Board which uses a quad-core ARM Cortex A17 CPU running at 1.8 GHz was independently posted in a different thread. I have regenerated the chart and posted the results here so they can be more easily compared.

Image

While the Tinker Board runs faster than the Raspberry Pi, it should be noted that the Cortex A17 is a 32-bit only processor while the A53 used in the 3B+ is a 64-bit processor currently running in 32-bit compatibility mode. In particular, it is possible to enjoy a variety of different 64-bit operating systems on the Raspberry Pi 3B+ that do not work on the Tinker Board. While I don't expect the mode of operation to make significant performance differences for these metrics, there are some applications that run significantly faster in 64-bit mode. Still, it would be interesting to check how much of a difference the mode of operation makes to the performance of a Pi 3B+ running the pie chart benchmarks.

Code: Select all

$ tar zxf pichart-current.tgz
$ cd pichart-current
$ make
gcc -std=gnu99 -O3 -ffast-math -Wall -lm -o pichart-serial pichart.c util.c sieve.c merge.c fourier.c lorenz.c
gcc -std=gnu99 -O3 -ffast-math -Wall -lm -fopenmp -o pichart-openmp pichart.c util.c sieve.c merge.c fourier.c lorenz.c
$ ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 23

Prime Sieve          P=14630843 Threads=4 Sec=0.805925 Mops=1159.32
Merge Sort           N=16777216 Threads=8 Sec=0.766017 Mops=525.645
Fourier Transform    N=4194304 Threads=4 Sec=1.24434 Mflops=370.779
Lorenz 96            N=32768 K=16384 Threads=4 Sec=0.764573 Mflops=4213.1

Making pie charts...done.

User avatar
DavidS
Posts: 3800
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: A Pi Pie Chart

Wed Dec 12, 2018 8:09 pm

Just in curiosity, I wonder if anyone could test this on a FireBee (Atari Coldfire Project) computer and on an AmigaOne (800MHz PowerPC 75xxx) to see how they compare with the RPi systems. Would be interesting to see how the high end of the other good and usable desktop computers compare with the RPi.
RPi = Way for me to have fun and save power.
100% Off Grid.
Household TTL Electricity Usage = 1.4KW/h per day.
500W Solar System, produces 2.8KW/h per day average.

User avatar
scruss
Posts: 2224
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Sat Dec 15, 2018 3:55 pm

I have a friend with an AmigaOne; I'll see what I can do. I'd be surprised if it was even half as fast as the Mac G5, though. I did consider running the program on my NAS which has a Qoriq dual-core Power CPU, but I suspect it doesn't have much memory or an up-to-date GCC.

In other tests, I rescued a ThinkPad R51 (1.6 GHz Pentium-M) from under a pile of woodshavings at the makerspace. It hadn't been turned on for a couple of years, so its BIOS had forgotten everything. Even with an update, its newest GCC is 5.5. Can only run the serial test:
pichart-r51-serial.png
ThinkPad R51, 1.6 GHz Pentium-M
pichart-r51-serial.png (55.03 KiB) Viewed 1529 times
So a fairly good case for replacing old computers with Raspberry Pis for energy efficiency. Unlike a Raspberry Pi, it does have a real parallel port which the CNC folks seem to love. It had real problems with the OpenMP code, segfaulting:

Code: Select all

(gdb) run -t 'R51 Debug'
Starting program: /home/user/Downloads/pichart-current/pichart-openmp -t 'R51 Debug'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
pichart -- Raspberry Pi Performance OPENMP version 23

Prime Sieve          [New Thread 0xb5dbeb40 (LWP 4220)]

Thread 1 "pichart-openmp" received signal SIGSEGV, Segmentation fault.
0x0804930d in setrange (jmax=268435456, jmin=268304400, imax=<optimized out>)
    at sieve.c:57
57	        if(!getbit(notPrimebits,i)){
Having time to kill while a large laser etch job was running, I tried my somewhat old MacBook. mac OS doesn't ship with GCC, but uses a compatible front-end to LLVM. It had problems -

Code: Select all

$ time ./pichart-serial -t 'MacBook5,1 Core 2 Duo 2.4 GHz'
pichart -- Raspberry Pi Performance Serial version 23

Prime Sieve          P=14630843 Threads=2 Sec=2.039 Mops=458.23
Merge Sort           N=16777216 Threads=2 Sec=2.99945 Mops=134.243
Fourier Transform    N=4194304 Threads=2 Sec=1.62804 Mflops=283.393
Lorenz 96            Error 2.96128e-11 in parallel solver at index 0!

real	1m29.162s
user	1m27.686s
sys	0m0.502s
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat Dec 15, 2018 5:29 pm

With gcc version 5.5 it is possible the cilk-parallel code would work.
scruss wrote:
Sat Dec 15, 2018 3:55 pm
Having time to kill while a large laser etch job was running, I tried my somewhat old MacBook. mac OS doesn't ship with GCC, but uses a compatible front-end to LLVM. It had problems -

Code: Select all

$ time ./pichart-serial -t 'MacBook5,1 Core 2 Duo 2.4 GHz'
pichart -- Raspberry Pi Performance Serial version 23

Prime Sieve          P=14630843 Threads=2 Sec=2.039 Mops=458.23
Merge Sort           N=16777216 Threads=2 Sec=2.99945 Mops=134.243
Fourier Transform    N=4194304 Threads=2 Sec=1.62804 Mflops=283.393
Lorenz 96            Error 2.96128e-11 in parallel solver at index 0!

real	1m29.162s
user	1m27.686s
sys	0m0.502s
I think you need to remove the fast-math option from LLVM to run the Lorenz 96 timing. It might be closer to unsafe-math on gcc.

With fast-math enabled the LLVM compiler appears to sequence the floating-point operations differently for the serial, cache-blocked and parallel versions of the code. In turn, this creates differences in rounding errors which grow to large sizes by the end of the run due to the sensitive dependence on initial conditions of the deterministic chaos represented by the Lorenz 96 dynamics.

Hopefully removing the fast-math option allows a LLVM-compiled Lorenz 96 to pass the consistency check. If not, try different optimization levels or simply remove the check and hope for the best.

echmain
Posts: 181
Joined: Fri Mar 04, 2016 8:26 pm

Re: A Pi Pie Chart

Sat Dec 15, 2018 6:16 pm

I'm curious to see a test of the new Pi 3A+ model.

It *should* be the same as the 3B+, but....

User avatar
scruss
Posts: 2224
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Sat Dec 15, 2018 9:54 pm

The biggest thing I've noticed is that the code won't build on many systems unless you move CFLAGS to the end of the line:

Code: Select all

pichart-serial: $(SOURCE) pichart.h
	$(CC) -o pichart-serial $(SOURCE) $(CFLAGS)
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sun Dec 16, 2018 7:54 pm

scruss wrote:
Sat Dec 15, 2018 9:54 pm
The biggest thing I've noticed is that the code won't build on many systems unless you move CFLAGS to the end of the line:

Code: Select all

pichart-serial: $(SOURCE) pichart.h
	$(CC) -o pichart-serial $(SOURCE) $(CFLAGS)
It may be important for the optimiser flags to occur early on the command line and the linker flags the end. I guess lumping everything into CFLAGS, while it works for gcc, may not have been such a good idea in general. Until I update the Makefile to correct this, it should be possible to sort things out by hand if necessary.

User avatar
scruss
Posts: 2224
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Mon Dec 17, 2018 4:18 am

The previous $(CFLAGS) position absolutely failed to compile for me on gcc-7.3: it crapped out, failing to find -lm
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Mon Dec 17, 2018 5:35 am

scruss wrote:
Mon Dec 17, 2018 4:18 am
The previous $(CFLAGS) position absolutely failed to compile for me on gcc-7.3: it crapped out, failing to find -lm
Agreed. The -lm should go at the end. On the other hand, the -O3 along with the -march and -mtune settings should be at the beginning. I'll put up a new version soon.

User avatar
scruss
Posts: 2224
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Fri Jan 04, 2019 2:52 pm

Over the break I found time to get gcc-6 built on the G5 tower under OS X 10.5. make bootstrap took about three hours. The results were better, but still quite a good case for not using old computers:
pichart-bigmac-openmp-gcc6.png
Apple Mac G5, dual-core PowerPC G5 at 2 GHz
pichart-bigmac-openmp-gcc6.png (51.72 KiB) Viewed 1201 times
Though I'm only posting the OpenMP image, here's the results for both it and serial:

Code: Select all

pichart -- Raspberry Pi Performance Serial version 23

Prime Sieve          P=14630843 Threads=1 Sec=3.32331 Mops=281.144
Merge Sort           N=16777216 Threads=2 Sec=3.91997 Mops=102.718
Fourier Transform    N=4194304 Threads=1 Sec=5.9601 Mflops=77.4103
Lorenz 96            N=32768 K=16384 Threads=2 Sec=2.41353 Mflops=1334.65

pichart -- Raspberry Pi Performance OPENMP version 23

Prime Sieve          P=14630843 Threads=2 Sec=1.8036 Mops=518.035
Merge Sort           N=16777216 Threads=2 Sec=2.10441 Mops=191.338
Fourier Transform    N=4194304 Threads=2 Sec=3.43897 Mflops=134.161
Lorenz 96            N=32768 K=16384 Threads=2 Sec=1.459 Mflops=2207.84
So, the G5 is pretty close to twice as fast if two cores are used. Curiously, compiling for G5 (-O3 -mcpu=G5 -mtune=G5 -ffast-math) ran fractionally but not interestingly slower. Since OS X never got to be fully 64-bit, these are 32-bit results. 64-bit Linux is alive and well on little-endian PPC, but G5s aren't that.

Ran the benchmark on a BeagleBone Black, but results are too embarrassing to post here. Failed to run it on an Onion Omega2+ (580 MHz MIPS - MT7688 SoC) as the cross-compiler environment is broken. For extra hilarity, I could run it on a Via APC (the first "Raspberry Pi Killer" which, uh, didn't) but it would likely have an ancient gcc. In a way, I kind of regret getting rid of my Intel Galileo (400 MHz pentium-ish Quark SoC) because I'm sure that these benchmarks would run on it — eventually.

Incidentally, a quicker/smaller way of converting the output SVG to PNG would be using cairosvg: sudo apt install python3-cairosvg, and then

Code: Select all

cairosvg pichart.svg -o pichart.png
You might have to pngcrunch the output to get in under this board's sensible upload limit.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 2864
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat Jan 05, 2019 8:17 am

scruss wrote:
Fri Jan 04, 2019 2:52 pm
So, the G5 is pretty close to twice as fast if two cores are used.
It's nice to see both cores working together.

The balance of performance is interesting: The G5 looks much faster than a 3B+ on the Lorenz dynamical simulation and much slower than a 3B+ on the merge sort.

I wonder if gcc is properly vectorising the ARM executable for Lorenz.

User avatar
scruss
Posts: 2224
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Sun Jan 06, 2019 6:11 pm

or there's the possibility that merge sort is hitting the G5's slower RAM (it's only PC2-4200 DDR2), while Lorenz is able to fit everything in cache and take advantage of the PowerPC 970MP's multiple (and fairly powerful, for its age) floating point units.

I genuinely have very little idea how these things affect overall performance, though.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

Return to “General discussion”