Re: A Pi Pie Chart
The charts came out nicely, it what I used to compare the speeds.
It did throttle as I could see the CPU usage (same as in Raspbian desktop) drop.
Also the warnings in dmesg help make sure. 90C iirc before it warns.
That single sieve is interesting, looks to me they all have the same IPC
1.4/1.2/0.9/1.9/0.7 GHz as we go down and the scale seems similar, I guess they could be normalised.
The Pi2 (original) and the B+ can both go at 1GHz (as mentioned before). So that would level them out.
I'll see what else I can run it on, got an i5-4460, T4500 in desktop/AIO, and others in laptops.
The snappiness of the G70 could well be the 3GB ram, almost certainly and just generally better in Chrome (which should be multicore capable).
I ran it off the same SD cards for a while sandiskA1 (over usb, as it wouldn't boot front the SD slot) and it was still generally much faster in 'user percieved' use.
Another place you see it is Thonny (python IDE), it's quite clunky to use on a Pi3, but not on the laptop, same version too.
I would say the Pi3 does well for $35, but you could probably pick the laptops up for that too.
It did throttle as I could see the CPU usage (same as in Raspbian desktop) drop.
Also the warnings in dmesg help make sure. 90C iirc before it warns.
That single sieve is interesting, looks to me they all have the same IPC
1.4/1.2/0.9/1.9/0.7 GHz as we go down and the scale seems similar, I guess they could be normalised.
The Pi2 (original) and the B+ can both go at 1GHz (as mentioned before). So that would level them out.
I'll see what else I can run it on, got an i5-4460, T4500 in desktop/AIO, and others in laptops.
The snappiness of the G70 could well be the 3GB ram, almost certainly and just generally better in Chrome (which should be multicore capable).
I ran it off the same SD cards for a while sandiskA1 (over usb, as it wouldn't boot front the SD slot) and it was still generally much faster in 'user percieved' use.
Another place you see it is Thonny (python IDE), it's quite clunky to use on a Pi3, but not on the laptop, same version too.
I would say the Pi3 does well for $35, but you could probably pick the laptops up for that too.
Re: A Pi Pie Chart
The parallel benchmark tries different numbers of software threads in sequence starting with twice the number of hardware threads and ending with the serial version of the code. Tests for each threading configuration are run a minimum of three times or for 5 seconds which ever takes longer and the best timing is kept as the final result. Therefore, one expects intervals where fewer cores are busy when the parallel code is running. This allows automatic tuning for hyperthreading and cases where there are more or fewer floating point units than integer units per core. It also allows the system to cool off a bit before performing the next benchmark.
Re: A Pi Pie Chart
What kind of run time should I expect for this suite? I am running it now, and would like a rough idea of if something has gone wrong, based on taking way to long.
I am running it on a Raspberry Pi B+ at 900MHz on RISC OS. I know that you do not have any RISC OS results, though how long would be expected for a RPi B+ at 900MHz.
I am running it on a Raspberry Pi B+ at 900MHz on RISC OS. I know that you do not have any RISC OS results, though how long would be expected for a RPi B+ at 900MHz.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers
Re: A Pi Pie Chart
Has it finished yet, run time is not very long, a minute-ish?
I was reading a webpage while I ran them, by the time I finished it had done.
I was reading a webpage while I ran them, by the time I finished it had done.
Re: A Pi Pie Chart
The pichart-serial benchmark runs in about 22 minutes on the original Pi B+ because it takes the minimum time (maximum speed) of eight measurements in each of the four categories and spends from 20 seconds to a minute on each measurement.
Re: A Pi Pie Chart
This post describes how to compile and run the pichart benchmark to compare your desktop computer with the Raspberry Pi. If your computer is running Microsoft Windows, Plan 9, OpenVMS or any other generally non-POSIX standards compliant operating system, it would be easiest to first boot into the Raspberry Pi Desktop for PC before proceeding. If you are running Windows 10 it would also be possible to install Windows Subsystem for Linux.
After logging in and obtaining a command-line shell, download the pichart source archive and unpack it with the commands
$ wget http://fractal.math.unr.edu/~ejolson/pi ... urrent.tgz
$ tar zxf pichart-current.tgz
Alternatively, use a web browser to download this file. If your tar command doesn't natively support gzip compressed files unpack the archive using
$ gunzip <pichart-current.tgz | tar xf -
After unpacking the archive the source code will be contained in a subdirectory called pichart-current. Change to that directory and open the Makefile in a text editor. There are three lines that you may wish to change: CFLAGS, CC and TARGETS. By default these lines are set to build pichart-openmp and pichart-serial using the system C compiler with generic optimizations. Best performance, however, may be achieved by selecting architecture specific tuning options and a custom C compiler. If the chosen C compiler supports the Intel/MIT Cilk parallel programming extensions you may further want to include pichart-cilk in the list of build targets.
For this example we will be using a custom C compiler installed in /usr/local/gcc-6.5 and specify -march=native -mtune=native to build architecture optimized executables. As this compiler supports the Cilk parallel processing extensions, we also build pichart-cilk to compare with the OpenMP version. To do this, change the Makefile so the first lines read as
CFLAGS=-O3 -mtune=native -march=native -ffast-math -Wall -lm -lrt
CC=/usr/local/gcc-6.5/bin/gcc
TARGETS=pichart-serial pichart-openmp pichart-cilk
Note, if you are using a version of the GNU C compiler earlier than 5.x or later than 7.x, then it does not support Cilk and you should not add pichart-cilk to the TARGETS line. Note also that I've included the option -lrt because the clock_gettime function used by the timing routines in the benchmark are contained in the librt library of the particular version of Linux being used. Linking with -lrt is not usually needed.
Compile the source using the command
$ make
If all goes well, there should now be three executables in the directory: pichart-cilk, pichart-openmp and pichart-serial. If your computer only has a single processor core, then use run pichart-serial to create the pie chart; otherwise, use either pichart-openmp or pichart-cilk to make the comparison.
The computer being tested in this example is a dual-processor Intel Pentium III server running at 650 MHz. This machine has two cores so I'll run both pichart-cilk and pichart-openmp to see which one gives the best result. The programs accept an option -t which can be used to give a descriptive label to the benchmark run as it appears in the pie chart. Otherwise, the default label "My Computer" is used. Between each run I'll copy the output file pichart.svg to a safe place so it doesn't get overwritten by the next run.The merge sort and Fourier transforms are faster using Cilk while prime sieve and Lorenz 96 are faster using OpenMP. The differences are small enough, however, that the pie charts generated in either case are visually identical.
The resulting scalable vector graphics pie charts can be viewed using geeqie or by loading them into gimp. Gimp can also be used to convert the SVG format to, for example, PNG. For reference, here is a PNG image file corresponding to the Cilk pie chart that the above run produced:

Update: The computer tested above was actually a dual-processor Pentium III running at 650MHz. The labels on the pie chart have been fixed using Gimp.
After logging in and obtaining a command-line shell, download the pichart source archive and unpack it with the commands
$ wget http://fractal.math.unr.edu/~ejolson/pi ... urrent.tgz
$ tar zxf pichart-current.tgz
Alternatively, use a web browser to download this file. If your tar command doesn't natively support gzip compressed files unpack the archive using
$ gunzip <pichart-current.tgz | tar xf -
After unpacking the archive the source code will be contained in a subdirectory called pichart-current. Change to that directory and open the Makefile in a text editor. There are three lines that you may wish to change: CFLAGS, CC and TARGETS. By default these lines are set to build pichart-openmp and pichart-serial using the system C compiler with generic optimizations. Best performance, however, may be achieved by selecting architecture specific tuning options and a custom C compiler. If the chosen C compiler supports the Intel/MIT Cilk parallel programming extensions you may further want to include pichart-cilk in the list of build targets.
For this example we will be using a custom C compiler installed in /usr/local/gcc-6.5 and specify -march=native -mtune=native to build architecture optimized executables. As this compiler supports the Cilk parallel processing extensions, we also build pichart-cilk to compare with the OpenMP version. To do this, change the Makefile so the first lines read as
CFLAGS=-O3 -mtune=native -march=native -ffast-math -Wall -lm -lrt
CC=/usr/local/gcc-6.5/bin/gcc
TARGETS=pichart-serial pichart-openmp pichart-cilk
Note, if you are using a version of the GNU C compiler earlier than 5.x or later than 7.x, then it does not support Cilk and you should not add pichart-cilk to the TARGETS line. Note also that I've included the option -lrt because the clock_gettime function used by the timing routines in the benchmark are contained in the librt library of the particular version of Linux being used. Linking with -lrt is not usually needed.
Compile the source using the command
$ make
If all goes well, there should now be three executables in the directory: pichart-cilk, pichart-openmp and pichart-serial. If your computer only has a single processor core, then use run pichart-serial to create the pie chart; otherwise, use either pichart-openmp or pichart-cilk to make the comparison.
The computer being tested in this example is a dual-processor Intel Pentium III server running at 650 MHz. This machine has two cores so I'll run both pichart-cilk and pichart-openmp to see which one gives the best result. The programs accept an option -t which can be used to give a descriptive label to the benchmark run as it appears in the pie chart. Otherwise, the default label "My Computer" is used. Between each run I'll copy the output file pichart.svg to a safe place so it doesn't get overwritten by the next run.
Code: Select all
$ ./pichart-cilk -t "dual PIII 650MHz"
pichart -- Raspberry Pi Performance CILKPLUS version 23
Prime Sieve P=14630843 Threads=2 Sec=4.76001 Mops=196.287
Merge Sort N=16777216 Threads=2 Sec=10.6594 Mops=37.7746
Fourier Transform N=4194304 Threads=2 Sec=9.29035 Mflops=49.6616
Lorenz 96 N=32768 K=16384 Threads=2 Sec=28.9737 Mflops=111.178
Making pie charts...done.
$ cp pichart.svg p3cilk.svg
$ ./pichart-openmp -t "dual PIII 650MHz"
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=4.70759 Mops=198.473
Merge Sort N=16777216 Threads=4 Sec=10.9022 Mops=36.9333
Fourier Transform N=4194304 Threads=2 Sec=9.49918 Mflops=48.5698
Lorenz 96 N=32768 K=16384 Threads=2 Sec=27.9837 Mflops=115.111
Making pie charts...done.
$ cp pichart.svg p3openmp.svg
The resulting scalable vector graphics pie charts can be viewed using geeqie or by loading them into gimp. Gimp can also be used to convert the SVG format to, for example, PNG. For reference, here is a PNG image file corresponding to the Cilk pie chart that the above run produced:

Update: The computer tested above was actually a dual-processor Pentium III running at 650MHz. The labels on the pie chart have been fixed using Gimp.
Last edited by ejolson on Mon Dec 10, 2018 5:57 am, edited 5 times in total.
Re: A Pi Pie Chart
Here is another Pi pie chart, this time comparing the speed of a single-core Pentium 4 processor running at 1500MHz. This system is interesting due to the use of the somewhat controversial and expensive RAMBUS memory of the time. During testing it was discovered that the system compiler gcc 4.7.2 with flags -march=pentium4 -O3 -ffast-math led to incorrect results with the Lorenz 96 benchmark. Using the more recent gcc version 6.5.0 solved the problem.
The output from the run waswith the resulting pie chart

The output from the run was
Code: Select all
$ ./pichart-serial -t "P4 1500MHz"
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=1 Sec=4.74821 Mops=196.775
Merge Sort N=16777216 Threads=1 Sec=6.5265 Mops=61.6952
Fourier Transform N=4194304 Threads=2 Sec=5.90099 Mflops=78.1857
Lorenz 96 N=32768 K=16384 Threads=2 Sec=6.05222 Mflops=532.238
Making pie charts...done.

Re: A Pi Pie Chart
Here is a Pi pie chart comparing the speed of a Pentium 4 D 2.8GHz processor. These dual-core processors were used in many desktop computers at the peak of Pentium 4 popularity. The timingsled to the pie chart

which is notable because of the low performance on Merge Sort and the exceptional performance on Lorenz 96. To confirm proper scaling to both cores the serial code was also run to obtainThis shows near linear scaling for every metric except Fourier Transform where dual-core performance was likely constrained by memory bandwidth.
Code: Select all
$ ./pichart-openmp -t "P4D 2.8GHz"
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=1.4254 Mops=655.484
Merge Sort N=16777216 Threads=4 Sec=2.09171 Mops=192.499
Fourier Transform N=4194304 Threads=2 Sec=2.05578 Mflops=224.427
Lorenz 96 N=32768 K=16384 Threads=2 Sec=0.993851 Mflops=3241.16
Making pie charts...done.

which is notable because of the low performance on Merge Sort and the exceptional performance on Lorenz 96. To confirm proper scaling to both cores the serial code was also run to obtain
Code: Select all
$ ./pichart-serial -t "P4D 2.8Ghz"
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=1 Sec=2.83992 Mops=328.998
Merge Sort N=16777216 Threads=1 Sec=4.14171 Mops=97.2192
Fourier Transform N=4194304 Threads=2 Sec=2.88016 Mflops=160.19
Lorenz 96 N=32768 K=16384 Threads=1 Sec=1.80798 Mflops=1781.67
Making pie charts...done.
Re: A Pi Pie Chart
This is an odd one. I'm sure I could make it run faster, but this is all I could do with stock development tools:
Power Mac G5 tower with a dual-core 2.0 GHz PowerPC G5 (970MP). In 2005 it was quite a nifty machine. Now, not so much. It has four large fans running push-pull into a toaster-sized CPU heatsink. It has a 450 W power supply fed from a high current (C19) connector. It is heavier than some Code Club graduates.
I'm sure there were some cleverer options I could have used for compilation, but gcc-4.2 doesn't know about -march options. The compiler complained about some OpenMP pragmas. And yes, I didn't know about the '-t' option to edit the machine name …
I have a fractionally faster 2.0 GHz dual-processor (970FX) G5, but it's running an older OS that doesn't have gcc-4.2. I'm not sure if its MP support is as good. It weighs as much as several Code Club graduates …
This is from an Apple Code: Select all
./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=3.83142 Mops=243.86
Merge Sort N=16777216 Threads=4 Sec=4.34331 Mops=92.7064
Fourier Transform N=4194304 Threads=1 Sec=6.78003 Mflops=68.0488
Lorenz 96 N=32768 K=16384 Threads=1 Sec=2.22188 Mflops=1449.78
Making pie charts...done.
bigmac:pichart-current scruss$ uname -a
Darwin bigmac.local 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh
bigmac:pichart-current scruss$ gcc-4.2 --version
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I have a fractionally faster 2.0 GHz dual-processor (970FX) G5, but it's running an older OS that doesn't have gcc-4.2. I'm not sure if its MP support is as good. It weighs as much as several Code Club graduates …
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
The gcc implementation of OpenMP has improved dramatically in recent years to include efficient support for dynamic parallelism. Since all four performance tests were originally written as Cilk parallel programs, they rely quite significantly on that feature. Maybe it's time to install gcc 6.3 or better.
Have you compared pichart-openmp with pichart-serial to check whether you are getting roughly double the performance when running on two cores?
Last edited by ejolson on Sat Dec 08, 2018 4:25 am, edited 1 time in total.
Re: A Pi Pie Chart
Maybe; as I said, I just installed the latest packaged Xcode compiler bundles that work with 10.5 PPC. Most Linux distros have dropped support for PPC, so I'd have to build it from source, which barely seems worth it.
Last edited by scruss on Sat Dec 08, 2018 11:36 am, edited 1 time in total.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
Those PowerPC Macintosh computers--especially the big silver coloured towers--were quite impressive. I think what you have measured so far is the single core performance. Unfortunately, the only Macintosh computers I have for testing are modern iMacs without toaster-sized heatsinks that seem engineered to toast their hard disks instead.
It looks like Debian no longer boots on the G5, but maybe NetBSD would work. That processor architecture definitely presents a different balance of performance characteristics than the Intel and ARM chips. It would also be interesting to see how the DEC Alpha and Sun SPARC compare.
Re: A Pi Pie Chart
Here is a Pi pie chart comparing the speed of a Pentium 4 HT 3.4GHz processor with hyper-threading technology turned on in EM64T mode.

Although there is only one CPU core, hyper threading yielded a 5 to 50 percent performance increase in the benchmarks. Even though Merge Sort enjoyed the greatest performance boost from hyper threading, it is also the metric in which the Pentium 4 architecture lagged farthest behind the Raspberry Pi. The exact performance numbers follow.Tests were performed using gcc version 8.1 running under version 5.4 of DragonflyBSD.

Although there is only one CPU core, hyper threading yielded a 5 to 50 percent performance increase in the benchmarks. Even though Merge Sort enjoyed the greatest performance boost from hyper threading, it is also the metric in which the Pentium 4 architecture lagged farthest behind the Raspberry Pi. The exact performance numbers follow.
Code: Select all
$ ./pichart-serial -t "P4 3.4GHz"
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=2 Sec=2.55177 Mops=366.15
Merge Sort N=16777216 Threads=2 Sec=3.43348 Mops=117.273
Fourier Transform N=4194304 Threads=1 Sec=2.60827 Mflops=176.888
Lorenz 96 N=32768 K=16384 Threads=1 Sec=1.24317 Mflops=2591.14
Making pie charts...done.
$ ./pichart-openmp -t "P4 3.4GHz"
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=4 Sec=2.18189 Mops=428.219
Merge Sort N=16777216 Threads=4 Sec=2.18205 Mops=184.53
Fourier Transform N=4194304 Threads=2 Sec=2.12401 Mflops=217.218
Lorenz 96 N=32768 K=16384 Threads=2 Sec=1.18418 Mflops=2720.23
Making pie charts...done.
Re: A Pi Pie Chart
A Pi pie chart for the ASUS Tinker Board which uses a quad-core ARM Cortex A17 CPU running at 1.8 GHz was independently posted in a different thread. I have regenerated the chart and posted the results here so they can be more easily compared.

While the Tinker Board runs faster than the Raspberry Pi, it should be noted that the Cortex A17 is a 32-bit only processor while the A53 used in the 3B+ is a 64-bit processor currently running in 32-bit compatibility mode. In particular, it is possible to enjoy a variety of different 64-bit operating systems on the Raspberry Pi 3B+ that do not work on the Tinker Board. While I don't expect the mode of operation to make significant performance differences for these metrics, there are some applications that run significantly faster in 64-bit mode. Still, it would be interesting to check how much of a difference the mode of operation makes to the performance of a Pi 3B+ running the pie chart benchmarks.

While the Tinker Board runs faster than the Raspberry Pi, it should be noted that the Cortex A17 is a 32-bit only processor while the A53 used in the 3B+ is a 64-bit processor currently running in 32-bit compatibility mode. In particular, it is possible to enjoy a variety of different 64-bit operating systems on the Raspberry Pi 3B+ that do not work on the Tinker Board. While I don't expect the mode of operation to make significant performance differences for these metrics, there are some applications that run significantly faster in 64-bit mode. Still, it would be interesting to check how much of a difference the mode of operation makes to the performance of a Pi 3B+ running the pie chart benchmarks.
Code: Select all
$ tar zxf pichart-current.tgz
$ cd pichart-current
$ make
gcc -std=gnu99 -O3 -ffast-math -Wall -lm -o pichart-serial pichart.c util.c sieve.c merge.c fourier.c lorenz.c
gcc -std=gnu99 -O3 -ffast-math -Wall -lm -fopenmp -o pichart-openmp pichart.c util.c sieve.c merge.c fourier.c lorenz.c
$ ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=4 Sec=0.805925 Mops=1159.32
Merge Sort N=16777216 Threads=8 Sec=0.766017 Mops=525.645
Fourier Transform N=4194304 Threads=4 Sec=1.24434 Mflops=370.779
Lorenz 96 N=32768 K=16384 Threads=4 Sec=0.764573 Mflops=4213.1
Making pie charts...done.
Re: A Pi Pie Chart
Just in curiosity, I wonder if anyone could test this on a FireBee (Atari Coldfire Project) computer and on an AmigaOne (800MHz PowerPC 75xxx) to see how they compare with the RPi systems. Would be interesting to see how the high end of the other good and usable desktop computers compare with the RPi.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers
Re: A Pi Pie Chart
I have a friend with an AmigaOne; I'll see what I can do. I'd be surprised if it was even half as fast as the Mac G5, though. I did consider running the program on my NAS which has a Qoriq dual-core Power CPU, but I suspect it doesn't have much memory or an up-to-date GCC.
In other tests, I rescued a ThinkPad R51 (1.6 GHz Pentium-M) from under a pile of woodshavings at the makerspace. It hadn't been turned on for a couple of years, so its BIOS had forgotten everything. Even with an update, its newest GCC is 5.5. Can only run the serial test: So a fairly good case for replacing old computers with Raspberry Pis for energy efficiency. Unlike a Raspberry Pi, it does have a real parallel port which the CNC folks seem to love. It had real problems with the OpenMP code, segfaulting:
Having time to kill while a large laser etch job was running, I tried my somewhat old MacBook. mac OS doesn't ship with GCC, but uses a compatible front-end to LLVM. It had problems -
In other tests, I rescued a ThinkPad R51 (1.6 GHz Pentium-M) from under a pile of woodshavings at the makerspace. It hadn't been turned on for a couple of years, so its BIOS had forgotten everything. Even with an update, its newest GCC is 5.5. Can only run the serial test: So a fairly good case for replacing old computers with Raspberry Pis for energy efficiency. Unlike a Raspberry Pi, it does have a real parallel port which the CNC folks seem to love. It had real problems with the OpenMP code, segfaulting:
Code: Select all
(gdb) run -t 'R51 Debug'
Starting program: /home/user/Downloads/pichart-current/pichart-openmp -t 'R51 Debug'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve [New Thread 0xb5dbeb40 (LWP 4220)]
Thread 1 "pichart-openmp" received signal SIGSEGV, Segmentation fault.
0x0804930d in setrange (jmax=268435456, jmin=268304400, imax=<optimized out>)
at sieve.c:57
57 if(!getbit(notPrimebits,i)){
Code: Select all
$ time ./pichart-serial -t 'MacBook5,1 Core 2 Duo 2.4 GHz'
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=2 Sec=2.039 Mops=458.23
Merge Sort N=16777216 Threads=2 Sec=2.99945 Mops=134.243
Fourier Transform N=4194304 Threads=2 Sec=1.62804 Mflops=283.393
Lorenz 96 Error 2.96128e-11 in parallel solver at index 0!
real 1m29.162s
user 1m27.686s
sys 0m0.502s
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
With gcc version 5.5 it is possible the cilk-parallel code would work.
With fast-math enabled the LLVM compiler appears to sequence the floating-point operations differently for the serial, cache-blocked and parallel versions of the code. In turn, this creates differences in rounding errors which grow to large sizes by the end of the run due to the sensitive dependence on initial conditions of the deterministic chaos represented by the Lorenz 96 dynamics.
Hopefully removing the fast-math option allows a LLVM-compiled Lorenz 96 to pass the consistency check. If not, try different optimization levels or simply remove the check and hope for the best.
I think you need to remove the fast-math option from LLVM to run the Lorenz 96 timing. It might be closer to unsafe-math on gcc.scruss wrote: ↑Sat Dec 15, 2018 3:55 pmHaving time to kill while a large laser etch job was running, I tried my somewhat old MacBook. mac OS doesn't ship with GCC, but uses a compatible front-end to LLVM. It had problems -Code: Select all
$ time ./pichart-serial -t 'MacBook5,1 Core 2 Duo 2.4 GHz' pichart -- Raspberry Pi Performance Serial version 23 Prime Sieve P=14630843 Threads=2 Sec=2.039 Mops=458.23 Merge Sort N=16777216 Threads=2 Sec=2.99945 Mops=134.243 Fourier Transform N=4194304 Threads=2 Sec=1.62804 Mflops=283.393 Lorenz 96 Error 2.96128e-11 in parallel solver at index 0! real 1m29.162s user 1m27.686s sys 0m0.502s
With fast-math enabled the LLVM compiler appears to sequence the floating-point operations differently for the serial, cache-blocked and parallel versions of the code. In turn, this creates differences in rounding errors which grow to large sizes by the end of the run due to the sensitive dependence on initial conditions of the deterministic chaos represented by the Lorenz 96 dynamics.
Hopefully removing the fast-math option allows a LLVM-compiled Lorenz 96 to pass the consistency check. If not, try different optimization levels or simply remove the check and hope for the best.
Re: A Pi Pie Chart
I'm curious to see a test of the new Pi 3A+ model.
It *should* be the same as the 3B+, but....
It *should* be the same as the 3B+, but....
Re: A Pi Pie Chart
The biggest thing I've noticed is that the code won't build on many systems unless you move CFLAGS to the end of the line:
Code: Select all
pichart-serial: $(SOURCE) pichart.h
$(CC) -o pichart-serial $(SOURCE) $(CFLAGS)
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
It may be important for the optimiser flags to occur early on the command line and the linker flags the end. I guess lumping everything into CFLAGS, while it works for gcc, may not have been such a good idea in general. Until I update the Makefile to correct this, it should be possible to sort things out by hand if necessary.scruss wrote: ↑Sat Dec 15, 2018 9:54 pmThe biggest thing I've noticed is that the code won't build on many systems unless you move CFLAGS to the end of the line:Code: Select all
pichart-serial: $(SOURCE) pichart.h $(CC) -o pichart-serial $(SOURCE) $(CFLAGS)
Re: A Pi Pie Chart
The previous $(CFLAGS) position absolutely failed to compile for me on gcc-7.3: it crapped out, failing to find -lm
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
Agreed. The -lm should go at the end. On the other hand, the -O3 along with the -march and -mtune settings should be at the beginning. I'll put up a new version soon.
Re: A Pi Pie Chart
Over the break I found time to get gcc-6 built on the G5 tower under OS X 10.5. make bootstrap took about three hours. The results were better, but still quite a good case for not using old computers:
So, the G5 is pretty close to twice as fast if two cores are used. Curiously, compiling for G5 (-O3 -mcpu=G5 -mtune=G5 -ffast-math) ran fractionally but not interestingly slower. Since OS X never got to be fully 64-bit, these are 32-bit results. 64-bit Linux is alive and well on little-endian PPC, but G5s aren't that.
Ran the benchmark on a BeagleBone Black, but results are too embarrassing to post here. Failed to run it on an Onion Omega2+ (580 MHz MIPS - MT7688 SoC) as the cross-compiler environment is broken. For extra hilarity, I could run it on a Via APC (the first "Raspberry Pi Killer" which, uh, didn't) but it would likely have an ancient gcc. In a way, I kind of regret getting rid of my Intel Galileo (400 MHz pentium-ish Quark SoC) because I'm sure that these benchmarks would run on it — eventually.
Incidentally, a quicker/smaller way of converting the output SVG to PNG would be using cairosvg: sudo apt install python3-cairosvg, and then
You might have to pngcrunch the output to get in under this board's sensible upload limit.
Though I'm only posting the OpenMP image, here's the results for both it and serial:
Code: Select all
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=1 Sec=3.32331 Mops=281.144
Merge Sort N=16777216 Threads=2 Sec=3.91997 Mops=102.718
Fourier Transform N=4194304 Threads=1 Sec=5.9601 Mflops=77.4103
Lorenz 96 N=32768 K=16384 Threads=2 Sec=2.41353 Mflops=1334.65
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=1.8036 Mops=518.035
Merge Sort N=16777216 Threads=2 Sec=2.10441 Mops=191.338
Fourier Transform N=4194304 Threads=2 Sec=3.43897 Mflops=134.161
Lorenz 96 N=32768 K=16384 Threads=2 Sec=1.459 Mflops=2207.84
Ran the benchmark on a BeagleBone Black, but results are too embarrassing to post here. Failed to run it on an Onion Omega2+ (580 MHz MIPS - MT7688 SoC) as the cross-compiler environment is broken. For extra hilarity, I could run it on a Via APC (the first "Raspberry Pi Killer" which, uh, didn't) but it would likely have an ancient gcc. In a way, I kind of regret getting rid of my Intel Galileo (400 MHz pentium-ish Quark SoC) because I'm sure that these benchmarks would run on it — eventually.
Incidentally, a quicker/smaller way of converting the output SVG to PNG would be using cairosvg: sudo apt install python3-cairosvg, and then
Code: Select all
cairosvg pichart.svg -o pichart.png
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
It's nice to see both cores working together.
The balance of performance is interesting: The G5 looks much faster than a 3B+ on the Lorenz dynamical simulation and much slower than a 3B+ on the merge sort.
I wonder if gcc is properly vectorising the ARM executable for Lorenz.
Re: A Pi Pie Chart
or there's the possibility that merge sort is hitting the G5's slower RAM (it's only PC2-4200 DDR2), while Lorenz is able to fit everything in cache and take advantage of the PowerPC 970MP's multiple (and fairly powerful, for its age) floating point units.
I genuinely have very little idea how these things affect overall performance, though.
I genuinely have very little idea how these things affect overall performance, though.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him