jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Fri May 17, 2019 8:27 am

ejolson wrote:
Wed May 15, 2019 12:00 am
jahboater wrote:
Tue May 14, 2019 8:04 pm
Here are the six runs with increasing CPU counts (count: Pi Ratio)

Code: Select all

1:  3.06463
2:  4.50112
3:  17.1603
4:  25.8384
5:  31.6286
6:  41.9504
That output seems to confirm that 0 and 1 are the little cores. It also seems to indicate some sort of throttling. For example, the performance when switching from one to two cores should about double, whereas you have only a 1.47 factor increase. I'd also expect a single little core to have a pi ratio closer to 5 not 3. That's a sign of something not being right. Maybe a shim or some thermal paste would improve things.

Alternatively, maybe the scheduler is acting weird because processor affinity is set to the little cores which then become compute bound. Have you checked if changing the performance setting of the Linux scheduler makes a difference?
I believe the heat sink is operating correctly. With all six cores maxed out for several hours, I saw no reduction in CPU frequency. Placing the board on edge so the fins were vertically aligned reduced the temp by several degrees.
HK do claim that the 12nm N2 "does not throttle". The C2 heatsink did have decent thermal paste, so I am presuming the N2 does too otherwise its pointless them fitting this expensive heatsink!

I'll investigate the Linux scheduling ....

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Fri May 17, 2019 2:56 pm

bensimmo wrote:
Fri May 17, 2019 8:10 am
The T5800 (and I can do others) is mine, *may have throttled*.
I'll get soem others up at some point.
I'll see if the 1GHz PentiumIII-m works
When I look at the chart above, the T5800 does yield relatively low performance on the Lorenz 96 simulation uncharacteristic of Intel architecture chips of that vintage. It is possible to rerun only one test by specifying the -r8 option. That would allow you to get a Lorenz timing before the other tests overheat your notebook computer.

Alternatively, you could point a hairdryer set to cold (heating elements turned off) into a suitable vent on the notebook and probably force enough air through it that the CPU doesn't throttle during the benchmark run.
Last edited by ejolson on Tue May 21, 2019 3:32 am, edited 3 times in total.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Fri May 17, 2019 6:10 pm

jahboater wrote:
Fri May 17, 2019 8:27 am
ejolson wrote:
Wed May 15, 2019 12:00 am
jahboater wrote:
Tue May 14, 2019 8:04 pm
Here are the six runs with increasing CPU counts (count: Pi Ratio)

Code: Select all

1:  3.06463
2:  4.50112
3:  17.1603
4:  25.8384
5:  31.6286
6:  41.9504
That output seems to confirm that 0 and 1 are the little cores. It also seems to indicate some sort of throttling. For example, the performance when switching from one to two cores should about double, whereas you have only a 1.47 factor increase. I'd also expect a single little core to have a pi ratio closer to 5 not 3. That's a sign of something not being right. Maybe a shim or some thermal paste would improve things.

Alternatively, maybe the scheduler is acting weird because processor affinity is set to the little cores which then become compute bound. Have you checked if changing the performance setting of the Linux scheduler makes a difference?
I believe the heat sink is operating correctly. With all six cores maxed out for several hours, I saw no reduction in CPU frequency. Placing the board on edge so the fins were vertically aligned reduced the temp by several degrees.
HK do claim that the 12nm N2 "does not throttle". The C2 heatsink did have decent thermal paste, so I am presuming the N2 does too otherwise its pointless them fitting this expensive heatsink!

I'll investigate the Linux scheduling ....
Given that four A57 cores yield a Pi ratio of 38.5 as seen here, a final Pi ratio of 42 seems reasonable for using all four A73 cores along with the two A53 cores.

I find it strange that two A53 cores in the N2 are not approximately double the performance of one, because on the Raspberry Pi they are. In particular, for the 3B+ one gets

Code: Select all

1: 3.64594
2: 7.02809
3: 9.72154
4. 12.2765
which shows a near exact factor of two scaling between one and two cores. Moreover, how could two Cortex-A53 cores running at 1.4GHz in the Pi 3B+ outperform two of the same kind of core in the N2 that are supposedly clocked in excess of 1.8 GHz?

For reference the output from the the Raspberry Pi 3B+ is

Code: Select all

$ taskset -c 0 ./pichart-openmp 
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=1 Sec=6.49964 Mops=143.751
Merge Sort           N=16777216 Workers=2 Sec=3.77389 Mops=106.695
Fourier Transform    N=4194304 Workers=1 Sec=6.63308 Mflops=69.5564
Lorenz 96            N=32768 K=16384 Workers=1 Sec=12.0904 Mflops=266.428

My Computer has Raspberry Pi ratio=3.64594
Making pie charts...done.
$ taskset -c 0,1 ./pichart-openmp 
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=3.23417 Mops=288.892
Merge Sort           N=16777216 Workers=4 Sec=1.91323 Mops=210.457
Fourier Transform    N=4194304 Workers=2 Sec=3.72711 Mflops=123.788
Lorenz 96            N=32768 K=16384 Workers=2 Sec=6.17764 Mflops=521.433

My Computer has Raspberry Pi ratio=7.02809
Making pie charts...done.
$ taskset -c 0,1,2 ./pichart-openmp 
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=3 Sec=2.1494 Mops=434.693
Merge Sort           N=16777216 Workers=6 Sec=1.35625 Mops=296.888
Fourier Transform    N=4194304 Workers=6 Sec=3.10515 Mflops=148.583
Lorenz 96            N=32768 K=16384 Workers=3 Sec=4.29928 Mflops=749.248

My Computer has Raspberry Pi ratio=9.72154
Making pie charts...done.
$ taskset -c 0,1,2,3 ./pichart-openmp 
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=1.61066 Mops=580.089
Merge Sort           N=16777216 Workers=8 Sec=1.11224 Mops=362.019
Fourier Transform    N=4194304 Workers=4 Sec=2.61202 Mflops=176.635
Lorenz 96            N=32768 K=16384 Workers=4 Sec=3.27035 Mflops=984.979

My Computer has Raspberry Pi ratio=12.2765
Making pie charts...done.
$ sudo vcgencmd get_throttled
throttled=0x0
Graphed as a Pi pie chart the 3B+ per-core scaling looks like

Image

User avatar
bensimmo
Posts: 4129
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Fri May 17, 2019 7:40 pm

Just a quick run, first go.

Code: Select all

[email protected]:~/pichart-30$ make
gcc -std=gnu99 -O3 -mtune=native -march=native -Wall -o pichart-serial pichart.c util.c sieve.c merge.c fourier.c lorenz.c -lm
gcc -std=gnu99 -O3 -mtune=native -march=native -Wall -fopenmp -o pichart-openmp pichart.c util.c sieve.c merge.c fourier.c lorenz.c -lm
[email protected]:~/pichart-30$ ls
fourier.c  lorenz.c  Makefile  merge.c  pichart.c  pichart.h  pichart-openmp  pichart-serial  pichart.svg  sieve.c  util.c
[email protected]:~/pichart-30$ ./pichart-openmp -t WLS Ubuntu i5-4460
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=8 Sec=0.260955 Mops=3580.42
Merge Sort           N=16777216 Workers=8 Sec=0.520772 Mops=773.185
Fourier Transform    N=4194304 Workers=4 Sec=0.236176 Mflops=1953.51
Lorenz 96            N=32768 K=16384 Workers=8 Sec=0.102943 Mflops=31291.4

The WLS has Raspberry Pi ratio=101.276
Making pie charts...done.
[email protected]:~/pichart-30$

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Fri May 17, 2019 7:48 pm

bensimmo wrote:
Fri May 17, 2019 7:40 pm
Just a quick run, first go.

Code: Select all

[email protected]:~/pichart-30$ make
gcc -std=gnu99 -O3 -mtune=native -march=native -Wall -o pichart-serial pichart.c util.c sieve.c merge.c fourier.c lorenz.c -lm
gcc -std=gnu99 -O3 -mtune=native -march=native -Wall -fopenmp -o pichart-openmp pichart.c util.c sieve.c merge.c fourier.c lorenz.c -lm
[email protected]:~/pichart-30$ ls
fourier.c  lorenz.c  Makefile  merge.c  pichart.c  pichart.h  pichart-openmp  pichart-serial  pichart.svg  sieve.c  util.c
[email protected]:~/pichart-30$ ./pichart-openmp -t WLS Ubuntu i5-4460
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=8 Sec=0.260955 Mops=3580.42
Merge Sort           N=16777216 Workers=8 Sec=0.520772 Mops=773.185
Fourier Transform    N=4194304 Workers=4 Sec=0.236176 Mflops=1953.51
Lorenz 96            N=32768 K=16384 Workers=8 Sec=0.102943 Mflops=31291.4

The WLS has Raspberry Pi ratio=101.276
Making pie charts...done.
[email protected]:~/pichart-30$
Looks great. If you want spaces in the name of the machine use quotes on the command line like this

Code: Select all

$ ./pichart-openmp -t "WLS Ubuntu i5-4460"

User avatar
bensimmo
Posts: 4129
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Fri May 17, 2019 7:58 pm

It's command line only, so need to move it to the windows folders to view.

Serial lets the processor run in turbo as it's not using all cores.
Interestingly (maybe as I don't know the ins and out) in openmp the cpu trace for prime had them all max out, so stick at 4x 3.2GHz. the Merge seemed to drop rapidly to single core(maybe) and 33% CPU utilisation but could trigger turbo mode. FT and L96 both ramped down pretty quickly to a smaller cpu 33%ish utilisation but at 3.2GHz None of these three hit max CPU in any of the core traces, not even at the start.

(using Task Manager in Win10, but it's pretty good)


EDIT
pichart-serial Win10 i5-4460 WLS Ubuntu

Code: Select all

[email protected]:~/pichart-30$ ./pichart-serial
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.00909 Mops=925.912
Merge Sort           N=16777216 Workers=1 Sec=1.8882 Mops=213.248
Fourier Transform    N=4194304 Workers=1 Sec=0.732049 Mflops=630.25
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.301821 Mflops=10672.6

My Computer has Raspberry Pi ratio=30.1441
Making pie charts...done.
[email protected]:~/pichart-30$ ./pichart-serial
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.00763 Mops=927.257
Merge Sort           N=16777216 Workers=1 Sec=1.89453 Mops=212.535
Fourier Transform    N=4194304 Workers=2 Sec=0.744321 Mflops=619.858
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.311009 Mflops=10357.4

My Computer has Raspberry Pi ratio=29.7807
really liking the Pi Ratio as it's easy to understand.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat May 18, 2019 3:27 pm

bensimmo wrote:
Fri May 17, 2019 7:58 pm
FT and L96 both ramped down pretty quickly to a smaller cpu 33%ish utilisation but at 3.2GHz None of these three hit max CPU in any of the core traces, not even at the start.
I'm also happy having a single number that I can use to compare average performance relative to the original 700 MHz Pi B computer.

My recollection is that a nontrivial amount of time is spent initializing memory and checking answers for the Fourier transform and Lorenz 96 computations. When the timed part of the calculation is running it uses all cores, at least initially, until it ramps down as part of the automatic tuning.

I thought about changing the code to avoid the delays from initialising and checking: On one hand running all cores full out for a long time would make sure the clock speeds are set to maximum; on the other hand doing this is also more likely to cause throttling. The presence of turbo boost adds one more variable that further complicates things. As the trade-off was not clear, I simply left the code unchanged from the way it is.

It may be possible to obtain more deterministic results by setting the CPU governor to performance, the minimum clock speed equal to the maximum and turning off turbo boost. It would be useful to know such a setting for Windows. However even with the default settings, Raspbian and many other systems (with the apparent exception of the N2) give pretty consistent results.

Do you have anything new to report for the T5800?

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sat May 18, 2019 5:03 pm

As for the N2, the problem persists:
I changed the CPU scaling governor from "interactive" to "performance", increased the priority of the runs (nice --20), and made sure the desktop was logged out. Same odd result for the first two "little" cpu's

Code: Select all

$ sudo nice --20 ./piratio.sh 
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=3.80874 Mops=245.312
Merge Sort           N=16777216 Workers=1 Sec=5.23774 Mops=76.8754
Fourier Transform    N=4194304 Workers=1 Sec=13.7535 Mflops=33.5459
Lorenz 96            N=32768 K=16384 Workers=1 Sec=12.785 Mflops=251.954

My Computer has Raspberry Pi ratio=3.15507
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.90152 Mops=491.359
Merge Sort           N=16777216 Workers=4 Sec=4.10234 Mops=98.152
Fourier Transform    N=4194304 Workers=2 Sec=8.69264 Mflops=53.0764
Lorenz 96            N=32768 K=16384 Workers=4 Sec=13.8154 Mflops=233.162

My Computer has Raspberry Pi ratio=4.38891
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=1.18136 Mops=790.895
Merge Sort           N=16777216 Workers=6 Sec=1.73341 Mops=232.289
Fourier Transform    N=4194304 Workers=6 Sec=2.00193 Mflops=230.464
Lorenz 96            N=32768 K=16384 Workers=1 Sec=1.35678 Mflops=2374.17

My Computer has Raspberry Pi ratio=15.811
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=8 Sec=0.770036 Mops=1213.36
Merge Sort           N=16777216 Workers=8 Sec=1.01626 Mops=396.212
Fourier Transform    N=4194304 Workers=8 Sec=1.36911 Mflops=336.987
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.74693 Mflops=4312.62

My Computer has Raspberry Pi ratio=25.672
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=10 Sec=0.595456 Mops=1569.1
Merge Sort           N=16777216 Workers=10 Sec=0.724951 Mops=555.421
Fourier Transform    N=4194304 Workers=10 Sec=1.08681 Mflops=424.521
Lorenz 96            N=32768 K=16384 Workers=10 Sec=0.602084 Mflops=5350.13

My Computer has Raspberry Pi ratio=33.3063
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=0.479973 Mops=1946.63
Merge Sort           N=16777216 Workers=12 Sec=0.575958 Mops=699.101
Fourier Transform    N=4194304 Workers=12 Sec=0.996301 Mflops=463.086
Lorenz 96            N=32768 K=16384 Workers=12 Sec=0.453332 Mflops=7105.66

My Computer has Raspberry Pi ratio=40.8474
Making pie charts...done.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat May 18, 2019 5:33 pm

jahboater wrote:
Sat May 18, 2019 5:03 pm
As for the N2, the problem persists:
I changed the CPU scaling governor from "interactive" to "performance", increased the priority of the runs (nice --20), and made sure the desktop was logged out.
It is very strange that the Lorenz 96 calculation runs slower when two cores are available. Although non-Pi problems are best solved in other forums, as the present difficulty involves my Raspberry pie charts, there may be a bug in the code which I'd like to track down.

Could you try running with the -r flag so only the Lorenz 96 test is performed using the command

Code: Select all

$ taskset -c 0,1 ./pichart-openmp -r8
Thanks for the help in checking if there is anything wrong with my program.
Last edited by ejolson on Sat May 18, 2019 5:45 pm, edited 1 time in total.

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sat May 18, 2019 5:43 pm

Here is the result:

Code: Select all

$ taskset -c 0,1 ./pichart-openmp -r8
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.534 Mflops=1271.2

My Computer has Raspberry Pi ratio=2.18465
Making pie charts...done.
and some more runs

Code: Select all

$ taskset -c 0 ./pichart-openmp -r8
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=1 Sec=4.5217 Mflops=712.393

My Computer has Raspberry Pi ratio=1.8902
Making pie charts...done.
[email protected]:~/pichart-30$ taskset -c 1 ./pichart-openmp -r8
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=1 Sec=4.43363 Mflops=726.543

My Computer has Raspberry Pi ratio=1.89952
Making pie charts...done.
[email protected]:~/pichart-30$ taskset -c 0,1 ./pichart-openmp -r8
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.52734 Mflops=1274.55

My Computer has Raspberry Pi ratio=2.18609
Making pie charts...done.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat May 18, 2019 5:48 pm

jahboater wrote:
Sat May 18, 2019 5:43 pm
Here is the result:

Code: Select all

$ taskset -c 0,1 ./pichart-openmp -r8
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.534 Mflops=1271.2

My Computer has Raspberry Pi ratio=2.18465
Making pie charts...done.
The result of 1271 Mflops is a lot faster than 233 Mflops from the previous run. What happens if you loop that test with something like

Code: Select all

$ while true; do taskset -c 0,1 ./pichart-openmp -r8; done
Update: Fixed typo in previous line.
Last edited by ejolson on Sat May 18, 2019 5:55 pm, edited 1 time in total.

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sat May 18, 2019 5:58 pm

Code: Select all

$ while true; do taskset -c 0,1 ./pichart-openmp -r8; done
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50814 Mflops=1284.31

My Computer has Raspberry Pi ratio=2.19026
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.51884 Mflops=1278.85

My Computer has Raspberry Pi ratio=2.18793
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.5091 Mflops=1283.82

My Computer has Raspberry Pi ratio=2.19005
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.49342 Mflops=1291.89

My Computer has Raspberry Pi ratio=2.19348
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50477 Mflops=1286.04

My Computer has Raspberry Pi ratio=2.19099
Making pie charts...done.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat May 18, 2019 6:02 pm

jahboater wrote:
Sat May 18, 2019 5:58 pm

Code: Select all

$ while true; do taskset -c 0,1 ./pichart-openmp -r8; done
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50814 Mflops=1284.31

My Computer has Raspberry Pi ratio=2.19026
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.51884 Mflops=1278.85

My Computer has Raspberry Pi ratio=2.18793
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.5091 Mflops=1283.82

My Computer has Raspberry Pi ratio=2.19005
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.49342 Mflops=1291.89

My Computer has Raspberry Pi ratio=2.19348
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50477 Mflops=1286.04

My Computer has Raspberry Pi ratio=2.19099
Making pie charts...done.
The plot takes sudden and expected turn. I assume you are running in 64-bit mode. What version of gcc or clang are you compiling with?
Last edited by ejolson on Sat May 18, 2019 6:11 pm, edited 1 time in total.

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sat May 18, 2019 6:10 pm

ejolson wrote:
Sat May 18, 2019 6:02 pm
jahboater wrote:
Sat May 18, 2019 5:58 pm

Code: Select all

$ while true; do taskset -c 0,1 ./pichart-openmp -r8; done
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50814 Mflops=1284.31

My Computer has Raspberry Pi ratio=2.19026
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.51884 Mflops=1278.85

My Computer has Raspberry Pi ratio=2.18793
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.5091 Mflops=1283.82

My Computer has Raspberry Pi ratio=2.19005
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.49342 Mflops=1291.89

My Computer has Raspberry Pi ratio=2.19348
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50477 Mflops=1286.04

My Computer has Raspberry Pi ratio=2.19099
Making pie charts...done.
That is very strange. I assume you are running in 64-bit mode. What version of gcc or clang are you compiling with?
GCC 9.1 and yes, these things are always 64-bits.
The ram is 4GB.
I have just done another run

Code: Select all

pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=3.79253 Mops=246.36
Merge Sort           N=16777216 Workers=1 Sec=3.54395 Mops=113.617
Fourier Transform    N=4194304 Workers=2 Sec=4.79467 Mflops=96.2262
Lorenz 96            N=32768 K=16384 Workers=1 Sec=4.5828 Mflops=702.894

My Computer has Raspberry Pi ratio=5.85722
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.88358 Mops=496.039
Merge Sort           N=16777216 Workers=4 Sec=1.78598 Mops=225.453
Fourier Transform    N=4194304 Workers=2 Sec=2.6631 Mflops=173.247
Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50512 Mflops=1285.86

My Computer has Raspberry Pi ratio=11.1558
Making pie charts...done.
I'll change the shell script to echo the commands, but this was taskset -c 0 and then taskset -c 0,1
and the figures seem more plausible?

I may have set the governors wrong - looks like I have to set the policy.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat May 18, 2019 6:15 pm

jahboater wrote:
Sat May 18, 2019 6:10 pm
ejolson wrote:
Sat May 18, 2019 6:02 pm
jahboater wrote:
Sat May 18, 2019 5:58 pm

Code: Select all

$ while true; do taskset -c 0,1 ./pichart-openmp -r8; done
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50814 Mflops=1284.31

My Computer has Raspberry Pi ratio=2.19026
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.51884 Mflops=1278.85

My Computer has Raspberry Pi ratio=2.18793
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.5091 Mflops=1283.82

My Computer has Raspberry Pi ratio=2.19005
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.49342 Mflops=1291.89

My Computer has Raspberry Pi ratio=2.19348
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50477 Mflops=1286.04

My Computer has Raspberry Pi ratio=2.19099
Making pie charts...done.
That is very strange. I assume you are running in 64-bit mode. What version of gcc or clang are you compiling with?
GCC 9.1 and yes, these things are always 64-bits.
The ram is 4GB.
I have just done another run

Code: Select all

pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=3.79253 Mops=246.36
Merge Sort           N=16777216 Workers=1 Sec=3.54395 Mops=113.617
Fourier Transform    N=4194304 Workers=2 Sec=4.79467 Mflops=96.2262
Lorenz 96            N=32768 K=16384 Workers=1 Sec=4.5828 Mflops=702.894

My Computer has Raspberry Pi ratio=5.85722
Making pie charts...done.
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.88358 Mops=496.039
Merge Sort           N=16777216 Workers=4 Sec=1.78598 Mops=225.453
Fourier Transform    N=4194304 Workers=2 Sec=2.6631 Mflops=173.247
Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.50512 Mflops=1285.86

My Computer has Raspberry Pi ratio=11.1558
Making pie charts...done.
I'll change the shell script to echo the commands, but this was taskset -c 0 and then taskset -c 0,1
and the figures seem more plausible?

I may have set the governors wrong - looks like I have to set the policy.
That output looks much more reasonable. For the record, could you describe in detail what changes you made to the policy and CPU settings to get the good timings? I suspect that information could be useful for people using 64-bit Linux on a Pi and many other machines.

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sat May 18, 2019 6:23 pm

As root do:

cd /sys/devices/system/cpu/cpufreq/policy0
echo performance >scaling_governor

then (just to make sure)

cd ../policy2
echo performance >scaling_governor

You can check each cpu with

cd /sys/devices/system/cpu/cpuX/cpufreq
cat scaling_governor

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sat May 18, 2019 6:52 pm

Yet another full set of runs from 1 to 6 cores on the N2

Code: Select all

$ ./piratio.sh
taskset -c 0 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=3.79414 Mops=246.256
Merge Sort           N=16777216 Workers=1 Sec=3.54291 Mops=113.651
Fourier Transform    N=4194304 Workers=1 Sec=4.80112 Mflops=96.0969
Lorenz 96            N=32768 K=16384 Workers=1 Sec=4.57762 Mflops=703.69

My Computer has Raspberry Pi ratio=5.85672
Making pie charts...done.
taskset -c 0,1 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.88222 Mops=496.397
Merge Sort           N=16777216 Workers=4 Sec=1.77579 Mops=226.745
Fourier Transform    N=4194304 Workers=2 Sec=2.63889 Mflops=174.836
Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.49634 Mflops=1290.38

My Computer has Raspberry Pi ratio=11.2091
Making pie charts...done.
taskset -c 0,1,2 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=1.17752 Mops=793.471
Merge Sort           N=16777216 Workers=6 Sec=1.67522 Mops=240.359
Fourier Transform    N=4194304 Workers=6 Sec=1.90784 Mflops=241.83
Lorenz 96            N=32768 K=16384 Workers=1 Sec=1.35454 Mflops=2378.1

My Computer has Raspberry Pi ratio=16.1594
Making pie charts...done.
taskset -c 0,1,2,3 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=8 Sec=0.76766 Mops=1217.11
Merge Sort           N=16777216 Workers=8 Sec=0.958957 Mops=419.887
Fourier Transform    N=4194304 Workers=8 Sec=1.39197 Mflops=331.454
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.750927 Mflops=4289.67

My Computer has Raspberry Pi ratio=25.9251
Making pie charts...done.
taskset -c 0,1,2,3,4 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=10 Sec=0.593635 Mops=1573.91
Merge Sort           N=16777216 Workers=10 Sec=0.805171 Mops=500.084
Fourier Transform    N=4194304 Workers=10 Sec=1.1461 Mflops=402.561
Lorenz 96            N=32768 K=16384 Workers=10 Sec=0.611228 Mflops=5270.09

My Computer has Raspberry Pi ratio=31.9198
Making pie charts...done.
taskset -c 0,1,2,3,4,5 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=12 Sec=0.480491 Mops=1944.53
Merge Sort           N=16777216 Workers=12 Sec=0.589966 Mops=682.502
Fourier Transform    N=4194304 Workers=12 Sec=1.03284 Mflops=446.702
Lorenz 96            N=32768 K=16384 Workers=12 Sec=0.453926 Mflops=7096.37

My Computer has Raspberry Pi ratio=40.2148
Making pie charts...done.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat May 18, 2019 9:11 pm

jahboater wrote:
Sat May 18, 2019 6:52 pm
Yet another full set of runs from 1 to 6 cores on the N2

Code: Select all

$ ./piratio.sh
taskset -c 0 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=3.79414 Mops=246.256
Merge Sort           N=16777216 Workers=1 Sec=3.54291 Mops=113.651
Fourier Transform    N=4194304 Workers=1 Sec=4.80112 Mflops=96.0969
Lorenz 96            N=32768 K=16384 Workers=1 Sec=4.57762 Mflops=703.69

My Computer has Raspberry Pi ratio=5.85672
Making pie charts...done.
taskset -c 0,1 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.88222 Mops=496.397
Merge Sort           N=16777216 Workers=4 Sec=1.77579 Mops=226.745
Fourier Transform    N=4194304 Workers=2 Sec=2.63889 Mflops=174.836
Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.49634 Mflops=1290.38

My Computer has Raspberry Pi ratio=11.2091
Making pie charts...done.
taskset -c 0,1,2 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=1.17752 Mops=793.471
Merge Sort           N=16777216 Workers=6 Sec=1.67522 Mops=240.359
Fourier Transform    N=4194304 Workers=6 Sec=1.90784 Mflops=241.83
Lorenz 96            N=32768 K=16384 Workers=1 Sec=1.35454 Mflops=2378.1

My Computer has Raspberry Pi ratio=16.1594
Making pie charts...done.
taskset -c 0,1,2,3 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=8 Sec=0.76766 Mops=1217.11
Merge Sort           N=16777216 Workers=8 Sec=0.958957 Mops=419.887
Fourier Transform    N=4194304 Workers=8 Sec=1.39197 Mflops=331.454
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.750927 Mflops=4289.67

My Computer has Raspberry Pi ratio=25.9251
Making pie charts...done.
taskset -c 0,1,2,3,4 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=10 Sec=0.593635 Mops=1573.91
Merge Sort           N=16777216 Workers=10 Sec=0.805171 Mops=500.084
Fourier Transform    N=4194304 Workers=10 Sec=1.1461 Mflops=402.561
Lorenz 96            N=32768 K=16384 Workers=10 Sec=0.611228 Mflops=5270.09

My Computer has Raspberry Pi ratio=31.9198
Making pie charts...done.
taskset -c 0,1,2,3,4,5 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=12 Sec=0.480491 Mops=1944.53
Merge Sort           N=16777216 Workers=12 Sec=0.589966 Mops=682.502
Fourier Transform    N=4194304 Workers=12 Sec=1.03284 Mflops=446.702
Lorenz 96            N=32768 K=16384 Workers=12 Sec=0.453926 Mflops=7096.37

My Computer has Raspberry Pi ratio=40.2148
Making pie charts...done.
It looks like things are working well. What are the results like when running on only the four A73 cores using

Code: Select all

$ taskset -c 2,3,4,5 ./pichart-openmp

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sat May 18, 2019 9:51 pm

Here is the run for the 4 x A73 cores ...

Code: Select all

$ taskset -c 2,3,4,5 ./pichart-openmp "A73"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=0.638702 Mops=1462.85
Merge Sort           N=16777216 Workers=8 Sec=0.817786 Mops=492.37
Fourier Transform    N=4194304 Workers=8 Sec=1.08413 Mflops=425.57
Lorenz 96            N=32768 K=16384 Workers=4 Sec=0.39798 Mflops=8093.94

My Computer has Raspberry Pi ratio=35.241
Making pie charts...done.

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sat Jun 15, 2019 3:44 am

Here are some pie charts comparing the Raspberry Pi lineup to a six-core Intel Xeon E5-1650 processor running with hyper-threads turned on.

Image

from the output

Code: Select all

$ ./pichart-openmp -t E5-1650
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=24 Sec=0.144355 Mops=6472.43
Merge Sort           N=16777216 Workers=24 Sec=0.207972 Mops=1936.09
Fourier Transform    N=4194304 Workers=12 Sec=0.122169 Mflops=3776.51
Lorenz 96            N=32768 K=16384 Workers=24 Sec=0.0499105 Mflops=64540

The E5-1650 has Raspberry Pi ratio=208.747
Making pie charts...done.
Image

from the output

Code: Select all

$ ./pichart-serial -t E5-1650
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=1 Sec=0.838971 Mops=1113.66
Merge Sort           N=16777216 Workers=2 Sec=1.59174 Mops=252.964
Fourier Transform    N=4194304 Workers=1 Sec=0.639001 Mflops=722.023
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.281831 Mflops=11429.6

The E5-1650 has Raspberry Pi ratio=34.673
Making pie charts...done.
Compared to the Pi, the floating point performance of the Xeon E5-1650 on Lorenz 96 is relatively much faster than any of the other tests. While this result is fully consistent with the timings presented in this post, it is interesting to know how much of the the speed difference results from better sequencing of short-vector instructions by the compiler and how much comes from differences in the intrinsic capabilities of the hardware.

As always, a link to the latest version of the pichart program is available from the first post in this thread, in case you want to run it yourself.

mikerr
Posts: 2770
Joined: Thu Jan 12, 2012 12:46 pm
Location: UK
Contact: Website

Re: A Pi Pie Chart

Tue Jun 25, 2019 2:55 pm

Pi4B results

(Just with raspbian's preinstalled gcc)

Code: Select all

gcc --version
gcc (Raspbian 8.3.0-6+rpi1) 8.3.0
make
gcc -std=gnu99 -O3 -mtune=native -march=native -Wall -o pichart-serial pichart.c util.c sieve.c merge.c fourier.c lorenz.c -lm
gcc -std=gnu99 -O3 -mtune=native -march=native -Wall -fopenmp -o pichart-openmp pichart.c util.c sieve.c merge.c fourier.c lorenz.c -lm

Code: Select all

./pichart-openmp -t "Pi 4B"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=0.548627 Mops=1703.03
Merge Sort           N=16777216 Workers=8 Sec=1.19706 Mops=336.368
Fourier Transform    N=4194304 Workers=8 Sec=1.74989 Mflops=263.659
Lorenz 96            N=32768 K=16384 Workers=4 Sec=0.582189 Mflops=5532.95

The Pi 4B has Raspberry Pi ratio=26.8474
Making pie charts...done.

Code: Select all

./pichart-serial -t "Pi 4B"
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=2 Sec=2.34665 Mops=398.154
Merge Sort           N=16777216 Workers=1 Sec=4.58707 Mops=87.78
Fourier Transform    N=4194304 Workers=2 Sec=2.90977 Mflops=158.56
Lorenz 96            N=32768 K=16384 Workers=1 Sec=2.12154 Mflops=1518.34

The Pi 4B has Raspberry Pi ratio=8.50442
Making pie charts...done.
Image
Image
Android app - Raspi Card Imager - download and image SD cards - No PC required !

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Tue Jun 25, 2019 3:40 pm

mikerr wrote:
Tue Jun 25, 2019 2:55 pm
Pi4B results
Thanks for posting. Your results are similar, though perhaps slightly faster, compared to the graphs that James uploaded here which I converted to portable network graphics:
.
pichart-mp-pi4.png
pichart-mp-pi4.png (65.8 KiB) Viewed 768 times
pichart-serial-pi4.png
pichart-serial-pi4.png (66.17 KiB) Viewed 768 times
Here "My Computer" refers to the new Raspberry Pi 4B.

Compared to the original Pi B the Pi 4B is 26.8474 times faster. That's about double the performance of the 3B+ overall, however, I find it surprising that the merge sort timings are actually slower than the 3B+. I wonder if this result is related to the compiler version or an optimization setting. It would be nice to find a set of compiler flags for which the merge-sort timings were faster.
Last edited by ejolson on Thu Jun 27, 2019 10:05 pm, edited 1 time in total.

mikerr
Posts: 2770
Joined: Thu Jan 12, 2012 12:46 pm
Location: UK
Contact: Website

Re: A Pi Pie Chart

Tue Jun 25, 2019 4:06 pm

Asus Tinkerboard

Code: Select all

gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

Code: Select all

./pichart-openmp -t "Asus Tinkerboard"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=0.907341 Mops=1029.74
Merge Sort           N=16777216 Workers=8 Sec=0.80095 Mops=502.72
Fourier Transform    N=4194304 Workers=4 Sec=1.01172 Mflops=456.03
Lorenz 96            N=32768 K=16384 Workers=4 Sec=0.812622 Mflops=3963.99

The Asus Tinkerboard has Raspberry Pi ratio=27.6177
Making pie charts...done.

Code: Select all

[email protected]:~/pichart-30$ ./pichart-serial -t "Asus Tinkerboard"
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=1 Sec=3.70431 Mops=252.227
Merge Sort           N=16777216 Workers=1 Sec=2.94353 Mops=136.792
Fourier Transform    N=4194304 Workers=1 Sec=2.9843 Mflops=154.6
Lorenz 96            N=32768 K=16384 Workers=2 Sec=2.91163 Mflops=1106.33

The Asus Tinkerboard has Raspberry Pi ratio=7.78269
Making pie charts...done.
Image
Image
Last edited by mikerr on Tue Jun 25, 2019 4:21 pm, edited 1 time in total.
Android app - Raspi Card Imager - download and image SD cards - No PC required !

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23083
Joined: Sat Jul 30, 2011 7:41 pm

Re: A Pi Pie Chart

Tue Jun 25, 2019 4:18 pm

ejolson wrote:
Tue Jun 25, 2019 3:40 pm
mikerr wrote:
Tue Jun 25, 2019 2:55 pm
Pi4B results
Thanks for posting. Your results are similar, though perhaps slightly faster, compared to the graphs that James uploaded which I converted to portable network graphics:
Here "My Computer" refers to the new Raspberry Pi 4B.

Compared to the original Pi B the Pi 4B is 26.8474 times faster. That's about double the performance of the 3B+ overall, however,I find it surprising that the merge sort timings are actually slower than the 3B+. I wonder if this result is related to the compiler version or an optimization setting. It would be nice to find a set of compiler flags for which the merge-sort timings were faster.
From Eben when I showed him the results for the merge, "Could be expensive line moves between L1s, but I suspect it's actually measuring the cost of forking processes in LPAE."

Which is why some of the other Pie charts were comparing LPAE kernels on the Pi3B+.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

ejolson
Posts: 3260
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Tue Jun 25, 2019 4:36 pm

jamesh wrote:
Tue Jun 25, 2019 4:18 pm
ejolson wrote:
Tue Jun 25, 2019 3:40 pm
mikerr wrote:
Tue Jun 25, 2019 2:55 pm
Pi4B results
Thanks for posting. Your results are similar, though perhaps slightly faster, compared to the graphs that James uploaded which I converted to portable network graphics:
Here "My Computer" refers to the new Raspberry Pi 4B.

Compared to the original Pi B the Pi 4B is 26.8474 times faster. That's about double the performance of the 3B+ overall, however,I find it surprising that the merge sort timings are actually slower than the 3B+. I wonder if this result is related to the compiler version or an optimization setting. It would be nice to find a set of compiler flags for which the merge-sort timings were faster.
From Eben when I showed him the results for the merge, "Could be expensive line moves between L1s, but I suspect it's actually measuring the cost of forking processes in LPAE."

Which is why some of the other Pie charts were comparing LPAE kernels on the Pi3B+.
My understanding is that the task parallel constructs in modern OpenMP implementations fork a pool of threads at the beginning of the run (which isn't measured by the timing routines) and then use either work stealing or some sort of grand central dispatch to assign parcels of work to the threads in the pool. Maybe the cost of the Linux thread synchronization primitives goes up when LPAE is enabled; however, it is strange that the serial version also runs slower.

I wonder if this is a gcc version 8.x compiler regression. Have you tried any compiler flags to remedy the situation?
Last edited by ejolson on Tue Jun 25, 2019 4:55 pm, edited 1 time in total.

Return to “General discussion”