User avatar
scruss
Posts: 2261
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Mon May 13, 2019 4:59 am

I picked up a used Firefly-RK3288 from a friend. It has a 4-core Rockchip 3288 ARM Cortex-A17 clocked at (IIRC) 1.3 GHz. It's got 2 GB RAM, and with gcc 7 under Ubuntu 18.04, it's no slouch:
pichart-rk3288-openmp.png
Firefly-RK3288
pichart-rk3288-openmp.png (43.11 KiB) Viewed 1193 times
Results:

Code: Select all

pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=0.804635 Mops=1161.18
Merge Sort           N=16777216 Workers=8 Sec=0.723482 Mops=556.549
Fourier Transform    N=4194304 Workers=8 Sec=1.4824 Mflops=311.234
Lorenz 96            N=32768 K=16384 Workers=4 Sec=0.894368 Mflops=3601.68

The Firefly-RK3288 OpenMP has Raspberry Pi ratio=25.9055
Making pie charts...done.


real	2m52.516s
user	5m32.200s
sys	0m1.040s
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

jahboater
Posts: 4437
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Mon May 13, 2019 7:36 am

Here are the numbers for the new Odroid-N2

Code: Select all

pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=0.478389 Mops=1953.07
Merge Sort           N=16777216 Workers=12 Sec=0.57364 Mops=701.927
Fourier Transform    N=4194304 Workers=12 Sec=0.847415 Mflops=544.448
Lorenz 96            N=32768 K=16384 Workers=12 Sec=0.465711 Mflops=6916.79

The Odroid-N2 has Raspberry Pi ratio=42.3263
Making pie charts...done.
This is a hex core Amlogic S922X SoC, four Cortex-A73 cores and two Cortex-A53 cores with 4GB of RAM. Clocked at 1.8GHz and 1.896GHz respectively.
GCC 9.1, Ubuntu 18.04

Not sure whats the best compiler options for big.little? The instruction sets are obviously compatible but the tuning is very different (mostly the scheduling as the A73 is "out-of-order" and the A53 is not).
I used "-mcpu=cortex-a73 -mtune=cortex-a73" because there are more of those cores.

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Mon May 13, 2019 8:16 am

Go for the fast core, it should switch everything over to run on that if speed is needed
As with 8 core beasts too they are really just 4 slow and efficient or then use the 4 fast and a bit more power hungry iirc.

jahboater
Posts: 4437
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Mon May 13, 2019 8:55 am

Thanks, makes sense.

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Mon May 13, 2019 9:58 am

jahboater wrote:
Mon May 13, 2019 8:55 am
Thanks, makes sense.
But don't quote me, how it does it and if it can use all 6 (or 8) I do not know.
That's purely what I have read and understood how it behaves.

jahboater
Posts: 4437
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Mon May 13, 2019 10:11 am

bensimmo wrote:
Mon May 13, 2019 9:58 am
jahboater wrote:
Mon May 13, 2019 8:55 am
Thanks, makes sense.
But don't quote me, how it does it and if it can use all 6 (or 8) I do not know.
That's purely what I have read and understood how it behaves.
For big jobs it uses all the cores, even the two little ones.
I built GCC 9.1 using "make -j 6" and all six cores were active most of the time.
Also in the Piechart run above, its using 6 or 12 worker threads.

Oddly, the big cores are clocked at 1800MHz and the little cores are clocked at 1896Mhz.

User avatar
scruss
Posts: 2261
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: A Pi Pie Chart

Mon May 13, 2019 3:18 pm

jahboater wrote:
Mon May 13, 2019 7:36 am
Here are the numbers for the new Odroid-N2 …
Neat results from a not-too-expensive board.

While people might complain about the Raspberry Pi's µSD card, updating this Firefly board was a pain:
  1. take it out of its (very nice) metal case so I could access the reset and recovery keys more easily
  2. attach an OTG USB cable to flash the eMMC
  3. attach a TTL serial cable so I could see the boot messages
  4. do a Vulcan death grip involving the reset and recovery keys and the power connector to get it into recovery mode
  5. download seemingly random binary x86 Linux only tools to access the flash
  6. download a large image file from the vendor's Google Drive
  7. spend about 10 minutes watching the OS image trickle over the OTG cable via the flash tool
  8. watch the system reboot in the serial window, but find it doesn't take keyboard entry to shut it down
  9. pull power plug, put it back in its case, get it to where there's a keyboard and monitor and plug it in
I really don't miss using uboot and serial connections like on the old SheevaPlug. And Firefly are supposed to have some of the better support communities, too …
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Mon May 13, 2019 3:37 pm

bensimmo wrote:
Mon May 13, 2019 9:58 am
jahboater wrote:
Mon May 13, 2019 8:55 am
Thanks, makes sense.
But don't quote me, how it does it and if it can use all 6 (or 8) I do not know.
That's purely what I have read and understood how it behaves.
I didn't know the N2 was shipping already. You can use numactl or taskset to exclude certain cores from participating in the Pi Pie Chart benchmark.

While the code itself does some automated tuning related to how many OpenMP workers are used for the calculation, for machines with four simultaneous hardware multithreads per core or multiple sockets I've been able to achieve higher performance by restricting on what part of the hardware the program runs. In one case the parallel FFT part of the test ran faster by telling the scheduler to allocate all memory and threads on only one socket.

As you can see I've updated the code to output an additional number indicating relative speed compared to the original 700MHz ARMv6-based Raspberry Pi. For easier comparison the single-threaded version now graphs the single-threaded results for the reference machines. Less visibly the code has been updated to make it compatible with a wider variety of compilers.

It would be interesting what the performance is when using numactl or taskset to schedule only the big cores on systems with such an architecture.

jahboater
Posts: 4437
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Mon May 13, 2019 6:00 pm

Here is a more complete set of results.
According to the comments in boot.ini CPU's 0 and 1 are the little A53's, and CPU's 2,3,4,5 are the big A73's.
So using taskset .....

Code: Select all

[email protected]:~/pichart-30$ taskset -c 2,3,4,5 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=0.640666 Mops=1458.37
Merge Sort           N=16777216 Workers=8 Sec=1.242 Mops=324.198
Fourier Transform    N=4194304 Workers=4 Sec=1.86739 Mflops=247.069
Lorenz 96            N=32768 K=16384 Workers=4 Sec=1.8518 Mflops=1739.51

My Computer has Raspberry Pi ratio=18.8527
Making pie charts...done.

[email protected]:~/pichart-30$ taskset -c 0,1 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=1.91583 Mops=487.689
Merge Sort           N=16777216 Workers=2 Sec=3.89942 Mops=103.26
Fourier Transform    N=4194304 Workers=2 Sec=8.61634 Mflops=53.5463
Lorenz 96            N=32768 K=16384 Workers=4 Sec=12.9881 Mflops=248.013

My Computer has Raspberry Pi ratio=4.51557
Making pie charts...done.

[email protected]:~/pichart-30$ taskset -c 0-5 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=0.478178 Mops=1953.93
Merge Sort           N=16777216 Workers=12 Sec=0.574004 Mops=701.482
Fourier Transform    N=4194304 Workers=12 Sec=0.863874 Mflops=534.075
Lorenz 96            N=32768 K=16384 Workers=12 Sec=0.785385 Mflops=4101.46

My Computer has Raspberry Pi ratio=36.9623
Making pie charts...done.

[email protected]:~/pichart-30$ ./pichart-serial "N2"
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=2 Sec=2.55921 Mops=365.084
Merge Sort           N=16777216 Workers=2 Sec=3.26208 Mops=123.434
Fourier Transform    N=4194304 Workers=2 Sec=6.02412 Mflops=76.5876
Lorenz 96            N=32768 K=16384 Workers=2 Sec=3.08463 Mflops=1044.28

My Computer has Raspberry Pi ratio=6.88009
Making pie charts...done.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Tue May 14, 2019 6:33 am

jahboater wrote:
Mon May 13, 2019 6:00 pm
Here is a more complete set of results.
According to the comments in boot.ini CPU's 0 and 1 are the little A53's, and CPU's 2,3,4,5 are the big A73's.
So using taskset .....

Code: Select all

[email protected]:~/pichart-30$ taskset -c 2,3,4,5 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=0.640666 Mops=1458.37
Merge Sort           N=16777216 Workers=8 Sec=1.242 Mops=324.198
Fourier Transform    N=4194304 Workers=4 Sec=1.86739 Mflops=247.069
Lorenz 96            N=32768 K=16384 Workers=4 Sec=1.8518 Mflops=1739.51

My Computer has Raspberry Pi ratio=18.8527
Making pie charts...done.

[email protected]:~/pichart-30$ taskset -c 0,1 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=1.91583 Mops=487.689
Merge Sort           N=16777216 Workers=2 Sec=3.89942 Mops=103.26
Fourier Transform    N=4194304 Workers=2 Sec=8.61634 Mflops=53.5463
Lorenz 96            N=32768 K=16384 Workers=4 Sec=12.9881 Mflops=248.013

My Computer has Raspberry Pi ratio=4.51557
Making pie charts...done.

[email protected]:~/pichart-30$ taskset -c 0-5 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=0.478178 Mops=1953.93
Merge Sort           N=16777216 Workers=12 Sec=0.574004 Mops=701.482
Fourier Transform    N=4194304 Workers=12 Sec=0.863874 Mflops=534.075
Lorenz 96            N=32768 K=16384 Workers=12 Sec=0.785385 Mflops=4101.46

My Computer has Raspberry Pi ratio=36.9623
Making pie charts...done.

[email protected]:~/pichart-30$ ./pichart-serial "N2"
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=2 Sec=2.55921 Mops=365.084
Merge Sort           N=16777216 Workers=2 Sec=3.26208 Mops=123.434
Fourier Transform    N=4194304 Workers=2 Sec=6.02412 Mflops=76.5876
Lorenz 96            N=32768 K=16384 Workers=2 Sec=3.08463 Mflops=1044.28

My Computer has Raspberry Pi ratio=6.88009
Making pie charts...done.
Those results are quite strange. It seems unreasonable that adding two little cores would double the performance.

Do you imagine the big cores are throttling when they're all running? Alternatively, it could happen that the identification of which cores are big and which little is mixed up. I'll also recheck my code to make sure the parallel part scales reasonably between 4 and 6 cores. Would it be possible to test each core separately with taskset and the serial code to verify which are which?

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Tue May 14, 2019 8:24 am

ejolson wrote:
Tue May 14, 2019 6:33 am
I'll also recheck my code to make sure the parallel part scales reasonably between 4 and 6 cores.
Here is the output for a set of runs on an 8-core ARM Cortex A53 system running in 64-bit mode:

Code: Select all

$ taskset -c 0 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=1 Sec=5.22214 Mops=178.917
Merge Sort           N=16777216 Workers=1 Sec=4.54134 Mops=88.664
Fourier Transform    N=4194304 Workers=1 Sec=5.44227 Mflops=84.776
Lorenz 96            N=32768 K=16384 Workers=1 Sec=6.06767 Mflops=530.883

My Computer has Raspberry Pi ratio=4.58997
Making pie charts...done.
$ taskset -c 0,1 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=2.61991 Mops=356.626
Merge Sort           N=16777216 Workers=4 Sec=2.29244 Mops=175.644
Fourier Transform    N=4194304 Workers=2 Sec=3.1076 Mflops=148.466
Lorenz 96            N=32768 K=16384 Workers=2 Sec=3.13086 Mflops=1028.86

My Computer has Raspberry Pi ratio=8.78213
Making pie charts...done.
$ taskset -c 0,1,2 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=3 Sec=1.75374 Mops=532.762
Merge Sort           N=16777216 Workers=6 Sec=1.55268 Mops=259.328
Fourier Transform    N=4194304 Workers=6 Sec=2.3963 Mflops=192.536
Lorenz 96            N=32768 K=16384 Workers=3 Sec=2.13824 Mflops=1506.49

My Computer has Raspberry Pi ratio=12.5634
Making pie charts...done.
$ taskset -c 0,1,2,3 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=4 Sec=1.31822 Mops=708.779
Merge Sort           N=16777216 Workers=8 Sec=1.18948 Mops=338.511
Fourier Transform    N=4194304 Workers=4 Sec=1.93407 Mflops=238.55
Lorenz 96            N=32768 K=16384 Workers=4 Sec=1.60808 Mflops=2003.16

My Computer has Raspberry Pi ratio=16.3394
Making pie charts...done.
$ taskset -c 0,1,2,3,4 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=5 Sec=1.05731 Mops=883.685
Merge Sort           N=16777216 Workers=10 Sec=0.967107 Mops=416.348
Fourier Transform    N=4194304 Workers=10 Sec=1.70527 Mflops=270.557
Lorenz 96            N=32768 K=16384 Workers=5 Sec=1.31937 Mflops=2441.48

My Computer has Raspberry Pi ratio=19.7156
Making pie charts...done.
$ taskset -c 0,1,2,3,4,5 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=0.882527 Mops=1058.7
Merge Sort           N=16777216 Workers=12 Sec=0.797551 Mops=504.862
Fourier Transform    N=4194304 Workers=12 Sec=1.68901 Mflops=273.162
Lorenz 96            N=32768 K=16384 Workers=6 Sec=1.08304 Mflops=2974.23

My Computer has Raspberry Pi ratio=22.7944
Making pie charts...done.
$ taskset -c 0,1,2,3,4,5,6 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=7 Sec=0.757242 Mops=1233.86
Merge Sort           N=16777216 Workers=14 Sec=0.719012 Mops=560.009
Fourier Transform    N=4194304 Workers=7 Sec=1.6323 Mflops=282.652
Lorenz 96            N=32768 K=16384 Workers=7 Sec=0.921426 Mflops=3495.91

My Computer has Raspberry Pi ratio=25.5247
Making pie charts...done.
$ taskset -c 0,1,2,3,4,5,6,7 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=8 Sec=0.663588 Mops=1407.99
Merge Sort           N=16777216 Workers=16 Sec=0.645496 Mops=623.789
Fourier Transform    N=4194304 Workers=8 Sec=1.5337 Mflops=300.824
Lorenz 96            N=32768 K=16384 Workers=8 Sec=0.807844 Mflops=3987.44

My Computer has Raspberry Pi ratio=28.4481
Making pie charts...done.
Presented graphically this looks like

Image

The scaling appears fairly uniform without anything surprising, which is expected because all cores are identical. This suggests the code itself is working fine and that there is something strange going on with the N2 hardware. I suspect you have misidentified which cores are the little ones; however, some sort of throttling could also be involved.
Last edited by ejolson on Wed May 15, 2019 5:58 am, edited 2 times in total.

jahboater
Posts: 4437
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Tue May 14, 2019 8:04 pm

ejolson wrote:
Tue May 14, 2019 8:24 am
I suspect you have misidentified which cores are the little ones
This is quite likely. All I went on was this comment in boot.ini (the equivalent of config.txt).

Code: Select all

# max cpu-cores
# Note:
# CPU's 0 and 1 are the A53 (small cores)
# CPU's 2 to 5 are the A73 (big cores)
# Lowering this value disables only the bigger cores (the last cores).
# setenv maxcpus "4"
# setenv maxcpus "5"
setenv maxcpus "6"
They run at different speeds: A73's are clocked at 1800MHz, the A53's are clocked at 1896MHz.
ejolson wrote:
Tue May 14, 2019 8:24 am
however, some sort of throttling could also be involved.
The N2 is claimed not to throttle.
It is12nm and has a really massive heat sink factory fitted.

Here are the six runs with increasing CPU counts (count: Pi Ratio)

Code: Select all

1:  3.06463
2:  4.50112
3:  17.1603
4:  25.8384
5:  31.6286
6:  41.9504

Code: Select all

[email protected]:~/pichart-30$ taskset -c 0 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=3.83186 Mops=243.832
Merge Sort           N=16777216 Workers=2 Sec=5.2353 Mops=76.9112
Fourier Transform    N=4194304 Workers=2 Sec=13.3536 Mflops=34.5506
Lorenz 96            N=32768 K=16384 Workers=2 Sec=14.7099 Mflops=218.983

My Computer has Raspberry Pi ratio=3.06463
Making pie charts...done.

[email protected]:~/pichart-30$ taskset -c 0,1 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.92459 Mops=485.469
Merge Sort           N=16777216 Workers=2 Sec=4.19914 Mops=95.8895
Fourier Transform    N=4194304 Workers=2 Sec=9.09953 Mflops=50.703
Lorenz 96            N=32768 K=16384 Workers=1 Sec=11.5153 Mflops=279.735

My Computer has Raspberry Pi ratio=4.50112
Making pie charts...done.

[email protected]:~/pichart-30$ taskset -c 0,1,2 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=1.2032 Mops=776.534
Merge Sort           N=16777216 Workers=6 Sec=1.29098 Mops=311.897
Fourier Transform    N=4194304 Workers=6 Sec=1.51315 Mflops=304.91
Lorenz 96            N=32768 K=16384 Workers=6 Sec=1.70545 Mflops=1888.78

My Computer has Raspberry Pi ratio=17.1603
Making pie charts...done.
[email protected]:~/pichart-30$ taskset -c 0,1,2,3 ./pichart-openmp "N2"
#pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=8 Sec=0.796118 Mops=1173.61
Merge Sort           N=16777216 Workers=4 Sec=0.966202 Mops=416.738
Fourier Transform    N=4194304 Workers=8 Sec=1.11916 Mflops=412.248
Lorenz 96            N=32768 K=16384 Workers=8 Sec=0.905878 Mflops=3555.91

My Computer has Raspberry Pi ratio=25.8384
Making pie charts...done.
[email protected]:~/pichart-30$ taskset -c 0,1,2,3,4 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=10 Sec=0.594597 Mops=1571.36
Merge Sort           N=16777216 Workers=10 Sec=0.776335 Mops=518.659
Fourier Transform    N=4194304 Workers=10 Sec=0.997077 Mflops=462.726
Lorenz 96            N=32768 K=16384 Workers=10 Sec=0.754667 Mflops=4268.41

My Computer has Raspberry Pi ratio=31.6286
Making pie charts...done.
[email protected]:~/pichart-30$ taskset -c 0,1,2,3,4,5 ./pichart-openmp "N2"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=6 Sec=0.478357 Mops=1953.2
Merge Sort           N=16777216 Workers=12 Sec=0.579982 Mops=694.251
Fourier Transform    N=4194304 Workers=12 Sec=0.852847 Mflops=540.98
Lorenz 96            N=32768 K=16384 Workers=12 Sec=0.474342 Mflops=6790.93

My Computer has Raspberry Pi ratio=41.9504
Making pie charts...done.
The number of workers doesn't seem to correlate with the number of CPU cores?

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Wed May 15, 2019 12:00 am

jahboater wrote:
Tue May 14, 2019 8:04 pm
Here are the six runs with increasing CPU counts (count: Pi Ratio)

Code: Select all

1:  3.06463
2:  4.50112
3:  17.1603
4:  25.8384
5:  31.6286
6:  41.9504
That output seems to confirm that 0 and 1 are the little cores. It also seems to indicate some sort of throttling. For example, the performance when switching from one to two cores should about double, whereas you have only a 1.47 factor increase. I'd also expect a single little core to have a pi ratio closer to 5 not 3. That's a sign of something not being right.

It is amazing how much the heatsink on the N2 resembles a sandwich toaster. I wonder if the one mentioned here is similar. Could there be a gap between the SOC mounted on the bottom of the circuit board and the heatsink? I think the system includes instrumentation that allows you to monitor the heat and current speed of the CPU cores, because graphs of such things appear on the N2 vendor's webpage. Maybe a shim or some thermal paste would improve things.

Alternatively, maybe the scheduler is acting weird because processor affinity is set to the little cores which then become compute bound. Have you checked if changing the performance setting of the Linux scheduler makes a difference?

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Wed May 15, 2019 12:34 am

Here are results for the NVIDIA Tegra TX2 big.LITTLE architecture. This architecture consists of the following cores:
  • Four 2 GHz Cortex-A57 cores numbered 0,3,4,5
  • Two 2 GHz Denver 2 cores numbered 1,2
Single-threaded results for the two different cores are

Code: Select all

$ taskset -c 0 ./pichart-serial 
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=1 Sec=1.78459 Mops=523.553
Merge Sort           N=16777216 Workers=2 Sec=3.51306 Mops=114.616
Fourier Transform    N=4194304 Workers=2 Sec=2.58702 Mflops=178.341
Lorenz 96            N=32768 K=16384 Workers=1 Sec=1.7412 Mflops=1850.01

My Computer has Raspberry Pi ratio=10.533
Making pie charts...done.
$ taskset -c 1 ./pichart-serial 
pichart -- Raspberry Pi Performance Serial version 30

Prime Sieve          P=14630843 Workers=1 Sec=1.46102 Mops=639.504
Merge Sort           N=16777216 Workers=2 Sec=2.6922 Mops=149.563
Fourier Transform    N=4194304 Workers=1 Sec=1.49668 Mflops=308.265
Lorenz 96            N=32768 K=16384 Workers=1 Sec=0.892562 Mflops=3608.97

My Computer has Raspberry Pi ratio=16.0375
Making pie charts...done.
which indicates the Denver cores have roughly 1.5 times the integer performance and 2 times the floating-point performance of the A57 cores. Note that I didn't tune the compiler optimization flags very carefully, so it is likely that faster timings are possible for the Tegra-TX2 hardware. The full set of tests may be summarized as

Code: Select all

A57  Den  Pi-Ratio    Prime    Merge  Fourier   Lorenz
 1    0    10.5330  523.553  114.616  178.341  1850.01
 0    1    16.0375  639.504  149.563  308.265  3608.97
 2    0    20.2125  1049.53  228.756  339.554  3293.35
 1    1    23.6960  1136.82  256.381  481.260  3615.54
 3    0    28.7771  1574.30  338.459  422.827  4896.22
 0    2    31.2410  1272.69  288.663  584.764  7132.36
 2    1    32.1336  1650.66  347.954  622.039  4800.34
 1    2    35.9372  1780.66  392.443  696.115  5515.29
 4    0    38.5189  2098.16  450.227  579.577  6467.59
 3    1    40.9209  2164.41  447.678  738.529  6302.87
 2    2    47.1051  2293.69  504.744  951.607  7188.47
 4    1    49.4959  2705.64  565.161  813.556  7760.25
 3    2    53.0869  2799.29  604.649  974.044  7749.05
 4    2    62.1914  3322.65  709.686  1114.95  9152.58
It is likely some sort of statistical analysis of the above table of numbers could determine whether the system is performing as expected or not. Instead of doing that, I reran a few of the tests to check that the results are at least repeatable. For example, if there was throttling one might expect the performance to decrease over repeated runs. Repeated Pi-Ratio results for selected combinations given by

Code: Select all

A57  Den     Run-1    Run-2    Run-3    Run-4    Run-5
 0    2    31.2410  31.4802  31.7105  31.6916  31.6474
 4    0    38.5189  38.2766  38.3444  38.6743  38.5121
 4    2    62.1914  61.7359  62.2623  62.1156  61.9489
show the tests are fairly repeatable.
Last edited by ejolson on Wed May 15, 2019 1:46 am, edited 2 times in total.

User avatar
Gavinmc42
Posts: 3158
Joined: Wed Aug 28, 2013 3:31 am

Re: A Pi Pie Chart

Wed May 15, 2019 12:55 am

What are the figures for a Intel Celeron Core Duo?
Was that the first multicore Intel?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Andyroo
Posts: 3316
Joined: Sat Jun 16, 2018 12:49 am
Location: Lincs U.K.

Re: A Pi Pie Chart

Wed May 15, 2019 1:12 am

Gavinmc42 wrote:
Wed May 15, 2019 12:55 am
What are the figures for a Intel Celeron Core Duo?
Was that the first multicore Intel?
Thought the Celeron came out four or five years after the Pentium D that had two cores. Not sure about the RISC or micro controller ranges Intel had created before this though as some of them had multiple chip solutions, do they count, and some where built for supercomputers and only worked in massively parallel sets :lol:
Need Pi spray - these things are breeding in my house...

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Wed May 15, 2019 1:51 am

Gavinmc42 wrote:
Wed May 15, 2019 12:55 am
What are the figures for a Intel Celeron Core Duo?
Was that the first multicore Intel?
It seems I sent my core-duo system to surplus last month. Instead of replacing a third HD in that iMac, I wanted to use the desk space for something useful like pencil and paper.

The closest timing currently available here is for the previous generation 2.8 GHz Pentium 4 D. Those were multi-chip packages consisting of two dies next to each other similar to Epyc and the Threadripper today.

https://www.raspberrypi.org/forums/view ... 5#p1401734

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Wed May 15, 2019 8:25 am

Andyroo wrote:
Wed May 15, 2019 1:12 am
Gavinmc42 wrote:
Wed May 15, 2019 12:55 am
What are the figures for a Intel Celeron Core Duo?
Was that the first multicore Intel?
Thought the Celeron came out four or five years after the Pentium D that had two cores. Not sure about the RISC or micro controller ranges Intel had created before this though as some of them had multiple chip solutions, do they count, and some where built for supercomputers and only worked in massively parallel sets :lol:
Pentium D 8## was the first bolted together Dual Core Pentium in 2005 (AMD with it's Athlon 64 X2 being a 'proper' one iirc)
First 'proper' was two years later named the Pentium Dual-Core (the start of the 'core' range before it was renamed)


(EDIT looked it up, 2008 Celeron gained Dual-Core, in the E/T1#00 range )

User avatar
Gavinmc42
Posts: 3158
Joined: Wed Aug 28, 2013 3:31 am

Re: A Pi Pie Chart

Thu May 16, 2019 3:01 am

Wow, the Core Duo's are only just over 10 years old, I thought they were ancient ;)
My Core Duo only seems about twice as fast as my Pi3B+but i have not bench-marked it enough.

Wonder what Pi's will be like in 10years?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

User avatar
Imperf3kt
Posts: 2401
Joined: Tue Jun 20, 2017 12:16 am
Location: Australia

Re: A Pi Pie Chart

Thu May 16, 2019 4:53 am

Gavinmc42 wrote:
Thu May 16, 2019 3:01 am
Wow, the Core Duo's are only just over 10 years old, I thought they were ancient ;)
My Core Duo only seems about twice as fast as my Pi3B+but i have not bench-marked it enough.

Wonder what Pi's will be like in 10years?
I doubt the legitimacy of those dates.
Both Wikipedia and Intel seem to claim the core 2 range began in July 2006
https://ark.intel.com/content/www/us/en ... z-fsb.html
https://en.wikipedia.org/wiki/Intel_Core_2


I myself had a core 2 Duo in 2006 which I later replaced with a quad core AM3 in early 2009
Google is ubiquitous - Try it today, it's free!
https://opensource.com/life/16/10/how-ask-technical-questions

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Thu May 16, 2019 6:27 am

"Celeron"

What I did forget wa the Pentium Dual-Core wasn't an infill for the Celeron for a bit, but a level above the Celeron. So you can ignore that.

The C2D did start when you say, but these are not Celeron.


https://ark.intel.com/content/www/us/en ... z-fsb.html

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Thu May 16, 2019 5:53 pm

Gavinmc42 wrote:
Wed May 15, 2019 12:55 am
What are the figures for a Intel Celeron Core Duo?
Was that the first multicore Intel?
Looking back through the thread I found results for a Core Duo T5800 running in an HP G70 laptop. To make the comparison more interesting, I added two more laptops: an Acer C720 Chromebook with Celeron 2955U processor and an HP 15-db0011dx with an AMD A6-9225. This resulted in the following Pi pie chart:

Image

It is possible to compute the Pi ratio by hand using the formula

(Sieve*Sort*Fourier*Lorenz/1608533.6)^(0.25)

Taking the previous timings for the Core Duo T5800, the Raspberry Pi 3B+ and 3B computers we obtain

Code: Select all

The Duo T5800 has Raspberry Pi ratio=20.6175
The Pi 3B+ has Raspberry Pi ratio=12.4016
The Pi 3B has Raspberry Pi ratio=10.8197
For reference, the runs for the other notebook computers are

Code: Select all

$ ./pichart-openmp -t "Cel 2955U"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=1.32935 Mops=702.847
Merge Sort           N=16777216 Workers=4 Sec=2.21872 Mops=181.48
Fourier Transform    N=4194304 Workers=2 Sec=0.637345 Mflops=723.899
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.458876 Mflops=7019.82

The Cel 2955U has Raspberry Pi ratio=25.1951

$ ./pichart-openmp -t "A6-9225"
pichart -- Raspberry Pi Performance OPENMP version 30

Prime Sieve          P=14630843 Workers=2 Sec=0.653126 Mops=1430.55
Merge Sort           N=16777216 Workers=4 Sec=1.47106 Mops=273.717
Fourier Transform    N=4194304 Workers=4 Sec=1.05959 Mflops=435.427
Lorenz 96            N=32768 K=16384 Workers=2 Sec=0.274606 Mflops=11730.4

The A6-9225 has Raspberry Pi ratio=33.3926
Making pie charts...done.
There is a link to the current source code from the first post of this thread if you would like to make your own Pi pie charts.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Thu May 16, 2019 9:28 pm

If your Pi computers are not keeping the house warm enough this summer, here are some Pi-ratio results for currently available server hardware.

These results were obtained with a single core using either 2 or 4 simultaneous multithreads and the OpenMP version of the benchmark. Both gcc and clang compilers were run and the best results in each category reported. The Pi ratios are computed by hand as described in the previous post.

Code: Select all

SIMULTANEOUS MULTITHREADED SINGLE-CORE PERFORMANCE
             Pi-Ratio    Prime    Merge  Fourier   Lorenz
IBM Power9    43.3983  1099.17  479.123  1489.01   7276.3
Gold 6126     59.1558  1122.87  560.387  1389.17  22534.4
EPYC 7371     53.8473  1413.59  570.489  1081.44  15506.5

FURTHER HARDWARE INFORMATION (only one core was used)
             Base-Clock  Domains  Cores/Domain  Threads/Core
IBM Power9     2.7GHz       2          20            4
Gold 6126      2.6GHz       2          12            2
EPYC 7371      3.1GHz       8           4            2
I find it amusing that Power was best at the Fourier transform, Gold was best at the Lorenz 96 simulation and EPYC was best at the prime sieve and merge sort. Note that the above results reflect only single-core performance. Throughput using multiple cores and efficiency when heating the home are entirely different matters.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Fri May 17, 2019 5:47 am

jahboater wrote:
Tue May 14, 2019 8:04 pm
The number of workers doesn't seem to correlate with the number of CPU cores?
The pichart benchmark starts by over provisioning the available cores with twice the number of worker threads. This is done because over provisioning allows the operating system to assist with load balancing and that sometimes leads to greater performance.

After obtaining timing metrics using twice the workers, the number of workers is halved and the benchmark is run again. The halving process is repeated until only one worker thread is left. In the end, the best time among all tests is reported along with the number of workers that achieved that best time.

In the case of the serial code, it is still possible to break the work into bundles and then execute them consecutively rather than in parallel. Even though no parallel speedup is possible, breaking the computation into smaller parts can result in better use of cache memory.

If the default seems insufficient, the number of workers to start with may be specified using the -n command-line option. Type -h for more information.

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Fri May 17, 2019 8:10 am

The T5800 (and I can do others) is mine, *may have throttled*.
I'll get soem others up at some point.
I'll see if the 1GHz PentiumIII-m works

Return to “General discussion”