ejolson wrote: ↑Tue May 14, 2019 6:33 am

I'll also recheck my code to make sure the parallel part scales reasonably between 4 and 6 cores.

Here is the output for a set of runs on an 8-core ARM Cortex A53 system running in 64-bit mode:

Code: Select all

```
$ taskset -c 0 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=1 Sec=5.22214 Mops=178.917
Merge Sort N=16777216 Workers=1 Sec=4.54134 Mops=88.664
Fourier Transform N=4194304 Workers=1 Sec=5.44227 Mflops=84.776
Lorenz 96 N=32768 K=16384 Workers=1 Sec=6.06767 Mflops=530.883
My Computer has Raspberry Pi ratio=4.58997
Making pie charts...done.
$ taskset -c 0,1 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=2 Sec=2.61991 Mops=356.626
Merge Sort N=16777216 Workers=4 Sec=2.29244 Mops=175.644
Fourier Transform N=4194304 Workers=2 Sec=3.1076 Mflops=148.466
Lorenz 96 N=32768 K=16384 Workers=2 Sec=3.13086 Mflops=1028.86
My Computer has Raspberry Pi ratio=8.78213
Making pie charts...done.
$ taskset -c 0,1,2 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=3 Sec=1.75374 Mops=532.762
Merge Sort N=16777216 Workers=6 Sec=1.55268 Mops=259.328
Fourier Transform N=4194304 Workers=6 Sec=2.3963 Mflops=192.536
Lorenz 96 N=32768 K=16384 Workers=3 Sec=2.13824 Mflops=1506.49
My Computer has Raspberry Pi ratio=12.5634
Making pie charts...done.
$ taskset -c 0,1,2,3 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=4 Sec=1.31822 Mops=708.779
Merge Sort N=16777216 Workers=8 Sec=1.18948 Mops=338.511
Fourier Transform N=4194304 Workers=4 Sec=1.93407 Mflops=238.55
Lorenz 96 N=32768 K=16384 Workers=4 Sec=1.60808 Mflops=2003.16
My Computer has Raspberry Pi ratio=16.3394
Making pie charts...done.
$ taskset -c 0,1,2,3,4 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=5 Sec=1.05731 Mops=883.685
Merge Sort N=16777216 Workers=10 Sec=0.967107 Mops=416.348
Fourier Transform N=4194304 Workers=10 Sec=1.70527 Mflops=270.557
Lorenz 96 N=32768 K=16384 Workers=5 Sec=1.31937 Mflops=2441.48
My Computer has Raspberry Pi ratio=19.7156
Making pie charts...done.
$ taskset -c 0,1,2,3,4,5 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=6 Sec=0.882527 Mops=1058.7
Merge Sort N=16777216 Workers=12 Sec=0.797551 Mops=504.862
Fourier Transform N=4194304 Workers=12 Sec=1.68901 Mflops=273.162
Lorenz 96 N=32768 K=16384 Workers=6 Sec=1.08304 Mflops=2974.23
My Computer has Raspberry Pi ratio=22.7944
Making pie charts...done.
$ taskset -c 0,1,2,3,4,5,6 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=7 Sec=0.757242 Mops=1233.86
Merge Sort N=16777216 Workers=14 Sec=0.719012 Mops=560.009
Fourier Transform N=4194304 Workers=7 Sec=1.6323 Mflops=282.652
Lorenz 96 N=32768 K=16384 Workers=7 Sec=0.921426 Mflops=3495.91
My Computer has Raspberry Pi ratio=25.5247
Making pie charts...done.
$ taskset -c 0,1,2,3,4,5,6,7 ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=8 Sec=0.663588 Mops=1407.99
Merge Sort N=16777216 Workers=16 Sec=0.645496 Mops=623.789
Fourier Transform N=4194304 Workers=8 Sec=1.5337 Mflops=300.824
Lorenz 96 N=32768 K=16384 Workers=8 Sec=0.807844 Mflops=3987.44
My Computer has Raspberry Pi ratio=28.4481
Making pie charts...done.
```

Presented graphically this looks like

The scaling appears fairly uniform without anything surprising, which is expected because all cores are identical. This suggests the code itself is working fine and that there is something strange going on with the N2 hardware. I suspect you have misidentified which cores are the little ones; however, some sort of throttling could also be involved.