User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

[SOLVED] Pi 3B+ single core performance better than Pi 3B?

Thu Mar 22, 2018 6:02 pm

Long ago I started (single core) performance comparison of high frequency Arduinos.
Then ESP8266 and ESP32 numbers were added, and a 2.8GHz Intel number.
Later I added Pi numbers as well, updated 2 days ago with Pi 3B number:
https://forum.arduino.cc/index.php?topi ... msg3413818
Image

So basically the Due is most performant Arduino, "only" 83 times slower than Intel.
ESPs are better than all Arduinos, and PIs are even better.
And until today all made sense, Pi 3B is 1200/900=4/3 times better than Pi 2B (34μs versus 45μs).

Today I compiled q32.c with -O3 on my new Pi 3B+ and only get the same number as for Pi 3B.
I would have expected factor 1400/1200=7/6 better than Pi 3B.
Pi 3B+ is real, see 191Mbit/s over lan below (A), although my laptop gets 386Mbit/s with same speedtest-cli.

The code is just excessive search for minimal magic 3x3 square consisting of distinct primes.
Here you can download it, or see (B) below.
https://stamm-wilbrandt.de/en/forum/q32.c

I would have expected at least search time <30μs after forcing CPU frequency to 1.4GHz and running as root ...

What am I missing here, why does 3B+ show same single core integer performance as 3B?

Code: Select all

[email protected]:~ $ sudo su
[email protected]:/home/pi# echo 1400000 >  /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq 
[email protected]:/home/pi# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
1400000
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

34us
[email protected]:/home/pi# 

(A)

Code: Select all

[email protected]:~ $ speedtest-cli 
Retrieving speedtest.net configuration...
Testing from Kabel BW (46.223.20.147)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by PfalzKom (Ludwigshafen) [11.73 km]: 24.753 ms
Testing download speed................................................................................
Download: 191.04 Mbit/s
Testing upload speed....................................................................................................
Upload: 19.59 Mbit/s
[email protected]:~ $ 

(B)

Code: Select all

/* determine minimal prime 3x3 magic square; for more details see bottom */

#include <stdio.h>
#include <sys/time.h>
#include <stdint.h>

uint32_t B[]={0x35145105,0x4510414,0x11411040,0x45144001};

#define Prime(i) ((B[(i)>>5] & (0x80000000UL >> ((i)%32))) != 0)

#define forall_odd_primes_less_than(p, m, block) \
  for((p)=3; (p)<(m); (p)+=2)                    \
    if (Prime((p)))                              \
      block

uint8_t p,a,b,c,d;
struct timeval tv0,tv1;

int main(void)
{
  gettimeofday(&tv1, NULL);      // wait for usec change
  do  gettimeofday(&tv0, NULL);  while (tv0.tv_usec == tv1.tv_usec);

  forall_odd_primes_less_than(p, 64,
    forall_odd_primes_less_than(a, p,
      if Prime(2*p-a)
      {
        forall_odd_primes_less_than(b, p,
          if ( (b!=a) && Prime(2*p-b) )
          {
            c= 3*p - (a+b);

            if ( (c<2*p) && (2*p-c!=a) && (2*p-c!=b) && Prime(c) && Prime(2*p-c) )
            {
              if (2*a+b>2*p)
              {
                d = 2*a + b - 2*p;   // 3*p - (3*p-(a+b)) - (2*p-a)

                if ( (d!=a) && (d!=b) && (d!=2*p-c) && Prime(d) && Prime(2*p-d) )
                {
                  gettimeofday(&tv1, NULL);

                  printf("%3u|%3u|%3u|\n%3u|%3u|%3u|\n%3u|%3u|%3u|\n",
                    a,b,c,2*p-d,p,d,2*p-c,2*p-b,2*p-a);

                  printf("\n%ldus\n",
                    1000000*(tv1.tv_sec-tv0.tv_sec)+tv1.tv_usec-tv0.tv_usec);
                  return 0;
                }
              }
            }
          }
        )
      }
    )
  )
}

/*

it always exists this by rotation and flippings (= is p, -/+ is less/greater p)
--?
?=?
???

proof by enumeration of all possibilities

++
 =
 --

+-  +-+ +-+
 =   =  +=-
 +- -+- -+-

        +-+
        -=
        -+-

    +--
     =
     +-

-+  -+  -++
 =  +=- +=-
 -+  -+ --+

        -+-
        +=-
         -+

    -+
    -=
     -+

--
 =
 ++



row/column/diagonal sum is 3*p

a b 3*p-(a+b)=c   - - +

  p 2*a+b-2*p=d   + = -

    2*p-a         - + +
*/
Last edited by HermannSW on Fri Mar 23, 2018 1:39 am, edited 1 time in total.
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

el_grappaduro
Posts: 14
Joined: Thu Mar 22, 2018 7:06 pm

Re: Pi 3B+ single core performance better than Pi 3B?

Thu Mar 22, 2018 7:10 pm

This is Pi 2 running at 600 MHz:

Code: Select all

[email protected]:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
600000
[email protected]:~# vcgencmd measure_clock arm
frequency(45)=600000000
[email protected]:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

45us
This is the same Pi 2 running at 900 MHz:

Code: Select all

[email protected]:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
900000
[email protected]:~# vcgencmd measure_clock arm
frequency(45)=900000000
[email protected]:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

29us
It seems you're running all the time at 600 MHz. Have you ever checked real clockspeeds using 'vcgencmd measure_clock arm'?

ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi 3B+ single core performance better than Pi 3B?

Thu Mar 22, 2018 8:43 pm

HermannSW wrote:
Thu Mar 22, 2018 6:02 pm
Today I compiled q32.c with -O3 on my new Pi 3B+ and only get the same number as for Pi 3B.
Maybe both Pi computers are running in low power 600MHz mode because of a substandard power supply.

User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 12:17 am

el_grappaduro wrote:
Thu Mar 22, 2018 7:10 pm
This is Pi 2 running at 600 MHz:

Code: Select all

[email protected]:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
600000
[email protected]:~# vcgencmd measure_clock arm
frequency(45)=600000000
[email protected]:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

45us
This is the same Pi 2 running at 900 MHz:

Code: Select all

[email protected]:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
900000
[email protected]:~# vcgencmd measure_clock arm
frequency(45)=900000000
[email protected]aspberrypi:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

29us
It seems you're running all the time at 600 MHz. Have you ever checked real clockspeeds using 'vcgencmd measure_clock arm'?
I don't know why you get these numbers, but that seems to be really the problem.

I just measured frequency as you requested, and it indeed is 1.4GHz.
But the number is worse than your Pi2 number?!?!?

Code: Select all

[email protected]:~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
1400000
[email protected]:~ $ vcgencmd measure_clock arm
frequency(45)=1400146000
[email protected]:~ $ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

34us
[email protected]:~ $ whoami
pi
[email protected]:~ $ 
ejolson wrote:
Thu Mar 22, 2018 8:43 pm
Maybe both Pi computers are running in low power 600MHz mode because of a substandard power supply.
In that case measurement would not show 1.4GHZ, right?
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 12:22 am

Aaah -- it seems to be just a Linux timing issue for too small amount of time -- now once Pi 3B+ showed 15μs !
But running many times again only shows 34μs or 35μs.

I will try 1,000,000 loops as in this blog posting to get away from these low time values:
https://www.ibm.com/developerworks/comm ... erformance

Code: Select all

[email protected]:~ $ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

15us
[email protected]:~ $ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

35us
[email protected]:~ $ 
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 12:45 am

1M loops take 40s which is even worse than before,.

This is the changed code:
https://stamm-wilbrandt.de/en/forum/q32.1M.c

Code: Select all

[email protected]:~ $ diff q32.c q32.1M.c
17a18
> unsigned i, N=1000000;
23a25
> for(i=1; i<=N; ++i)
40a43,44
> if (i==N)
> {
48a53
> }
[email protected]:~ $ 

But running frequency measurement in 2nd ssh session in parallel shows that it really runs only in 600MHz although min scaling frequency is 1400000?!?!?!

Code: Select all

[email protected]:~ $ vcgencmd measure_clock arm
frequency(45)=600000000
[email protected]:~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1400000
[email protected]:~ $ 

So what is the correct method to force CPU to 1.4GHz?


P.S:
The numbers make sense, 45μs for 600MHz on Pi 2, and 29μs fir 900MHz as measured by @el_grappaduro correlate with the 15μs I measured once for Pi 3B+:

Code: Select all

bc -ql
45/(900/600)
30.00000000000000000000
45/(1400/600)
19.28571428571428571431


P.P.S:
I found another USB hub that explicitely states it can do DC5V and 2.5A, and another power adapter than can do 2.5A.
With that I am able to boot the Pi 3b+ successfully.
But again current scaling frequency drops to 600MHZ although minimal scaling frequency is set to 1.4GHz.
It seems I have a power supply problem, better than before as in this thread, but still limiting CPU power:
viewtopic.php?f=28&t=208773
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

[SOLVED] Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 1:28 am

OK, I disconnected 5V step up converter from one of my robots and used 600mAh 25C lipo to power Pi 3B+ (see image below).
Now all is fine, the 25C guarantee that Pi 3B+ gets whatever it needs (25C can deliver 5A at 5V).
And now EVERY single run shows 14μs !

Code: Select all

[email protected]:/home/pi# vcgencmd measure_clock arm
frequency(45)=1400146000
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
[email protected]:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
[email protected]:/home/pi# 

It is interesting that 1M loops take slightly longer per loop (17μs):

Code: Select all

[email protected]:/home/pi# ./q32.1M
 47| 29|101|
113| 59|  5|
 17| 89| 71|

17674786us
[email protected]:/home/pi#

The 2.8GHz Intel CPU shows better time with 1M loops (2.5) than with single run (5):

Code: Select all

$ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

5us
$ ./q32.1M
 47| 29|101|
113| 59|  5|
 17| 89| 71|

2514477us
$ 

Image


P.S:
I just removed the 5V step up converted and directly connected 4.06V loaded 25C lipo to Pi 3B+.
Pi works, but again current frequency drops to 600MHz.
So powering Pi 3B+ with 5V and 2.5A is essential to get high CPU frequencies working.
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

el_grappaduro
Posts: 14
Joined: Thu Mar 22, 2018 7:06 pm

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 8:18 am

HermannSW wrote:
Fri Mar 23, 2018 12:17 am
ejolson wrote:
Thu Mar 22, 2018 8:43 pm
Maybe both Pi computers are running in low power 600MHz mode because of a substandard power supply.
In that case measurement would not show 1.4GHZ, right?
Linux always reports wrong clockspeeds and 'vcgencmd measure_clock arm' only shows actual value. When you checked your Pi was idle (no performance needed). Then it's 1400 MHz. When you run something more demanding that needs performance it gets down to 600 MHz. You get the high clockspeed only when not needed :lol:

I have not been aware of this until recently: viewtopic.php?f=63&t=208057&p=1287591#p1287370

This other vcgencmd command is interesting since displaying whether the problem occured since last boot. Then you know you have to invest in a better power supply (3 coming since affected on all 3 out of my 3 Pi)

jahboater
Posts: 4690
Joined: Wed Feb 04, 2015 6:38 pm

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 8:31 am

Instead of the legacy gettimeofday() you might like to try the posix clock_gettime( CLOCK_MONOTONIC, ...
gettimeofday reports clock on the wall time (as CLOCK_REALTIME), while CLOCK_MONOTONIC reports a hi-res monotonically increasing counter. CLOCK_MONOTONIC_RAW is the same but without any interference from NTP.
clock_gettime reports the time in nanoseconds.
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
so to return the time in nanoseconds as a 64-bit unsigned number:-
return (uint64_t)now.tv_sec * 1000000000U + (uint64_t)now.tv_nsec;
or your trick to wait for a new second will still work.
There are several other useful clocks available, including CPU time CLOCK_PROCESS_CPUTIME_ID

el_grappaduro
Posts: 14
Joined: Thu Mar 22, 2018 7:06 pm

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 8:56 am

HermannSW wrote:
Fri Mar 23, 2018 12:45 am
But running frequency measurement in 2nd ssh session in parallel shows that it really runs only in 600MHz although min scaling frequency is 1400000?!?!?!
Yes. It explained here: viewtopic.php?f=29&t=82373

I use Rpi monitor https://rpi-experiences.blogspot.com/p/rpi-monitor.html on all my Pi but was not aware of the problem since software use the Linux way to get clockspeed which seems to be wrong? I don't understand why?

User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 5:53 pm

Thanks for the information and the links.
Especially the thread with undervoltage was important to me.
Image

I saw that symbol on HDMI monitor often, not being aware of its meaning.

Determining CPU frequency is not that important for runtime of q32.c since all PIs have exactly two available frequencies. Given the short (microseconds) runtime it is unlikely that a CPU changes its frequency during program runs. So it either runs at 600MHz, or at the high frequency. For Pi 3B that is either 34μs at 600MHz or 14μs at 1400Mz.

I just measured all PIs again while enforcing runs under high CPU frequency and updated the table in the other thread:
https://forum.arduino.cc/index.php?topi ... msg3413818
Image

Then I created this diagram for only the PIs and the Intel CPU:
Image

For the PI 3Bs the measured values totally make sense:

Code: Select all

$ bc -ql
17/(1400/1200)
14.57142857142857142865
34/(1400/600)
14.57142857142857142859
Last edited by HermannSW on Wed Jun 06, 2018 12:39 pm, edited 1 time in total.
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Sat Mar 24, 2018 9:41 pm

HermannSW wrote:
Fri Mar 23, 2018 5:53 pm
Especially the thread with undervoltage was important to me.
Image

I saw that symbol on HDMI monitor often, not being aware of its meaning.
Today I received the official Pi power supply (delivery after 1 day 👍).
I am happy that I have not seen under voltage symbol on HDMI monitor with it, even taking videos.
HermannSW wrote:
Fri Mar 23, 2018 5:53 pm
Determining CPU frequency is not that important for runtime of q32.c since all PIs have exactly two available frequencies. Given the short (microseconds) runtime it is unlikely that a CPU changes its frequency during program runs.
I should have verified that :D

These are the timings for 1000 runs of q32: <ADD>average over 1000 mesurements is 16.111μs</ADD>

Code: Select all

[email protected]Bplus:~ $ for((i=1; i<=1000; i++)); do ./q32 | grep us ; done | sort -n | uniq -c
    241 14us
    495 15us
     78 16us
     40 17us
     20 18us
     18 19us
     27 20us
     43 21us
     26 22us
      2 23us
      2 26us
      1 27us
      1 28us
      1 37us
      1 38us
      1 39us
      1 64us
      1 113us
      1 223us
[email protected]:~ $ 
Since the normal values for 600MHz/1400MHz are 34μs/14μs my previous statement that CPU frequency change during runtime of q32 is unlikely is either wrong, or taking μs times under Raspbian/Linux is not as accurate as I thought.
Last edited by HermannSW on Mon Apr 16, 2018 2:37 pm, edited 1 time in total.
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

User avatar
HermannSW
Posts: 1505
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Sat Mar 24, 2018 11:28 pm

I did reboot with the other new power supply I ordered from amazon.
That can even do 5V/3A instead of 5.1V/2.5A of official Raspberry power supply.

Previous run was done with forcing CPU frequency to 1400.
Seems not to be needed, similar runtime distribution without forcing CPU frequency to 1400MHz:

Code: Select all

[email protected]:~ $ cat /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
600000
[email protected]:~ $ for((i=1; i<=1000; i++)); do ./q32 | grep us ; done | sort -n | uniq -c
    128 14us
    488 15us
    171 16us
     64 17us
     22 18us
     30 19us
     20 20us
     24 21us
     33 22us
      2 23us
      3 24us
      1 25us
      1 26us
      2 34us
      4 35us
      2 36us
      1 42us
      1 43us
      1 45us
      2 46us
[email protected]:~ $ 
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw
https://twitter.com/HermannSW

Return to “General discussion”