RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Sat Oct 01, 2016 4:06 pm

Drive and Network Speed

The next benchmarks measure speed of main, USB and network drives, with details, and results provided in:

http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

The benchmark’s first test is to measure writing and reading MB/second of three large files. The default sizes are 8 and 16 MB, but can be changed in the run time command. The second set of measurements are random reading and writing times, in milliseconds, of 1 KB blocks out of 4, 8 and 16 MB. The final test writes and reads 200 small files of 4, 8 and 16 KB, measured in MB/second and milliseconds per file.

In case anyone wants to run them, the benchmarks are DriveSpeed and LanSpeed found in:

http://www.roylongbottom.org.uk/Raspber ... hmarks.zip

With the benchmarks in a test folder, terminal commands are;

Code: Select all

Main Drive
test $ ./DriveSpeed

USB Drive
Path via Menu - Go, Devices, Click USB Icon, Address Bar
test $ ./DriveSpeed FilePath /media/pi/BLUE

Network (In my Case with For using d:/Test)
Windows Command Prompt ipconfig command = 192.168.1.68
test $ sudo mount -t cifs -o dir_mode=0777,file_mode=0777 //192.168.1.68/d /media/pi
Enter password
test $ ./LanSpeed FilePath /media/pi/Test
Below is a summary of results
.
Large Files - The SD card is Class 10 that is supposed to write at at a minimum of 10 MB/s but varies and can be less. The USB drive is a modern class less 8 GB drive - see htm report for drives with much faster writing speeds. LAN speed is as expected for 100 Mbps connection. Note how much slower wi-fi is.

Random Access - SDs are faster than hard drives. This SD is faster than USB drive. LAN reading is mainly from remote disk drive’s buffer. Wi-fi is, again, much slower.

Small Files - Average running time might be expected to increase with file size, but there is a lot of variability. Again, Wi-Fi can be particularly slow.

Code: Select all

                  MB/second 16 MB files
                                                  Boot
 Large    Write1  Write2  Write3  Read1   Read2   Read3

 SD Main    8.7     7.3    11.0    16.9    22.9    23.1 
 USB2      10.6     4.8     3.7    35.0    35.6    34.8

 LAN       11.4    11.4    11.4    11.7    11.7    11.7
 WiFi       2.7     3.2     2.6     1.6     1.5     0.8



                     Random milliseconds

              Read                    Write
 From MB      4       8      16       4       8      16

 SD Main  0.460   0.450   0.400    1.68    2.60    1.77
 USB2     0.717   0.771   0.797    1.94    2.38    2.41 

 LAN      0.459   0.864   0.743    3.47    2.77    3.16
 WiFi     7.178  10.447   7.784   11.18    9.79    8.99

                    Milliseconds per file

              Write                   Read               Delete
 File KB      4       8      16       4       8      16  Seconds

 SD Main   4.39    1.75    3.83    0.54    0.70    1.09   0.019
 USB2      7.24    9.12   12.72    0.64    0.74    0.63   0.012

 LAN       4.39    4.66    5.39    1.79    2.31    3.29    0.33
 WiFi     30.22   34.09   53.57   36.84   22.94   40.33    3.13
Broadband

Measuring data transfer speeds of my new broadband, from the Raspberry Pi 3, indicates download and upload speeds of 58 and 10.7 Mbps via LAN, with 11.3 and 7.7 using Wi-Fi.

I have an on-line test facility, for measuring loading time of images, via buttons in the following:

http://www.roylongbottom.org.uk/online%20benchmarks.htm

For broadband, the larger images should be selected, comprising 1 MB BMP, GIF and JPG files, plus 400 tiny GIF files. Pixel dimensions of the large files are 667 x 500 , 1766 x 1325 and 2048 x 1536. The images can be re-read via the browser’s refresh option, to avoid initial loading overheads. Results are provided for the range of loading times after the first (should be revised for high speed broadband).

The following loading times are for the RPi 3 LAN and Wi-Fi, a desktop PC and a new mobile phone, then some using earlier broadband. Performance of the first batch is quite similar, except for the large files via RPi Wi-Fi. For these large files, the new broadband can be more than ten times faster, but is much slower downloading numerous tiny files (Is this a package size issue?).

Code: Select all

              Broadband Seconds Loading Time
                                              400
             BMP        GIF         JPG       Small

 LAN      0.1 - 0.2  0.1 - 0.1  0.2 - 0.2  13.8 - 18.0  
 Wi-Fi    0.9 - 0.9  0.5 - 0.7  0.5 - 0.6  13.4 - 16.8
 PC       0.1 - 0.1  0.1 - 0.1  0.1 - 0.1   9.5 - 19.9   
 Mobile   0.1 - 0.2  0.1 - 0.2  0.1 - 0.1  11.7 - 16.9
 Slow Broadband
 PC          1.5        1.8        1.7      5.1 -  7.5
 Lap Wi-Fi   1.4        1.5        1.5      9.2 - 13.5

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Fri Oct 07, 2016 4:21 pm

Raspberry Pi 3 Unstressed

Based on a recommendation from Dom, regarding reduced heating, I obtained a FLIRC case and installed my RPi 3 board. This is screwed to the aluminium cover, where a protrusion is clamped to the CPU and the whole case acts as a heatsink. I repeated the stress tests described above:

viewtopic.php?p=1043098#p1043098
Throttling started at around the reported 80°C, with a maximum of about a 34% reduction in CPU MHz and recorded MFLOPS, for both heatsinks and still 21% with the cover removed.
Results are below showing considerably reduced temperature and constant performance, over the test period.

Code: Select all

             Revised Benchmark Max MFLOPS > 2900 Per Core

           Copper Heatsink       Copper No Cover       FLIRC Case

                       4 Core                4 Core                4 Core
  Minute     °C    MHz MFLOPS      °C    MHz MFLOPS      °C    MHz MFLOPS

       0   41.9   1200           46.2   1200           41.9   1200
       1   65.0   1200  11706    67.1   1200  11720    56.9   1200  11728
       2   73.6   1200  11709    74.1   1200  11709    59.6   1200  11671
       3   79.0   1200  11726    79.0   1200  11682    61.2   1200  11715
       4   81.7   1038  10322    80.6   1118  11059    62.3   1200  11711
       5   82.2    963   9629    81.7   1048  10296    63.4   1200  11692
       6   82.7    932   9165    81.7   1015  10073    65.0   1200  11696
       7   83.8    876   8832    81.7    991   9812    65.5   1200  11691
       8   83.3    867   8558    81.7    991   9684    66.3   1200  11702
       9   83.8    842   8318    82.2    963   9556    67.1   1200  11704
      10   83.8    824   8146    82.7    965   9369    67.1   1200  11699
      11   83.8    821   8051    82.7    968   9342    68.2   1200  11710
      12   83.8    813   7966    82.7    953   9241    69.3   1200  11712
      13   83.8    812   7879    82.2    956   9203    69.3   1200  11699
      14   84.4    796   7780    82.7    948   9194    69.8   1200  11709
      15   84.4    794   7710    82.7    949   9109    Fin

    min    65.0    794   7710    67.1    948   9109    56.9   1200  11671
    max    84.4   1200  11726    82.7   1200  11720    69.8   1200  11728
   Loss
      %           33.8   34.2           21.0   22.3              0      0


RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Thu Oct 20, 2016 10:55 am

Raspberry Pi 3 Java and OpenGL Benchmarks

To complete the benchmark collection, there are programs that use Java and OpenGL Details are in the following, with benchmarks and source codes in the zip file.

http://www.roylongbottom.org.uk/Raspber ... hmarks.htm
http://www.roylongbottom.org.uk/Raspber ... hmarks.zip

Java Whetstone Benchmarks

There are on-line and off-line versions available. The on-line version can be run from the following but it needs a run time plugin, where it might take a long time to install and enable a suitable one. See the above documentation, where the icedtea plugin was used and the RPi 3 results were slow.

http://www.roylongbottom.org.uk/online%20benchmarks.htm

The latest Raspbian comes with Java 8 installed and this provides hardware acceleration but Java 7 was also installed to demonstrate much slower speeds. Following is an example of the faster speeds. These are around 50% faster than results on a Raspberry Pi 2, not much better than the CPU clock speed difference.

Code: Select all

     Whetstone Benchmark Java Version, Oct 11 2016, 21:11:28

                                                       1 Pass
  Test                  Result       MFLOPS     MOPS  millisecs

  N1 floating point  -1.124750137    184.44             0.1041
  N2 floating point  -1.131330490    178.87             0.7514
  N3 if then else     1.000000000              88.77    1.1660
  N4 fixed point     12.000000000             461.07    0.6832
  N5 sin,cos etc.     0.499110103               5.96   13.9700
  N6 floating point   0.999999821     91.05             5.9240
  N7 assignments      3.000000000             276.56    0.6682
  N8 exp,sqrt etc.    0.751108646               1.19   31.2500

  MWIPS                              183.43            54.5169
JavaDraw Benchmark

This runs the tests shown below that provide increasing demands, with speeds some 70% faster than on RPi 2. This was without activation of the OpenGL GLUT driver, needed for the new benchmark described later. When activated, drawing was at 8.1 FPS or less.

Code: Select all

   Java Drawing Benchmark, Oct 11 2016, 22:20:34
            Produced by javac 1.7.0_02

  Test                              Frames     FPS    RPi2

  Display PNG Bitmap Twice Pass 1      763    76.2    44.4
  Display PNG Bitmap Twice Pass 2      969    96.8    56.2
  Plus 2 SweepGradient Circles         958    95.8    57.3
  Plus 200 Random Small Circles        897    89.6    55.0
  Plus 320 Long Lines                  623    62.3    38.6
  Plus 4000 Random Small Circles       429    42.9    25.2

         Total Elapsed Time  60.0 seconds

  Operating System    Linux, Arch. arm, Version 4.4.11-v7+
  Java Vendor         Oracle Corporation, Version  1.8.0_65
OpenGL ES Benchmark

The benchmark, again, runs tests with increasing demands and has run time parameters for the window size to use. Maximum speed is limited at 60 FPS by Wait For Vertical Blank (VSYNC). The results (in above report) were not much faster than on The RPi 2. This benchmark would not run with the GLUT driver enabled.

Code: Select all

 Raspberry Pi OpenGL ES Benchmark 1.2, Thu Jul 28 12:25:05 2016

           --------- Frames Per Second --------
 Triangles WireFrame   Shaded  Shaded+ Textured

    900+      59.99    60.00    41.45    37.88
   9000+      19.38    19.17    14.42    11.59
  18000+       9.84     9.75     8.34     6.49
  36000+       4.91     4.90     4.52     3.30

      Screen Pixels 1920 Wide 1080 High

      End Time Thu Jul 28 12:27:47 2016
OpenGL GLUT Benchmark

This is essentially my Linux VideoGL1 benchmark. Its pedigree was established in 2012, when I approved a request from a Quality Engineer at Canonical, to use this OpenGL benchmark in the testing framework of the Unity desktop software. Details and results are in the general RPi benchmark report (link above), also in the following topic, where mismatch of the new driver and Raspbian version exaggerated overheating issues, using the program in a stress testing mode.

viewtopic.php?p=958209#p958209
Before running, a command can disable VSYNC (export vblank_mode=0), followed by other commands to run at different window sizes, for example, to produce the following table of results. The comparison with RPi 2 shows that the RPi 3 performs much better on the heavier tasks. The RPi 3 can also be faster than an Atom based netbook and modern desktop PCs that use a default driver.

http://www.roylongbottom.org.uk/linux%2 ... orNetbook1

Code: Select all

GLUT OpenGL Benchmark 32 Bit Version 1, Wed Jul 27 20:31:52 2016

          Running Time Approximately 5 Seconds Each Test

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    308.4    182.1     82.6     52.3     21.6     13.7
   640   480    129.5    119.6     74.6     49.2     21.6     13.8
  1024   768     54.8     52.2     43.7     39.2     21.4     13.6
  1920  1080     21.5     17.9     20.3     19.6     20.6     13.4

                   End at Wed Jul 27 20:34:06 2016

        Comparison With Raspberry Pi 2 At Sefault 900 MHz 

   320   240     1.47     1.59     1.57     1.61     1.79     1.76
  1920  1080     1.04     0.96     1.21     1.23     1.81     1.81

mcgyver83
Posts: 358
Joined: Fri Oct 05, 2012 11:49 am

Re: Raspberry Pi Benchmarks

Tue Oct 25, 2016 7:41 am

The last benchmark test zip is usable (with relevant results) also on Rpi2?

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Tue Oct 25, 2016 8:11 am

mcgyver83 wrote:The last benchmark test zip is usable (with relevant results) also on Rpi2?
Yes, but remember to change properties to allow execution.

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Mon Jan 09, 2017 7:59 pm

64 Bit Benchmarks via OpenSUSE

I am converting my Raspberry Pi benchmarks to run via 64 bit OpenSUSE. The following provides details of early experience, where inconsistent performance was identified.

viewtopic.php?p=1095254#p1095254

I will include relative 64 bit / 32 bit comparisons here in due course.

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Mon Feb 06, 2017 11:48 am

64 Bit Benchmarks

I am currently recompiling benchmarks for 64 bit operation and testing them via Raspberry Pi 3 compatible versions of SUSE, OpenSUSE and Gentoo. For more information on these see:

viewtopic.php?p=1095254#p1095254
viewtopic.php?p=1108073#p1108073

Full details are in the following, with benchmarks and source codes in the tar.gz file:

http://www.roylongbottom.org.uk/Raspber ... hmarks.htm
http://www.roylongbottom.org.uk/Rpi3-64 ... rks.tar.gz


The Classic Benchmarks are the first programs that set standards of performance for computers in the 1970s and 1980s. They are Whetstone, Dhrystone, Linpack and Livermore Loops

Whetstone - This includes simple test loops that do not benefit from advanced instructions. There was a 40% improvement in overall performance. This was due to limited but dominant tests using such as COS and EXP functions.

Dhrystone - rated in VAX MIPS AKA DMIPS produced a 43% improvement, but this benchmark is susceptible to over optimisation.

Linpack - with double and single precision versions, with results reported in MFLOPS. Speed improvements, over the 32 bit version, were around 1.9 times DP and 2.5 times SP. There is also a version that uses NEON intrinsic functions where, at 32 bits and 64 bits, are compiled as different varieties of vector instructions, with only a 10% improvement.

Livermore Loops - has 24 test kernels, where 64 bit performance increased between 1.02 and 2.88 times. The official average was 34% faster, at 279 MFLOPS. This is 21 times faster than the Cray 1 supercomputer, where this benchmark confirmed the original selection.
.

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Tue Feb 14, 2017 10:55 am

64 Bit Memory Tests

These measure cache and RAM speeds with results in MB/second. As could be expected, 32 bit and 64 bit RAM speeds were generally quite similar for particular test functions.

MemSpeed - Nine tests measure speeds using floating point (FP) and integer calculations. Cache based improvements, over 32 bit speeds, were 1.64 to 2.60 DPFP, 1.17 to 1.55 SPFP and 1.03 to 1.23 integer.

BusSpeed - this reads data via loops with 64 AND instructions, attempting to measure maximum data transfer speeds. It includes variable address increments to identify burst reading and to provide a means of estimating bus speeds. Main differences were on using L1 cache data, where average bursts speeds were 38% faster but reading all data was slower. This is surprising as the 64 bit disassembly indicate that far more registers were used, with fewer load instructions, and the same type of AND instructions.

NeonSpeed - All floating point data is single precision. The source code carries out the same calculations using normal arithmetic and more complicated NEON intrinsic functions, the latter being compiled as different types of vector instructions, with no real average 64 bit improvement. The normal SP calculations were slightly faster.

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Wed Mar 08, 2017 11:13 am

64 Bit MultiThreading Benchmarks

Most of my multithreading benchmarks run using 1, 2, 4 and 8 threads. Many have tests that use approximately 12 KB. 120 KB and 12 MB, to use both caches and RAM. The first set attempt to measure maximum MFLOPS. with two test procedures, one with two floating point operations per data word and the other with 32. The latter includes a mixture of multiplications and additions, coded to enable SIMD operation. In this case, using single precision numbers, four at a time, plus linked multiply and add, a top end CPU can execute eight operations per clock cycle per core. It is not clear what the potential maximum MFLOPS is on an ARM Cortex-A53, but eight per core is mentioned. The same benchmark code obtained a maximum of 24 MFLOPS/MHz on a top end quad core Intel CPU, via Linux - see the following:

http://www.roylongbottom.org.uk/linux%2 ... tm#anchor6

Then this ARM CPU might need a different combination of arithmetic operations for higher values, where best case obtained with this benchmark was 2.2 MFLOPS/MHz using a single core.

Following shows the format of the MP-MFLOPS benchmarks with the best 64 bit Raspberry Pi 3 results. Note performance increases using more threads, except when limited by RAM speed. These benchmarks carry out a fixed number of test passes, with each thread carrying out the same calculations on different sections of data. Numeric results produced (x 100000) are output to show that all data has been used.

Code: Select all

 MP-MFLOPS NEON Intrinsics 64 Bit Tue Feb 28 15:37:39 2017

    FPU Add & Multiply using 1, 2, 4 and 8 Threads

        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      697     725     420    2640    2544    2441
 2T     1452    1420     348    5135    5258    4430
 4T     1438    2679     343   10113    9905    5370
 8T     1914    2533     358    9332   10124    6041
 Results x 100000, 12345 indicates ERRORS
 1T    76406   97075   99969   66015   95363   99951
 2T    76406   97075   99969   66015   95363   99951
 4T    76406   97075   99969   66015   95363   99951
 8T    76406   97075   99969   66015   95363   99951

         End of test Tue Feb 28 15:37:43 2017
Benchmarks appropriate for comparison of 32 and 64 bit versions are single and double precision versions, compiled for normal floating point and one using NEON intrinsic functions that are clearly suitable for SIMD operation and are converted to different types of vector operation.

64 bit/32 bit speed comparisons are below. Single precision MP-MFLOPS has the highest gain by using vector instructions, instead of scalar. With compiled intrinsics the systems use different forms of vector instructions.

Code: Select all

 Average 64 bit performance gains

         2 Ops/Word              32 Ops/Word
         12.8     128   12800    12.8     128   12800

 MF SP   4.31    3.87    1.24    2.19    2.35    2.04
 MF DP   2.45    1.71    0.83    1.92    1.92    1.42
 Intrin  1.81    1.84    0.82    1.67    1.75    1.08
There is also an OpenMP benchmark that carries out the same calculations, but also with 8 calculations per data word. OpenSUSE uses all available CPU cores. So, for comparison purposes, a version, without the MP directive, is also provided. Results identify MP gains of up to 3.89 times at 64 bits. The 64 bit version produces some similar speeds to the 32 bit compilation, but was faster by 2.47 to 2.80 times using 32 floating point operations per word, in the MP tests.

As usual benchmark, source codes, details and results are in:

http://www.roylongbottom.org.uk/Rpi3-64 ... rks.tar.gz
http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Thu Mar 16, 2017 11:19 am

More 64 Bit MultiThreading Benchmarks

The other MP benchmarks, included in the tar.gz file, demonstrate some MP and 64 bit performance gains, with others identifying that multithreading provided little or no benefit and, sometimes, much worse performance.

MP-Whetstone - Multiple threads each run the eight test functions at the same time, but with some dedicated variables. MP performance is good but the simple test functions are nit appropriate for more advanced instructions at 64 bits, so relative 32 bit performance is between 0.48 and 2.08.

MP-Dhrystone - This runs multiple copies of the whole program at the same time. Dedicated data arrays are used for each thread but there are numerous other variables that are shared. The latter reduces performance gains via multiple threads and, in some cases, these can be slower than using a single thread. In this case, some quad core improvements are shown as up to 2.5 times faster than a single core. Single core 64 bit/32 bit speed ratio was 1.50 reducing to 1.10 using four threads.

MP-Linpack - The original Linpack Benchmark operates on double precision floating point 100x100 matrices. This one runs on 100x100, 500x500 and 1000x1000 single precision matrices using 0, 1, 2 and 4 separate threads, mainly via NEON intrinsic functions that are compiled into different forms of vector instructions. The benchmark was produced to demonstrate that the original Linpack code could not be converted (by me) to show increased performance using multiple threads. The official line is that users are allowed to implement their own linear equation solver for this purpose. At 100 x 100, data is in L2 cache, others depend more on RAM speed. The critical daxpy function is affected by numerous thread create and join directives, even on using one thread. This leads to slow and constant performance using all thread tests - see example below. The 32 bit version produced slightly slower speeds.

Code: Select all

 Linpack Single Precision MultiThreaded Benchmark
  64 Bit NEON Intrinsics, Wed Mar  8 11:36:25 2017

   MFLOPS 0 to 4 Threads, N 100, 500, 1000

 Threads      None        1        2        4

 N  100     552.47   112.73   105.19   105.31 
 N  500     442.32   303.75   303.64   305.03 
 N 1000     353.88   315.96   309.15   308.31 
MP-BusSpeed - This runs integer read only tests using caches and RAM, each thread accessing the same data, but with staggered starting points. It includes tests with variable address increments, to identify burst reading and bus speeds. The main “Read All” test is intended to identify maximum RAM speed. The benchmark demonstrated some appropriate MP performance gains, but slow 64 bit speeds, with the 32 bit version being 2.5 times faster via cache based data. The reason is that the latter compiled arithmetic as 16 four way NEON operations compared with 64 scalar instructions.

MP-RandMem - The benchmark has cache and RAM read only and read/write tests using sequential and random access, each thread accessing the same data but starting at different points. The read only L1 cache based tests demonstrated MP gains of 3.6 times and 64 bit version 43% faster than the 32 bit variety. Read/write tests produced no multithreading performance improvement and the latest benchmark appeared to be siomewaht slower than the 32 bit version.

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Sun Mar 26, 2017 4:47 pm

OpenGL GLUT 64 Bit Benchmark

This was produced for use on Linux based PCs. It has four tests using coloured or textured simple objects then a wireframe and textured complex kitchen structure. It can be run from a script file specifying different window sizes and a command to disable VSYNC, enabling speeds greater than 60 FPS to be demonstrated. The benchmark, source code and details are in the following:

http://www.roylongbottom.org.uk/Rpi3-64 ... rks.tar.gz
http://www.roylongbottom.org.uk/Raspber ... #anchor19a

In 2012, I approved a request from a Quality Engineer at Canonical, to use this OpenGL benchmark in the testing framework of the Unity desktop software. One reason probably was that a test can be run for extended periods as a stress test.

Below are results from a Raspberry Pi 3, using the experimental desktop GL driver and the new 64 bit version using Gentoo and OpenSUSE. Comparing the first two, it can be seen that, using smaller windows, the 32 bit version was much faster running simple coloured objects, with the 64 bit benchmark being ahead with complex structures. Then, performance was quite similar with full screen displays.

The OpenSUSE exercise included tests at a smaller window size, to show that maximum speed was not limited by VSYNC. Performance was generally slower than the Gentoo tests and was particularly slow with a full screen display - (config setting?). One other problem with OpenSUSe Leap 42.2 is that it failed to run on some of the many available distributions.

Code: Select all

######################### RPi 3 Original #########################

 GLUT OpenGL Benchmark 32 Bit Version 1, Wed Jul 27 20:31:52 2016

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    308.4    182.1     82.6     52.3     21.6     13.7
   640   480    129.5    119.6     74.6     49.2     21.6     13.8
  1024   768     54.8     52.2     43.7     39.2     21.4     13.6
  1920  1080     21.5     17.9     20.3     19.6     20.6     13.4


 ########################## RPi 3 SUSE ###########################

 GLUT OpenGL Benchmark 64 Bit Version 1, Sat Mar 18 19:03:25 2017

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   160   120     87.1     76.3     64.3     46.9     24.3     15.6

   320   240     59.2     54.7     53.7     43.9     25.6     15.6
   640   480     33.4     31.7     31.0     27.6     24.4     15.3
  1024   768     17.5     17.5     17.7     17.0     16.2     14.1
  1920  1080      8.2      8.3      9.0      9.3      8.4      7.6


########################## RPi 3 Gentoo ##########################

 GLUT OpenGL Benchmark 64 Bit Version 1, Sat Mar 18 18:21:44 2017

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    161.8    116.0     67.1     46.3     26.7     16.7
   640   480     76.8     74.8     49.8     41.4     25.9     16.3
  1024   768     35.7     34.8     29.7     26.7     25.0     15.7
  1920  1080     18.0     18.7     16.4     15.8     17.1     13.1
Java and Java Whetstone Benchmark via 64 Bit Systems

After a struggle, I gave up trying to install Java with Gentoo but managed to download Oracle JDK 1.8 for temporary use (not installed in the right place?). This could compile Java code and run the Whetstone program but not my JavaDraw benchmark. The benchmarks and results are can be obtained via the above links. On running the Whetstone benchmark, excluding two tests, where each was much faster, the average 64 bit speed was twice as fast using OpenSUSE and somewhat higher via Gentoo.

JavaDraw Benchmarks - 64 Bit OpenSUSE

The benchmark uses small to rather excessive simple objects to measure drawing performance in Frames Per Second (FPS). Five tests draw on a background of continuously changing colour shades. Benchmarks, further details and results can be obtained via the above links.

Results below include all sorts of issues, where the original system did not run well after the new OpenGL GLUT driver was installed and OpenSUSE performance depended on a particular distribution.

Code: Select all

 ##################### RPi 3 JavaDraw FPS ######################

                    PNG     PNG    +Sweep   +200    +320   +4000
                  Bitmaps Bitmaps Gradient Small    Long   Small
                      1       2   Circles Circles  Lines  Circles

  Pi 2  900 MHz     44.4    56.8    57.3    55.0    38.6    25.2

  Pi 3  Original    55.0    69.5    70.0    67.7    46.4    29.5
  Pi 3 +GLUT Driver  2.9     3.2     7.3     8.1     7.5     7.0

  Pi 3  OpenSUSE     8.6    10.9    10.7    10.1     7.9     3.6
  Pi 3  OpenSUSE    22.8    32.1    32.3    27.7    15.3     6.2
 

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Sun Apr 23, 2017 10:01 am

64 Bit I/O Benchmarks

My DriveSpeed and LanSpeed programs have now been recompiled as DriveSpeed64 and LanSpeed64, with benchmarks, source codes, details and results in the tar.gz and htm files quoted earlier. The code for these is identical, except DriveSpeed opens files to use direct I/O, avoiding caching. LanSpeed normally runs without using local caching. The benchmarks measure speeds of relatively large files, random access and numerous small files.

There might be tuning parameter, but DriveSpeed64 produced errors using the installed OpenSUSE and Gentoo operating systems, where direct I/O did not appear to be available. It did run using SUSE SLES, producing the results shown below. In this case, random access and small file test results were not as expected.

Code: Select all

################ DriveSpeed64 SUSE SLES ################

   DriveSpeed RasPi 64 Bit 1.1 Mon Apr  3 23:40:21 2017
 
 Current Directory Path: /home/roy/driveLANSUSE
 Total MB   29465, Free MB   27495, Used MB    1970

                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

   8    10.26    15.50     7.78    47.27    51.62    48.91
  16    10.58    13.86    10.14    54.05    55.50    45.78
 Cached
   8   520.96   586.68   601.25   709.43   709.23   706.46

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.005    0.004    0.004    16.91    20.31    22.13

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      0.25     0.36     1.06   252.55   403.28   621.47
 ms/file    16.10    23.00    15.43     0.02     0.02     0.03    0.029


                End of test Mon Apr  3 23:40:59 2017
 
 >>>>>>>>>>>>>>>> Comparison with 32 Bit Version <<<<<<<<<<<<<<<

  Large Files > Faster SD card reflected, reading > twice as fast
  Random      > Writing exceptionally slow, reading far too fast, data cached? 
  Small Files > Writing exceptionally slow, reading far too fast, data cached?
DriveSpeed can also normally be used to measure performance of USB connected drives but without much luck at 64 bits. On installation, USB drives could be used for copying files using Gentoo as installed and via OpenSUSE after installing other software. The program produced errors using USB flash drives and only ran via Gentoo using a USB connected micro SD card on a btrfs formatted partition. Then, the performance pattern was as in the above example.

LanSpped, was also run successfully specifying local and USB drives, confirming that errors were caused by trying to use direct I/O.

LAN access could only be used via OpenSUSE, following installation of additional facilities. Samba for Gentoo was said to be not tested at 64 bits and that for SUSE SLES could not be downloaded following a necessary reinstallation of the system. OpenSUSE results are below from accessing a Windows based PC.

Code: Select all

#################### LanSpeed64 Example #################
 
   LanSpeed RasPi 64 Bit 1.0 Tue Apr  4 13:04:06 2017
 
 Selected File Path: 
 /root/Desktop/sharepc/
 Total MB  266240, Free MB  134653, Used MB  131587

                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

   8    11.23    11.40    11.40     8.10    11.62    11.64
  16    11.27    11.42    11.44    11.66    11.66    11.64

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.724    0.886    1.333     1.58     1.50     1.37

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      0.99     1.81     2.73     1.77     3.02     4.50
 ms/file     4.13     4.54     6.01     2.32     2.71     3.64    0.201


                End of test Tue Apr  4 13:04:43 2017
 
 >>>>>>>>>>> Comparison with 32 Bit Version Rpi 3 Ph Win <<<<<<<<<

  Large Files > Similar speeds reflecting 100 Mbps
  Random      > Similar but writing faster, no apparent caching 
  Small Files > Similar speeds


RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Mon May 08, 2017 2:16 pm

64 Bit Stress Tests

I have finished running my 64 bit benchmarks and test programs via 64 bit Operating Systems. The last ones were the series of stress tests demonstrating CPU temperature increases and associated performance degradation due to CPU MHz throttling. These tests had already been run via SUSE SLES and OpenSUSE. See:

viewtopic.php?p=1104685#p1104685

The latest test procedures were via Gentoo, with similar results, but Gentoo provides the same vcgencmd, as available via Raspbian, to measure CPU temperatures and MHz. All 64 bit programs and source codes are in the following tar.gz file with details and results in the htm report.

http://www.roylongbottom.org.uk/Rpi3-64 ... rks.tar.gz
http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Tue Apr 10, 2018 11:53 am

Raspberry Pi 3B+ 32 Bit and 64 Bit Benchmark Results

On receiving a new Raspberry Pi 3B+, I tried booting with Raspbian and 64 bit Gentoo. Both failed to boot, producing that rainbow coloured square. I found the solution for Raspbian, a sudo rpi-update being required to provide a missing file. For Gentoo, a new release was required, and that appeared quickly and, as reported in this forum, can be downloaded from:

https://github.com/sakaki-/gentoo-on-rpi3-64bit

I am now running all my benchmarks on the new system

Below is a summary of single core benchmark results on a Raspberry Pi 3B+ using 32 bit Raspbian and 64 bit Gentoo. Older model 3B results are also included for comparison purposes. These benchmarks measure processor speeds, including using some cached data, where, as expected, performance was proportional to CPU MHz of the two systems (subject to normal variations in running time of individual test functions). They are the first programs that set standards of performance for computers, from the 1970s and 1980s. Details and further results can be found in “Raspberry Pi 32 Bit and 64 Bit Benchmarks and Stress Tests.pdf” at ResearchGate:

https://www.researchgate.net/publicatio ... ationTitle

This also provides links to download the benchmarks and source codes, plus files relating to other types of systems and historic data (from ResearchGate or Archives).
Reports include some for Android devices that might provide a clue to performance of future Raspberry Pi computers.

Whetstone Benchmark - overall rating in MWIPS and eight test functions with speed in MFLOPS or MOPS. RPi 3B+ speeds at 32 bits and 64 bits were essentially the same on some tests, with overall MWIPS 36% faster.

There is also a Java version, where initial comparisons indicated that the 3B+ was slower than the original, via Gentoo but not Raspbian. The problem is that, as with Android, there can be wide variations produced by each new release of Java. Booting the old RPi 3 from the new version of Gentoo produced the expected differences.

Code: Select all

 Whetstone Benchmark 

 32 Bit

 System          MHz  MWIPS  ------MFLOPS-------   ------------MOPS---------------
                               1      2      3     COS   EXP  FIXPT      IF  EQUAL

 RPi 3   v8-A53 1200  711.6  336.5  329.7  256.9  12.2   8.8 1498.5  1796.7 1198.7
 RPi 3B+ v8-A53 1400  829.9  392.7  384.6  299.8  14.2  10.2 1748.1  2095.8 1398.5

 Ratio          1.17   1.17   1.17   1.17   1.17  1.16  1.16   1.17    1.17   1.17

 Java

 Both Java 1,8.0_65

 RPi 3   v8-A53 1200  183.4  184.1  179.6   91.1  5.94  1.19  460.5    88.6  276.6
 RPi 3B+ v8-A53 1400  211.8  214.2  207.6  105.8  6.92  1.37  535.5   103.1  321.3  

 Ratio          1.17   1.15   1.16   1.16   1.16  1.16  1.15   1.16    1.16   1.16

 ==================================================================================

 64 Bit

 System          MHz  MWIPS  ------MFLOPS-------   ------------MOPS---------------
                               1      2      3     COS   EXP  FIXPT      IF  EQUAL

 RPi 3   v8-A53 1200 1022.9  327.6  346.3  282.1  20.3  12.6 1467.3  ###### 1166.4
 RPi 3B+ v8-A53 1400 1125.8  383.3  403.3  328.0  22.6  13.0 1705.8  ###### 1359.2

 Ratio          1.17   1.10   1.17   1.16   1.16  1.11  1.03   1.16           1.17
 3B+ 64/32 bit         1.36   0.98   1.05   1.09  1.59  1.27   0.98           0.97     

 Java

 Java 1.8.0_121, Linux 4.10.0

 RPi 3   v8-A53 1200  783.0  335.4  296.3  207.0  19.0  18.1  667.1   160.8   88.3

 Both Java 1.8.0_161, Linux 4.14.31

 RPi 3   v8-A53 1200  668.4  267.7  250.0  112.3  20.0  18.9  609.3   207.8   76.9
 RPi 3B+ v8-A53 1400  774.2  311.7  282.6  130.2  23.3  21.8  708.2   241.2   89.1

 Ratio          1.17   1.16   1.16   1.13   1.16  1.17  1.15   1.16    1.16   1.16

 ###### compiler optimiser produces 1 pass, this test does not affect MWIPS much 
Dhrystone Benchmark - measures integer performance rating in VAX MIPS AKA DMIPS with MIPS/MHz often quoted but performance is highly dependent on the compiler used, some of which have been known to be designed to produce the highest rating. In this case, the 40% 64 bit improvement might not be really true.

Code: Select all

 Dhrystone 2 Benchmark 

 32 Bit

 System            MHz   VAX MIPS  MIPS/MHz

 RPi 3   v8-A53   1200     2469     2.06
 RPi 3B+ v8-A53   1400     2881     2.06

 Ratio            1.17     1.17

 ===========================================

 64 Bit

 System            MHz   VAX MIPS  MIPS/MHz

 RPi 3   v8-A53   1200     3475     2.90
 RPi 3B+ v8-A53   1400     4025     2.88

 Ratio            1.17     1.16
 3B+ 64/32 bit             1.40
Linpack Benchmark - This is the original implementation with a matrix of order 100 in double precision (DP), with performance measured in MFLOPS. In this case, a single precision (SP) version has been produced, also one using NEON intrinsic functions, with the same precision. With the 32 bit compiler, SP and DP speeds were similar, with NEON providing significant gains. At 64 bits, as expected with SIMD, SP was much faster than DP and similar to the NEON version. The ratio MFLOPS/MHz improved considerably.

Code: Select all

 Linpack Benchmark 

 32 Bit
                           ------ MFLOPS ----    --- MFLOPS/MHz -- 
 System            MHz     DP     SP  NEON SP    DP    SP  NEON SP

 RPi 3   v8-A53   1200    180    194    486    0.15   0.16   0.41
 RPi 3B+ v8-A53   1400    210    226    562    0.15   0.16   0.40

 Ratio            1.17   1.17   1.16   1.16

 ==================================================================

 64 Bit
                           ------ MFLOPS ----    --- MFLOPS/MHz -- 
 System            MHz     DP     SP  NEON SP    DP    SP  NEON SP

 RPi 3   v8-A53   1200    343    482    521   0.29   0.40   0.43
 RPi 3B+ v8-A53   1400    397    563    605   0.28   0.40   0.43

 Ratio            1.17   1.16   1.17   1.16
 3B+ 64/32 bit           1.89   2.49   1.08
Livermore Loops Benchmark - comprises 24 kernels from numeric applications, using double precision arithmetic. Various summary performance calculations are produced, the official average performance being geometric mean. Below are the summary speeds and the range of 3B+/3B and 64 bit/32 bit comparisons for the 24 test loops. {Note speeds of single loops can vary). The best overall Geomean of 285 MFLOPS, for one core, is nearly 24 times faster than the Cray 1 supercomputer, that cost $7 million (in 1978). See the following for other comparisons with Raspberry Pi 1.

https://www.webarchive.org.uk/wayback/a ... m#anchor7a

Code: Select all

 Livermore Loops 

 32 Bit SUmmary
                          -------------- DP MFLOPS -------------- Per MHz
 System            MHz    Maximum Average Geomean Harmean Minimum Geomean

 RPi 3   v8-A53   1200     398.4   210.6   185.9   160.2    56.5    0.15
 RPi 3B+ v8-A53   1400     462.5   243.8   215.2   185.7    65.6    0.15

 Ratio            1.17      1.16    1.16    1.16    1.16    1.16

 =======================================================================

 64 Bit Summary

 RPi 3   v8-A53   1200     627.3   275.7   246.8   219.2    90.6    0.21
 RPi 3B+ v8-A53   1400     737.3   320.2   285.0   250.8    94.4    0.20

 Ratio            1.17      1.18    1.16    1.15    1.14    1.04 
 3B+ 64/32 bit              1.59    1.31    1.32    1.35    1.44

 =======================================================================

 32 Bit DP MFLOPS 24 Loops 3B+/3B Ratios 0.95 to 1.19, average 1.15 

 64 Bit DP MFLOPS 24 Loops 3B+/3B Ratios 1.03 to 1.31, average 1.16

 3B+ 64 bit/32 bit                       1.00 to 2.83, average 1.40  

User avatar
DavidS
Posts: 3096
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Raspberry Pi Benchmarks

Tue Apr 10, 2018 6:45 pm

One thing that has bothered me about this thread since it began:
If we are attempting to benchmark the abilities of the HW, why are we running in an Operating System?
It would make better since to run any benchmarks baremetal, displaying results to the framebuffer only between tests.

I could understand that there was not enough documentation on some things to run without an OS and have decent results before, the situation has changed (long ago now). Anyone can write a simple bitmap font rendering engine for displaying the results, there is no need to have USB, as no need for user input at all.

Now there is enough known about the system that there is no longer a practical reason to have the huge variable of the OS in the way, why not test at bare metal (Including multi core benchmarks on the RPi 2B, 3B, & 3B+)?
26-Bit R15 to 32-bit. 16-bit addressing to 24-bit. ARM and 65xx two CPU's that continue on, and are better than ever. Assembly Language forever :) .

User avatar
Paeryn
Posts: 2146
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Raspberry Pi Benchmarks

Tue Apr 10, 2018 11:04 pm

DavidS wrote:
Tue Apr 10, 2018 6:45 pm
One thing that has bothered me about this thread since it began:
If we are attempting to benchmark the abilities of the HW, why are we running in an Operating System?
It would make better since to run any benchmarks baremetal, displaying results to the framebuffer only between tests.

I could understand that there was not enough documentation on some things to run without an OS and have decent results before, the situation has changed (long ago now). Anyone can write a simple bitmap font rendering engine for displaying the results, there is no need to have USB, as no need for user input at all.

Now there is enough known about the system that there is no longer a practical reason to have the huge variable of the OS in the way, why not test at bare metal (Including multi core benchmarks on the RPi 2B, 3B, & 3B+)?
Probably because it's more useful to know what the performance is for programs running in a normal working environment, which for the majority of people means there will be an OS running in the background.

Going by your idea of going baremetal to avoid the convenience of an OS eating up valuable cycles, would that include disabling the display entirely (and any other parts of the VC4) for the duration just to make sure no time is lost due to the VC4 needing to access the memory?
She who travels light — forgot something.

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Tue Apr 10, 2018 11:12 pm

One thing that has bothered me about this thread since it began:
If we are attempting to benchmark the abilities of the HW, why are we running in an Operating System?
It would make better since to run any benchmarks baremetal, displaying results to the framebuffer only between tests.
You have picked the wrong set benchmarks to come up with such a suggestion. These are straight relatively long running CPU tests with no OS influence whilst running, where the bare metal instructions can be analysed, if required. The OS influence comes via the compiler included that produces the instructions, with those for 64 bit operation being different to the 32 bit varieties.

Moving on to my other benchmarks covering memory, graphics and input/output (all important HW), they need controlling management and drivers, included in an OS. You mentioned benchmarking using multiple cores. How can you do this without an Operating System?

User avatar
DavidS
Posts: 3096
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Raspberry Pi Benchmarks

Wed Apr 11, 2018 2:25 am

RoyLongbottom wrote:
Tue Apr 10, 2018 11:12 pm
One thing that has bothered me about this thread since it began:
If we are attempting to benchmark the abilities of the HW, why are we running in an Operating System?
It would make better since to run any benchmarks baremetal, displaying results to the framebuffer only between tests.
You have picked the wrong set benchmarks to come up with such a suggestion. These are straight relatively long running CPU tests with no OS influence whilst running, where the bare metal instructions can be analysed, if required. The OS influence comes via the compiler included that produces the instructions, with those for 64 bit operation being different to the 32 bit varieties.
So you run them with all interrupts disabled and single tasking? Good trick.
Moving on to my other benchmarks covering memory, graphics and input/output (all important HW), they need controlling management and drivers, included in an OS. You mentioned benchmarking using multiple cores. How can you do this without an Operating System?
The only ones that require an OS would be the OpenGL (and there may be someone out there that knows the VideoCore IV well enough by now to even get them up on baremetal).

As to multiple core without an OS, is that really a question on the RPi 2B/3B? You simply branch execution based on which core you are running on (as all four cores startup running the same exact location), of course make sure you enable the PMMU and the L1 and L2 caches on each core (kind of a given, though must be specific for some). Multiple core on the RPi is to simple, no questions to be asked.

Now any tests that require file I/O (such as benchmarking File access) are OS specific, though are an exception to the rule. Unfortunately the implementation of File Systems, as well as the algorithms to handle file I/O, buffering, etc can have a huge impact on performance so a benchmark of file or disk I/O is of little meaning (other than relatively speaking on the same filesystem and the same version of the same OS across differing HW).

If you are benchmarking the performance under an OS then running in an OS makes since, if you are benchmarking a computer system then NO OS is the way to go (why do you think there are so many benchmarks for the x86 PC that run on bare HW without an OS?).
26-Bit R15 to 32-bit. 16-bit addressing to 24-bit. ARM and 65xx two CPU's that continue on, and are better than ever. Assembly Language forever :) .

RoyLongbottom
Posts: 222
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Wed Apr 11, 2018 10:50 am

If you are benchmarking the performance under an OS then running in an OS makes since, if you are benchmarking a computer system then NO OS is the way to go (why do you think there are so many benchmarks for the x86 PC that run on bare HW without an OS?).
My near 50 years of experience in benchmarking, tells me that performance of single CPU only tests written in assembly code normally run at full speed under an OS. I have never been exposed to a bare CPU HW benchmark for the x86, mine initially running via DOS (Disk Operating System), where assembly code produces the same performance (on a given CPU), even via Windows 10 or the latest Linux system. Machine code from a compiler or interpreter is treaded no differently.

Anything other than running single CPU only bare metal benchmarks, with minimum RAM access, will require OS type functions built in.

I cannot imagine trying to evaluate the extremely wide performance attributes of a computer without an OS. I would spend all day writing down all the results, then typing them into a computer.

ejolson
Posts: 1900
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Wed Apr 11, 2018 3:42 pm

RoyLongbottom wrote:
Wed Apr 11, 2018 10:50 am
If you are benchmarking the performance under an OS then running in an OS makes since, if you are benchmarking a computer system then NO OS is the way to go (why do you think there are so many benchmarks for the x86 PC that run on bare HW without an OS?).
My near 50 years of experience in benchmarking, tells me that performance of single CPU only tests written in assembly code normally run at full speed under an OS.
One trick to get closer to hardware speed in the presence of a full-blown multitasking time-sharing operating system is to run the benchmark multiple times and then take the minimum timing from among the trials. This mitigates the effects of Linux temporarily suspending your job while it goes out to lunch. Obviously system activity unrelated to the benchmark should also be reduced as much as possible. In particular, don't watch videos, do any web browsing or back up the system while a benchmark is running.

Even though Linux has a noticeable cache footprint, it make sense to benchmark the system as a whole. Linux is the environment of interest for applications in high-performance computing. Moreover, there is so much parallelism in a modern computer--pipelines, cache, DMA devices, multiple cores and hardware threads--that a multitasking OS almost always increases efficiency and throughput.

Having said this, it is worth observing that users are typically interested in the wall clock or how long it actually takes the program to produce results, while programmers might be better off using more detailed metrics to tune the code. To this end were created the various performance counters on modern CPUs such as instructions per clock cycle and cache misses. Of course, wall time is always the ultimate measure of computing performance.

User avatar
DavidS
Posts: 3096
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Raspberry Pi Benchmarks

Wed Apr 11, 2018 6:41 pm

ejolson wrote:
Wed Apr 11, 2018 3:42 pm
RoyLongbottom wrote:
Wed Apr 11, 2018 10:50 am
If you are benchmarking the performance under an OS then running in an OS makes since, if you are benchmarking a computer system then NO OS is the way to go (why do you think there are so many benchmarks for the x86 PC that run on bare HW without an OS?).
My near 50 years of experience in benchmarking, tells me that performance of single CPU only tests written in assembly code normally run at full speed under an OS.
One trick to get closer to hardware speed in the presence of a full-blown multitasking time-sharing operating system is to run the benchmark multiple times and then take the minimum timing from among the trials. This mitigates the effects of Linux temporarily suspending your job while it goes out to lunch. Obviously system activity unrelated to the benchmark should also be reduced as much as possible. In particular, don't watch videos, do any web browsing or back up the system while a benchmark is running.

Even though Linux has a noticeable cache footprint, it make sense to benchmark the system as a whole. Linux is the environment of interest for applications in high-performance computing. Moreover, there is so much parallelism in a modern computer--pipelines, cache, DMA devices, multiple cores and hardware threads--that a multitasking OS almost always increases efficiency and throughput.

Having said this, it is worth observing that users are typically interested in the wall clock or how long it actually takes the program to produce results, while programmers might be better off using more detailed metrics to tune the code. To this end were created the various performance counters on modern CPUs such as instructions per clock cycle and cache misses. Of course, wall time is always the ultimate measure of computing performance.
I would agree with that completely. If you are benchmarking a machine to look at application performance, do so under the OS of concern. I was under the impression that the HW was the concern for these benchmarks, and specifically the HW in the terms of raw processing performance, with all caches enabled, running in a sensible processor mode, as well as memory access performance, and multiprocessing performance of the hardware.

If the goal is to show performance under Linux then by all means run it in Linux and CALL IT WHAT IT IS, a benchmark of performance on the target HW UNDER LINUX kernel ver x.xxx.xxx with nnnn software environment considerations of concern (deamons that use a notable amount of CPU time, anything that effects RAM access times [like the dynamic memory management responding to page access violations to allocate more mem on the stack]), basically include note of everything that will make a notable difference in performance from the environment in which it is running.

Then you get a good idea of application potential performance from the benchmark in a given OS. If you do not think it makes such a difference, do some testing of the same benchmarks across different Operating Systems on the same exact HW, and compare (BSD, different Linux configs, RISC OS, Plan 9, Xinu, etc), you will see more difference than you may expect (been there done that). Make sure though that you use the same exact compiler, with the same exact optimization options for each target, and minimize the effect of the standard library (only use it to log results).
26-Bit R15 to 32-bit. 16-bit addressing to 24-bit. ARM and 65xx two CPU's that continue on, and are better than ever. Assembly Language forever :) .

jahboater
Posts: 2936
Joined: Wed Feb 04, 2015 6:38 pm

Re: Raspberry Pi Benchmarks

Thu Apr 12, 2018 6:27 am

ejolson wrote:
Wed Apr 11, 2018 3:42 pm
One trick to get closer to hardware speed in the presence of a full-blown multitasking time-sharing operating system is to run the benchmark multiple times and then take the minimum timing from among the trials. This mitigates the effects of Linux temporarily suspending your job while it goes out to lunch. Obviously system activity unrelated to the benchmark should also be reduced as much as possible.
I would hope that on a quad core Pi, the results might be more consistent - there will always be a spare core to run housekeeping tasks or interrupts. On a single core Pi the benchmark must be constantly interrupted.
Perhaps increasing the priority with "sudo nice --20" might help too.

For the timing, using "clock_gettime( CLOCK_MONOTONIC_RAW" avoids interference from NTP .

User avatar
bensimmo
Posts: 3225
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Raspberry Pi Benchmarks

Thu Apr 12, 2018 6:57 am

DavidS wrote:
Wed Apr 11, 2018 6:41 pm
... Make sure though that you use the same exact compiler, with the same exact optimization options for each target, and minimize the effect of the standard library (only use it to log results).
Why would you use the same exact optimisation options?
You would want to use the best optimisation option for the target platform/processor and not restrict it to something that may not be 'best' for a quad core 64bit Quad A53/VC4 with 1G Ram, but is best for 32bit Single A11/VC4 with 0.5G or 0.25G Ram.

Why restrict it to older platform optimisation?

ejolson
Posts: 1900
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Thu Apr 12, 2018 8:58 am

bensimmo wrote:
Thu Apr 12, 2018 6:57 am
DavidS wrote:
Wed Apr 11, 2018 6:41 pm
... Make sure though that you use the same exact compiler, with the same exact optimization options for each target, and minimize the effect of the standard library (only use it to log results).
Why would you use the same exact optimisation options?
You would want to use the best optimisation option for the target platform/processor and not restrict it to something that may not be 'best' for a quad core 64bit Quad A53/VC4 with 1G Ram, but is best for 32bit Single A11/VC4 with 0.5G or 0.25G Ram.
Many benchmarks, for example all of the programs in any of the Spec CPU collections, are meant to reflect best efforts at solving a particular problem using a given hardware. As a result, the skill of the person performing the test can play a significant factor in final outcome. Benchmarks based on how fast a well-defined problem can be solved outlast machine architectures, programming languages and specific computer codes. Any comparison based on running the same binary or compiling the same code is, at best, useful for comparing a very narrow range of similar hardware over a short period of time.

Today's super computers are ranked by how fast they can solve systems of linear equations using Gaussian elimination. The same benchmark problem was used for the same purposes 40 years ago and is commonly called Linpack. The operating systems, programming languages, computing architecture, parallel, vector and GPU coprocessors have all changed. However, since Linpack is based on solving a particular problem rather than running a particular code, it has remained relevant. As an aside, best effort Linpack results for the Raspberry Pi are in the 6 to 8 GFlop range, which is 10 to 20 times faster than the results reported above.

User avatar
bensimmo
Posts: 3225
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Raspberry Pi Benchmarks

Thu Apr 12, 2018 10:14 am

I understand benchmarks and I'm not particularly a fan of single benchmarks like calculate Pi, other than just to see between similar processors. You may as well just build a dedicated Pi number cruncher processor and be done with it.
I like benchmarks like, [email protected] actual data crunching or current trend with Coin Mining, or raytracing or time to render a webpage. Stuff you actually use. I know it limits historical appeal to some people.
Aside from that ;-)


What I don't understand is why DavidS would want to see code compiled for v6 architecture optimisations being run on v7 or v8 architecture. That doesn't show you what the platform can do, other than only showing how well it can handle legacy compiled code.
Why would you not compile for the increased FPU's, instruction enhancements, even the addition of SIMD for that matter and actually see what it can do.
Just seems pointless

Return to “General programming discussion”

Who is online

Users browsing this forum: No registered users and 5 guests