ejolson
Posts: 2861
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Mon Apr 01, 2019 5:30 pm

RoyLongbottom wrote:
Mon Apr 01, 2019 11:15 am
The following reference that you quoted indicates that a four Raspberry Pi 3s obtain an HPL score of 3.463 GFLOPS but performance of a single 4 core node was not mentioned, nor which versions of the benchmark packages were used. That speed is what I saw with my ATLAS version on one Pi 3 and, for the other version, I see over 5 GFLOPS.

https://www.raspberrypi.org/magpi/bench ... i-cluster/
That article appears to indicate inexperience of the author as well as a lack of editorial oversight. This is somewhat unfortunate as the MagPi serves as a definitive educational resource and the parallel Linpack as the defacto test a cluster is functioning properly.

Maybe the article can motivate an easy challenge: use a cluster of four Raspberry Pi 3Bs to obtain around 20 GFLOPs on Linpack instead of 3.5 GFLOPs. Increasing N, tuning the configuration and making sure HPL was linked with OpenBLAS should be sufficient. I suspect most code clubs and schools have the resources to try, just not me.

RoyLongbottom
Posts: 263
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Wed Apr 24, 2019 6:44 pm

New Multithreaded Stress Tests

I have produced some new 32 bit and 64 bit stress tests that can use up to 64 threads, each using a dedicated segment of the data. There are three varieties Integer, Single Precision Floating Point and Double Precision Floating Point. Run time parameters include duration, data size and number of threads. Full details with RPi 3B and 3B+ results are included in the tar.gz and zip files, at ResearchGate, that contain the benchmarks and source codes.

tar.gz
https://www.researchgate.net/profile/Ro ... UpdatesLog

zip
https://www.researchgate.net/profile/Ro ... UpdatesLog

Default use, with no run time parameters, is a benchmarking mode. Following are examples of integer and single precision floating point performance via data in caches and RAM.

Code: Select all

  MP-Integer-Test 32 Bit v1.0 Tue Apr  9 12:33:19 2019

      Benchmark 1, 2, 4, 8, 16 and 32 Threads

                   MB/second
                KB    KB    MB            Same All
   Secs Thrds   16   160    16  Sumcheck   Tests

   8.8    1   3571  3293  2045  00000000    Yes
   6.3    2   6946  6511  2152  FFFFFFFF    Yes
   5.6    4  13892 12651  1895  5A5A5A5A    Yes
   5.6    8  13247 13764  1880  AAAAAAAA    Yes
   5.5   16  13853 14034  1879  CCCCCCCC    Yes
   5.5   32  13633 13829  1908  0F0F0F0F    Yes

            End of test Tue Apr  9 12:33:56 2019


 MP-Threaded-MFLOPS 32 Bit v1.0 Tue Apr  9 12:34:25 2019

             Benchmark 1, 2, 4 and 8 Threads

                        MFLOPS          Numeric Results
             Ops/   KB    KB    MB      KB     KB     MB
  Secs  Thrd Word 12.8   128  12.8    12.8    128   12.8

   3.2    T1   2   852   847   399   40392  76406  99700
   5.5    T2   2  1641  1666   415   40392  76406  99700
   7.5    T4   2  3098  3172   415   40392  76406  99700
   9.5    T8   2  3002  3072   413   40392  76406  99700
  14.0    T1   8  1905  1932  1460   54756  85091  99820
  16.9    T2   8  3798  3837  1635   54756  85091  99820
  19.2    T4   8  7307  7460  1633   54756  85091  99820
  21.5    T8   8  7258  7613  1645   54756  85091  99820
  36.9    T1  32  2025  2049  1923   35296  66020  99519
  44.8    T2  32  4031  4072  3638   35296  66020  99519
  49.1    T4  32  7986  8025  6085   35296  66020  99519
  53.3    T8  32  7929  8081  6212   35296  66020  99519

            End of test Tue Apr  9 12:35:18 2019
The stress tests were run at the same time as my program that measures CPU MHz, core voltage anf temperature. Following is an indication of logged stress test results on my older Pi 3B:

Code: Select all

  After  99 seconds to first data comparison error

  MP-Integer-Test 32 Bit v1.0 Sat Apr  6 15:45:41 2019

               15 Minute Stress Test

              Data                          Same All
 Seconds      Size Threads   MB/sec Sumcheck Threads

   107.4     640 KB      4      7157 5A5A5A5A  Yes
  115.8     640 KB      4      7176 5A5A5A5A  Yes
  124.2     640 KB      4      7149 5A5A5A5A  Yes
  132.3     640 KB      4      7504 5A5A5A5A  Yes
  140.8     640 KB      4      7074 5A5A5A5A  Yes
  149.2     640 KB      4      7159 5A5A5A5A  Yes
  157.4     640 KB      4      7364 AAAAAAAA  Yes
  165.3     640 KB      4      7569 AAAAAAAA  Yes
  173.6     640 KB      4      7293 AAAAAAAA  Yes
  182.0     640 KB      4      7149 AAAAAAAA  Yes
  190.2     640 KB      4      7322 AAAAAAAA  No 1
  198.3     640 KB      4      7460 AAAAAAAA  Yes


 After 315 seconds to first error

  MP-Threaded-MFLOPS 32 Bit v1.0 Thu Apr 18 10:56:22 2019

                   15 Minute Stress Test

            Data             Ops/         Numeric
 Seconds    Size  Threads    Word  MFLOPS Results     Passes

  325.7   640 KB        8      32    6812   59617      13125
  335.9   640 KB        8      32    6772   59617      13125
  346.0   640 KB        8      32    6815       0      13125
  356.1   640 KB        8      32    6790   59617      13125
The only data comparison failures that were detected were on the older Pi 3B, without the recommended “over_voltage=2” boot setting.

The most stressful tests appeared to be on running the integer program, with 4 threads requiring 640 KB data that overfilled the L2 cache. Then running instead with 8 threads proved to be worse. In this case, performance was much faster, with data for 4 threads in cache, interrupted by swapping for the other 4 threads.

Return to “General programming discussion”