ejolson
Posts: 1641
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Thu Apr 12, 2018 3:06 pm

bensimmo wrote:
Thu Apr 12, 2018 10:14 am
That doesn't show you what the platform can do, other than only showing how well it can handle legacy compiled code.
I see there are two different ideas here both with the goal of comparing and increasing performance. In one case you consider a particular hardware configuration and tune the software until it solves a given problem as fast as possible. This is the best effort benchmarking scenario previously discussed. In the other case you consider a particular item of (compiled legacy) software and tune the hardware until it runs that code as fast as possible. This is sometimes called over clocking instead of benchmarking. Thus, in benchmarking you tune the software keeping the hardware the same, while in over clocking you tune the hardware keeping the software the same.

It goes without saying that tuning the hardware to run a particular software as fast as possible is perhaps more common (and useful) than traditional benchmarking. The reason for this is the vast amount of precompiled binary code available both commercially-produced for purchase and for free as in beer. For example, since the Raspbian user land has been compiled for ARMv6 compatibility, it is possible to over clock the ARMv8 cores on the Pi3B and Pi3B+ and still correctly execute all of user land. By some accounts the stock clock settings for the 3B and 3B+ have already been tuned in this way, but never mind that.

If the hardware has been tuned to execute a particular set of binaries fast, it may crash or produce incorrect results for differently optimized code that was not taken into account during the over clocking. However, the usage scenario in which the binary executables are fixed is so common that over clocking might well be the more important type of performance tuning and grounds for comparison, at least in the short term.

User avatar
bensimmo
Posts: 2888
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Raspberry Pi Benchmarks

Thu Apr 12, 2018 3:32 pm

I've never heard the term overclocking used in that context.
To me that is taking the hardware, say a Pi1 or a Pi3 or an Intel P5 i4460K or a 6502 and running it beyond it default designated safe parameters.

But if overclocking is running a ARMv6 optimised fft and pi calculation and running it on and ARMv8 then so be it and seeing how fast it performs, then so be it.

jahboater
Posts: 2709
Joined: Wed Feb 04, 2015 6:38 pm

Re: Raspberry Pi Benchmarks

Thu Apr 12, 2018 4:52 pm

For me "benchmarking" and "overclocking" are two entirely different things. Benchmarking is pure "measurement", "overclocking" is performance tuning (likely beyond manufacturers spec). The two are only related because benchmarking is required to assess the effectiveness of an overclock.

ejolson
Posts: 1641
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Fri Apr 13, 2018 12:57 pm

jahboater wrote:
Thu Apr 12, 2018 4:52 pm
For me "benchmarking" and "overclocking" are two entirely different things. Benchmarking is pure "measurement", "overclocking" is performance tuning (likely beyond manufacturers spec). The two are only related because benchmarking is required to assess the effectiveness of an overclock.
Given the accuracy in how parts are speed graded, over clocking almost always breaks the hardware in some way. The only reason it works is because the sequence of instructions that fail due to the over clock don't happen to be used in whatever specific software (usually a game) under consideration.

Benchmarking in general requires tuning the software (or writing new code) for the computer being measured. Many websites that run things like Sandra CPU test and Doom on a bunch of PCs are not answering the question, how fast can a particular computer solve a particular problem, but rather trying to monetize some sort of online advertising.

ejolson
Posts: 1641
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Tue Apr 24, 2018 12:12 am

I'm posting a link to a Linpack benchmarking run showing the Pi 3B+ can achieve 6.718 Gflops, which is hopefully closer to a best-effort result than the 605 Mflops reported earlier in this thread. From what I can tell, Roy's results indicate how well modern computers run historical benchmark codes without specialized tuning. Such results are important, in my opinion, because a significant percentage of numerical codes used in production actually fall into this category.

I'm posting here because current usage of the term Linpack generally refers to best-effort attempts to solve problems scaled to the size of available memory using carefully optimized code. As these forum posts show up early on web searches, I believe it is important to include results for comparison which reflect the performance of the Pi 3B+ following the practices currently used for other computers.

I'm also posting here in the hope that someone interested in benchmarks might verify whether 6.718 Gflops really reflects a best-effort Linpack result for the Pi 3B+. In particular, based on relative clock speeds I had expected a speed of over 7 Gflops and would appreciate it if someone else could verify my results.

RoyLongbottom
Posts: 218
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Thu Apr 26, 2018 5:06 pm

Raspberry Pi 3B+ Memory Benchmarks

Full details of 32 bit and 64 bit memory benchmarks (and single core tests) are available at ResearchGate in Raspberry Pi 3B+ 32 Bit and 64 Bit Benchmarks and Stress Tests.pdf, from the following link (then click on down arrow to select download)

https://www.researchgate.net/publicatio ... ress_Tests

This includes 3B+ comparisons with the older Mode 3B and 64 bit versus 32 bit performance. The latter is repeated below for the newer processor (3B similar). The 3B+/3B performance is essentially proportional to respective CPU MHz speeds, where date from caches is processed, but 3B+ is often shown to be slightly slower with RAM data transfers. The benchmarks are as follows, most doubling up data size used, to cover caches, and RAM, with performance measured in MegaBytes per second. Example full results and comparisons are provided below.

MemSpeed - carries out the calculations shown in the following, the first being of the same format as the Linpack benchmark time dependent function. Maximum MFLOPS are also shown for these, plus MFLOPS/MHz ratios, these being higher that those for Linpack, mainly due to the smoother data flow and slightly using L1 cache based results. Best 64 bit performance gains were using double precision floating point but one result indicates that the older RPi was faster using RAM based data.

Code: Select all

            Memory Reading Speed Test vfpv4 32 Bit Version 1

  Memory   x[m]=x[m]+s*y[m] Int+   x[m]=x[m]+y[m]         x[m]=y[m]
  KBytes    Dble   Sngl  Int32   Dble   Sngl  Int32   Dble   Sngl  Int32
    Used    MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S   MB/S
                                                                          3B+/3B
           Raspberry Pi 3B+ CPU 1400 MHz, SDRAM ?                        Avg Gain

       8    1899   2125   4041   2783   2624   4448   3164   3693   3693  1.17 L1
      16    1901   2128   4058   2791   2628   4462   3177   3703   3707
      32    1852   2049   3817   2686   2508   4161   3186   3715   3711
      64    1796   1959   3574   2542   2367   3855   2945   3347   3347  1.16 L2
     128    1826   1989   3741   2600   2408   4031   3042   3506   3508
     256    1833   1995   3771   2617   2414   4068   2860   3616   3617
     512    1517   1618   2587   2039   1911   2687   2459   2825   2832
    1024     968   1098   1221   1172   1140   1211   1455   1144   1137  0.98 RAM 
    2048     911    980   1060   1038   1026   1062   1013    941    935
    4096     913    993   1064   1047   1038    948    992    902    903
    8192     926   1013   1077   1074   1065   1085    782    784    783

 Max MFLOPS  238    532
    Per MHz 0.17   0.38
    64 bit  0.43   0.52

 #################### Compare 64 bit / 32 bit Pi 3B+ ######################

       8    2.54   1.36   1.08   2.22   1.51   1.09   1.70   1.17   1.17
     256    2.12   1.39   1.05   1.86   1.53   1.06   1.71   1.13   1.13
    8192    0.71   1.19   1.17   1.14   1.03   1.17   1.29   1.38   1.38

#######################################################################
NeonSpeed - executes the same functions as MemSpeed, but with all floating point calculations using single precision floating point (for compatibility with NEON). Some normal calculations are also included for comparison purposes. The NEON calculations are carried out using NEON Intrinsic Functions but the latest compilers convert these into more appropriate vector instructions. This leads to little difference between 32 bit and 64 bit speed, the former being faster in one case. For some reason, 32 bit normal calculations were faster than in MemSpeed, but maximum NEON MFLOPS per MHz were significantly faster.

Code: Select all

   NEON SP Float & Integer Benchmark RPi 3B+ 64 Bit

  Memory  Float v=v+s*v  Int v=v+v+s   Neon v=v+v    3B/3B+
  KBytes   Norm   Neon   Norm   Neon  Float    Int  Avg Gain

      16   2724   5109   3961   4841   5446   5607  1.16 L1
      32   2612   4645   3726   4450   4968   5036 
      64   2523   4247   3540   4150   4521   4519  1.16 L2
     128   2583   4363   3666   4253   4616   4635
     256   2576   4314   3674   4254   4591   4631
     512   1852   2871   2608   2466   2916   2698
    1024   1222   1207   1305   1179   1280   1216  1.08 RAM
    4096   1157   1144   1214   1109   1181   1160
   16384   1175   1245   1244   1134   1191   1180
   65536   1143   1258   1185    909   1144   1260

Max MFLOPS  681   1277
  Per MHz  0.49   0.91
  32 Bit   0.57   0.84

 #################### Compare 64 bit / 32 bit Pi 3B+ ######################

      16   0.86   1.10   0.99   0.99   1.05   1.02
     256   0.88   1.07   0.98   1.01   1.06   1.00
   65536   0.85   0.94   0.88   0.90   0.91   0.93
 
 #######################################################################
BusSpeed - is designed to identify reading data in bursts over buses and possible maximum data transfer speed from RAM (using 1 core - see MP version). The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. Data is read using inner loops containing 64 AND statements, that appear to essentially generate the same code for 32 bit and 64 bit compilations, with only 32 bit data words being used. Surprisingly, the 64 bit version produced slow speeds on reading all data from what should be L1 cache.

Code: Select all

                    BusSpeed 64 Bit  
                                                    
  Memory  Inc32  Inc16   Inc8   Inc4   Inc2   Read  3B+ Gain
  KBytes  Words  Words  Words  Words  Words    All  Read All

      16   3823   4251   4638   4945   5045   3854  1.15 L1
      32   1543   1677   2423   3331   4152   3680
      64    672    694   1306   2169   3300   3577  1.17 L2
     128    635    648   1211   2055   3202   3604
     256    600    615   1163   1971   3152   3612
     512    328    278    695   1272   2256   2978
    1024     94    140    281    543    960   2075  1.12 RAM
    4096     99    128    259    448   1016   1931
   16384    125    129    258    500    898   1863
   65536    125    114    257    500   1015   1898

 #################### Compare 64 bit / 32 bit Pi 3B+ ######################

      16   1.02   1.03   0.98   1.00   0.99   0.76
     256   0.96   0.97   1.00   0.96   0.99   0.90
   65536   0.99   0.88   1.02   1.02   1.01   1.10

 #######################################################################
Fast Fourier Transforms - There are two FFT benchmarks, the second one benefiting from being optimised to make better use of burst data transfers, with the procedures dependent of skipped sequential access. FFT sizes vary between 1K and 1024K, covering caches and RAM. Three copies are run using both single and double precision data, the middle ones used here, as best choice due to varying millisecond running times. Because of the latter, 3B/3B+ comparisons are not as constant as for other benchmarks, this being reflected in the different 64/32 bit comparisons provided below.

With running times of the smaller FFTs being less than a millisecond, that for the first few measurements can be extended with the CPU MHz scaling governor set as on demand. A performance setting is required to produce more acceptable results. An example is shown below.

Code: Select all

                 FFT Benchmarks 
   
    Size  -------- milliseconds --------
       K  Single  Double  Single  Double

                 scaling_governor
             performance      ondemand
  
       1    0.17    0.14    0.40    0.14
       2    0.38    0.32    0.93    0.32
       4    1.07    0.77    1.97    0.75
       8    2.13    1.89    4.64    1.76
      16    4.57    5.83    4.47    5.83 



 #################### Compare 64 bit / 32 bit ######################

                RPi3           RPi3B+ 
        K  Single  Double  Single  Double 
  FFT1
   1 to 8    1.05    0.86    1.06    0.90
  16 to 128  1.17    0.83    1.14    1.06
 256 to 1M   1.26    0.88    1.58    1.13

 FFT3C
   1 to 8    1.24    0.89    1.17    0.88
  16 to 128  1.05    1.04    1.15    1.17
 256 to 1M   1.14    1.01    1.26    1.16

 #######################################################################

jahboater
Posts: 2709
Joined: Wed Feb 04, 2015 6:38 pm

Re: Raspberry Pi Benchmarks

Thu Apr 26, 2018 5:28 pm

RoyLongbottom wrote:
Thu Apr 26, 2018 5:06 pm
NeonSpeed - executes the same functions as MemSpeed, but with all floating point calculations using single precision floating point (for compatibility with NEON).
NEON happily does double precision by the way (even VFP did), or have I missed something?

RoyLongbottom
Posts: 218
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Thu Apr 26, 2018 5:55 pm

jahboater wrote:
Thu Apr 26, 2018 5:28 pm
RoyLongbottom wrote:
Thu Apr 26, 2018 5:06 pm
NeonSpeed - executes the same functions as MemSpeed, but with all floating point calculations using single precision floating point (for compatibility with NEON).
NEON happily does double precision by the way (even VFP did), or have I missed something?
I could not find any Intrinsics when I wrote the program, See the following with no sign of double or f64:

https://gcc.gnu.org/onlinedocs/gcc-4.8. ... nsics.html

jahboater
Posts: 2709
Joined: Wed Feb 04, 2015 6:38 pm

Re: Raspberry Pi Benchmarks

Thu Apr 26, 2018 7:00 pm

I don't know why there are no intrinsics for it. Perhaps because there are only two lanes - who knows.
Your link points to a GCC 4.8 doct which is very old - the current version of GCC is 7.3 with 8.1 due to be released next week.

The arm intrinsics doct. mentions double precision:-
http://infocenter.arm.com/help/topic/co ... cs_ref.pdf

A few seconds look at the "armv8 arm" shows that all the floating point instructions, both vector and scalar, accept double precision operands.

Code: Select all

C7.2.42
FADD (vector)
Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP
registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values
in this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results
in either a flag being set in FPSR or a synchronous exception being generated. For more information, see
Floating-point exceptions and exception traps on page D1-1899.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state
and Exception level, an attempt to execute the instruction might be trapped.
Half-precision
ARMv8.2
31 30 29 28 27 26 25 24 23 22 21 20
0 Q 0 0 1 1 1 0 0 1 0
U
16 15 14 13 12 11 10 9
Rm
0 0 0 1 0 1
5 4
Rn
0
Rd
Half-precision variant
FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Decode for this encoding
if !HaveFP16Ext() then UnallocatedEncoding();
integer
integer
integer
integer
integer
integer
d = UInt(Rd);
n = UInt(Rn);
m = UInt(Rm);
esize = 16;
datasize = if Q == '1' then 128 else 64;
elements = datasize DIV esize;
boolean pair = (U == '1');
Single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20
0 Q 0 0 1 1 1 0 0 sz 1
U
16 15 14 13 12 11 10 9
Rm
1 1 0 1 0 1
5 4
Rn
0
Rd
Single-precision and double-precision variant
FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
Decode for this encoding
integer
integer
integer
if sz:Q
integer
integer
integer
d = UInt(Rd);
n = UInt(Rn);
m = UInt(Rm);
== '10' then ReservedValue();
esize = 32 << UInt(sz);
datasize = if Q == '1' then 128 else 6

ejolson
Posts: 1641
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Fri Apr 27, 2018 6:00 am

RoyLongbottom wrote:
Thu Apr 26, 2018 5:06 pm
The latter is repeated below for the newer processor (3B similar). The 3B+/3B performance is essentially proportional to respective CPU MHz speeds, where date from caches is processed, but 3B+ is often shown to be slightly slower with RAM data transfers.
Recently there has been a change in the default Pi 3B+ RAM settings from 500 MHz down to 450 MHz. I think the schmoo memory timings change as well between the two frequencies, so it's not clear which actually performs better in the end. Did the slower RAM data transfers for the 3B+ occur with the memory clock set to 500 MHz or 450 MHz?

jahboater
Posts: 2709
Joined: Wed Feb 04, 2015 6:38 pm

Re: Raspberry Pi Benchmarks

Fri Apr 27, 2018 7:32 am

ejolson wrote:
Fri Apr 27, 2018 6:00 am
I think the schmoo memory timings change as well between the two frequencies, so it's not clear which actually performs better in the end.
Yes. At 450Mhz schmoo is off. If you set 500Mhz by hand, schmoo gets set to
sdram_schmoo=0x2000020
and you cannot override it ....
Is the schmoo setting increasing the drive level or relaxing the timings?

RoyLongbottom
Posts: 218
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Tue May 22, 2018 10:06 am

jahboater wrote:
Thu Apr 26, 2018 7:00 pm
I don't know why there are no intrinsics for it. Perhaps because there are only two lanes - who knows.
Your link points to a GCC 4.8 doct which is very old - the current version of GCC is 7.3 with 8.1 due to be released next week.

The arm intrinsics doct. mentions double precision:-
http://infocenter.arm.com/help/topic/co ... cs_ref.pdf

A few seconds look at the "armv8 arm" shows that all the floating point instructions, both vector and scalar, accept double precision operands.
I have disassembled my MemSpeed type benchmarks to see why I thought that NEON was inapplicable for double precision calculations. As shown in the code below, there are calculations using intrinsic functions and normal four way unrolled C code. each of the four Vector Multiply Accumulate intrinsic statements should lead to execution of four multiplies and four adds (total of 16 floating point operations). The C code loop has four multiples and four adds, but the compilers might be expected to unroll this further, where appropriate (they didn’t - is there a parameter to force this?). This lead to the fastest speeds being produced by the intrinsics, using assembly code instructions shown below.

Code: Select all

                         Program Code

   NEON Intrinsics                    MemSpeed and NEONSpeed C Code for Compilation
{                                     Single and Double Precision
   x41 = vld1q_f32(ptrx1);
   x42 = vld1q_f32(ptrx2);          for (m=0; m<kd; m=m+inc)
   x43 = vld1q_f32(ptrx3);          {
   x44 = vld1q_f32(ptrx4);            xn[m]   = xn[m]   + sumn * yn[m];
                                      xn[m+1] = xn[m+1] + sumn * yn[m+1];
   y41 = vld1q_f32(ptry1);            xn[m+2] = xn[m+2] + sumn * yn[m+2];
   y42 = vld1q_f32(ptry2);            xn[m+3] = xn[m+3] + sumn * yn[m+3];
   y43 = vld1q_f32(ptry3);          }
   y44 = vld1q_f32(ptry4);

   z41 = vmlaq_f32(x41, y41, c4);
   z42 = vmlaq_f32(x42, y42, c4);
   z43 = vmlaq_f32(x43, y43, c4);
   z44 = vmlaq_f32(x44, y44, c4);

   vst1q_f32(ptrx1, z41);
   vst1q_f32(ptrx2, z42);
   vst1q_f32(ptrx3, z43);
   vst1q_f32(ptrx4, z44);

   ptrx1 = ptrx1 + 16;
   ptry1 = ptry1 + 16;
   ptrx2 = ptrx2 + 16;
   ptry2 = ptry2 + 16;
   ptrx3 = ptrx3 + 16;
   ptry3 = ptry3 + 16;
   ptrx4 = ptrx4 + 16;
   ptry4 = ptry4 + 16;
}
NeonSpeed - At 32 bit working the Vector Multiply Accumulate intrinsic were directly converted to NEON vmla.f32 instructions using quad word registers. The 64 bit compiler converted the intrinsics to A64 instructions “Floating-point fused multiply-add to accumulator” , using 128 bit vector registers.

Next are instructions generated for normal C code, using neon and funsafe compiler directives at 32 bits and standard parameters at 64 bits, acting on single precision calculations. At 32 bits, a single SIMD NEON instruction is used - vfma.32 (Vector Fused Multiply Accumulate) with four calculations. At 64 bits, vfma is generated again. The difference in speed is apparent from using a single SIMD instruction in the loop, compared with four with intrinsics.

Code: Select all

               NEON Speed Intrinsics 

   32 Bit                             64 Bit
   1173 MFLOPS                        1277 MFLOPS
.L75:                               .L13:
   add     r0, r3, #48                ldr     q4, [x3, -16]
   add     ip, r3, #32                add     x3, x3, 64
   add     lr, r3, #16                ldr     q3, [x3, -64]
   add     r10, r2, #48               add     x1, x1, 64
   add     r7, r2, #32                ldr     q2, [x3, -48]
   add     r4, r2, #16                ldr     q1, [x3, -32]
   vld1.32 {d24-d25}, [r3]            cmp     x3, x2
   vld1.32 {d18-d19}, [r0]            ldr     q16, [x1, -64]
   vld1.32 {d20-d21}, [ip]            ldr     q7, [x1, -48]
   vld1.32 {d22-d23}, [lr]            ldr     q6, [x1, -32]
   vld1.32 {d26-d27}, [r2]            ldr     q5, [x1, -16]
   vld1.32 {d6-d7}, [r10]             fmla    v4.4s, v0.4s, v16.4s
   vld1.32 {d30-d31}, [r7]            fmla    v3.4s, v0.4s, v7.4s
   vld1.32 {d28-d29}, [r4]            fmla    v2.4s, v0.4s, v6.4s
   vmla.f32        q9, q3, q8         fmla    v1.4s, v0.4s, v5.4s
   vmla.f32        q10, q15, q8       str     q4, [x3, -80]
   vmla.f32        q11, q14, q8       str     q3, [x3, -64]
   vmla.f32        q12, q13, q8       str     q2, [x3, -48]
   add     r1, r1, #1                 str     q1, [x3, -32]
   add     r2, r2, #64                bne     .L13
   cmp     r1, r5
   vst1.32 {d24-d25}, [r3]
   vst1.32 {d22-d23}, [lr]
   add     r3, r3, #64
   vst1.32 {d20-d21}, [ip]
   vst1.32 {d18-d19}, [r0]
   bne     .L75

 ######################################################################

                    NEON Speed Normal

   -mfpu=neon-vfpv4 
   -funsafe-math-optimizations        -march=armv8-a

   797 MFLOPS                         681 MFLOPS

   32 Bit                             64 bit
.L54:                                 .L37:
        vld1.32 {q9}, [r2]                    ldr     q0, [x0, x26]
        vld1.32 {q8}, [r3]                    add     w1, w1, 1
        add     r1, r1, #1                    ldr     q1, [x0, x28]
        add     r2, r2, #16                   cmp     w24, w1
        cmp     r1, r4                        fmla    v0.4s, v1.4s, v2.4s
        vfma.f32        q8, q9, q7            str     q0, [x0, x26]
        vst1.32 {q8}, [r3]                    add     x0, x0, 16
        add     r3, r3, #16                   bhi     .L37
        bcc     .L54
MemSpeed - Single Precision vs Double Precision - For the 32 bit version, using the NEON compiling parameter shown, NEON instructions were not generated, four scalar Floating-point multiply-accumulate (fmacs or fmacd) were produced instead, producing the slowest speeds.

Adding that funsafe parameter produced the same vfma.f32 NEON instruction as NeonSpeed for four single precision calculations. But four vfma.f64 were generated for double precision. Yes these are NEON instructions but SISD (Single Instruction Single Data), each with data in 64 bit scalar registers.

64 Bit MemSpeed - For the four sets of calculations, the fmla vector instructions were again produced, requiring two for double precision and speed closer to that from single precision calculations.

Code: Select all

        MemSpeed 32 Bit Single and Double Precision

   Parameters -mfpu=neon-vfpv4
   Single Precision                   Double Precision
   MFLOPS 532                         mFLOPS 238

.L45:                               .L31:
   mov     ip, r2                     fldd    d5, [r2, #-24]
   flds    s15, [r3]                  fldd    d6, [r3, #-24]
   flds    s11, [ip]                  fldd    d7, [r3, #-16]
   flds    s12, [r3, #-12]            fldd    d4, [r3, #-8]
   flds    s13, [r3, #-8]             mov     r6, r2
   flds    s14, [r3, #-4]             fmacd   d6, d5, d8
   flds    s8, [r2, #-12]             fldd    d3, [r3]
   flds    s9, [r2, #-8]              add     r2, r2, #32
   flds    s10, [r2, #-4]             fstd    d6, [r3, #-24]
   fmacs   s15, s11, s30              fldd    d6, [r2, #-48]
   fmacs   s12, s8, s30               fmacd   d7, d6, d8
   fmacs   s13, s9, s30               fstd    d7, [r3, #-16]
   fmacs   s14, s10, s30              fldd    d7, [r2, #-40]
   add     r2, r2, #16                fmacd   d4, d7, d8
   fmrs    ip, s15                    fstd    d4, [r3, #-8]
   fsts    s12, [r3, #-12]            fldd    d7, [r6]
   fsts    s13, [r3, #-8]             fmacd   d3, d7, d8
   fsts    s14, [r3, #-4]             fmrrd   r8, r9, d3
   str     ip, [r3], #16   @ float    strd    r8, [r3], #32
   cmp     r3, r6                     cmp     r3, r1
   bne     .L45                       bne     .L31

 ######################################################################

      More MemSpeed 32 Bit Single and Double Precision

   Parameters -mfpu=neon-vfpv4 -funsafe-math-optimizations
   Single Precision                   Double Precision
   MFLOPS 695                         MFLOPS 236 MLOPS

.L44:                               .L28:
   vld1.64 {d16-d17}, [r3:64]         fldd    d17, [r2, #-24]
   vld1.64 {d18-d19}, [r1:64]         fldd    d16, [r3, #-24]
   add     r2, r2, #1                 fldd    d18, [r3, #-16]
   add     r1, r1, #16                vfma.f64        d16, d17, d8
   cmp     r4, r2                     mov     r4, r2
   add     r3, r3, #16                fldd    d17, [r3, #-8]
   vfma.f32        q8, q9, q7         add     r2, r2, #32
   vstr    d16, [r3, #-16]            fcpyd   d19, d16
   vstr    d17, [r3, #-8]             fldd    d16, [r3]
   bhi     .L44                       fstd    d19, [r3, #-24]
                                      fldd    d19, [r2, #-48]
                                      vfma.f64        d18, d19, d8
                                      fstd    d18, [r3, #-16]
                                      fldd    d18, [r2, #-40]
                                      vfma.f64        d17, d18, d8
                                      fstd    d17, [r3, #-8]
                                      fldd    d17, [r4]
                                      vfma.f64        d16, d17, d8
                                      fmrrd   r4, r5, d16
                                      strd    r4, [r3], #32
                                      cmp     r3, r1
                                      bne     .L28

 ######################################################################

           MemSpeed 64 Bit Single and Double Precision

   Parameters - -march=armv8-a
   Single Precision                   Double Precision
   MFLOPS 726                         MFLOPS 602

.L56:                               .L34:
   ldr     q0, [x27, x0]              ldr     q5, [x2, 16]
   add     w1, w1, 1                  add     w1, w1, 1
   ldr     q1, [x23, x0]              ldr     q1, [x0, 16]
   cmp     w21, w1                    cmp     w28, w1
   fmla    v0.4s, v1.4s, v2.4s        ldr     q3, [x2], 32
   str     q0, [x27, x0]              add     x0, x0, 32
   add     x0, x0, 16                 ldr     q0, [x0, -32]
   bhi     .L56                       fmla    v1.2d, v5.2d, v2.2d
                                      fmla    v0.2d, v3.2d, v2.2d
                                      str     q1, [x0, -16]
                                      str     q0, [x0, -32]
                                      bhi     .L34
Later Compiler - I have been running my LAN/WiFi benchmark on the 3B+ and will be posting results next. Wifi did not work on Raspbian. I eventually download Raspbian Stretch to see if WiFi worked, and it did. This includes gcc 6. Recompiling MemSpeed with this made no difference to the floating point instructions generated by the earlier compiler.

RoyLongbottom
Posts: 218
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Thu Jun 14, 2018 12:01 pm

More Disassembly For MP-MFLOPS Benchmarks - up to 11.6 GFLOPS

The arithmetic operations executed are of the form x = (x + a) * b - (x + c) * d + (x + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words or half for double precision). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run.
This benchmark was intended to demonstrate near maximum throughput using single precision floating point calculations. It nearly did on an Intel Core i7 CPU, compiled with gcc under Linux, obtaining 23 out of 32 MFLOPS/MHz with SSE instructions (4 cores, quad word registers, linked multiply and add). The latter arrangement (I believe) also applies to the ARM Cortex-A53 where, with the same efficiency, a Raspberry Pi 3B, at 1200 MHz, would be expected to achieve 27600 MFLOPS and a 3B+ 29900 MFLOPS, at 1300 MHz. For ARM, and probably Intel, as shown below, 20 instructions could be executed at the full speed, with 12 at half speed, nearly corresponding with the 72% (23*100/32) efficiency obtained with Intel.

Single Precision and Double Precision Raspberry Pi 3B+ MFLOPS results are show below for existing compiled 32 bit and 64 bit benchmarks and one that uses Single Precision NEON Intrinsic functions, then those from a new compilation using gcc 7. None achieve the levels of performance suggested above.

Performance using one and four threads is shown, along with the gain via the latter. Note that particularly four thread performance can vary significantly, sometimes influenced by higher temperatures [On a hot day, using 4 threads, CPU MHz reduced to 600 MHz when temperature was indicated as exceeding 55°C.]

Code: Select all

     Raspberry Pi 3B+ MFLOPS at 32 Operations Per Data Word   

                                             NEON              
                  32 bit        64 bit      64 bit  64 bit gcc7
                  SP     DP     SP     DP     SP     SP     DP 
							 	
     1 Thread    797    798   1793   1405   2999   2800   1403 
     4 Threads  3134   3119   6981   4398  11563  10608   4492 
     4T/1T      3.93   3.91   3.89   3.13   3.86   3.79   3.20 
Source and Assembly Codes for these benchmarks runs are shown below. The 32 bit compilations uses 12 scalar add and multiply instructions and 10 using fused multiply accumulate NEON type, but limited to scalar operation (SISD - Single Instructions Single Data). All the others use NEON or 64 bit vector SIMD instructions (Multiple Data), carrying out four calculations simultaneously at single precision, with 128 operations in the execution loops, or half these at double precision. Each has its own variation of fused multiply and add or subtract instructions.

In the original single precision benchmarks, the NEON version produced significantly faster performance, where the compiler converted the 32 intrinsic calculating functions into 22 instructions, with those fused operations, and a total in-loop count of 27. Performance of the first 64 bit version was degraded through making use of only 12 vector registers, for a programming function involving 23 variables, necessitating frequent load instructions. The gcc 7 compiler made use of 25 vector registers with out of loop loads to achieve similar performance as the hand code NEON benchmark. Both the 64 bit double precision benchmarks included the higher efficient code, with external data loading, but best speed was, as expected, half that for single precision SIMD calculations.

Code: Select all

 ######################################################################

   MP-MFLOPS on Raspberry Pi 3B+ 
   Function triadplus2

   for(i=0; i<n; i++)
   x[i] = (x[i]+a)*b-(x[i]+c)*d+(x[i]+e)*f-(x[i]+g)*h+(x[i]+j)*k
   -(x[i]+l)*m+(x[i]+o)*p-(x[i]+q)*r+(x[i]+s)*t-(x[i]+u)*v+(x[i]+w)*y;

 ######################################################################

    gcc 4.9 32 bit
   SP MFLOPS                          DP MFLOPS
   797 1t 3134 4T                     798 1T 3119 4T

.L21:                              .L21:
   flds      s23, [r3]                fldd      d17, [r1]
   fadds     s15, s8, s23             faddd     d16, d17, d2
   fadds     s24, s10, s23            faddd     d18, d17, d0
   fadds     s31, s6, s23             faddd     d25, d17, d4
   fadds     s30, s4, s23             faddd     d24, d17, d6
   fnmuls    s15, s15, s7             fnmuld    d16, d3, d16
   fadds     s29, s3, s23             faddd     d23, d17, d15
   fadds     s28, s1, s23             faddd     d22, d17, d13
   fadds     s27, s0, s23             faddd     d21, d17, d11
   vfma.f32  s15, s9, s24             faddd     d20, d17, d9
   fadds     s26, s17, s23            faddd     d19, d17, d31
   fadds     s25, s18, s23            vfma.f64  d16, d18, d1
   fadds     s24, s20, s23            faddd     d18, d17, d29
   fadds     s23, s21, s23            faddd     d17, d17, d27
   vfma.f32  s15, s5, s31             vfma.f64  d16, d25, d5
   vfma.f32  s15, s14, s30            vfms.f64  d16, d24, d7
   vfma.f32  s15, s2, s29             vfma.f64  d16, d23, d14
   vfma.f32  s15, s13, s28            vfms.f64  d16, d22, d12
   vfma.f32  s15, s16, s27            vfma.f64  d16, d21, d10
   vfma.f32  s15, s12, s26            vfms.f64  d16, d20, d8
   vfma.f32  s15, s19, s25            vfma.f64  d16, d19, d30
   vfma.f32  s15, s11, s24            vfms.f64  d16, d18, d26
   vfma.f32  s15, s22, s23            vfma.f64  d16, d17, d28
   fstmias   r3!, {s15}               fstmiad   r1!, {d16}
   cmp       r3, r2                   cmp       r1, r0
   bne       .L9                      bne      .L21
 
 ######################################################################

   gcc 6 64 bit
   SP MFLOPS                          DP MFLOPS
   1793 1T to 6981 4T                 1405 1T to 4398 4T

.L65:                               .L84:
   ldr     q16, [x2, x5]              ldr     q16, [x2, x0]
   add     w6, w6, 1                  add     w3, w3, 1
   ldr     q15, [sp, 64]              cmp     w3, w6
   cmp     w3, w6                     fadd    v15.2d, v16.2d, v14.2d
   ldr     q17, [sp, 80]              fadd    v17.2d, v16.2d, v12.2d
   ldr     q0, [sp, 112]              fmul    v15.2d, v15.2d, v13.2d
   fadd    v15.4s, v16.4s, v15.4s     fmls    v15.2d, v17.2d, v11.2d
   fmul    v15.4s, v15.4s, v17.4s     fadd    v17.2d, v16.2d, v10.2d
   ldr     q17, [sp, 96]              fmla    v15.2d, v17.2d, v9.2d
   fadd    v17.4s, v16.4s, v17.4s     fadd    v17.2d, v16.2d, v8.2d
   fmls    v15.4s, v17.4s, v0.4s      fmls    v15.2d, v17.2d, v31.2d
   ldr     q0, [sp, 128]              fadd    v17.2d, v16.2d, v30.2d
   fadd    v17.4s, v16.4s, v0.4s      fmla    v15.2d, v17.2d, v29.2d
   ldr     q0, [sp, 144]              fadd    v17.2d, v16.2d, v28.2d
   fmla    v15.4s, v17.4s, v0.4s      fmls    v15.2d, v17.2d, v0.2d
   ldr     q0, [sp, 160]              fadd    v17.2d, v16.2d, v27.2d
   fadd    v17.4s, v16.4s, v0.4s      fmla    v15.2d, v17.2d, v26.2d
   ldr     q0, [sp, 176]              fadd    v17.2d, v16.2d, v25.2d
   fmls    v15.4s, v17.4s, v0.4s      fmls    v15.2d, v17.2d, v24.2d
   ldr     q0, [sp, 192]              fadd    v17.2d, v16.2d, v23.2d
   fadd    v17.4s, v16.4s, v0.4s      fmla    v15.2d, v17.2d, v22.2d
   ldr     q0, [sp, 208]              fadd    v17.2d, v16.2d, v21.2d
   fmla    v15.4s, v17.4s, v0.4s      fadd    v16.2d, v16.2d, v19.2d
   ldr     q0, [sp, 224]              fmls    v15.2d, v17.2d, v20.2d
   fadd    v17.4s, v16.4s, v0.4s      fmla    v15.2d, v16.2d, v18.2d
   ldr     q0, [sp, 240]              str     q15, [x2, x0]
   fmls    v15.4s, v17.4s, v0.4s      add     x0, x0, 16
   ldr     q0, [sp, 256]              bcc     .L84
   fadd    v17.4s, v16.4s, v0.4s
   ldr     q0, [sp, 272]
   fmla    v15.4s, v17.4s, v0.4s
   ldr     q0, [sp, 288]
   fadd    v17.4s, v16.4s, v0.4s
   fmls    v15.4s, v17.4s, v14.4s
   fadd    v17.4s, v16.4s, v13.4s
   fmla    v15.4s, v17.4s, v12.4s
   fadd    v17.4s, v16.4s, v11.4s
   fadd    v16.4s, v16.4s, v9.4s
   fmls    v15.4s, v17.4s, v10.4s
   fmla    v15.4s, v16.4s, v8.4s
   str     q15, [x2, x5]
   add     x5, x5, 16
   bhi     .L65

 ######################################################################

   gcc6 neon
   SP MFLOPS                          C code
   2999 1T to 11563 4T              for(i=0; i<n; i=i+4)

.L41:                               {
   ldr     q1, [x1]                   x41 = vld1q_f32(ptrx1);
   ldr     q0, [sp, 64]               z41 = vaddq_f32(x41, a41);
   fadd    v18.4s, v20.4s, v1.4s      z41 = vmulq_f32(z41, b41);
   fadd    v17.4s, v22.4s, v1.4s      z42 = vaddq_f32(x41, c41);
   fadd    v0.4s, v0.4s, v1.4s        z42 = vmulq_f32(z42, d41);
   fadd    v16.4s, v24.4s, v1.4s      z41 = vsubq_f32(z41, z42);
   fadd    v7.4s, v26.4s, v1.4s       z42 = vaddq_f32(x41, e41);
   fadd    v6.4s, v28.4s, v1.4s       z42 = vmulq_f32(z42, f41);
   fadd    v5.4s, v30.4s, v1.4s       z41 = vaddq_f32(z41, z42);
   fmul    v0.4s, v0.4s, v19.4s       z42 = vaddq_f32(x41, g41);
   fadd    v4.4s, v10.4s, v1.4s       z42 = vmulq_f32(z42, h41);
   fadd    v3.4s, v12.4s, v1.4s       z41 = vsubq_f32(z41, z42);
   fadd    v2.4s, v14.4s, v1.4s       z42 = vaddq_f32(x41, j41);
   fadd    v1.4s, v8.4s, v1.4s        z42 = vmulq_f32(z42, k41);
   fmls    v0.4s, v21.4s, v18.4s      z41 = vaddq_f32(z41, z42);
   fmla    v0.4s, v23.4s, v17.4s      z42 = vaddq_f32(x41, l41);
   fmls    v0.4s, v25.4s, v16.4s      z42 = vmulq_f32(z42, m41);
   fmla    v0.4s, v27.4s, v7.4s       z41 = vsubq_f32(z41, z42);
   fmls    v0.4s, v29.4s, v6.4s       z42 = vaddq_f32(x41, o41);
   fmla    v0.4s, v31.4s, v5.4s       z42 = vmulq_f32(z42, p41);
   fmls    v0.4s, v9.4s, v1.4s        z41 = vaddq_f32(z41, z42);
   fmla    v0.4s, v4.4s, v11.4s       z42 = vaddq_f32(x41, q41);
   fmls    v0.4s, v3.4s, v13.4s       z42 = vmulq_f32(z42, r41);
   fmla    v0.4s, v2.4s, v15.4s       z41 = vsubq_f32(z41, z42);
   str     q0, [x1], 16               z42 = vaddq_f32(x41, s41);
   cmp     x1, x0                     z42 = vmulq_f32(z42, t41);
   bne     .L41                       z41 = vaddq_f32(z41, z42);
                                      z42 = vaddq_f32(x41, u41);
                                      z42 = vmulq_f32(z42, v41);
                                      z41 = vsubq_f32(z41, z42);
                                      z42 = vaddq_f32(x41, w41);
                                      z42 = vmulq_f32(z42, y41);
                                      z41 = vaddq_f32(z41, z42);
                                      vst1q_f32(ptrx1, z41);
                                      ptrx1 = ptrx1 + 4;
                                    }
 ######################################################################

   gcc 7
   SP MFLOPS                          DP MFLOPS
   2800 1T to 10608 4T                1403 1T 4492 4T

.L51:                               .L44:
   ldr     q15, [x2, x3]              ldr     q15, [x3, x2]
   add     w4, w4, 1                  add     w4, w4, 1
   cmp     w4, w6                     cmp     w4, w5
   fadd    v0.4s, v15.4s, v14.4s      fadd    v7.2d, v15.2d, v14.2d
   fadd    v17.4s, v15.4s, v12.4s     fadd    v16.2d, v15.2d, v12.2d
   fmul    v0.4s, v0.4s, v13.4s       fmul    v7.2d, v7.2d, v13.2d
   fmls    v0.4s, v17.4s, v11.4s      fmls    v7.2d, v16.2d, v11.2d
   fadd    v17.4s, v15.4s, v10.4s     fadd    v16.2d, v15.2d, v10.2d
   fmla    v0.4s, v17.4s, v9.4s       fmla    v7.2d, v16.2d, v9.2d
   fadd    v17.4s, v15.4s, v8.4s      fadd    v16.2d, v15.2d, v8.2d
   fmls    v0.4s, v17.4s, v31.4s      fmls    v7.2d, v16.2d, v31.2d
   fadd    v17.4s, v15.4s, v30.4s     fadd    v16.2d, v15.2d, v30.2d
   fmla    v0.4s, v17.4s, v29.4s      fmla    v7.2d, v16.2d, v29.2d
   fadd    v17.4s, v15.4s, v16.4s     fadd    v16.2d, v15.2d, v28.2d
   fmls    v0.4s, v17.4s, v28.4s      fmls    v7.2d, v16.2d, v27.2d
   fadd    v17.4s, v15.4s, v27.4s     fadd    v16.2d, v15.2d, v26.2d
   fmla    v0.4s, v17.4s, v26.4s      fmla    v7.2d, v16.2d, v25.2d
   fadd    v17.4s, v15.4s, v25.4s     fadd    v16.2d, v15.2d, v24.2d
   fmls    v0.4s, v17.4s, v24.4s      fmls    v7.2d, v16.2d, v23.2d
   fadd    v17.4s, v15.4s, v23.4s     fadd    v16.2d, v15.2d, v22.2d
   fmla    v0.4s, v17.4s, v22.4s      fmla    v7.2d, v16.2d, v21.2d
   fadd    v17.4s, v15.4s, v21.4s     fadd    v16.2d, v15.2d, v20.2d
   fadd    v15.4s, v15.4s, v19.4s     fadd    v15.2d, v15.2d, v18.2d
   fmls    v0.4s, v17.4s, v20.4s      fmls    v7.2d, v16.2d, v19.2d
   fmla    v0.4s, v15.4s, v18.4s      fmla    v7.2d, v15.2d, v17.2d
   str     q0, [x2, x3]               str     q7, [x3, x2]
   add     x3, x3, 16                 add     x2, x2, 16
   bcc     .L51                       bcc     .L44

RoyLongbottom
Posts: 218
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Fri Jun 29, 2018 9:44 am

Slow 3B+ Multithreading Performance - Temperature or Power?

On running my multithreading benchmarks, I noted unusual slow performance from certain tests. The first was the Whetstone benchmark, with independent copies of the program, using 1, 2, 4 and 8 threads. Then, the running time should not increase much using up to 4 threads, but should be just over twice as long using 8. As shown in the example below, the 4 thread test was too slow and this was particularly due to the long running COS test.

Code: Select all

  MP-Whetstone Benchmark armv8 64 Bit Mon Jun 18 23:09:29 2018

                    Using 1, 2, 4 and 8 Threads

      MWIPS MFLOPS MFLOPS MFLOPS   Cos   Exp     Fixpt      If  Equal
                 1      2      3  MOPS  MOPS   ## MOPS    MOPS   MOPS

 1T  1112.9  352.1  379.0  319.2  22.0  12.7 1641076.6  2722.5 1328.7
 2T  2250.7  717.5  767.4  656.4  44.5  25.5 2684285.1  5456.3 2652.7
 4T  2899.0 1342.3 1525.3 1048.1  42.6  46.1 1959513.0  4497.3 4319.1
 8T  3433.1 1654.1 1804.6 1106.4  55.2  47.8 2453184.1 10960.3 4994.0

   Overall Seconds   5.14 1T,   5.11 2T,   8.08 4T,  13.66 8T

  ## over optimised but always had little effect on overall MWIPS
A (not official) 2.5 amp power supply was used and this was connected via a digital meter that measures current and voltage. During the tests, this reported constant over 5 volts and less than 1 amp. I suspected overheating and ran my RPiHeatMHz program at the same time, producing the results below and showing that the CPU MHz flipped into 600 MHz at the time of slow recorded performance. Although the temperature was not excessive. I carried out further tests with the system wrapped in bags of frozen food. The failures still occurred with recorded temperatures of less than 30°C.

Code: Select all

 Temperature and CPU MHz Measurement

 Start at Mon Jun 18 23:09:26 2018

 Using 40 samples at 1 second intervals

 Seconds
    0.0     1400 scaling MHz,   1400 ARM MHz, temp=55.8'C
    1.0     1400 scaling MHz,   1400 ARM MHz, temp=55.8'C
    2.2     1400 scaling MHz,   1400 ARM MHz, temp=55.8'C
    3.3     1400 scaling MHz,   1400 ARM MHz, temp=56.4'C 1T
    4.5     1400 scaling MHz,   1400 ARM MHz, temp=56.9'C
    5.7     1400 scaling MHz,   1400 ARM MHz, temp=56.9'C
    6.9     1400 scaling MHz,   1400 ARM MHz, temp=56.9'C
    8.0     1400 scaling MHz,   1400 ARM MHz, temp=57.5'C 2T
    9.2     1400 scaling MHz,   1400 ARM MHz, temp=58.0'C
   10.4     1400 scaling MHz,   1400 ARM MHz, temp=58.0'C
   11.7     1400 scaling MHz,   1399 ARM MHz, temp=59.1'C
   12.9     1400 scaling MHz,   1400 ARM MHz, temp=59.1'C 4T
   14.1     1400 scaling MHz,    600 ARM MHz, temp=59.1'C
   15.4     1400 scaling MHz,    600 ARM MHz, temp=59.1'C
   16.8     1400 scaling MHz,    600 ARM MHz, temp=58.0'C
   18.3     1400 scaling MHz,   1400 ARM MHz, temp=60.1'C
   19.6     1400 scaling MHz,   1400 ARM MHz, temp=60.7'C
   20.8     1400 scaling MHz,   1400 ARM MHz, temp=61.2'C
   22.0     1400 scaling MHz,   1400 ARM MHz, temp=61.8'C 8T
   23.3     1400 scaling MHz,    600 ARM MHz, temp=60.1'C
   24.9     1400 scaling MHz,    600 ARM MHz, temp=60.1'C
   26.4     1400 scaling MHz,   1400 ARM MHz, temp=60.7'C
   27.6     1400 scaling MHz,   1400 ARM MHz, temp=61.2'C
   To
   38.8     1400 scaling MHz,   1400 ARM MHz, temp=60.1'C
Next, I tried using my official Pi 2 amp power supply and that seemed to be fine, but caused the failures when the meter was included, needing connection using a longer wire. It also failed when just the wire extension was included.

The above benchmarks were the 64 bit variety, run via Gentoo. So next, I ran the 32 bit program via Raspbian. That ran successfully using the 2.5 amp power supply, with the meter monitoring demands. There was also no problem using the 2 amp unit, but note that the 32 bit benchmark is slightly slower than the 64 bit version. However, it did fail after including the power extension cable.

I have ordered an official 2.5 amp power supply and will see how it goes.

ejolson
Posts: 1641
Joined: Tue Mar 18, 2014 11:47 am

Re: Raspberry Pi Benchmarks

Sat Jun 30, 2018 2:26 am

RoyLongbottom wrote:
Fri Jun 29, 2018 9:44 am
I carried out further tests with the system wrapped in bags of frozen food.
I hear fresh-frozen garden peas are the best kind of food to use for SOC cooling. Did you provide any protection to prevent short circuits from condensation?

User avatar
davidcoton
Posts: 2999
Joined: Mon Sep 01, 2014 2:37 pm
Location: Cambridge, UK

Re: Raspberry Pi Benchmarks

Sat Jun 30, 2018 9:13 am

ejolson wrote:
Sat Jun 30, 2018 2:26 am
RoyLongbottom wrote:
Fri Jun 29, 2018 9:44 am
I carried out further tests with the system wrapped in bags of frozen food.
I hear fresh-frozen garden peas are the best kind of food to use for SOC cooling. Did you provide any protection to prevent short circuits from condensation?
Doesn't the W(h)etstone benchmark need some water? :lol:
Don't try it at home :roll:
"Thanks for saving my life." See https://www.raspberrypi.org/forums/viewtopic.php?p=1327656#p1327656
“Raspberry Pi is a trademark of the Raspberry Pi Foundation”

RoyLongbottom
Posts: 218
Joined: Fri Apr 12, 2013 9:27 am
Location: Essex, UK
Contact: Website

Re: Raspberry Pi Benchmarks

Sat Jun 30, 2018 10:28 am

raspi.jpg
raspi.jpg (28.76 KiB) Viewed 136 times
Did you provide any protection to prevent short circuits from condensation?
I didn't need to as I used the above then ran the Dhrystone Benchmark.

Return to “General programming discussion”

Who is online

Users browsing this forum: No registered users and 6 guests