stony
Posts: 4
Joined: Sat Oct 08, 2016 5:45 pm

apt-build optimisation

Fri Oct 14, 2016 9:40 am

Hi all.

while using my new Pi3, i thought about compiling software because it is newer, faster and has more feautures than the old Pi1, but all Raspberry Pis use same software.

For testing i ran sysbench (from raspbian repo) with 1 thread and 2000 (i did not want to spend too much time for waiting).

Code: Select all

~/ sysbench --num-threads=1 --test=cpu --cpu-max-prime=2000 --validate run                     
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Additional request validation enabled.


Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 2000


Test execution summary:
    total time:                          19.8116s
    total number of events:              10000
    total time taken by event execution: 19.8036
    per-request statistics:
         min:                                  1.96ms
         avg:                                  1.98ms
         max:                                  9.87ms
         approx.  95 percentile:               1.97ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   19.8036/0.00

sysbench --num-threads=1 --test=cpu --cpu-max-prime=2000 --validate run  19.80s user 0.02s system 99% cpu 19.848 total
I installed apt-build and let it run wih -O2 and mtune = -mcpu=cortex-a53 -mfpu=neon-vfpv4
Result:

Code: Select all

~/ sysbench --num-threads=1 --test=cpu --cpu-max-prime=2000 --validate run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Additional request validation enabled.


Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 2000


Test execution summary:
    total time:                          16.3542s
    total number of events:              10000
    total time taken by event execution: 16.3464
    per-request statistics:
         min:                                  1.60ms
         avg:                                  1.63ms
         max:                                  8.46ms
         approx.  95 percentile:               1.62ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   16.3464/0.00

sysbench --num-threads=1 --test=cpu --cpu-max-prime=2000 --validate run  16.35s user 0.02s system 99% cpu 16.396 total
Summary:

standard = 19,8 seconds
optimisation = 16,3 seconds


I rebuild with -O3,but there was no difference.
Each build was run 5-10 times to ensure that tests are correct.

So, what should i do now to confirm this result? Is there a benchmark or something else that i should run, build and re-run again?

Regardes

jahboater
Posts: 5759
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: apt-build optimisation

Fri Oct 14, 2016 12:04 pm

Code: Select all

-mfpu=neon-fp-armv8
is (slightly) better for the Pi3, though it probably wont help with sysbench.

This benchmark is a bit limited because the performance is dominated by the division/remainder operation. All arithmetic in the benchmark is 64-bit. Both CPU's are 32-bit or operating in 32-bit mode. Were you to run the Pi3 in 64-bit mode (which has a 64-bit integer division instruction) you would see a large improvement - 15 times faster or more.

What you are seeing are just the small improvements in the 32-bit cortex-a53 instruction set.

stony
Posts: 4
Joined: Sat Oct 08, 2016 5:45 pm

Re: apt-build optimisation

Fri Oct 14, 2016 8:18 pm

jahboater wrote:

Code: Select all

-mfpu=neon-fp-armv8
is (slightly) better for the Pi3, though it probably wont help with sysbench.
I will change it :) Thanks!
This benchmark is a bit limited because the performance is dominated by the division/remainder operation. All arithmetic in the benchmark is 64-bit. Both CPU's are 32-bit or operating in 32-bit mode. Were you to run the Pi3 in 64-bit mode (which has a 64-bit integer division instruction) you would see a large improvement - 15 times faster or more.
If someone will convert raspian to 64bit, i will follow, but i am no technican so i will just crash my system with smaller optimisations :lol:
What you are seeing are just the small improvements in the 32-bit cortex-a53 instruction set.
Hm... if you call 20% "just the small improvements" ... other guys set voltage to 1.4v to get this result.

Is there a possibility to check "improvement" with real situations? Like starting fluxbox oder using squid-proxy-server?

Return to “General discussion”