Page 2 of 2

Re: neon discussion, FAO shiftplusone

Posted: Thu Mar 31, 2016 4:56 pm
by plugwash
So I cooked up a small test program, basically do a 1048576 point FFT 128 times so I can time the result.

Since fftw3 uses in-program detection rather than ld hwcaps I needed to test on both the pi1 and the pi2, what shcoked me was that even without neon enabled and even given that this was a single threaded test the pi2 was about 3 times faster than the pi1.

Re: neon discussion, FAO shiftplusone

Posted: Sun Apr 03, 2016 3:06 pm
by plugwash
volk also seems to have had neon disabled, I'm not sure why I missed it in the list at the start of this thread.

Re: neon discussion, FAO shiftplusone

Posted: Sun Apr 03, 2016 3:10 pm
by plugwash
ok, looks like the only real user of volk is gnuradio, not sure if it's worth further effort or not.

Re: neon discussion, FAO shiftplusone

Posted: Sun Apr 03, 2016 3:13 pm
by plugwash
Further investigation shows that volk is stretch only.

Re: neon discussion, FAO shiftplusone

Posted: Tue Apr 05, 2016 10:30 pm
by Claggy
plugwash wrote:So I cooked up a small test program, basically do a 1048576 point FFT 128 times so I can time the result.

Since fftw3 uses in-program detection rather than ld hwcaps I needed to test on both the pi1 and the pi2, what shcoked me was that even without neon enabled and even given that this was a single threaded test the pi2 was about 3 times faster than the pi1.
I saw fftw 3.3.4 had been updated, and have rebuilt the setiathome v8 app at rev 3433, and am at present running a bench on my Pi 2 with the Lunatics Seti v8 test workunits,
When i get a spare moment, I'll do it on my Pi 3, and Pi 1B, althrough the Pi 1B still suffers from the 'Internal error: Oops - undefined instruction: 0 [#2] PREEMPT ARM' problem,

Claggy

Re: neon discussion, FAO shiftplusone

Posted: Wed Apr 06, 2016 11:02 pm
by Claggy
The Bench has eventually completed, there's an up to 7% increase in performance in using the Neon detecting fftw 3.3.4 (and fast mathes),
depending on the telescope Angle Range, the r3236 app didn't use fast mathes (I compiled it myself), the Stock 8.02 did i understand (compiled by someone else), and the r3433 one did:

KWSN-Linux-MBbench v2.1.08
Running on raspberrypi at Tue 05 Apr 2016 21:18:58 UTC
----------------------------------------------------------------
Starting benchmark run...
----------------------------------------------------------------
Listing wu-file(s) in /testWUs :
PG0009_v8.wu
PG0444_v8.wu
PG1327_v7.wu
PG1327_v8.wu

Listing executable(s) in /APPS :
setiathome_8.02_arm-unknown-linux-gnueabihf
setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf

Listing executable in /REF_APPS :
setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf
----------------------------------------------------------------
Current WU: PG0009_v8.wu

----------------------------------------------------------------
Skipping default app setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf, displaying saved result(s)
Elapsed Time: ....................... 8076 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog
./setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog 8045.87 sec 7977.05 sec 54.32 sec
Elapsed Time : ...................... 8046 seconds
Speed compared to default : ......... 100 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.98%

----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog 7883.00 sec 7809.93 sec 55.79 sec
Elapsed Time : ...................... 7883 seconds
Speed compared to default : ......... 102 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.98%

----------------------------------------------------------------
Done with PG0009_v8.wu

====================================================================
Current WU: PG0444_v8.wu

----------------------------------------------------------------
Skipping default app setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf, displaying saved result(s)
Elapsed Time: ....................... 8874 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog
./setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog 8469.55 sec 8393.75 sec 54.89 sec
Elapsed Time : ...................... 8469 seconds
Speed compared to default : ......... 104 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.97%

----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog 8238.09 sec 8159.51 sec 53.89 sec
Elapsed Time : ...................... 8238 seconds
Speed compared to default : ......... 107 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.98%

----------------------------------------------------------------
Done with PG0444_v8.wu

====================================================================
Current WU: PG1327_v7.wu

----------------------------------------------------------------
Running default app with command :... setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf -st -verb -nog 10602.89 sec 10453.22 sec 129.63 sec
Elapsed Time: ....................... 10603 seconds

----------------------------------------------------------------
Running app with command : .......... setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog
./setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog 10473.90 sec 10320.13 sec 129.88 sec
Elapsed Time : ...................... 10474 seconds
Speed compared to default : ......... 101 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.95%

----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog 9905.77 sec 9753.34 sec 127.35 sec
Elapsed Time : ...................... 9906 seconds
Speed compared to default : ......... 107 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.95%

----------------------------------------------------------------
Done with PG1327_v7.wu

====================================================================
Current WU: PG1327_v8.wu

----------------------------------------------------------------
Skipping default app setiathome-8.0r3236.armv7l-unknown-linux-gnueabihf, displaying saved result(s)
Elapsed Time: ....................... 10414 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog
./setiathome_8.02_arm-unknown-linux-gnueabihf -st -verb -nog 10410.35 sec 10251.57 sec 129.66 sec
Elapsed Time : ...................... 10411 seconds
Speed compared to default : ......... 100 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.95%

----------------------------------------------------------------
Running app with command : .......... setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog
./setiathome-8.0r3433.armv7l-unknown-linux-gnueabihf -st -verb -nog 10076.89 sec 9925.13 sec 129.40 sec
Elapsed Time : ...................... 10077 seconds
Speed compared to default : ......... 103 %
-----------------
Comparing results
Result : Strongly similar, Q= 99.95%

----------------------------------------------------------------
Done with PG1327_v8.wu

====================================================================
Hosts CPU data ...
model name : ARMv7 Processor rev 5 (v7l)

Done with Benchmark run! Removing temporary files!

Thanks. :)

Claggy

Re: neon discussion, FAO shiftplusone

Posted: Thu Apr 07, 2016 5:57 pm
by plugwash
libvpx went smoothly, shuold hit the repo in the next update run.

Re: neon discussion, FAO shiftplusone

Posted: Wed Apr 13, 2016 8:48 am
by Rascas
Can you do the same for libjpeg-turbo ?

Re: neon discussion, FAO shiftplusone

Posted: Thu Apr 14, 2016 10:11 pm
by cjan
plugwash wrote:libvpx went smoothly, shuold hit the repo in the next update run.
after 2 times update, did not have libvpx update?

Re: neon discussion, FAO shiftplusone

Posted: Fri Apr 15, 2016 12:47 am
by plugwash
hmm, what version do you have?

Re: neon discussion, FAO shiftplusone

Posted: Fri Apr 15, 2016 12:54 am
by cjan
plugwash wrote:hmm, what version do you have?
libvpx-dev_1.3.0-3+rvt

Re: neon discussion, FAO shiftplusone

Posted: Fri Apr 15, 2016 1:06 am
by plugwash
Thats the updated version.

Re: neon discussion, FAO shiftplusone

Posted: Thu Apr 21, 2016 2:16 pm
by plugwash
Rascas wrote:Can you do the same for libjpeg-turbo ?
We never disabled neon in libjpeg-turbo (of course that doesn't mean it's working, but if it's not working it's not because we disabled it).

Re: neon discussion, FAO shiftplusone

Posted: Thu May 12, 2016 8:15 pm
by plugwash
Investigating stretch's ffmpeg it seems to already be using internal neon detection (no seperate neon flavour) and to be performing better than the neon-enabled version of wheezy's libav. I conclude that no action is needed there.

Re: neon discussion, FAO shiftplusone

Posted: Thu May 12, 2016 9:15 pm
by plugwash
Just forward ported the x264 changes to stretch, it is using ld.so hwcaps so I don't think there is any need to run tests and will be uploading straight away.