Code: Select all
-march=armv7-a -mtune=cortex-a72
Adding or omitting -mfloat-abi=hard makes no difference in the output binary as far I've seen, as hard float is the default.
Code: Select all
-march=armv7-a -mtune=cortex-a72
Code: Select all
(pi64)[email protected]:~ $ openssl speed ecdh
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-vZWY2W/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
op op/s
160 bits ecdh (secp160r1) 0.0008s 1211.3
192 bits ecdh (nistp192) 0.0010s 1003.2
224 bits ecdh (nistp224) 0.0014s 707.6
256 bits ecdh (nistp256) 0.0003s 3813.7
384 bits ecdh (nistp384) 0.0043s 231.6
521 bits ecdh (nistp521) 0.0113s 88.7
163 bits ecdh (nistk163) 0.0012s 853.9
233 bits ecdh (nistk233) 0.0017s 580.3
283 bits ecdh (nistk283) 0.0036s 279.0
409 bits ecdh (nistk409) 0.0075s 132.7
571 bits ecdh (nistk571) 0.0157s 63.7
163 bits ecdh (nistb163) 0.0012s 812.8
233 bits ecdh (nistb233) 0.0018s 552.7
283 bits ecdh (nistb283) 0.0039s 257.3
409 bits ecdh (nistb409) 0.0084s 119.7
571 bits ecdh (nistb571) 0.0175s 57.1
256 bits ecdh (brainpoolP256r1) 0.0016s 632.2
256 bits ecdh (brainpoolP256t1) 0.0016s 629.7
384 bits ecdh (brainpoolP384r1) 0.0043s 231.3
384 bits ecdh (brainpoolP384t1) 0.0043s 233.3
512 bits ecdh (brainpoolP512r1) 0.0085s 117.5
512 bits ecdh (brainpoolP512t1) 0.0085s 118.3
253 bits ecdh (X25519) 0.0003s 3524.4
448 bits ecdh (X448) 0.0018s 566.0
(pi32)[email protected]:~/openssl-1.1.1c/build_shared/apps $ LD_LIBRARY_PATH=.. ./openssl speed ecdh
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC # jdonald NB: "built on" misleading because it uses SOURCE_DATE_EPOCH
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 -g -O2 -fdebug-prefix-map=/home/pi/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
op op/s
160 bits ecdh (secp160r1) 0.0010s 978.6
192 bits ecdh (nistp192) 0.0015s 675.3
224 bits ecdh (nistp224) 0.0021s 479.1
256 bits ecdh (nistp256) 0.0004s 2348.4
384 bits ecdh (nistp384) 0.0078s 128.9
521 bits ecdh (nistp521) 0.0193s 51.9
163 bits ecdh (nistk163) 0.0011s 870.2
233 bits ecdh (nistk233) 0.0019s 522.3
283 bits ecdh (nistk283) 0.0034s 291.3
409 bits ecdh (nistk409) 0.0072s 139.6
571 bits ecdh (nistk571) 0.0167s 59.8
163 bits ecdh (nistb163) 0.0012s 819.2
233 bits ecdh (nistb233) 0.0021s 479.0
283 bits ecdh (nistb283) 0.0038s 266.2
409 bits ecdh (nistb409) 0.0081s 123.1
571 bits ecdh (nistb571) 0.0189s 52.9
256 bits ecdh (brainpoolP256r1) 0.0027s 371.9
256 bits ecdh (brainpoolP256t1) 0.0027s 372.4
384 bits ecdh (brainpoolP384r1) 0.0078s 128.6
384 bits ecdh (brainpoolP384t1) 0.0077s 129.3
512 bits ecdh (brainpoolP512r1) 0.0110s 91.3
512 bits ecdh (brainpoolP512t1) 0.0109s 91.7
253 bits ecdh (X25519) 0.0005s 1839.8
448 bits ecdh (X448) 0.0026s 383.9
(pi64)[email protected]:~ $ openssl speed ecdsa
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-vZWY2W/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
sign verify sign/s verify/s
160 bits ecdsa (secp160r1) 0.0009s 0.0007s 1135.5 1389.2
192 bits ecdsa (nistp192) 0.0011s 0.0009s 944.2 1152.4
224 bits ecdsa (nistp224) 0.0015s 0.0012s 668.7 848.8
256 bits ecdsa (nistp256) 0.0001s 0.0003s 8275.4 2871.0
384 bits ecdsa (nistp384) 0.0045s 0.0033s 220.2 302.9
521 bits ecdsa (nistp521) 0.0119s 0.0082s 84.3 121.4
163 bits ecdsa (nistk163) 0.0013s 0.0025s 798.6 404.2
233 bits ecdsa (nistk233) 0.0018s 0.0036s 545.0 274.5
283 bits ecdsa (nistk283) 0.0038s 0.0075s 263.3 132.6
409 bits ecdsa (nistk409) 0.0079s 0.0156s 127.1 64.1
571 bits ecdsa (nistk571) 0.0163s 0.0323s 61.2 30.9
163 bits ecdsa (nistb163) 0.0013s 0.0026s 761.6 385.7
233 bits ecdsa (nistb233) 0.0019s 0.0038s 518.8 260.3
283 bits ecdsa (nistb283) 0.0041s 0.0080s 246.9 124.4
409 bits ecdsa (nistb409) 0.0087s 0.0172s 115.3 58.1
571 bits ecdsa (nistb571) 0.0182s 0.0360s 54.9 27.8
256 bits ecdsa (brainpoolP256r1) 0.0017s 0.0014s 598.8 695.8
256 bits ecdsa (brainpoolP256t1) 0.0017s 0.0013s 599.9 755.6
384 bits ecdsa (brainpoolP384r1) 0.0046s 0.0035s 219.5 282.9
384 bits ecdsa (brainpoolP384t1) 0.0045s 0.0033s 221.7 305.6
512 bits ecdsa (brainpoolP512r1) 0.0089s 0.0064s 112.3 155.4
512 bits ecdsa (brainpoolP512t1) 0.0088s 0.0059s 113.1 168.8
(pi32)[email protected]:~/openssl-1.1.1c/build_shared/apps $ LD_LIBRARY_PATH=.. ./openssl speed ecdsa
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC # jdonald NB: "built on" misleading because it uses SOURCE_DATE_EPOCH
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 -g -O2 -fdebug-prefix-map=/home/pi/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
sign verify sign/s verify/s
160 bits ecdsa (secp160r1) 0.0011s 0.0009s 910.1 1128.1
192 bits ecdsa (nistp192) 0.0016s 0.0012s 640.0 809.1
224 bits ecdsa (nistp224) 0.0022s 0.0017s 457.0 596.6
256 bits ecdsa (nistp256) 0.0002s 0.0006s 4250.7 1612.2
384 bits ecdsa (nistp384) 0.0081s 0.0057s 123.0 175.9
521 bits ecdsa (nistp521) 0.0202s 0.0136s 49.4 73.7
163 bits ecdsa (nistk163) 0.0012s 0.0024s 810.5 411.6
233 bits ecdsa (nistk233) 0.0020s 0.0039s 497.1 254.4
283 bits ecdsa (nistk283) 0.0036s 0.0072s 276.2 139.8
409 bits ecdsa (nistk409) 0.0077s 0.0149s 130.6 67.0
571 bits ecdsa (nistk571) 0.0178s 0.0348s 56.2 28.7
163 bits ecdsa (nistb163) 0.0013s 0.0025s 770.4 394.1
233 bits ecdsa (nistb233) 0.0022s 0.0042s 463.7 236.4
283 bits ecdsa (nistb283) 0.0040s 0.0078s 253.0 128.8
409 bits ecdsa (nistb409) 0.0085s 0.0166s 117.9 60.1
571 bits ecdsa (nistb571) 0.0199s 0.0391s 50.2 25.6
256 bits ecdsa (brainpoolP256r1) 0.0028s 0.0023s 354.0 441.2
256 bits ecdsa (brainpoolP256t1) 0.0028s 0.0021s 354.6 470.6
384 bits ecdsa (brainpoolP384r1) 0.0081s 0.0061s 123.1 162.8
384 bits ecdsa (brainpoolP384t1) 0.0081s 0.0057s 123.8 176.5
512 bits ecdsa (brainpoolP512r1) 0.0115s 0.0085s 86.7 117.2
512 bits ecdsa (brainpoolP512t1) 0.0114s 0.0079s 87.8 126.7
Code: Select all
=============================================================================================
64 BIT
=============================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh -p 2222 [email protected] 'cat > /dev/zero'
---------------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 60.6595 s, 69.1 MB/s
real 1m0.692s
user 0m35.075s
sys 0m11.436s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 57.0103 s, 73.6 MB/s
real 0m57.046s
user 0m34.734s
sys 0m9.792s
---------------------------------------------------------------------------------------
time ssh -p 2222 [email protected] 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
---------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 68.5412 s, 61.2 MB/s
real 1m8.837s
user 0m38.308s
sys 0m11.058s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 69.4922 s, 60.4 MB/s
real 1m9.784s
user 0m38.732s
sys 0m10.975s
=====================================================================================
64/32 BIT
=====================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh [email protected] 'cat > /dev/zero'
-------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 80.2948 s, 52.2 MB/s
real 1m20.338s
user 0m40.381s
sys 0m11.419s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 80.8875 s, 51.9 MB/s
real 1m20.931s
user 0m41.776s
sys 0m11.443s
-------------------------------------------------------------------------------
time ssh [email protected] 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
-------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 92.5082 s, 45.3 MB/s
real 1m32.887s
user 0m44.963s
sys 0m15.377s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 92.8123 s, 45.2 MB/s
real 1m33.179s
user 0m44.541s
sys 0m15.030s
=====================================================================================
32 BIT
=====================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh [email protected] 'cat > /dev/zero'
-------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 91.1898 s, 46.0 MB/s
real 1m31.246s
user 0m43.491s
sys 0m12.446s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 90.1818 s, 46.5 MB/s
real 1m30.231s
user 0m44.946s
sys 0m10.945s
-------------------------------------------------------------------------------
time ssh [email protected] 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
-------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 96.0945 s, 43.6 MB/s
real 1m36.693s
user 0m48.734s
sys 0m13.826s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 94.0497 s, 44.6 MB/s
real 1m34.669s
user 0m46.585s
sys 0m14.196s
My understanding is that many of the 64-bit ARM processors have AES cryptographic extensions in hardware as described here. If that's the case, it could explain why nobody has hand coded AES directly in 64-bit ARM assembler.
I'm only interested in "out of the box" performance. I'm perfectly capable of custom builds, but anything that breaks "apt-get upgrade" keeping security fixes current is a non-starter for our general use cases.jdonald wrote: ↑Fri Aug 16, 2019 5:59 amjerrm thanks for running more tests particularly OpenSSH.
However, in light of what pica200 and others pointed out isn't the biggest concern with your methodology that you're comparing 64-bit programs against ARMv6 ones? Furthermore, even if you used ARMv7 binaries it would still fail to account for the performance difference of using first-gen ARMv7 vs higher-end ARMv7. While you might hope this part to be negligible (as it sometimes is), in the case of sysbench there's a 10x performance difference.
The Pi is built to a fixed price point for the standard model. $35. Not $36. Not $35.50. Not $35.05. $35. That is set in stone. The margins are tight. Some stuff has to be left out to get it down to that price. Charging extra for something that most people won't even notice is pointless.pica200 wrote: ↑Fri Aug 16, 2019 12:48 pmThere was a different thread where someone complained about poor SSH speed and i pointed out how the crypto extensions would have made a difference but apparently the userbase was not worth the few cents more per SoC. I would have happily paid 1€ more for hardware AES and SHA1/2.
Yeah, i'd like to know what the real tradeoffs were. There was an early response to the question where jamesh didn't seem to really know the answer("Not as far as I can ascertain."). I would have considered leaving out the extensions a major compromise that would have led to a lot of hand wringing. I hope it wasn't just an oversight.
The numbers tell a different story. I think quite a few people will notice how it doesn't reach the advertised 1 Gbit/s because it is now CPU bound resulting from saving money at the wrong end. This also results in much higher power usage than necessary.
It may have something to do with export restrictions for devices which have encryption hardware.jerrm wrote: ↑Fri Aug 16, 2019 1:05 pmYeah, i'd like to know what the real tradeoffs were. There was an early response to the question where jamesh didn't seem to really know the answer("Not as far as I can ascertain."). I would have considered leaving out the extensions a major compromise that would have led to a lot of hand wringing. I hope it wasn't just an oversight.
But it does that even without hardware acceleration so where is the point restricting it any more than without?
It's a matter of principle. When a product says it can do X but it can't in real world i it may not be fraud immediately (To make it clear: That's not what i'm saying) but it's disappointing at the least. It can never reach the full 1 Gbit/s due to protocol overhead and stuff which everyone knows but what's not visible to potential customers (until they dig deeper which few will do) is that it's now limited elsewhere.
Encrypted gigabit Ethernet is useful when setting up a VPN inside a local network. It is good for any kind of secure network filesystem or sharing, again within a local network. Encryption is also used for remote desktop and login.Heater wrote: ↑Fri Aug 16, 2019 1:34 pmI'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?
I can't collect data anywhere near that fast from any devices connected to my Pi.
If I could I cannot get it over my internet connection or mobile connection.
I'm the first to admit our Pi uses are not standard, but we'll be purchasing 100+ units (of something) vs 1 or 2. Miniscule overall for RPF, but I'm sure there are others like us.Heater wrote: ↑Fri Aug 16, 2019 1:34 pmI'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?
I can't collect data anywhere near that fast from any devices connected to my Pi.
If I could I cannot get it over my internet connection or mobile connection.
Here, just outside the coverage area of the microwave towers, I'm lucky to get more than 1mbps. Even for local network use, most network protocols (ssh, samba, rdp, iscsi and realvnc) are encrypted by default. It actually takes quite a bit of effort to install and use unencrypted protocols such as telnet, ftp and nfs.jerrm wrote: ↑Fri Aug 16, 2019 3:51 pmI'm the first to admit our Pi uses are not standard, but we'll be purchasing 100+ units (of something) vs 1 or 2. Miniscule overall for RPF, but I'm sure there are others like us.Heater wrote: ↑Fri Aug 16, 2019 1:34 pmI'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?
I can't collect data anywhere near that fast from any devices connected to my Pi.
If I could I cannot get it over my internet connection or mobile connection.
Even for the home user 1Gbps internet is available and affordable here, 100mbps+ even more so.
More of a wish than an expectation.
This could be an arguable reason not to use Cortex-A72 compiled 32-bit benchmarks. Restating: your 64-bit test system does not receive the benefit of recompiling their binaries which are likely tuned for Cortex-A15 as systems provide out-of-the-box. Thus, it may be an appropriate comparison to use lower end ARMv7 binaries as your 32-bit baseline.
Simplicity and maintainability count. I have no real desire to maintain Raspbian with a Debian chroot. But since this is just a curiosity project, I set up the Debian chroot anyway.jdonald wrote: ↑Fri Aug 16, 2019 6:40 pmHowever, this does not appear to justify using ARMv6 binaries in your baseline. Debian and Ubuntu are two systems compiled for ARMv7 with upstream packages that can handle "apt-get upgrade" just fine. Your 64-bit measurement is already done inside a Debian arm64 chroot so it would make sense for the baseline to run inside a Debian armhf chroot. This would also better control for factors such as side effects from running inside a chroot vs metal, or subtle configuration differences between Raspbian and Debian.
As alluded to by jahboater early on in this thread, for various programs the performance delta going from ARMv6->ARMv7 exceeds that of going from 32-bit->64-bit. If we only compare ARMv6 vs 64-bit that leaves a big unknown on any test result.
Code: Select all
=============================================================================================
64/deb32 BIT
=============================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh -p 2222 [email protected] 'cat > /dev/null'
---------------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 86.6481 s, 48.4 MB/s
real 1m26.695s
user 0m44.092s
sys 0m12.941s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 89.0552 s, 47.1 MB/s
real 1m29.112s
user 0m44.384s
sys 0m12.446s
---------------------------------------------------------------------------------------
time ssh -p 2222 [email protected] 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
---------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 99.3105 s, 42.2 MB/s
real 1m39.604s
user 0m50.835s
sys 0m13.415s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 101.382 s, 41.4 MB/s
real 1m41.674s
user 0m56.046s
sys 0m9.099s
=============================================================================================
32/deb32 BIT
=============================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh -p 2222 [email protected] 'cat > /dev/null'
---------------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 91.5883 s, 45.8 MB/s
real 1m31.639s
user 0m45.516s
sys 0m13.374s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 94.2352 s, 44.5 MB/s
real 1m34.286s
user 0m45.983s
sys 0m11.374s
---------------------------------------------------------------------------------------
time ssh -p 2222 [email protected] 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
---------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 95.1678 s, 44.1 MB/s
real 1m35.496s
user 0m47.133s
sys 0m15.391s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 98.5608 s, 42.6 MB/s
real 1m38.886s
user 0m48.183s
sys 0m15.235s