I've run more benchmarks on the controlled setup. After sysbench I figured I'd start with elliptic curve crypto in order to put to rest the rumors from a year ago.

Below you can find the raw numbers for ECDH 64-bit followed by 32-bit ECDH, then 64-bit ECDSA followed by 32-bit ECDSA.

Code: Select all

```
(pi64)[email protected]:~ $ openssl speed ecdh
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-vZWY2W/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
op op/s
160 bits ecdh (secp160r1) 0.0008s 1211.3
192 bits ecdh (nistp192) 0.0010s 1003.2
224 bits ecdh (nistp224) 0.0014s 707.6
256 bits ecdh (nistp256) 0.0003s 3813.7
384 bits ecdh (nistp384) 0.0043s 231.6
521 bits ecdh (nistp521) 0.0113s 88.7
163 bits ecdh (nistk163) 0.0012s 853.9
233 bits ecdh (nistk233) 0.0017s 580.3
283 bits ecdh (nistk283) 0.0036s 279.0
409 bits ecdh (nistk409) 0.0075s 132.7
571 bits ecdh (nistk571) 0.0157s 63.7
163 bits ecdh (nistb163) 0.0012s 812.8
233 bits ecdh (nistb233) 0.0018s 552.7
283 bits ecdh (nistb283) 0.0039s 257.3
409 bits ecdh (nistb409) 0.0084s 119.7
571 bits ecdh (nistb571) 0.0175s 57.1
256 bits ecdh (brainpoolP256r1) 0.0016s 632.2
256 bits ecdh (brainpoolP256t1) 0.0016s 629.7
384 bits ecdh (brainpoolP384r1) 0.0043s 231.3
384 bits ecdh (brainpoolP384t1) 0.0043s 233.3
512 bits ecdh (brainpoolP512r1) 0.0085s 117.5
512 bits ecdh (brainpoolP512t1) 0.0085s 118.3
253 bits ecdh (X25519) 0.0003s 3524.4
448 bits ecdh (X448) 0.0018s 566.0
(pi32)[email protected]:~/openssl-1.1.1c/build_shared/apps $ LD_LIBRARY_PATH=.. ./openssl speed ecdh
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC # jdonald NB: "built on" misleading because it uses SOURCE_DATE_EPOCH
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 -g -O2 -fdebug-prefix-map=/home/pi/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
op op/s
160 bits ecdh (secp160r1) 0.0010s 978.6
192 bits ecdh (nistp192) 0.0015s 675.3
224 bits ecdh (nistp224) 0.0021s 479.1
256 bits ecdh (nistp256) 0.0004s 2348.4
384 bits ecdh (nistp384) 0.0078s 128.9
521 bits ecdh (nistp521) 0.0193s 51.9
163 bits ecdh (nistk163) 0.0011s 870.2
233 bits ecdh (nistk233) 0.0019s 522.3
283 bits ecdh (nistk283) 0.0034s 291.3
409 bits ecdh (nistk409) 0.0072s 139.6
571 bits ecdh (nistk571) 0.0167s 59.8
163 bits ecdh (nistb163) 0.0012s 819.2
233 bits ecdh (nistb233) 0.0021s 479.0
283 bits ecdh (nistb283) 0.0038s 266.2
409 bits ecdh (nistb409) 0.0081s 123.1
571 bits ecdh (nistb571) 0.0189s 52.9
256 bits ecdh (brainpoolP256r1) 0.0027s 371.9
256 bits ecdh (brainpoolP256t1) 0.0027s 372.4
384 bits ecdh (brainpoolP384r1) 0.0078s 128.6
384 bits ecdh (brainpoolP384t1) 0.0077s 129.3
512 bits ecdh (brainpoolP512r1) 0.0110s 91.3
512 bits ecdh (brainpoolP512t1) 0.0109s 91.7
253 bits ecdh (X25519) 0.0005s 1839.8
448 bits ecdh (X448) 0.0026s 383.9
(pi64)[email protected]:~ $ openssl speed ecdsa
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-vZWY2W/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
sign verify sign/s verify/s
160 bits ecdsa (secp160r1) 0.0009s 0.0007s 1135.5 1389.2
192 bits ecdsa (nistp192) 0.0011s 0.0009s 944.2 1152.4
224 bits ecdsa (nistp224) 0.0015s 0.0012s 668.7 848.8
256 bits ecdsa (nistp256) 0.0001s 0.0003s 8275.4 2871.0
384 bits ecdsa (nistp384) 0.0045s 0.0033s 220.2 302.9
521 bits ecdsa (nistp521) 0.0119s 0.0082s 84.3 121.4
163 bits ecdsa (nistk163) 0.0013s 0.0025s 798.6 404.2
233 bits ecdsa (nistk233) 0.0018s 0.0036s 545.0 274.5
283 bits ecdsa (nistk283) 0.0038s 0.0075s 263.3 132.6
409 bits ecdsa (nistk409) 0.0079s 0.0156s 127.1 64.1
571 bits ecdsa (nistk571) 0.0163s 0.0323s 61.2 30.9
163 bits ecdsa (nistb163) 0.0013s 0.0026s 761.6 385.7
233 bits ecdsa (nistb233) 0.0019s 0.0038s 518.8 260.3
283 bits ecdsa (nistb283) 0.0041s 0.0080s 246.9 124.4
409 bits ecdsa (nistb409) 0.0087s 0.0172s 115.3 58.1
571 bits ecdsa (nistb571) 0.0182s 0.0360s 54.9 27.8
256 bits ecdsa (brainpoolP256r1) 0.0017s 0.0014s 598.8 695.8
256 bits ecdsa (brainpoolP256t1) 0.0017s 0.0013s 599.9 755.6
384 bits ecdsa (brainpoolP384r1) 0.0046s 0.0035s 219.5 282.9
384 bits ecdsa (brainpoolP384t1) 0.0045s 0.0033s 221.7 305.6
512 bits ecdsa (brainpoolP512r1) 0.0089s 0.0064s 112.3 155.4
512 bits ecdsa (brainpoolP512t1) 0.0088s 0.0059s 113.1 168.8
(pi32)[email protected]:~/openssl-1.1.1c/build_shared/apps $ LD_LIBRARY_PATH=.. ./openssl speed ecdsa
...
OpenSSL 1.1.1c 28 May 2019
built on: Thu May 30 15:27:48 2019 UTC # jdonald NB: "built on" misleading because it uses SOURCE_DATE_EPOCH
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 -g -O2 -fdebug-prefix-map=/home/pi/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
sign verify sign/s verify/s
160 bits ecdsa (secp160r1) 0.0011s 0.0009s 910.1 1128.1
192 bits ecdsa (nistp192) 0.0016s 0.0012s 640.0 809.1
224 bits ecdsa (nistp224) 0.0022s 0.0017s 457.0 596.6
256 bits ecdsa (nistp256) 0.0002s 0.0006s 4250.7 1612.2
384 bits ecdsa (nistp384) 0.0081s 0.0057s 123.0 175.9
521 bits ecdsa (nistp521) 0.0202s 0.0136s 49.4 73.7
163 bits ecdsa (nistk163) 0.0012s 0.0024s 810.5 411.6
233 bits ecdsa (nistk233) 0.0020s 0.0039s 497.1 254.4
283 bits ecdsa (nistk283) 0.0036s 0.0072s 276.2 139.8
409 bits ecdsa (nistk409) 0.0077s 0.0149s 130.6 67.0
571 bits ecdsa (nistk571) 0.0178s 0.0348s 56.2 28.7
163 bits ecdsa (nistb163) 0.0013s 0.0025s 770.4 394.1
233 bits ecdsa (nistb233) 0.0022s 0.0042s 463.7 236.4
283 bits ecdsa (nistb283) 0.0040s 0.0078s 253.0 128.8
409 bits ecdsa (nistb409) 0.0085s 0.0166s 117.9 60.1
571 bits ecdsa (nistb571) 0.0199s 0.0391s 50.2 25.6
256 bits ecdsa (brainpoolP256r1) 0.0028s 0.0023s 354.0 441.2
256 bits ecdsa (brainpoolP256t1) 0.0028s 0.0021s 354.6 470.6
384 bits ecdsa (brainpoolP384r1) 0.0081s 0.0061s 123.1 162.8
384 bits ecdsa (brainpoolP384t1) 0.0081s 0.0057s 123.8 176.5
512 bits ecdsa (brainpoolP512r1) 0.0115s 0.0085s 86.7 117.2
512 bits ecdsa (brainpoolP512t1) 0.0114s 0.0079s 87.8 126.7
```

In order to ensure a fair baseline for 32-bit I added -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 and built 32-bit openssl from source in a Debian armhf container. It was a bit tricky as providing CFLAGS unexpectedly overrides the default flags (including -O3) leading to erroneous results. One needs to make sure the overridden CFLAGS and CXXFLAGS are the combined set before running dpkg-buildpackage. This build from source ultimately didn't change ARMv7 libcrypto performance much (certainly not the 10x difference seen in sysbench), but was done for diligence.

The median speedup here is +28.9% for ECDH and +27.8% for ECDSA. Not 3x, but higher than I expected.

I did some rough tests for RSA, DSA, and HMAC and so far those results point towards a wash for 64-bit vs 32-bit.

Which brings us to the cipher, the part of HTTPS that executes across every byte of content you download. Running AES (specifically focused on

**openssl speed -evp aes-256-gcm**) showed a major anomaly with 64-bit achieving half the throughput. I investigated and if you look closely at the compiler options above the likely reason is there. The Debian aarch64 builds don't include -DAES_ASM, because it turns out OpenSSL doesn't have an aarch64 AES assembly implementation (despite having VPAES) yet. I hope this gets fixed in the not too distant future.

As far as I can tell, OpenSSH does not use libcrypto's implementation of AES. I'm curious as to whether it has the same issue with its 64-bit implementation. Unfortunately it does not appear to have a convenient benchmarking command-line option like OpenSSL.