Thanks pica200 for pointing out the ARMv6 vs ARMv7 catch. Running
rustup show revealed I was using the stable-arm-unknown-linux-gnueabihf toolchain & corresponding target but the correct one to try here is stable-armv7-unknown-linux-gnueabihf.
Revised armv7 32-bit runtimes show a 1% difference:
Code: Select all
(pi32)pi@raspberrypi:~/insane-british-anagram-rust $ time ./target/release/iba-3 > anagrams.txt
real 0m0.523s
user 0m0.442s
sys 0m0.080s
pi@raspberrypi:~/insane-british-anagram-rust $ time ./target/release/iba-3 > anagrams.txt
real 0m0.530s
user 0m0.453s
sys 0m0.076s
pica200 wrote: ↑Mon Aug 12, 2019 5:53 am
Did you compile with "-march=armv8-a+crc+simd -mtune=cortex-a72 -mfloat-abi=hard"
Just tried adding one arg via:
Code: Select all
export RUSTFLAGS="-C llvm-args=-mtune=cortex-a72"
But this fails miserably with a parsing error. Looks like a
longstanding issue with rustc. The workaround here would be to use Rust's builder system, something like:
Code: Select all
let mut builder = Builder::default().rust_target(RustTarget::Stable_1_0)
.clang_arg("-march=armv8-a+crc+simd")
.clang_arg("-mtune=cortex-a72");
Preferably someone more familiar with Rust (Heater) could attempt this. I believe -mfloat-abi=hard is unnecessary as that's already implied by gnueabihf.
Heater wrote: ↑Mon Aug 12, 2019 6:32 am
That's right running the anagram program as a benchmark like that is pretty hopeless. Using "time" is not a good way to do such benchmarks and the execution time is getting so small that that getting a stable result is impossible.
Yeah, for small runtimes we can get large swings with all the unknowns of networking traffic or my USB keyboard interrupts. To mitigate that, here's a script to run 50 times and get a histogram:
Code: Select all
#!/bin/bash -e
rm -f real.txt
touch real.txt
for ((i=1; i<=50; i++)); do
{ time ./target/release/iba-3 > anagrams.txt; } 2>&1 | grep real >> real.txt
done
sort real.txt | uniq -c | sort -nr
That way I can always get a consistent top-ranking runtime. Unfortunately the additional script process or additional redirect seems to add 45 ms to each run, but at least the relative runtime ratios remain consistent.