As far as I know the video drivers in the 64-bit distributions have no hardware acceleration, whereas the 32-bit distributions do.sirspudd wrote:Working on cross compiling Qt for the aarch64 to see how it performs.
I don't know of a single reason VC4 will not fly on this 4.7 kernel; it runs nicely on Arch at 32bit, and hinges on no BRCM provided binariesejolson wrote:As far as I know the video drivers in the 64-bit distributions have no hardware acceleration, whereas the 32-bit distributions do.sirspudd wrote:Working on cross compiling Qt for the aarch64 to see how it performs.
The linpack benchmark for solving systems of linear equations discussed in this thread achieves about 6.5 double-precision gflops running on a well-cooled Pi 3 in 32-bit mode. As this is a well understood computational problem, it would be interesting to know whether a version optimized for 64-bit would perform any better.
The 64bit chip in the iPhone was faster because it was a faster chip. The 64bitness of it did make a difference, but was not the whole story. Apple has a ARM Architecture licence I believe, so is able to tweak the silicon in various ways. The big one, IIRC, was in the memory subsystem silicon which meant the chips were best in class. Which of course is 64bit, but probably would have been very good in 32 as well.sirspudd wrote: Remember all those muppets who said the iPhone adopting a 64 bit chip was dopey as they only had a gig of ram at the time; remember the egg on their faces as the chip put every other ARM based device on its face in terms of performance? We don't need to repeat these assertions.
Which survey? It's ok quoting facts and figures but ... ah, forget it. Citation needed.MarkHaysHarris777 wrote:They're getting their act together at the Pine64 team... yes, the PI came in first in the survey, but the PineA64 came in 7th...
One obvious place to look for gains is in programs that perform lots of 64-bit integer arithmetic. I would guess gmp optimized for ARMv8 might use such instructions as well as certain cryptographic libraries and random number generators. A synthetic test of 64-bit processor speed that will likely show a 10x improvement can be found in the collatz algorithm.sirspudd wrote:@jamesh:
http://www.anandtech.com/show/7335/the- ... s-review/4
There is a lot more potentially at stake then just address space. As mentioned, I have no demonstrable proof about gains, nor potential gains, but we are now in a position to actually test this with fire and to see whether we can stretch any more out of the chip.
Code: Select all
Fedora Fedora Raspbian ARM64 ARM ARM sysbench version 0.4.12 0.4.12 0.4.12 sysbench binary size 140592 113560 90212 sysbench –-num-threads=1 –test=cpu –-cpu-max-prime=20000 –-validate run Total time: 60.4250s 727,0269s 478.9251s Min statistic request: 6.03ms 72.66ms 47.88ms Avg statistic request: 6.04ms 72.70ms 47.89ms Max statistic request: 6.05ms 76.96ms 69.86ms sysbench –-num-threads=4 –test=cpu –-cpu-max-prime=20000 –-validate run Total time: 15.2510s 183.4330s 119.5340s Min statistic request: 6.03ms 72.64ms 47.69ms Avg statistic request: 6.1ms 73.36ms 47.80ms Max statistic request: 6.42ms 75.76ms 104.88ms top memory usage 0.5% 0.4% 0.2% Valgrind –-tool=massif sysbench –-num-threads=4 –test=cpu –-cpu-max-prime=2000 –-validate run Max mem_heap_B 82592 26008 5347 With mem_heap_extra_B 4344 3296 3037 Mem_stacks_B 0 0 0 heap_tree= peak peak peak Memtester: Version: 4.3.0 4.3.0 4.3.0 Binary size 21664 17624 14236 time memtester 256M 1 14m14.095s 11m12.650s 9m5.4s Valgrind –-tool=massif memtester 1M 1 Max mem_heap_B 1052672 1052672 1048576 With mem_heap_extra_B 16 16 8 Mem_stacks_B 0 0 0 heap_tree= peak peak empty
"sysbench --test=cpu" is not a floating point test (except that it unnecessarily calls sqrt).