Just a quick consideration but could compile flags have a influence on the calculations and / or checks ?
Possibly, but only after a re-compilation.
The problem here is related to over heating and the effectiveness of the throttling mechanism.
jojopi wrote:UPDATE: With very aggressive cooling (80mm fan blowing towards the existing heatsink), N=8000 now passes reliably for me at ~6.4Gflops, 53s.
This is a very interesting data point. Not only does the heat sink and fan make the Pi 3B run twice as fast, but it shows that without extra cooling the CPU doesn't throttle down fast enough to prevent errors when doing linear algebra. Maybe there are no 3Bs that can do this calculation reliably at 1.2 GHz without a heat sink. I wonder if there is an under clock setting that would work without the fan.
You are right about possible issues with denormals, but it would happen all the time, and the problem wouldn't go away by pointing a more powerful fan at the SoC.
Perhaps the reason why NEON keeps being mentioned is that it is so powerful. NEON on the Pi3 is quad issue and there are four cores, so it could potentially do 16 SIMD operations at once.
The cpuburn program uses the vaba instruction which subtracts two numbers, gets the absolute value and adds that to a result, on 4 separate 32 bit numbers - each instruction ...
Like the GPU it probably takes up quite a lot of the chip.