The end dummy result is to avoid compiler to do any optimization and skipping the core operation. So the end result 63 ensures that after doing trillions of iterations, memory bytes set at the end is still intact. It denotes the end result of trillions of iterations done successful. But the catch is I need to do dummy print on the screen. This way compiler will not escape and will not do code optimization.
Besides, it is part CPU intensive. Assume you have a network stack, (VPN, IPS, server,) or a database. In most cases it is about movement of memory bytes to and fro along with some amount of CPU computation (I mean the basic cpu number crunching logic). And this is the reason in these applications cpu computational latency becomes a bottleneck. And for the same reasons it is wise to buy a hardware-offload networking device vs. a traditional cpu software bound networking device.
But the Raspberry Pi results show a complete different story. I am confident that in next 2-3 years, a typical raspberry pi sized hardware which is ARM based can compete Intel x86 chips. But it is even more makes sense to cross check with new upcoming AMD ZEN architecture.
I am currently not releasing my source-code. Since it is too simple. The problem in general public (and the programmer community), is that they have a collective conclusion about things. They do and believe things which are very common and mostly practised. They fail to adapt sometimes. And they now start comparing this technique with other methods. Which is something I do not want.
I do not want to test scalability of CPU. I do not want to test its multidimensional aspect. Instead I want to calibrate precisely single dimension performance. And just judge with that. Lets say we have 4GB of RAM vs 8GB vs 16GB of RAM. More CPU cores means better support for more threads and or sometimes more processes (and threads within the same). Which is easy to compute. And this is vital aspect for Server Hardware design. But in desktop computing (especially gaming), they need combination of all. They need best scalability as well best core-wise performance (lets ignore GPU aspect in this case). GPU is useful for better frame-rates(or game rendering), but CPU is needed for better Game AI, overall app performance.
But single dimension tests tells the real story about CPU innovation progress is what I believe
I did recently a video on Threads vs Processes in my channel The Linux Channel
. This somewhat explains this logic.