I dug up an old program I had written back in my Unix days. I do a lot of code doodling (think of a pad of paper and pencil but with editor and a terminal). This program is in C (my perfered language). It is based on a challenge from "Nibble" magazine (I think) back when I was learning to program on my Apple II. The challenge was to write an 6502 assembly program to count to one million as fast as possible. Back then the Apple II did not have virtual memory so the screen was mapped to an area of memory (0x400 I think but my memory is corrupted some). Any way storing the ascii value for '0' in 6 consecutive memory locations displayed 000000. The winner to this challenge counted to a million in just a few secs on a cpu that ran on a 1 Mhz clock blew my mind. The way he did it was to not loop but bump the ones place ten times then bump the tens place repeating this ten times. (basically unrolling the loop until the thousands place if I remember correctly). Anyway my code in C does the same thing but I increased the count as systems have gotten so fast.
I ran the program on my Pi 4 and it takes about 1.5 secs. I do a simple 'for loop' to do the same thing in about 15 secs.
This time is run on a SSH through MobaXterm on my desktop. If I run the same program through a remote xterm the time becomes 13 secs.
It's not to hard to figure out why. If I run the program redirecting the output to /dev/null the result is 0.56 secs no matter what terminal I use. The wierd part is my PI 4 beats my Windows 10 desktop a 3.6 Ghz I5 which runs 1010 secs (redirecting to NULL devive took 73 secs). Windows expects everyone to not use the command line except simple batch scripts.
The deal with benchmarks is you need to be knowledgeable of whats going on. Any way thought it might be of intrest.
Just a note: This count optimization is not a good programming implementation. The hassle to maintain the code is terrible (but it gives you some bragging rights...). I'm thinking of doing a version that counts in hex and maybe a binary version. I Implemented this in curses and the source code is over 1 megabyte long (its not fast by a long shot as I suspected). I had hoped to try counting to one billion extending the loop unrolling for the curses version, but at 10 million bytes source code both nano and gcc would break.