ejolson
Posts: 3415
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Sun Mar 25, 2018 7:58 pm

tkaiser wrote:
Sun Mar 25, 2018 5:13 pm
ejolson wrote:
Tue Mar 20, 2018 6:46 am
Any compute-bound program that makes heavy use of 64-bit integers will show significantly better performance in 64-bit mode.
Not all, see the C-Ray test results and explanation: https://libre.computer/2018/03/21/raspb ... omparison/

3 different Cortex-A53 SoCs, all clocked between 1.4GHz and 1.5GHz so the 32-bit vs. 64-bit difference can be studied in detail.
Agreed. If 32-bit integers and addresses are sufficient, then 32-bit mode can be faster (due in part to memory bandwidth effects). Another comparison, using code similar but not exactly the same as in sysbench, shows 32-bit integers performing at twice the speed as 64-bit integers on x86 hardware running in 64-bit mode. At the same time 32-bit integers are 15 times faster than 64-bit integers on Raspberry Pi 3B running in 32-bit mode. It would be interesting to run the same test with the 3B in 64-bit mode to see how things change.

Heater
Posts: 13099
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Sun Mar 25, 2018 9:30 pm

ejolson,
...shows 32-bit integers performing at twice the speed as 64-bit integers on x86 hardware running in 64-bit mode.
That seemed so unlikely I had to try for myself.

My little test below runs in about 16.5 seconds no matter if the type used is a 32 or 64 bit integer on my Intel 64 bit Surface Pro. After many runs I could lean toward saying the 64 bit version is half a second or so slower.

Code: Select all

#include <stdio.h>
#include <stdint.h>

typedef uint32_t fibo_t;
//typedef uint64_t fibo_t;


fibo_t fibo (fibo_t n) {
    if (n == 0)
    {
        return 0;
    }
    if (n < 2)
    {
        return 1;
    }
    return (fibo(n - 2) + fibo(n - 1));
}

int main(int argc, char* argv[])
{
    for (fibo_t  n = 0; n <= 46; n++)
    {
        fibo_t f = fibo (n);
        printf("fibo(%lu) = %lu\n", n, f);
    }
    return(0);
}
I would think something is very broken if using 64 bit ints on a 64 bit machine is half as fast as using 32 bit ints.

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Sun Mar 25, 2018 10:31 pm

I suggest looking at the classic work for x86 instruction timings:-

http://www.agner.org/optimize/instruction_tables.pdf

Heater
Posts: 13099
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Sun Mar 25, 2018 11:24 pm

jahboater,

Interesting document thanks. Does Intel even still publish instruction timings for their CPUs?

I'm going to give it a miss though. There is no way I'm about to rewrite my code in assembler, spending months wading through that 300 plus page document to find the optimal way to do everything. Also I'm not about to modify the code generator of my compilers to change the instruction sequences they generate.

Also, I think you will find that it is impossible to know the execution time of any single instruction in your program anymore. Modern processors are loaded with pipelines, out of order execution, speculative execution, parallel dispatch etc. All of which makes the execution time of any instruction quite variable, depending on what is around it and how your program flows.

That's before we talk about the very large variability in execution time of any instruction depending on your programs use of multiple levels of cache memory. It's pretty much not worth the effort to micro-optimize by selecting instructions when your program can be slowed by a factor of 10 or 1000 or more by bad use of cache.

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Sun Mar 25, 2018 11:40 pm

Heater wrote:
Sun Mar 25, 2018 11:24 pm
Also, I think you will find that it is impossible to know the execution time of any single instruction in your program anymore.
Definitely.

But the discussion here seemed to be going down a hole, so some actual measured numbers make a fun change.
You were comparing 32-bit and 64-bit times - both running in 64-bit mode. As you found they are almost identical. 64-bit division takes longer for obvious reasons, oddly 64-bit multiply is sometimes a cycle faster, and perhaps the extra prefix byte for 16 and 64 bit instructions may make a slight difference.

This document is interesting too http://www.agner.org/optimize/microarchitecture.pdf

Index here http://www.agner.org/optimize/#manuals

Heater
Posts: 13099
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Mon Mar 26, 2018 12:02 am

jahboater,
You were comparing 32-bit and 64-bit times - both running in 64-bit mode.
Yes, because that is what ejolson was talking about when he said "shows 32-bit integers performing at twice the speed as 64-bit integers on x86 hardware running in 64-bit mode".

Which I find rather unbelievable.

I'm not sure about the prefix bytes, did not looks so hard at the generated code. But the 64 bit fibo() function I posted above is 112 bytes longer than the 783 bytes of the 32 bit version.

Actually, I'm wondering why they are both so huge for such a short, simple function?

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Mar 26, 2018 12:16 am

Heater wrote:
Mon Mar 26, 2018 12:02 am
I'm not sure about the prefix bytes, did not looks so hard at the generated code. But the 64 bit fibo() function I posted above is 112 bytes longer than the 783 bytes of the 32 bit version.
They are both identical in size on my PC (gcc 7.3).

Code: Select all

   text	   data	    bss	    dec	    hex	filename
   1169	    544	      8	   1721	    6b9	try
Actually, I'm wondering why they are both so huge for such a short, simple function?
[/quote]Just the usual C overheard I suppose. crt0 (the startup code), the GOT table for the linker etc.
You have to write stuff in assembler to get really tiny (a few bytes) programs.

The printf needs adjusting perhaps between 32/64 bits.

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Mar 26, 2018 12:19 am

fibo 64 bit integers is:

Code: Select all

  80                fibo:
  81 0000 55            pushq   %rbp    #
  82 0001 31ED          xorl    %ebp, %ebp  # add_acc_9
  83 0003 53            pushq   %rbx    #
  84 0004 488D5FFE      leaq    -2(%rdi), %rbx  #, ivtmp.15
  85 0008 4883EC08      subq    $8, %rsp    #,
  86                .L3:
  87                # try.c:9:  if (n == 0)
  88 000c 4883FBFE      cmpq    $-2, %rbx   #, ivtmp.15
  89 0010 7416          je  .L4 #,
  90                # try.c:13:     if (n < 2)
  91 0012 4883FBFF      cmpq    $-1, %rbx   #, ivtmp.15
  92 0016 7414          je  .L5 #,
  93                # try.c:17:     return (fibo(n - 2) + fibo(n - 1));
  94 0018 4889DF        movq    %rbx, %rdi  # ivtmp.15,
  95 001b 48FFCB        decq    %rbx    # ivtmp.15
  96 001e E8DDFFFF      call    fibo    #
  96      FF
  97 0023 4801C5        addq    %rax, %rbp  # _2, add_acc_9
  98 0026 EBE4          jmp .L3 #
  99                .L4:
 100                # try.c:11:         return 0;
 101 0028 31C0          xorl    %eax, %eax  # _4
 102 002a EB05          jmp .L2 #
 103                .L5:
 104                # try.c:15:         return 1;
 105 002c B8010000      movl    $1, %eax    #, _4
 105      00
 106                .L2:
 107 0031 4801E8        addq    %rbp, %rax  # add_acc_9, tmp93
 108                # try.c:18: }
 109 0034 5A            popq    %rdx    #
 110 0035 5B            popq    %rbx    #
 111 0036 5D            popq    %rbp    #
 112 0037 C3            ret
and 32 bit integers is:

Code: Select all

  80                fibo:
  81 0000 55            pushq   %rbp    #
  82 0001 31ED          xorl    %ebp, %ebp  # add_acc_9
  83 0003 53            pushq   %rbx    #
  84 0004 8D5FFE        leal    -2(%rdi), %ebx  #, ivtmp.15
  85 0007 4883EC08      subq    $8, %rsp    #,
  86                .L3:
  87                # try.c:9:  if (n == 0)
  88 000b 83FBFE        cmpl    $-2, %ebx   #, ivtmp.15
  89 000e 7412          je  .L4 #,
  90                # try.c:13:     if (n < 2)
  91 0010 83FBFF        cmpl    $-1, %ebx   #, ivtmp.15
  92 0013 7411          je  .L5 #,
  93                # try.c:17:     return (fibo(n - 2) + fibo(n - 1));
  94 0015 89DF          movl    %ebx, %edi  # ivtmp.15,
  95 0017 FFCB          decl    %ebx    # ivtmp.15
  96 0019 E8E2FFFF      call    fibo    #
  96      FF
  97 001e 01C5          addl    %eax, %ebp  # _2, add_acc_9
  98 0020 EBE9          jmp .L3 #
  99                .L4:
 100                # try.c:11:         return 0;
 101 0022 31C0          xorl    %eax, %eax  # _4
 102 0024 EB05          jmp .L2 #
 103                .L5:
 104                # try.c:15:         return 1;
 105 0026 B8010000      movl    $1, %eax    #, _4
 105      00
 106                .L2:
 107 002b 01E8          addl    %ebp, %eax  # add_acc_9, tmp102
 108                # try.c:18: }
 109 002d 5A            popq    %rdx    #
 110 002e 5B            popq    %rbx    #
 111 002f 5D            popq    %rbp    #
 112 0030 C3            ret
You can see a "48" prefix byte for "decq" compared to "decl". Its not many.

ejolson
Posts: 3415
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Mon Mar 26, 2018 2:15 am

Heater wrote:
Sun Mar 25, 2018 9:30 pm
ejolson,
...shows 32-bit integers performing at twice the speed as 64-bit integers on x86 hardware running in 64-bit mode.
My little test below runs in about 16.5 seconds no matter if the type used is a 32 or 64 bit integer on my Intel 64 bit Surface Pro.
The speed of a recursive implementation of the Fibonacci sequence mostly reflects function call overhead. Integer arithmetic, in this case, is a trivial part of the total execution time. Try running the code I mentioned if you want to see a difference.

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Mar 26, 2018 7:35 am

ejolson wrote:
Mon Mar 26, 2018 2:15 am
Try running the code I mentioned if you want to see a difference.
This does a lot of divisions (probably much like sysbench).

if(n%prime[k]==0) return 0;

64-bit division takes much longer than 32-bit division, because its doing more, and that dominates the time.
Most other instructions take the same time, even things like popcount.
Again, see http://www.agner.org/optimize/instruction_tables.pdf

I'm not sure what the point of comparing the speed of 32 and 64-bit operations in the same 64-bit mode is?

The real difference is comparing 64-bit division on a 32-bit platform (Pi) with a 64-bit platform (say Odroid C2), both with the same cpu.

pi3+ (Cortex A53 in 32 bit mode, gcc 7.3, 1.4GHz)
Found a total of 664579 primes (64-bit)

real 0m14.691s
user 0m14.680s
sys 0m0.000s

Odroid C2 (Cortex A53 in 64 bit mode, gcc 7.3, 1.68GHz)
Found a total of 664579 primes (64-bit)

real 0m2.689s
user 0m2.680s
sys 0m0.000s

Correcting for clock speed gives 3.22 sec for 64bit mode compared to 14.69 sec for 32-bit mode,
note the Pi does the division with a library call.

Now the same thing with 32-bit division that the Pi can do:-

Pi
Found a total of 664579 primes (32-bit)

real 0m3.332s
user 0m3.330s
sys 0m0.000s

odroid c2
Found a total of 664579 primes (32-bit)

real 0m2.493s
user 0m2.480s
sys 0m0.000s

Correcting for clock speed gives 2.99 sec for 64bit mode compared to 3.33 sec for 32-bit mode.

Perhaps this is a plausible measure of the 32/64 speed difference 11.4% faster for this little benchmark.

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Mar 26, 2018 8:40 am

Another way of looking at this is that the RPF did some very clever engineering with the 3B+ to extend the life of the 40nm SOC by the 16% clock speed hike. Perhaps going to 64-bit might give say 10-15% speed increase again, with no change to the SOC - extending its life even more and delaying the expensive die shrink.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23362
Joined: Sat Jul 30, 2011 7:41 pm

Re: 64-bit operating system

Mon Mar 26, 2018 9:05 am

In some operations on the VPU we do hit SDRAM bandwidth problems (moving video around), I do wonder if this can happen when you move to 64bit integers - you are accessing the RAM more, which may be a cause of very slight slowdowns when you have a very (very) high number of SDRAM accesses going on.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Mar 26, 2018 9:19 am

jamesh wrote:
Mon Mar 26, 2018 9:05 am
In some operations on the VPU we do hit SDRAM bandwidth problems (moving video around), I do wonder if this can happen when you move to 64bit integers - you are accessing the RAM more, which may be a cause of very slight slowdowns when you have a very (very) high number of SDRAM accesses going on.
Yes agreed, and 64-bit pointers too.

For scalar stuff though (e.g. not arrays or big structs), there is generally less memory access because there are twice the number of registers. For example it is much less common to put local scalar variables on the stack when you have 31 integer registers and 32 float registers - and so the stack needs adjusting less often and function calls are faster (how often do you have more than 31 integer/pointer local variables!).

Heater
Posts: 13099
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Mon Mar 26, 2018 9:33 am

jahboater,
They are both identical in size on my PC (gcc 7.3).....ust the usual C overheard I suppose.
Strangely enough the resulting executables are exactly the same size when using 32 bit ints or 64.

But, the sizes I quoted are for only the fibo() function itself. Excluding the rest of the code. The fibo() using 64 bit ints is bigger. They both seem to be huge for such a small simple function. I am using GCC version 5.4.0 and -O3

Hmm...that's odd. My generated code looks very different:

fibo() with 64 bit ints:

Code: Select all

0000000000400600 <fibo>:
  400600:	48 85 ff             	test   %rdi,%rdi
  400603:	0f 84 36 03 00 00    	je     40093f <fibo+0x33f>
  400609:	48 83 ff 01          	cmp    $0x1,%rdi
  40060d:	0f 86 26 03 00 00    	jbe    400939 <fibo+0x339>
  400613:	41 57                	push   %r15
  400615:	41 56                	push   %r14
  400617:	48 8d 47 fe          	lea    -0x2(%rdi),%rax
  40061b:	41 55                	push   %r13
  40061d:	41 54                	push   %r12
  40061f:	55                   	push   %rbp
  400620:	53                   	push   %rbx
  400621:	48 83 ec 68          	sub    $0x68,%rsp
  400625:	48 89 04 24          	mov    %rax,(%rsp)
  400629:	48 c7 44 24 08 00 00 	movq   $0x0,0x8(%rsp)
  400630:	00 00 
  400632:	48 85 c0             	test   %rax,%rax
  400635:	0f 84 fa 02 00 00    	je     400935 <fibo+0x335>
  40063b:	48 83 f8 01          	cmp    $0x1,%rax
  40063f:	0f 84 bb 02 00 00    	je     400900 <fibo+0x300>
  400645:	48 83 e8 02          	sub    $0x2,%rax
  400649:	48 c7 44 24 38 00 00 	movq   $0x0,0x38(%rsp)
  400650:	00 00 
  400652:	48 89 44 24 10       	mov    %rax,0x10(%rsp)
  400657:	48 85 c0             	test   %rax,%rax
  40065a:	0f 84 ee 02 00 00    	je     40094e <fibo+0x34e>
  400660:	48 83 f8 01          	cmp    $0x1,%rax
  400664:	0f 84 d8 02 00 00    	je     400942 <fibo+0x342>
  40066a:	48 83 e8 02          	sub    $0x2,%rax
  40066e:	48 c7 44 24 40 00 00 	movq   $0x0,0x40(%rsp)
  400675:	00 00 
  400677:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
  40067c:	48 85 c0             	test   %rax,%rax
  40067f:	0f 84 77 02 00 00    	je     4008fc <fibo+0x2fc>
  400685:	48 83 f8 01          	cmp    $0x1,%rax
  400689:	0f 84 61 02 00 00    	je     4008f0 <fibo+0x2f0>
  40068f:	48 83 e8 02          	sub    $0x2,%rax
  400693:	48 c7 44 24 48 00 00 	movq   $0x0,0x48(%rsp)
  40069a:	00 00 
  40069c:	48 89 44 24 20       	mov    %rax,0x20(%rsp)
  4006a1:	48 85 c0             	test   %rax,%rax
  4006a4:	0f 84 0b 02 00 00    	je     4008b5 <fibo+0x2b5>
  4006aa:	48 83 f8 01          	cmp    $0x1,%rax
  4006ae:	0f 84 f5 01 00 00    	je     4008a9 <fibo+0x2a9>
  4006b4:	48 83 e8 02          	sub    $0x2,%rax
  4006b8:	48 c7 44 24 50 00 00 	movq   $0x0,0x50(%rsp)
  4006bf:	00 00 
  4006c1:	48 85 c0             	test   %rax,%rax
  4006c4:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
  4006c9:	0f 84 52 01 00 00    	je     400821 <fibo+0x221>
  4006cf:	48 83 f8 01          	cmp    $0x1,%rax
  4006d3:	0f 84 c1 01 00 00    	je     40089a <fibo+0x29a>
  4006d9:	48 83 e8 02          	sub    $0x2,%rax
  4006dd:	48 c7 44 24 58 00 00 	movq   $0x0,0x58(%rsp)
  4006e4:	00 00 
  4006e6:	48 85 c0             	test   %rax,%rax
  4006e9:	48 89 44 24 30       	mov    %rax,0x30(%rsp)
  4006ee:	0f 84 e3 00 00 00    	je     4007d7 <fibo+0x1d7>
  4006f4:	45 31 f6             	xor    %r14d,%r14d
  4006f7:	48 83 f8 01          	cmp    $0x1,%rax
  4006fb:	4c 8d 68 fe          	lea    -0x2(%rax),%r13
  4006ff:	0f 84 53 01 00 00    	je     400858 <fibo+0x258>
  400705:	4d 85 ed             	test   %r13,%r13
  400708:	0f 84 8e 00 00 00    	je     40079c <fibo+0x19c>
  40070e:	45 31 ff             	xor    %r15d,%r15d
  400711:	49 83 fd 01          	cmp    $0x1,%r13
  400715:	4d 8d 65 fe          	lea    -0x2(%r13),%r12
  400719:	0f 84 c1 00 00 00    	je     4007e0 <fibo+0x1e0>
  40071f:	4d 85 e4             	test   %r12,%r12
  400722:	74 47                	je     40076b <fibo+0x16b>
  400724:	0f 1f 40 00          	nopl   0x0(%rax)
  400728:	49 83 fc 01          	cmp    $0x1,%r12
  40072c:	74 72                	je     4007a0 <fibo+0x1a0>
  40072e:	4c 89 e3             	mov    %r12,%rbx
  400731:	31 ed                	xor    %ebp,%ebp
  400733:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  400738:	48 8d 7b fe          	lea    -0x2(%rbx),%rdi
  40073c:	48 83 eb 01          	sub    $0x1,%rbx
  400740:	e8 bb fe ff ff       	callq  400600 <fibo>
  400745:	48 01 c5             	add    %rax,%rbp
  400748:	48 83 fb 01          	cmp    $0x1,%rbx
  40074c:	75 ea                	jne    400738 <fibo+0x138>
  40074e:	49 83 fc ff          	cmp    $0xffffffffffffffff,%r12
  400752:	4e 8d 7c 3d 01       	lea    0x1(%rbp,%r15,1),%r15
  400757:	74 25                	je     40077e <fibo+0x17e>
  400759:	4d 85 e4             	test   %r12,%r12
  40075c:	49 8d 44 24 ff       	lea    -0x1(%r12),%rax
  400761:	74 17                	je     40077a <fibo+0x17a>
  400763:	49 89 c4             	mov    %rax,%r12
  400766:	4d 85 e4             	test   %r12,%r12
  400769:	75 bd                	jne    400728 <fibo+0x128>
  40076b:	31 c0                	xor    %eax,%eax
  40076d:	49 01 c7             	add    %rax,%r15
  400770:	4d 85 e4             	test   %r12,%r12
  400773:	49 8d 44 24 ff       	lea    -0x1(%r12),%rax
  400778:	75 e9                	jne    400763 <fibo+0x163>
  40077a:	49 83 c7 01          	add    $0x1,%r15
  40077e:	4d 01 fe             	add    %r15,%r14
  400781:	49 83 fd ff          	cmp    $0xffffffffffffffff,%r13
  400785:	74 24                	je     4007ab <fibo+0x1ab>
  400787:	4d 85 ed             	test   %r13,%r13
  40078a:	49 8d 45 ff          	lea    -0x1(%r13),%rax
  40078e:	74 17                	je     4007a7 <fibo+0x1a7>
  400790:	49 89 c5             	mov    %rax,%r13
  400793:	4d 85 ed             	test   %r13,%r13
  400796:	0f 85 72 ff ff ff    	jne    40070e <fibo+0x10e>
  40079c:	31 c0                	xor    %eax,%eax
  40079e:	eb 45                	jmp    4007e5 <fibo+0x1e5>
  4007a0:	b8 01 00 00 00       	mov    $0x1,%eax
  4007a5:	eb c6                	jmp    40076d <fibo+0x16d>
  4007a7:	49 83 c6 01          	add    $0x1,%r14
  4007ab:	4c 01 74 24 58       	add    %r14,0x58(%rsp)
  4007b0:	48 83 7c 24 30 ff    	cmpq   $0xffffffffffffffff,0x30(%rsp)
  4007b6:	74 38                	je     4007f0 <fibo+0x1f0>
  4007b8:	48 8b 4c 24 30       	mov    0x30(%rsp),%rcx
  4007bd:	48 89 c8             	mov    %rcx,%rax
  4007c0:	48 83 e8 01          	sub    $0x1,%rax
  4007c4:	48 85 c9             	test   %rcx,%rcx
  4007c7:	74 21                	je     4007ea <fibo+0x1ea>
  4007c9:	48 85 c0             	test   %rax,%rax
  4007cc:	48 89 44 24 30       	mov    %rax,0x30(%rsp)
  4007d1:	0f 85 1d ff ff ff    	jne    4006f4 <fibo+0xf4>
  4007d7:	31 c0                	xor    %eax,%eax
  4007d9:	e9 7f 00 00 00       	jmpq   40085d <fibo+0x25d>
  4007de:	66 90                	xchg   %ax,%ax
  4007e0:	b8 01 00 00 00       	mov    $0x1,%eax
  4007e5:	49 01 c6             	add    %rax,%r14
  4007e8:	eb 9d                	jmp    400787 <fibo+0x187>
  4007ea:	48 83 44 24 58 01    	addq   $0x1,0x58(%rsp)
  4007f0:	48 8b 74 24 58       	mov    0x58(%rsp),%rsi
  4007f5:	48 01 74 24 50       	add    %rsi,0x50(%rsp)
  4007fa:	48 83 7c 24 28 ff    	cmpq   $0xffffffffffffffff,0x28(%rsp)
  400800:	74 29                	je     40082b <fibo+0x22b>
  400802:	48 8b 54 24 28       	mov    0x28(%rsp),%rdx
  400807:	48 89 d0             	mov    %rdx,%rax
  40080a:	48 83 e8 01          	sub    $0x1,%rax
  40080e:	48 85 d2             	test   %rdx,%rdx
  400811:	74 12                	je     400825 <fibo+0x225>
  400813:	48 85 c0             	test   %rax,%rax
  400816:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
  40081b:	0f 85 ae fe ff ff    	jne    4006cf <fibo+0xcf>
  400821:	31 c0                	xor    %eax,%eax
  400823:	eb 7a                	jmp    40089f <fibo+0x29f>
  400825:	48 83 44 24 50 01    	addq   $0x1,0x50(%rsp)
  40082b:	48 8b 4c 24 50       	mov    0x50(%rsp),%rcx
  400830:	48 01 4c 24 48       	add    %rcx,0x48(%rsp)
  400835:	48 83 7c 24 20 ff    	cmpq   $0xffffffffffffffff,0x20(%rsp)
  40083b:	74 30                	je     40086d <fibo+0x26d>
  40083d:	48 8b 74 24 20       	mov    0x20(%rsp),%rsi
  400842:	48 89 f0             	mov    %rsi,%rax
  400845:	48 83 e8 01          	sub    $0x1,%rax
  400849:	48 85 f6             	test   %rsi,%rsi
  40084c:	74 19                	je     400867 <fibo+0x267>
  40084e:	48 89 44 24 20       	mov    %rax,0x20(%rsp)
  400853:	e9 49 fe ff ff       	jmpq   4006a1 <fibo+0xa1>
  400858:	b8 01 00 00 00       	mov    $0x1,%eax
  40085d:	48 01 44 24 58       	add    %rax,0x58(%rsp)
  400862:	e9 51 ff ff ff       	jmpq   4007b8 <fibo+0x1b8>
  400867:	48 83 44 24 48 01    	addq   $0x1,0x48(%rsp)
  40086d:	48 8b 54 24 48       	mov    0x48(%rsp),%rdx
  400872:	48 01 54 24 40       	add    %rdx,0x40(%rsp)
  400877:	48 83 7c 24 18 ff    	cmpq   $0xffffffffffffffff,0x18(%rsp)
  40087d:	74 40                	je     4008bf <fibo+0x2bf>
  40087f:	48 8b 4c 24 18       	mov    0x18(%rsp),%rcx
  400884:	48 89 c8             	mov    %rcx,%rax
  400887:	48 83 e8 01          	sub    $0x1,%rax
  40088b:	48 85 c9             	test   %rcx,%rcx
  40088e:	74 29                	je     4008b9 <fibo+0x2b9>
  400890:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
  400895:	e9 e2 fd ff ff       	jmpq   40067c <fibo+0x7c>
  40089a:	b8 01 00 00 00       	mov    $0x1,%eax
  40089f:	48 01 44 24 50       	add    %rax,0x50(%rsp)
  4008a4:	e9 59 ff ff ff       	jmpq   400802 <fibo+0x202>
  4008a9:	b8 01 00 00 00       	mov    $0x1,%eax
  4008ae:	48 01 44 24 48       	add    %rax,0x48(%rsp)
  4008b3:	eb 88                	jmp    40083d <fibo+0x23d>
  4008b5:	31 c0                	xor    %eax,%eax
  4008b7:	eb f5                	jmp    4008ae <fibo+0x2ae>
  4008b9:	48 83 44 24 40 01    	addq   $0x1,0x40(%rsp)
  4008bf:	48 8b 74 24 40       	mov    0x40(%rsp),%rsi
  4008c4:	48 01 74 24 38       	add    %rsi,0x38(%rsp)
  4008c9:	48 83 7c 24 10 ff    	cmpq   $0xffffffffffffffff,0x10(%rsp)
  4008cf:	0f 84 93 00 00 00    	je     400968 <fibo+0x368>
  4008d5:	48 8b 54 24 10       	mov    0x10(%rsp),%rdx
  4008da:	48 89 d0             	mov    %rdx,%rax
  4008dd:	48 83 e8 01          	sub    $0x1,%rax
  4008e1:	48 85 d2             	test   %rdx,%rdx
  4008e4:	74 6c                	je     400952 <fibo+0x352>
  4008e6:	48 89 44 24 10       	mov    %rax,0x10(%rsp)
  4008eb:	e9 67 fd ff ff       	jmpq   400657 <fibo+0x57>
  4008f0:	b8 01 00 00 00       	mov    $0x1,%eax
  4008f5:	48 01 44 24 40       	add    %rax,0x40(%rsp)
  4008fa:	eb 83                	jmp    40087f <fibo+0x27f>
  4008fc:	31 c0                	xor    %eax,%eax
  4008fe:	eb f5                	jmp    4008f5 <fibo+0x2f5>
  400900:	b8 01 00 00 00       	mov    $0x1,%eax
  400905:	48 01 44 24 08       	add    %rax,0x8(%rsp)
  40090a:	48 83 2c 24 01       	subq   $0x1,(%rsp)
  40090f:	48 8b 04 24          	mov    (%rsp),%rax
  400913:	48 83 f8 ff          	cmp    $0xffffffffffffffff,%rax
  400917:	0f 85 15 fd ff ff    	jne    400632 <fibo+0x32>
  40091d:	48 8b 44 24 08       	mov    0x8(%rsp),%rax
  400922:	48 83 c4 68          	add    $0x68,%rsp
  400926:	5b                   	pop    %rbx
  400927:	5d                   	pop    %rbp
  400928:	48 83 c0 01          	add    $0x1,%rax
  40092c:	41 5c                	pop    %r12
  40092e:	41 5d                	pop    %r13
  400930:	41 5e                	pop    %r14
  400932:	41 5f                	pop    %r15
  400934:	c3                   	retq   
  400935:	31 c0                	xor    %eax,%eax
  400937:	eb cc                	jmp    400905 <fibo+0x305>
  400939:	b8 01 00 00 00       	mov    $0x1,%eax
  40093e:	c3                   	retq   
  40093f:	31 c0                	xor    %eax,%eax
  400941:	c3                   	retq   
  400942:	b8 01 00 00 00       	mov    $0x1,%eax
  400947:	48 01 44 24 38       	add    %rax,0x38(%rsp)
  40094c:	eb 87                	jmp    4008d5 <fibo+0x2d5>
  40094e:	31 c0                	xor    %eax,%eax
  400950:	eb f5                	jmp    400947 <fibo+0x347>
  400952:	48 8b 4c 24 08       	mov    0x8(%rsp),%rcx
  400957:	48 8b 44 24 38       	mov    0x38(%rsp),%rax
  40095c:	48 8d 44 08 01       	lea    0x1(%rax,%rcx,1),%rax
  400961:	48 89 44 24 08       	mov    %rax,0x8(%rsp)
  400966:	eb a2                	jmp    40090a <fibo+0x30a>
  400968:	48 8b 4c 24 38       	mov    0x38(%rsp),%rcx
  40096d:	48 01 4c 24 08       	add    %rcx,0x8(%rsp)
  400972:	eb 96                	jmp    40090a <fibo+0x30a>
  400974:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40097b:	00 00 00 
  40097e:	66 90                	xchg   %ax,%ax
fibo() with 32 bit ints:

Code: Select all

0000000000400600 <fibo>:
  400600:	85 ff                	test   %edi,%edi
  400602:	0f 84 d8 02 00 00    	je     4008e0 <fibo+0x2e0>
  400608:	83 ff 01             	cmp    $0x1,%edi
  40060b:	0f 86 c9 02 00 00    	jbe    4008da <fibo+0x2da>
  400611:	41 57                	push   %r15
  400613:	41 56                	push   %r14
  400615:	8d 47 fe             	lea    -0x2(%rdi),%eax
  400618:	41 55                	push   %r13
  40061a:	41 54                	push   %r12
  40061c:	55                   	push   %rbp
  40061d:	53                   	push   %rbx
  40061e:	48 83 ec 38          	sub    $0x38,%rsp
  400622:	89 04 24             	mov    %eax,(%rsp)
  400625:	c7 44 24 04 00 00 00 	movl   $0x0,0x4(%rsp)
  40062c:	00 
  40062d:	85 c0                	test   %eax,%eax
  40062f:	0f 84 a1 02 00 00    	je     4008d6 <fibo+0x2d6>
  400635:	83 f8 01             	cmp    $0x1,%eax
  400638:	0f 84 69 02 00 00    	je     4008a7 <fibo+0x2a7>
  40063e:	83 e8 02             	sub    $0x2,%eax
  400641:	c7 44 24 1c 00 00 00 	movl   $0x0,0x1c(%rsp)
  400648:	00 
  400649:	89 44 24 08          	mov    %eax,0x8(%rsp)
  40064d:	85 c0                	test   %eax,%eax
  40064f:	0f 84 99 02 00 00    	je     4008ee <fibo+0x2ee>
  400655:	83 f8 01             	cmp    $0x1,%eax
  400658:	0f 84 85 02 00 00    	je     4008e3 <fibo+0x2e3>
  40065e:	83 e8 02             	sub    $0x2,%eax
  400661:	c7 44 24 20 00 00 00 	movl   $0x0,0x20(%rsp)
  400668:	00 
  400669:	89 44 24 0c          	mov    %eax,0xc(%rsp)
  40066d:	85 c0                	test   %eax,%eax
  40066f:	0f 84 2e 02 00 00    	je     4008a3 <fibo+0x2a3>
  400675:	83 f8 01             	cmp    $0x1,%eax
  400678:	0f 84 1a 02 00 00    	je     400898 <fibo+0x298>
  40067e:	83 e8 02             	sub    $0x2,%eax
  400681:	c7 44 24 24 00 00 00 	movl   $0x0,0x24(%rsp)
  400688:	00 
  400689:	89 44 24 10          	mov    %eax,0x10(%rsp)
  40068d:	85 c0                	test   %eax,%eax
  40068f:	0f 84 d1 01 00 00    	je     400866 <fibo+0x266>
  400695:	83 f8 01             	cmp    $0x1,%eax
  400698:	0f 84 bd 01 00 00    	je     40085b <fibo+0x25b>
  40069e:	83 e8 02             	sub    $0x2,%eax
  4006a1:	c7 44 24 28 00 00 00 	movl   $0x0,0x28(%rsp)
  4006a8:	00 
  4006a9:	85 c0                	test   %eax,%eax
  4006ab:	89 44 24 14          	mov    %eax,0x14(%rsp)
  4006af:	0f 84 32 01 00 00    	je     4007e7 <fibo+0x1e7>
  4006b5:	83 f8 01             	cmp    $0x1,%eax
  4006b8:	0f 84 8f 01 00 00    	je     40084d <fibo+0x24d>
  4006be:	83 e8 02             	sub    $0x2,%eax
  4006c1:	c7 44 24 2c 00 00 00 	movl   $0x0,0x2c(%rsp)
  4006c8:	00 
  4006c9:	85 c0                	test   %eax,%eax
  4006cb:	89 44 24 18          	mov    %eax,0x18(%rsp)
  4006cf:	0f 84 d7 00 00 00    	je     4007ac <fibo+0x1ac>
  4006d5:	45 31 f6             	xor    %r14d,%r14d
  4006d8:	83 f8 01             	cmp    $0x1,%eax
  4006db:	44 8d 68 fe          	lea    -0x2(%rax),%r13d
  4006df:	0f 84 30 01 00 00    	je     400815 <fibo+0x215>
  4006e5:	45 85 ed             	test   %r13d,%r13d
  4006e8:	0f 84 8a 00 00 00    	je     400778 <fibo+0x178>
  4006ee:	45 31 ff             	xor    %r15d,%r15d
  4006f1:	41 83 fd 01          	cmp    $0x1,%r13d
  4006f5:	45 8d 65 fe          	lea    -0x2(%r13),%r12d
  4006f9:	0f 84 b1 00 00 00    	je     4007b0 <fibo+0x1b0>
  4006ff:	45 85 e4             	test   %r12d,%r12d
  400702:	74 43                	je     400747 <fibo+0x147>
  400704:	0f 1f 40 00          	nopl   0x0(%rax)
  400708:	41 83 fc 01          	cmp    $0x1,%r12d
  40070c:	74 6e                	je     40077c <fibo+0x17c>
  40070e:	44 89 e3             	mov    %r12d,%ebx
  400711:	31 ed                	xor    %ebp,%ebp
  400713:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  400718:	8d 7b fe             	lea    -0x2(%rbx),%edi
  40071b:	83 eb 01             	sub    $0x1,%ebx
  40071e:	e8 dd fe ff ff       	callq  400600 <fibo>
  400723:	01 c5                	add    %eax,%ebp
  400725:	83 fb 01             	cmp    $0x1,%ebx
  400728:	75 ee                	jne    400718 <fibo+0x118>
  40072a:	41 83 fc ff          	cmp    $0xffffffff,%r12d
  40072e:	46 8d 7c 3d 01       	lea    0x1(%rbp,%r15,1),%r15d
  400733:	74 25                	je     40075a <fibo+0x15a>
  400735:	45 85 e4             	test   %r12d,%r12d
  400738:	41 8d 44 24 ff       	lea    -0x1(%r12),%eax
  40073d:	74 17                	je     400756 <fibo+0x156>
  40073f:	41 89 c4             	mov    %eax,%r12d
  400742:	45 85 e4             	test   %r12d,%r12d
  400745:	75 c1                	jne    400708 <fibo+0x108>
  400747:	31 c0                	xor    %eax,%eax
  400749:	41 01 c7             	add    %eax,%r15d
  40074c:	45 85 e4             	test   %r12d,%r12d
  40074f:	41 8d 44 24 ff       	lea    -0x1(%r12),%eax
  400754:	75 e9                	jne    40073f <fibo+0x13f>
  400756:	41 83 c7 01          	add    $0x1,%r15d
  40075a:	45 01 fe             	add    %r15d,%r14d
  40075d:	41 83 fd ff          	cmp    $0xffffffff,%r13d
  400761:	74 24                	je     400787 <fibo+0x187>
  400763:	45 85 ed             	test   %r13d,%r13d
  400766:	41 8d 45 ff          	lea    -0x1(%r13),%eax
  40076a:	74 17                	je     400783 <fibo+0x183>
  40076c:	41 89 c5             	mov    %eax,%r13d
  40076f:	45 85 ed             	test   %r13d,%r13d
  400772:	0f 85 76 ff ff ff    	jne    4006ee <fibo+0xee>
  400778:	31 c0                	xor    %eax,%eax
  40077a:	eb 39                	jmp    4007b5 <fibo+0x1b5>
  40077c:	b8 01 00 00 00       	mov    $0x1,%eax
  400781:	eb c6                	jmp    400749 <fibo+0x149>
  400783:	41 83 c6 01          	add    $0x1,%r14d
  400787:	44 01 74 24 2c       	add    %r14d,0x2c(%rsp)
  40078c:	83 7c 24 18 ff       	cmpl   $0xffffffff,0x18(%rsp)
  400791:	74 2c                	je     4007bf <fibo+0x1bf>
  400793:	8b 4c 24 18          	mov    0x18(%rsp),%ecx
  400797:	89 c8                	mov    %ecx,%eax
  400799:	83 e8 01             	sub    $0x1,%eax
  40079c:	85 c9                	test   %ecx,%ecx
  40079e:	74 1a                	je     4007ba <fibo+0x1ba>
  4007a0:	85 c0                	test   %eax,%eax
  4007a2:	89 44 24 18          	mov    %eax,0x18(%rsp)
  4007a6:	0f 85 29 ff ff ff    	jne    4006d5 <fibo+0xd5>
  4007ac:	31 c0                	xor    %eax,%eax
  4007ae:	eb 6a                	jmp    40081a <fibo+0x21a>
  4007b0:	b8 01 00 00 00       	mov    $0x1,%eax
  4007b5:	41 01 c6             	add    %eax,%r14d
  4007b8:	eb a9                	jmp    400763 <fibo+0x163>
  4007ba:	83 44 24 2c 01       	addl   $0x1,0x2c(%rsp)
  4007bf:	8b 74 24 2c          	mov    0x2c(%rsp),%esi
  4007c3:	01 74 24 28          	add    %esi,0x28(%rsp)
  4007c7:	83 7c 24 14 ff       	cmpl   $0xffffffff,0x14(%rsp)
  4007cc:	74 22                	je     4007f0 <fibo+0x1f0>
  4007ce:	8b 54 24 14          	mov    0x14(%rsp),%edx
  4007d2:	89 d0                	mov    %edx,%eax
  4007d4:	83 e8 01             	sub    $0x1,%eax
  4007d7:	85 d2                	test   %edx,%edx
  4007d9:	74 10                	je     4007eb <fibo+0x1eb>
  4007db:	85 c0                	test   %eax,%eax
  4007dd:	89 44 24 14          	mov    %eax,0x14(%rsp)
  4007e1:	0f 85 ce fe ff ff    	jne    4006b5 <fibo+0xb5>
  4007e7:	31 c0                	xor    %eax,%eax
  4007e9:	eb 67                	jmp    400852 <fibo+0x252>
  4007eb:	83 44 24 28 01       	addl   $0x1,0x28(%rsp)
  4007f0:	8b 4c 24 28          	mov    0x28(%rsp),%ecx
  4007f4:	01 4c 24 24          	add    %ecx,0x24(%rsp)
  4007f8:	83 7c 24 10 ff       	cmpl   $0xffffffff,0x10(%rsp)
  4007fd:	74 29                	je     400828 <fibo+0x228>
  4007ff:	8b 74 24 10          	mov    0x10(%rsp),%esi
  400803:	89 f0                	mov    %esi,%eax
  400805:	83 e8 01             	sub    $0x1,%eax
  400808:	85 f6                	test   %esi,%esi
  40080a:	74 17                	je     400823 <fibo+0x223>
  40080c:	89 44 24 10          	mov    %eax,0x10(%rsp)
  400810:	e9 78 fe ff ff       	jmpq   40068d <fibo+0x8d>
  400815:	b8 01 00 00 00       	mov    $0x1,%eax
  40081a:	01 44 24 2c          	add    %eax,0x2c(%rsp)
  40081e:	e9 70 ff ff ff       	jmpq   400793 <fibo+0x193>
  400823:	83 44 24 24 01       	addl   $0x1,0x24(%rsp)
  400828:	8b 54 24 24          	mov    0x24(%rsp),%edx
  40082c:	01 54 24 20          	add    %edx,0x20(%rsp)
  400830:	83 7c 24 0c ff       	cmpl   $0xffffffff,0xc(%rsp)
  400835:	74 38                	je     40086f <fibo+0x26f>
  400837:	8b 4c 24 0c          	mov    0xc(%rsp),%ecx
  40083b:	89 c8                	mov    %ecx,%eax
  40083d:	83 e8 01             	sub    $0x1,%eax
  400840:	85 c9                	test   %ecx,%ecx
  400842:	74 26                	je     40086a <fibo+0x26a>
  400844:	89 44 24 0c          	mov    %eax,0xc(%rsp)
  400848:	e9 20 fe ff ff       	jmpq   40066d <fibo+0x6d>
  40084d:	b8 01 00 00 00       	mov    $0x1,%eax
  400852:	01 44 24 28          	add    %eax,0x28(%rsp)
  400856:	e9 73 ff ff ff       	jmpq   4007ce <fibo+0x1ce>
  40085b:	b8 01 00 00 00       	mov    $0x1,%eax
  400860:	01 44 24 24          	add    %eax,0x24(%rsp)
  400864:	eb 99                	jmp    4007ff <fibo+0x1ff>
  400866:	31 c0                	xor    %eax,%eax
  400868:	eb f6                	jmp    400860 <fibo+0x260>
  40086a:	83 44 24 20 01       	addl   $0x1,0x20(%rsp)
  40086f:	8b 74 24 20          	mov    0x20(%rsp),%esi
  400873:	01 74 24 1c          	add    %esi,0x1c(%rsp)
  400877:	83 7c 24 08 ff       	cmpl   $0xffffffff,0x8(%rsp)
  40087c:	0f 84 82 00 00 00    	je     400904 <fibo+0x304>
  400882:	8b 54 24 08          	mov    0x8(%rsp),%edx
  400886:	89 d0                	mov    %edx,%eax
  400888:	83 e8 01             	sub    $0x1,%eax
  40088b:	85 d2                	test   %edx,%edx
  40088d:	74 63                	je     4008f2 <fibo+0x2f2>
  40088f:	89 44 24 08          	mov    %eax,0x8(%rsp)
  400893:	e9 b5 fd ff ff       	jmpq   40064d <fibo+0x4d>
  400898:	b8 01 00 00 00       	mov    $0x1,%eax
  40089d:	01 44 24 20          	add    %eax,0x20(%rsp)
  4008a1:	eb 94                	jmp    400837 <fibo+0x237>
  4008a3:	31 c0                	xor    %eax,%eax
  4008a5:	eb f6                	jmp    40089d <fibo+0x29d>
  4008a7:	b8 01 00 00 00       	mov    $0x1,%eax
  4008ac:	01 44 24 04          	add    %eax,0x4(%rsp)
  4008b0:	83 2c 24 01          	subl   $0x1,(%rsp)
  4008b4:	8b 04 24             	mov    (%rsp),%eax
  4008b7:	83 f8 ff             	cmp    $0xffffffff,%eax
  4008ba:	0f 85 6d fd ff ff    	jne    40062d <fibo+0x2d>
  4008c0:	8b 44 24 04          	mov    0x4(%rsp),%eax
  4008c4:	48 83 c4 38          	add    $0x38,%rsp
  4008c8:	5b                   	pop    %rbx
  4008c9:	5d                   	pop    %rbp
  4008ca:	83 c0 01             	add    $0x1,%eax
  4008cd:	41 5c                	pop    %r12
  4008cf:	41 5d                	pop    %r13
  4008d1:	41 5e                	pop    %r14
  4008d3:	41 5f                	pop    %r15
  4008d5:	c3                   	retq   
  4008d6:	31 c0                	xor    %eax,%eax
  4008d8:	eb d2                	jmp    4008ac <fibo+0x2ac>
  4008da:	b8 01 00 00 00       	mov    $0x1,%eax
  4008df:	c3                   	retq   
  4008e0:	31 c0                	xor    %eax,%eax
  4008e2:	c3                   	retq   
  4008e3:	b8 01 00 00 00       	mov    $0x1,%eax
  4008e8:	01 44 24 1c          	add    %eax,0x1c(%rsp)
  4008ec:	eb 94                	jmp    400882 <fibo+0x282>
  4008ee:	31 c0                	xor    %eax,%eax
  4008f0:	eb f6                	jmp    4008e8 <fibo+0x2e8>
  4008f2:	8b 4c 24 04          	mov    0x4(%rsp),%ecx
  4008f6:	8b 44 24 1c          	mov    0x1c(%rsp),%eax
  4008fa:	8d 44 08 01          	lea    0x1(%rax,%rcx,1),%eax
  4008fe:	89 44 24 04          	mov    %eax,0x4(%rsp)
  400902:	eb ac                	jmp    4008b0 <fibo+0x2b0>
  400904:	8b 4c 24 1c          	mov    0x1c(%rsp),%ecx
  400908:	01 4c 24 04          	add    %ecx,0x4(%rsp)
  40090c:	eb a2                	jmp    4008b0 <fibo+0x2b0>
  40090e:	66 90                	xchg   %ax,%ax

Heater
Posts: 13099
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Mon Mar 26, 2018 9:43 am

ejolson,
The speed of a recursive implementation of the Fibonacci sequence mostly reflects function call overhead.
Yep. Most of the code I'm running most of the time does not do intense amounts of calculation.
Try running the code I mentioned if you want to see a difference.
OK.

The 32 bit result:

Code: Select all

$ time ./a.out; time ./a.out; time ./a.out
Found a total of 664579 primes (32-bit)

real    0m0.896s
user    0m0.859s
sys     0m0.016s
Found a total of 664579 primes (32-bit)

real    0m0.927s
user    0m0.891s
sys     0m0.016s
Found a total of 664579 primes (32-bit)

real    0m0.939s
user    0m0.906s
sys     0m0.000s
The 64 bit result:

Code: Select all

$ time ./a.out; time ./a.out; time ./a.out
Found a total of 664579 primes (64-bit)

real    0m2.743s
user    0m2.734s
sys     0m0.000s
Found a total of 664579 primes (64-bit)

real    0m2.685s
user    0m2.609s
sys     0m0.016s
Found a total of 664579 primes (64-bit)

real    0m2.706s
user    0m2.656s
sys     0m0.000s
Quite a difference.

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Mar 26, 2018 9:52 am

Heater wrote:
Mon Mar 26, 2018 9:43 am
Quite a difference.
Yes, because 64-bit division is slow - even on 64-bit platforms, and it is horrifically slow on 32-bit platforms.
This prime number benchmark is mostly divisions/remainders.
What platform did you run those tests on?
Have you looked at the assembler (with -S -fverbose-asm)? you might even find the 64-bit division is being done by a library call on the Pi, where the 32-bit division is being done with the udiv instruction.

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Mar 26, 2018 10:16 am

Heater wrote:
Mon Mar 26, 2018 9:33 am
They both seem to be huge for such a small simple function. I am using GCC version 5.4.0 and -O3
Thats because you are using -O3 (and a very old compiler, the last supported version was 6.4).
I never use -O3, it produces absolutely massive code that isn't always faster (it might not fit in the i-cache for example).
-Os produces the most human readable code by the way. Most people use -O2.

You can see the extra 64-bit operand size prefix byte (48) on some instructions - here is the first insn of your fibo function:

400600: 48 85 ff test %rdi,%rdi

400600: 85 ff test %edi,%edi

Its no big deal, ARM instructions are all 4 bytes - and you need a lot more of them.

Heater
Posts: 13099
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Mon Mar 26, 2018 11:18 am

jahboater,

I'm running these tests on a Microsoft Surface Pro. Sorry I don't have Pi to hand just now.
Thats because you are using -O3 (and a very old compiler, the last supported version was 6.4).
I never use -O3, it produces absolutely massive code that isn't always faster (it might not fit in the i-cache for example).
-Os produces the most human readable code by the way. Most people use -O2.
Old compiler, what? It's what is in the current Ubuntu for the Linux Subsystem for Windows.

This is contrary to my past experience where -O3 has generally been the better performer.

In that case changing to -Os shrinks the fibo() function dramatically, down to about 55 bytes. The 64 bit version is a few bytes smaller.

However -Os increases the run time to 18 seconds for 32 bits and 21 seconds for 64 bits. A significant drop in performance.

-O2 turns in much the same timing as -Os.

User avatar
sakaki
Posts: 324
Joined: Sun Jul 16, 2017 1:11 pm

Re: 64-bit operating system

Wed Apr 04, 2018 2:04 pm

FYI, the bootable Gentoo 64-bit image for the RPi3 has now been updated to support the B+.
For more details, please see this post.

ejolson
Posts: 3415
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Wed Apr 11, 2018 4:00 am

jahboater wrote:
Mon Mar 26, 2018 7:35 am
Correcting for clock speed gives 2.99 sec for 64bit mode compared to 3.33 sec for 32-bit mode.

Perhaps this is a plausible measure of the 32/64 speed difference 11.4% faster for this little benchmark.
I performed a similar test using a 1.4GHz ARM Cortex-A53 core running in 64-bit mode and concluded that 9% if the speed increase was due to running in 64-bit mode. I suspect the 3% discrepancy in our results is due to the differing memory speeds, which have not been taking into account. I'll put your Raspberry Pi 3B+ timings into this table. I'm leaving your 64-bit timings out of the table for now because they appear to result from an overclocked system. It sure would be nice to have some timings of the Raspberry 3B+ running in 64-bit mode to compare.

Note that the 64-bit integer Raspberry Pi 3B+ timings, though slow, are much faster than I would have expected. I understand you used gcc-7.3, but what optimizer settings were used?

jahboater
Posts: 4603
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Wed Apr 11, 2018 7:14 am

ejolson wrote:
Wed Apr 11, 2018 4:00 am
Note that the 64-bit integer Raspberry Pi 3B+ timings, though slow, are much faster than I would have expected. I understand you used gcc-7.3, but what optimizer settings were used?
Obviously for a fair comparison it would have been same for 32 and 64 bit ints, but I cant remember if I temporarily changed -Os to -O3 for the benchmarks in my standard makefile (sorry).

-march=native -Os -mneon-for-64bits

"native" works properly with recent GCC on ARM. I believe it results in:-

-mcpu=cortex-a53 -mfpu=neon-fp-armv8

User avatar
Gavinmc42
Posts: 3627
Joined: Wed Aug 28, 2013 3:31 am

Re: 64-bit operating system

Wed Apr 11, 2018 7:27 am

FYI, the bootable Gentoo 64-bit image for the RPi3 has now been updated to support the B+.
Booted on a Pi3B , expanded SD and got Desktop up then shutdown and put SD card into a Pi3B+, bingo it works :D
Have no network here, will test further at home.

I am impressed. it must be magic ;)
It will be fun to play with NEON stuff now.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

User avatar
richlion2
Posts: 75
Joined: Thu Mar 29, 2018 7:14 am

Re: 64-bit operating system

Wed Apr 11, 2018 9:17 am

jamesh wrote:
Mon Mar 19, 2018 4:01 pm
More than 4GB? Not happening for years. RAM is too expensive to put that much on and keep anywhere close to the $35 price point. And RAM prices are currently INCREASING....

I find it amazing how much RAM we 'need' nowadays, when we were doing very similar tasks in 32MB devices not that long ago. Just badly written code in my opinion.
Agreed.

The 64 bit question is a valid one though, why was I assuming Raspbian is ? :cry:

Anyhow, for the price of £34 (incl. delivery) I do not expect the desktop to be flying. Would I want to pay more to have more ram? I don't know, I am close to trying a Blue ray movie using Kodi. If that works, then why pay for more? I also think many of us still have PC's or powerful laptops and are not looking to replace them with a RPI. Although I am impressed with the performance.

Actually is it not true we may need more memory for the 64-bit? So everything comes at a price.

We first need to stop using Chromium on the RPI. It's overloaded with functionality, all the additional tasks that the Chr. engine has to do that we don't know just shows how ridiculously reliant we are on so much memory. People think it's a necessity, being able to watch full HD stuff on their phones. I used to code on computers that had 128K and any compiling had to be done with no public having access to computers.

Having to run stuff with fewer resources (CPU, memory) teaches us respect and how to preserve what is wasted - energy. It also enables us to learn how actually an OS is built and how to tweak it. I like the idea of being in control, and that other "regular guys" don't know what I know 8-)

Just my 3c.
Richard

User avatar
richlion2
Posts: 75
Joined: Thu Mar 29, 2018 7:14 am

Re: 64-bit operating system

Wed Apr 11, 2018 9:25 am

Gavinmc42 wrote:
Wed Apr 11, 2018 7:27 am
FYI, the bootable Gentoo 64-bit image for the RPi3 has now been updated to support the B+.
Booted on a Pi3B , expanded SD and got Desktop up then shutdown and put SD card into a Pi3B+, bingo it works :D
Have no network here, will test further at home.

I am impressed. it must be magic ;)
It will be fun to play with NEON stuff now.
Now that's an option I am willing to explore. Thank's for the info :)
Is this how you do it?
https://wiki.gentoo.org/wiki/Raspberry_Pi
“It’s nice to be important, but it’s more important to be nice.” ;)

User avatar
Gavinmc42
Posts: 3627
Joined: Wed Aug 28, 2013 3:31 am

Re: 64-bit operating system

Wed Apr 11, 2018 10:14 am

For Gentoo 64 follow Sakaki's link
https://github.com/sakaki-/gentoo-on-rpi3-64bit

I will expect LAN7515 issue, I have seen things about this in the Raspbian Linux github posts.
https://github.com/raspberrypi/linux/commits/rpi-4.14.y

Might need to stick it in a Pi3B and update soon for the Pi3B+.
As Sakaki seems to know her stuff, it may be fixed soon/already.

A real usable Pi Aarch64 Desktop?
Tried many so called ones before but the new Pi3B+ should just push it into the usable for development PC box.
Actually now thinking a PiCore version would be fun to do now, perhaps a dCore spin?
Last edited by Gavinmc42 on Wed Apr 11, 2018 11:27 am, edited 1 time in total.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Return to “General discussion”