jdonald
Posts: 270
Joined: Fri Nov 03, 2017 4:36 pm

Re: 64-bit operating system

Mon Dec 10, 2018 7:42 am

W. H. Heydt wrote:
Mon Dec 10, 2018 2:54 am
There is still the problem (when it comes to only officially supporting one image at a time) that there is no 64-bit capable Pi0/Pi0W anywhere in sight.
...
If the RPF switched to a 64-bit Raspbian tomorrow, and I wanted to be able to keep operating Pis up to date on a current image, I would have to replace at least 11 Pi2Bv1.1 boards.
Single-image may not be a dealbreaker per se. The bootloader already has code such that if it's on a Pi 2 it will ignore kernel8.img and go straight to booting kernel7.img. If the foundation's policy were single-kernel it would be another story, but that's already not the case as we have both kernel7.img and kernel.img.

Now there are challenges supporting an additional kernel. Obviously there's more to maintain and test across all drivers. I find this similar to the situation with the open-source VC4 driver vs the legacy brcm one. The new driver seems to be what the foundation has chosen yet still plenty of legacy software like Kodi or Retropie requires the old one. We accept this complication, give users the freedom to choose via raspi-config, and move forward, but it isn't the same as having to support two separate distros.

ejolson
Posts: 3227
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Mon Dec 10, 2018 7:49 am

fruitoftheloom wrote:
Mon Dec 10, 2018 7:27 am
Have I missed something here ????
It seems you are right and 32-bit builds of recent versions of Firefox are still available, even with the most recent quantum rendering engine. For some reason I was under the impression that the newest versions of Firefox didn't run on Raspbian, but I may have misunderstood.

fruitoftheloom
Posts: 20136
Joined: Tue Mar 25, 2014 12:40 pm
Location: Delightful Dorset

Re: 64-bit operating system

Mon Dec 10, 2018 7:57 am

ejolson wrote:
Mon Dec 10, 2018 7:49 am
fruitoftheloom wrote:
Mon Dec 10, 2018 7:27 am
Have I missed something here ????
It seems you are right and 32-bit builds of recent versions of Firefox are still available, even with the most recent quantum rendering engine. For some reason I was under the impression that the newest versions didn't run on Raspbian, but I may have misunderstood.

At this point in time only Google Chrome Browser has ditched x86-32 Linux support:

https://support.google.com/chrome/answer/95346

Whilst only offer ARMHF & ARM64 versions for Android Linux OS version 4.1 onwards.......


Firefox-ESR still AFAIAA not been updated to the 60 engine in Raspbian Stretch, it is still on the older 52 engine.....

release-overview-high-res.d3c9788f2dd1.png
adieu

Asus CS10 Chromebit / HP Envy 4500 Wireless Printer / Raspberry Pi Model 2B v1.1 / RealVNC Software...

ejolson
Posts: 3227
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Mon Dec 10, 2018 8:17 am

fruitoftheloom wrote:
Mon Dec 10, 2018 7:57 am
Firefox-ESR still AFAIAA not been updated to the 60 engine in Raspbian Stretch, it is still on the older 52 engine.....
Thanks, I've updated my earlier post to indicate Firefox is still supported on 32-bit systems but the version in Raspbian is old.

fruitoftheloom
Posts: 20136
Joined: Tue Mar 25, 2014 12:40 pm
Location: Delightful Dorset

Re: 64-bit operating system

Mon Dec 10, 2018 8:26 am

ejolson wrote:
Mon Dec 10, 2018 8:17 am
fruitoftheloom wrote:
Mon Dec 10, 2018 7:57 am
Firefox-ESR still AFAIAA not been updated to the 60 engine in Raspbian Stretch, it is still on the older 52 engine.....
Thanks, I've updated my earlier post to indicate Firefox is still supported on 32-bit system but the version in Raspbian is old.

Though firefox-esr browser is not needed as Raspbian Stretch ARMHF has chromium-browser which Scratch3 is compatible with......

So what else in the next few years will make ARM32 Raspbian Stretch not usable ????

The RPF already offer OS support longer for the RPi SBC's than Apple & Google devices enjoy......
adieu

Asus CS10 Chromebit / HP Envy 4500 Wireless Printer / Raspberry Pi Model 2B v1.1 / RealVNC Software...

jdonald
Posts: 270
Joined: Fri Nov 03, 2017 4:36 pm

Re: 64-bit operating system

Mon Dec 10, 2018 8:30 am

ejolson I think you were right the first time. Firefox armhf is not supported on Stretch and has been broken since 55 or earlier. Here's the ticket where they say it's a tier-3 platform and might fix it someday. I just installed firefox-esr (60.3) on Debian and got the usual startup crash.

Earlier this year, it was a miracle when firefox started working again on 18.04 Bionic armhf, and I imagine it's still running on 18.10 Cosmic. When it comes to Debian, I haven't seen any support outside of arm64.

WebGL support in chromium-browser for Raspbian is subpar compared to firefox:arm64. I cannot run any Unity Web Player apps on Chromium for Pi. I'm guessing that Chromium support for WebGL is generally good across most OSs but Google also felt armhf Linux was third-tier so they could take shortcuts. Being a resource-constrained system Scratch 3 for Pi should not have to settle for only one lousy choice.

fruitoftheloom
Posts: 20136
Joined: Tue Mar 25, 2014 12:40 pm
Location: Delightful Dorset

Re: 64-bit operating system

Mon Dec 10, 2018 8:45 am

jdonald wrote:
Mon Dec 10, 2018 8:30 am
ejolson I think you were right the first time. Firefox armhf is not supported on Stretch and has been broken since 55 or earlier. Here's the ticket where they say it's a tier-3 platform and might fix it someday. I just installed firefox-esr (60.3) on Debian and got the usual startup crash.

Earlier this year, it was a miracle when firefox started working again on 18.04 Bionic armhf, and I imagine it's still running on 18.10 Cosmic. When it comes to Debian, I haven't seen any support outside of arm64.

WebGL support in chromium-browser for Raspbian is subpar compared to firefox:arm64. I cannot run any Unity Web Player apps on Chromium for Pi. I'm guessing that Chromium support for WebGL is generally good across most OSs but Google also felt armhf Linux was third-tier so they could take shortcuts. Being a resource-constrained system Scratch 3 for Pi should not have to settle for only one lousy choice.

Unity Web Player for Browser based Games.

Yes but does Raspbian Stretch ARM32 fit the goals of Education and Learning which the Raspberry Pi Foundation's Charitable Aims must adhere to ???

We can all wish, but wishing and reality are unfortunately in life differing outcomes !!!
adieu

Asus CS10 Chromebit / HP Envy 4500 Wireless Printer / Raspberry Pi Model 2B v1.1 / RealVNC Software...

User avatar
sakaki
Posts: 252
Joined: Sun Jul 16, 2017 1:11 pm

Re: 64-bit operating system

Mon Dec 10, 2018 1:59 pm

ejolson wrote:
Wed Dec 05, 2018 6:00 am
jdonald wrote:
Wed Dec 05, 2018 5:31 am
Tried the image on my Pi 3B+.
In a different thread appears a short self-contained C program which computes the first Fibonacci number with a million digits. This program implements big-number arithmetic using 64-bit integers as the underlying type. The Pi 3B+ running in 32-bit compatibility mode completes the computation in 15.43 seconds. Based on rescaling the clock speeds of a different ARM-based single-board computer, it was estimated that the Pi 3B+ running in 64-bit mode should complete this same computation in only 7.49 seconds. If true, that would be a two-fold increase in speed for a particular application just by switching operating systems.

It would be nice if someone who is running a 64-bit operating system on real 3B+ hardware could confirm that this estimate is correct. The program is available in this post. The above mentioned performance results are discussed in subsequent posts of the same thread.
Not sure if anyone has posted results for this as requested, but here's a run on an RPi3B+, gcc 8.2.0, gentoo-on-rpi3-64bit image, with and without -ffast-math (as expected, on arm64 this flag makes essentially no difference), FLIRC case, on-demand governor:

Code: Select all

[email protected] ~ $ gcc -O3 -ffast-math -o fibonacci fibonacci.c -lm

[email protected] ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.746s
user	0m7.713s
sys	0m0.032s

[email protected] ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.818s
user	0m7.764s
sys	0m0.033s

[email protected] ~ $ gcc -O3 -o fibonacci fibonacci.c -lm

[email protected] ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.740s
user	0m7.713s
sys	0m0.024s

[email protected] ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.813s
user	0m7.795s
sys	0m0.017s
hth, sakaki

ejolson
Posts: 3227
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Mon Dec 10, 2018 6:29 pm

sakaki wrote:
Mon Dec 10, 2018 1:59 pm
Not sure if anyone has posted results for this as requested, but here's a run on an RPi3B+, gcc 8.2.0, gentoo-on-rpi3-64bit image, with and without -ffast-math (as expected, on arm64 this flag makes essentially no difference), FLIRC case, on-demand governor:

Code: Select all

[email protected] ~ $ gcc -O3 -ffast-math -o fibonacci fibonacci.c -lm

[email protected] ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.746s
user	0m7.713s
sys	0m0.032s

[email protected] ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.818s
user	0m7.764s
sys	0m0.033s

[email protected] ~ $ gcc -O3 -o fibonacci fibonacci.c -lm

[email protected] ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.740s
user	0m7.713s
sys	0m0.024s

[email protected] ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.813s
user	0m7.795s
sys	0m0.017s
hth, sakaki
Thanks for running the code on the Pi 3B+ in 64-bit mode. Compared to the timing of 15.47 seconds in 32-bit mode from this post, we have

15.47 / 7.740 = 1.999

which is nearly a 2-fold increase in performance. This confirms the similar result posted here.

From my point of view, the fibonacci.c program performs a real computation using an asymptotically reasonable algorithm. In particular, it uses Karatsuba multiplication along with the doubling formulas for the Fibonacci sequence to find the nth term. While some care has been taken with the code, it is definitely not hand-coded assembler tuned to a particular architecture. For these reasons this is not a synthetic benchmark, in my opinion, but rather a program which represents application-level performance that results from writing suitable code to solve a real problem in a high-level language.

It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Dec 10, 2018 8:06 pm

ejolson wrote:
Mon Dec 10, 2018 6:29 pm
It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?
Interesting challenge!

I suspect the only thing that's slower might be a program reading/writing vast numbers of pointers to and from memory.

Pointers (and the related size_t and ptrdiff_t), are the only types that change size gratuitously. You could argue about long, but a reasonably written program should be using stdint.h. Perhaps off_t, but that can be set to 64-bits in 32-bit mode.

The 31 general purpose registers, the removal of the slow instructions, the regular opcode layout, the 32 floating-point registers, and so on, means 64-bit mode is usually going to be a bit faster, like it or not.

ejolson
Posts: 3227
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Mon Dec 10, 2018 8:31 pm

jahboater wrote:
Mon Dec 10, 2018 8:06 pm
ejolson wrote:
Mon Dec 10, 2018 6:29 pm
It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?
I suspect the only thing that's slower might be a program reading/writing vast numbers of pointers to and from memory.

Pointers (and the related size_t and ptrdiff_t), are the only types that change size gratuitously. You could argue about long, but a reasonably written program should be using stdint.h.

The 31 general purpose registers, the removal of the slow instructions, the regular opcode layout, the 32 floating-point registers, and so on, means 64-bit mode is usually going to be a bit faster, like it or not.
I'm pretty sure it is possible to create a synthetic benchmark that runs 2 times slower by leveraging memory bandwidth constraints when reading 64-bit pointers.

Since development of most mainstream desktop applications now target 64-bit platforms, I suspect most code that showed performance regressions on 64-bit platforms has already been rewritten. For example, one could use 32-bit integer offsets to a 64-bit base pointer in code where the excessive use of 64-bit pointers resulted in slowdowns. While this sounds like a lot of trouble, someone else has already done the tuning. Therefore, finding real-world examples where the 32-bit version runs faster than the 64-bit version may be rather difficult.

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Dec 10, 2018 8:47 pm

From what I have seen, modern hardware is optimized for reading 16-bytes (or more) at a time, probably for SIMD.
It will not have the slightest problem reading 8-byte pointers.

Some time ago I bench marked a crude memcpy that used "ldp q0,q0; stp q0,q0" - 32 bytes at a time (on suitable data) which was extremely fast, 8 times faster than the library memcpy.

In 64-bit mode, the stack, returns from malloc, large static objects, etc are all 16 byte aligned. In 32-bit mode it is 8-bytes.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23065
Joined: Sat Jul 30, 2011 7:41 pm

Re: 64-bit operating system

Tue Dec 11, 2018 10:44 am

ejolson wrote:
Mon Dec 10, 2018 6:29 pm
sakaki wrote:
Mon Dec 10, 2018 1:59 pm
Not sure if anyone has posted results for this as requested, but here's a run on an RPi3B+, gcc 8.2.0, gentoo-on-rpi3-64bit image, with and without -ffast-math (as expected, on arm64 this flag makes essentially no difference), FLIRC case, on-demand governor:

Code: Select all

[email protected] ~ $ gcc -O3 -ffast-math -o fibonacci fibonacci.c -lm

[email protected] ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.746s
user	0m7.713s
sys	0m0.032s

[email protected] ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.818s
user	0m7.764s
sys	0m0.033s

[email protected] ~ $ gcc -O3 -o fibonacci fibonacci.c -lm

[email protected] ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.740s
user	0m7.713s
sys	0m0.024s

[email protected] ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.813s
user	0m7.795s
sys	0m0.017s
hth, sakaki
Thanks for running the code on the Pi 3B+ in 64-bit mode. Compared to the timing of 15.47 seconds in 32-bit mode from this post, we have

15.47 / 7.740 = 1.999

which is nearly a 2-fold increase in performance. This confirms the similar result posted here.

From my point of view, the fibonacci.c program performs a real computation using an asymptotically reasonable algorithm. In particular, it uses Karatsuba multiplication along with the doubling formulas for the Fibonacci sequence to find the nth term. While some care has been taken with the code, it is definitely not hand-coded assembler tuned to a particular architecture. For these reasons this is not a synthetic benchmark, in my opinion, but rather a program which represents application-level performance that results from writing suitable code to solve a real problem in a high-level language.

It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?
Has anyone checked the memory used 32 vs 64? Both in program size and memory used during the run?
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Tue Dec 11, 2018 10:58 am

jamesh wrote:
Tue Dec 11, 2018 10:44 am
Has anyone checked the memory used 32 vs 64? Both in program size and memory used during the run?
The 64-bit version is larger in both cases.
From "top" :-
64-bit virtual 12232, resident 7916, shared 760
32-bit virtual 12108, resident 7148, shared 788

The executable is 19k (64-bit) and 14k (32-bit)

It seems to be quite variable, I have a text editor compiled on both: 65k (64-bit) and 70k (32-bit)

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23065
Joined: Sat Jul 30, 2011 7:41 pm

Re: 64-bit operating system

Tue Dec 11, 2018 12:21 pm

jahboater wrote:
Tue Dec 11, 2018 10:58 am
jamesh wrote:
Tue Dec 11, 2018 10:44 am
Has anyone checked the memory used 32 vs 64? Both in program size and memory used during the run?
The 64-bit version is larger in both cases.
From "top" :-
64-bit virtual 12232, resident 7916, shared 760
32-bit virtual 12108, resident 7148, shared 788

The executable is 19k (64-bit) and 14k (32-bit)

It seems to be quite variable, I have a text editor compiled on both: 65k (64-bit) and 70k (32-bit)
I guess that is about what I would expect, with the greater size of pointer variables affecting both run time and static memory requirements.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: 64-bit operating system

Tue Dec 11, 2018 12:58 pm

ejolson wrote: I'm pretty sure it is possible to create a synthetic benchmark that runs 2 times slower by leveraging memory bandwidth constraints when reading 64-bit pointers.
I think not likely. The bus on the RPi is 128-bits wide, hence why we can read 4 32-bit registers at a time in 32-bit real ARM without stalling the pipeline.
Since development of most mainstream desktop applications now target 64-bit platforms, I suspect most code that showed performance regressions on 64-bit platforms has already been rewritten. For example, one could use 32-bit integer offsets to a 64-bit base pointer in code where the excessive use of 64-bit pointers resulted in slowdowns. While this sounds like a lot of trouble, someone else has already done the tuning. Therefore, finding real-world examples where the 32-bit version runs faster than the 64-bit version may be rather difficult.
I think you need to take a look at the real world applications. Yes the example of the extreme Fibonacci will perform better on a 64-bit system, most applications will not do to the limits of the archetecture.

We still do not have a single cycle 32-bit divide, and it takes longer in 64-bit, there are many more examples where 32-bit is faster than 64-bit. Also as we can move 128-bits at a time to or from RAM if not in cache on either 32-bit or 64-bit there is no advantage for that either.

With the 32-bit ARM with its MMU we have the ability to address a space way bigger than is available on any system by more than 8000 times over. So we do not need 64-bit for memory access.

There are a few examples that we all know of where 64-bit is faster, these are the exceptions not the rule. People using exceptions to make something sound faster and better does not really work out in the end.

There is a reason that 32-bit systems still persist on any platform for which 64-bit is available. Those that use the 64-bit versions do it more for the bragging value, or they do not know the truth of performance. There is a reason that many that do know still flock to 32-bit x86 Linux even when there CPU supports AMD64 bit Long Mode. There is a reason that there is a huge demand for 32-bit ReactOS though not really anything to push the 64-bit version along.

So I must dissagree on this issue. 32-bit rules and will until every advantage of the 32-bit ARM is matched on the 64-bit ARM, including the timing for execution of any given instruction.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Tue Dec 11, 2018 1:23 pm

DavidS wrote:
Tue Dec 11, 2018 12:58 pm
We still do not have a single cycle 32-bit divide, and it takes longer in 64-bit, there are many more examples where 32-bit is faster than 64-bit.
Dividing large numbers is slower than dividing small numbers. If the numbers are the same size, then a 32-bit divide takes similar time to a 64-bit divide. I mean 42/12 will take the same time on both platforms. Obviously a 64-bit divide can deal with much larger numbers and so may potentially take longer - which is obviously not relevant.
Divide will never take one cycle on any platform, even Intel.
DavidS wrote:
Tue Dec 11, 2018 12:58 pm
So I must dissagree on this issue. 32-bit rules and will until every advantage of the 32-bit ARM is matched on the 64-bit ARM, including the timing for execution of any given instruction.
You should look at the conditional instructions, the 64-bit ones have one less dependency than the 32-bit ones, and work better with modern CPU's (CSET/CSEL/CINC/CNEG/CINV etc). LDP/STP is much much faster than LDM/STM.

Simple things like ADD take the same time even though the 64-bit version can handle much larger numbers.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: 64-bit operating system

Tue Dec 11, 2018 4:02 pm

jahboater wrote:
Tue Dec 11, 2018 1:23 pm
DavidS wrote:
Tue Dec 11, 2018 12:58 pm
We still do not have a single cycle 32-bit divide, and it takes longer in 64-bit, there are many more examples where 32-bit is faster than 64-bit.
Dividing large numbers is slower than dividing small numbers. If the numbers are the same size, then a 32-bit divide takes similar time to a 64-bit divide. I mean 42/12 will take the same time on both platforms. Obviously a 64-bit divide can deal with much larger numbers and so may potentially take longer - which is obviously not relevant.
Divide will never take one cycle on any platform, even Intel.
Not long ago we said the same thing for Multiply, everyone believed that a single cycle multiply was not possible without increasing propagation delay to an unacceptable level, that has been proven wrong so I can see a time when the same is true of Divide. As it stands to implement a single cycle divide introduces to much propagation delay, and that is the same issue we had with multiply. The other solution of breaking a divide across multiple pipeline stages is not acceptable because it would make the pipeline way to deep to manage performance in a sane way (optimization would be even beyond compilers of the highest caliber).

Though just because it is not done does not mean it can not be done. And intel is a poor example of anything, except for lackluster design.
DavidS wrote:
Tue Dec 11, 2018 12:58 pm
So I must dissagree on this issue. 32-bit rules and will until every advantage of the 32-bit ARM is matched on the 64-bit ARM, including the timing for execution of any given instruction.
You should look at the conditional instructions, the 64-bit ones have one less dependency than the 32-bit ones, and work better with modern CPU's (CSET/CSEL/CINC/CNEG/CINV etc).
So you are saying that it is lower latency to not be able to have every instruction conditional?
I would argue that, big time. That is the one thing missing from AARCH64 that will forever kill potential performance.

There are a bunch of cases where there is a huge advantage to have every instruction conditional (I know that a few of the newer instructions are not), and have the ability to specify which instructions set flags or not.
LDP/STP is much much faster than LDM/STM.
That is true. Though there are other ways around that issue, using NEON (ok it is a cooprocessor, still it is standard now), and equally fast on both :) .

So not really an advantage in most situations, with very few exceptions.

Also that is not the issue of the ISA, rather the implementation, it would be fairly easy to make LDM/STM single cycle for any load up to 4 registers (128 bits), with out adding much to the implementation, and without increasing any propagation delay in any stage of the pipeline.
Simple things like ADD take the same time even though the 64-bit version can handle much larger numbers.
That is a given, the propagation delay through the gates for the carry look ahead is minimally different between the two lengths when done correctly.

So I stand on my argument.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 12960
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Tue Dec 11, 2018 4:35 pm

DavidS,
So I stand on my argument.
Meanwhile in the real world:

1) Division takes longer for bigger numbers. Even in the ARM world. See for example:
https://lemire.me/blog/2017/11/17/fast- ... m-edition/

2) There is no "huge advantage to have every instruction conditional".
As evidenced by the fact that the RISC V does not do that. If it were advantageous the RISC V designers would have used it. They have been studying and experimenting with these things for decades, they know. Besides, actual RISC V devices demonstrate it is not required.

3) There is nothing "lackluster" about what Intel has achieved. One can argue the x86 is a mess but Intel, bless'em, has invested billions in efforts to get off that to something else, i432, i860, Itanium of the decades. It there customers than continually demand more of the same, so they have obliged.

4) Real world applications have demanded 64 bit computing. The likes of Google would not buy all that 64 bit hardware if it was less efficient.

Is this of any relevance to the Pi? Mostly not.

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Tue Dec 11, 2018 4:58 pm

DavidS wrote:
Tue Dec 11, 2018 4:02 pm
So you are saying that it is lower latency to not be able to have every instruction conditional?
I was just saying the new conditional instructions in A64 have one less dependency than the A32 ones.
They work in a different way. They are always executed and therefore the destination register is not dependent on its previous value.

I suspect the new conditionals were chosen as being the most useful ones.
DavidS wrote:
Tue Dec 11, 2018 4:02 pm
That is the one thing missing from AARCH64 that will forever kill potential performance.
The exact opposite, it was to enable high performance on future ARM architectures. Pretty obviously, any out of order CPU will benefit. And it free's up four bits in the opcode enabling 32 registers instead of 16 - a huge benefit.
It sounds like you think the ARM CPU designers are wrong - which I very much doubt :)

LDP/STP is much much faster than LDM/STM. That is true.
Here is a cool thing!
I like LDP/STP because you can give the same register twice, which you cant with LDM/STM.
For example, I have a C structure that is 16 bytes in size and I want to zero it all.
The compiler changes "memset( &mystruct, 0, 16 )" into say "STP XZR, XZR, [X25]" (using register 31, the zero register)
You cant do that in one instruction with STM.
Edit: You can do it two instructions with NEON - as you say!

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: 64-bit operating system

Tue Dec 11, 2018 5:26 pm

jahboater wrote:
Tue Dec 11, 2018 4:58 pm
DavidS wrote:
Tue Dec 11, 2018 4:02 pm
So you are saying that it is lower latency to not be able to have every instruction conditional?
I was just saying the new conditional instructions in A64 have one less dependency than the A32 ones.
They work in a different way. They are always executed and therefore the output register is not dependent on the previous value. That can break a dependency chain.

I suspect the new conditionals were chosen as being the most useful ones.
DavidS wrote:
Tue Dec 11, 2018 4:02 pm
That is the one thing missing from AARCH64 that will forever kill potential performance.
The exact opposite, it was to enable high performance on future ARM architectures. Pretty obviously, any out of order CPU will benefit. You sound like you think the ARM CPU designers are wrong and/or stupid - which I very much doubt :)
Not by a long shot. I more think that the advantages one way or the other are unbalanced. The AARCH64 feels like an experimental ISA. As for the dependancy chain, that is on the coder.

On a personal note I still feel (because of the research we did while I was in university) that it is a better choice to use in-order multiple issue architectures than it is to use out of order multiple issue architectures. Either way you are unlikely to execute more than 4 instructions per cycle in a single stream (the limits of dataflow, regardless of number of registers), and either way you have about equal chance of issuing more instructions in parallel in a single stream. Though In order multiple issue has the advantage of being simpler to implement, and reducing potential propagation issues by being able to issue instructions without any extra pipeline delays (unlike most out of order implementations). Uses less components positive, simplifies the pipeline positive, at least equals potential performance positive. In either case there will need to be well optimized code.
LDP/STP is much much faster than LDM/STM. That is true.
Here is a cool thing!
I like LDP/STP because you can give the same register twice, which you cant with LDM/STM.
For example, I have a C structure that is 16 bytes in size and I want to zero it all.
The compiler changes "memset( &mystruct, 0, 16 )" into say "STP XZR, XZR, [X25]" (using register 31, the zero register)
You cant do that in one instruction with STM.
Edit: You can do it two instructions with NEON - as you say!
Yes there are definite advantages to the LDP/STP instructions. Now if we can get our conditionals back, have a way to execute normal ARM code without having to go through 3 state changes each way. Either that or have the licencing on 32-bit ARM cores go way down in cost so more companies are compelled to use the 32-bit, if ARM really wants to push the AARCH64 on the world in place of ARM ISA.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 12960
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Tue Dec 11, 2018 5:50 pm

DavidS,
Uses less components positive, simplifies the pipeline positive, at least equals potential performance positive.
Sounds reasonable to me.

I guess the RISC V guys are on the right track then. They check all those boxes.

code_exec
Posts: 271
Joined: Sun Sep 30, 2018 12:25 pm

Re: 64-bit operating system

Tue Dec 11, 2018 5:59 pm

64-bit on the Pi is possible, and very stable. I'm writing this from a Pi 3B running 64-bit Debian MATE on Chromium with two other tabs open, and it's running smoothly for me.
Ubuntu 18.04 LTS desktop images for the Raspberry Pi 3.

https://github.com/CodeExecution/Ubuntu-ARM64-RPi

jdonald
Posts: 270
Joined: Fri Nov 03, 2017 4:36 pm

Re: 64-bit operating system

Tue Dec 11, 2018 11:52 pm

The issue with 32-bit Docker got me thinking: might it be any different with other types of containers? So I tried LXC with a 64-bit kernel on Raspbian:

Code: Select all

sudo apt install lxc

# enable bridge networking
echo 'USE_LXC_BRIDGE="true"' | sudo tee /etc/default/lxc-net

# replace default lxc.network.type = empty
cat <<EOF | sudo tee /etc/lxc/default.conf
lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx
EOF

sudo systemctl restart lxc-net

sudo lxc-create -t download --name pi64 -- -d debian -r stretch -a arm64
# lxc.seccomp fails on Debian ARM; see lxc#1490
echo 'lxc.seccomp =' | sudo tee -a /usr/share/lxc/config/debian.common.conf
sudo lxc-start -n pi64 -d
sudo lxc-attach -n pi64

# [email protected]:/# apt install gcc
# [email protected]:/# ...
So now you can run 64-bit software on 32-bit Raspbian without resorting to multiarch. LXC is not as user-friendly as Docker but gets the job done. With this proof-of-concept I don't see any fundamental reason that this shouldn't be possible with docker-ce:armhf, so I'll file a ticket with them.
Last edited by jdonald on Fri Dec 21, 2018 7:00 pm, edited 1 time in total.

User avatar
Gavinmc42
Posts: 3415
Joined: Wed Aug 28, 2013 3:31 am

Re: 64-bit operating system

Wed Dec 12, 2018 12:28 am

64-bit on the Pi is possible, and very stable. I'm writing this from a Pi 3B running 64-bit Debian MATE on Chromium with two other tabs open, and it's running smoothly for me.
Try Gentoo64 with Firefox, I got up to 30+ tabs and gave up adding and counting more.
It is also a bit more bleeding edge and has newer stuff than the normal Debian.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Return to “General discussion”