DavidS wrote: ↑
Sat Nov 17, 2018 2:18 am
I definitely look forward to seeing what it looks like.
I recommend this document "ARMv8 Instruction Set Overview"
https://www.element14.com/community/ser ... Manual.pdf
It is short and very easy to read (compared to the ARMv8 ARM). See the introduction on page 8 (the "A64 OVERVIEW" section) for a brief description of Aarch64.
They would have to be extremely clever to justify the added size of programs that suddenly will have to use branch instructions where load tables, or conditional execution was done before.
Actually programs usually get smaller in Aarch64 mode (probably because of the extra registers).
Here is the identical program:-
ARMv6 74KB (32-bit ARM)
ARMv8 54KB (32 bit, thumb2)
ARMv8 70KB (32 bit, ARM)
ARMv8 67KB (64-bit)
Intelx86 60KB (64 bit)
Only the thumb mode can beat Aarch64 for size (and that is not using any conditionals at all!).
I hope that the overall is positive, and as eloquent as the 32-bit ARM ISA.
I think ARM know what they are doing. They had good reasons for their choices of what went into Aarch64, what was left out, and what was improved. It was streamlined, and designed for high performance on future CPU's - which ARM themselves know most about.
As for eloquence, you only have to look at two similar code sequences side by side.
Trivial things are:-
Aarch registers are called Xn (64-bit) and Wn (32-bit) which doesn't seem as nice as Rn that Aarch32 used.
In A64 you don't have to use '#' for all the immediates (as in "mov w5,42") which does make it easier to scan.
To return from a function, you just use "ret"
Register 31 reads as zero, and its called xzr or wzr, which is extremely useful.
The addressing modes are the same.
There is no ldm/stm (they are slow and don't handle interrupts well). Instead there is the very fast ldp and stp (load pair), so you can write for example:
"stp xzr, xzr, [sp, -64]!" say, to write out 16 bytes of zeros in one instruction. Or "ldp q0, q1, [x2],32" to load 32 bytes and advance the pointer.
Having 31 general purpose and 32 floating point registers means the stack doesn't need adjusting very often.