LdB
Posts: 562
Joined: Wed Dec 07, 2016 2:29 pm

Help with 64bit assembler

Wed Aug 16, 2017 1:41 pm

I have a problem with AARCH64 compiling of my new full blow USB code but I can't work out why this code crashes

Now in C it is doing nothing other than setting some uint32_t array values and with -O2 I get this code and it works

Code: Select all

 uxtw	x0, w0
adrp	x1, .LANCHOR0
add	x1, x1, :lo12:.LANCHOR0
add	x1, x1, 4
mov	w3, -1
mov	w2, -16711936
cmp	w19, 24
stp	w0, wzr, [x1, 8]
stp	w21, w20, [x1, 16]
stp	w19, wzr, [x1, 24]
stp	wzr, w3, [x1, 32]
stp	wzr, w2, [x1, 40] 
But with -O3 it produces this rubbish

Code: Select all

 uxtw	x0, w0
adrp	x1, .LANCHOR0
add	x1, x1, :lo12:.LANCHOR0
movi	d0, 0xffffffff
movi	v1.2s, 0
add	x1, x1, 4
mov	w2, -16711936
str	w19, [x1, 24]
cmp	w19, 24
stp	w0, wzr, [x1, 8]
stp	w21, w20, [x1, 16]
str	w2, [x1, 44]
str	d1, [x1, 28]
str	d0, [x1, 36]
I can't for the life of me work out what the code is doing on the second instance because it has opcodes I have never seen and I can't understand why the writes are to different address targets?

Anyone got any idea how a simple array fill can be so different and what is d0 and d1?
Last edited by LdB on Wed Aug 16, 2017 2:47 pm, edited 1 time in total.

LdB
Posts: 562
Joined: Wed Dec 07, 2016 2:29 pm

Re: Need help with 64bit assembler

Wed Aug 16, 2017 2:35 pm

Okay I have more info .. so I looked at difference between -O2 and -O3
https://gcc.gnu.org/onlinedocs/gcc/Opti ... tions.html

I went thru each of the 13 flag differences and it is these two cause the issue
-ftree-loop-vectorize -ftree-slp-vectorize

I can't find much on the net about what the flags do but it works correctly with
-O3 -fno-tree-loop-vectorize -fno-tree-slp-vectorize

User avatar
Paeryn
Posts: 1668
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Help with 64bit assembler

Wed Aug 16, 2017 4:28 pm

LdB wrote:
Wed Aug 16, 2017 1:41 pm
I have a problem with AARCH64 compiling of my new full blow USB code but I can't work out why this code crashes

Now in C it is doing nothing other than setting some uint32_t array values and with -O2 I get this code and it works

Code: Select all

 uxtw	x0, w0
adrp	x1, .LANCHOR0
add	x1, x1, :lo12:.LANCHOR0
add	x1, x1, 4
mov	w3, -1
mov	w2, -16711936
cmp	w19, 24
stp	w0, wzr, [x1, 8]
stp	w21, w20, [x1, 16]
stp	w19, wzr, [x1, 24]
stp	wzr, w3, [x1, 32]
stp	wzr, w2, [x1, 40] 
But with -O3 it produces this rubbish

Code: Select all

 uxtw	x0, w0
adrp	x1, .LANCHOR0
add	x1, x1, :lo12:.LANCHOR0
movi	d0, 0xffffffff
movi	v1.2s, 0
add	x1, x1, 4
mov	w2, -16711936
str	w19, [x1, 24]
cmp	w19, 24
stp	w0, wzr, [x1, 8]
stp	w21, w20, [x1, 16]
str	w2, [x1, 44]
str	d1, [x1, 28]
str	d0, [x1, 36]
I can't for the life of me work out what the code is doing on the second instance because it has opcodes I have never seen and I can't understand why the writes are to different address targets?

Anyone got any idea how a simple array fill can be so different and what is d0 and d1?
It looks to be equivalent, the D registers refer to the lower 64 bits of the SIMD registers.
It has moved the consecutive writes of zero (at +28, +32) into a single write of D1 which it loaded with the movi v1.2s, 0 (not sure why it chose that over movi d1, 0).
Also it moved the writes of -1, 0 (at +36, +40) into a single write of D0 which it loaded with the lower 32 bits all set and upper 32 bits all clear with the movi d0, 0xffffffff.
The words at each side of these that were written with the store pair instructions are now single stores.
She who travels light — forgot something.

Return to “Bare metal”

Who is online

Users browsing this forum: Baidu [Spider], dgordon42 and 5 guests