User avatar
Burngate
Posts: 5885
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK Tralfamadore
Contact: Website

Assembler questions

Tue Dec 25, 2012 12:47 pm

I've been browsing the Baking Pi tutorial(http://www.cl.cam.ac.uk/freshers/raspbe ... /ok02.html), and also Bruce Smith's Assembly Language Beginners Hands on Guide, and come across something I don't understand

It started from another thread (http://www.raspberrypi.org/phpBB3/viewt ... 72&t=26648), in which I saw

Code: Select all

   sub r3,r2      /*subtract the pulse width from the total width*/
Looking at The Baking Pi tutorial, we have

Code: Select all

sub r2,#1
Now looking in http://www.dex-os.com/DexOS_R_PI/DexBasicSource.zip, in \FBasic_R_PI.inc for example, we have

Code: Select all

	and    r1,r0,16711680
and in Bruce Smith's book we find (Program 7a)

Code: Select all

 120 ADDS R0, R0, R1
So my question is: given that the Arm requires 3 registers for ADD, SUB, AND etc., how come the first two examples only require 2?


Suplementary question: I vaguely remember when the StrongArm first came out, that there were problems with the length of the pipe-line, because the return address in the LR wasn't neccesarrily correct when the code was used on a different chip. So what do I have to do to make code portable?
How long is the Pi's pipe-line, anyway?

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Assembler questions

Tue Dec 25, 2012 3:46 pm

1) go to infocenter.arm.com and under ARM Architecture, then Reference Manuals, you can start with the ARM v5. In there you will find the arm instruction set, which will show you when and where you need to operands or three operands. It is not a fixed rule, sometimes you can have two sometimes three, perhaps even four in some instances. Where the operands land inside or outside, brackets, etc determines what they mean. These two instructions are not the same for example

Code: Select all

ldr r0,[r1,#4]
ldr r0,[r1],#4
You will also see there is a thumb instruction set, which is a 16 bit instruction set. Being reduced versions of arm instructions these have further restrictions on the operands, as a result the arm assemblers generally tolerate thumb like instructions when in arm mode, for example

Code: Select all

add r0,r0,r1
add r0,r1
would normally be considered the same instruction, even though the second one is not proper form for an arm instruction.

With ARMv6 and in particular ARMv7 the thumb2 extensions to the thumb instruction set (using formerly undefined instructions). And, then they made up a unified instruction set with the goal of allowing programs or at least portions of programs to be written using one syntax but would assemble for either arm or thumb2. with gcc you need to use a directive before using the unified syntax before it will assemble properly.

The raspberry pi uses an ARMv6. From the FAQ or other places the raspberry pi uses a chip with an ARM1176JZFS arm core. So on that same arm web page go to the arm11 processors then arm1176 then get the technical reference manual. From there we see this is an ARMv6 so back up to the architecture link to find the armv6 arch manual. Generally you want both the technical reference manual (TRM) and ARM Architectural Reference Manual (ARM ARM) for the core you are using.

2) No idea what you are talking about, if you can point to more info on this strongarm problem that might be useful. The pipeline depth for an arm is not known or worried about. the strongarm and xscale are separately developed arm clones whose problems are not related to arm developed cores. (not that arm cores are problem free). when worrying about such problems you often need to focus on the specific core and version of the core (ideally when you get the TRM you want the specific version of the core even if the manual says it is obsolete, if you have a r1p0 core the r2p0 manual may have stuff that is not in your core (certainly true for the arm11)).

User avatar
Burngate
Posts: 5885
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK Tralfamadore
Contact: Website

Re: Assembler questions

Wed Dec 26, 2012 7:00 pm

Okay, I'll have to go looking at those then!

Maybe I didn't phrase things well.
According to what I thought I knew, instructions such as sub require three items and would be of the form
<instruction> <destination>, <operand-1>, <operand-2>
with added extras such as conditions, setting flags, barrel-shifting, indexed addressing thrown in for seasoning

That seems to be followed in Bruce Smith's book, and in the example I stole from DexOS

But the Baking Pi tutorial hasn't got one of those parameters. It seems to be missing the <destination>
So is it assumed by the assembler that Baking Pi to be R0? or to be the same as <operand-1>?

From what you say I'll have to go looking at the Thumb instructions and how they integrate into the normal ARM instructions


My second question is to do with the BL instruction, and is really hammering my long-term memory.
When the Arm encounters a branch with link instruction, it puts the PC contents into R14, before taking the branch. But what's in the PC is already ahead of the BL instruction, by the length of the pipeline.
So on return (mov PC, R14) the PC is one or more instructions ahead of where it should be
Now thinking about it some more, it would seem that the Arm should modify what it puts into R14 so that it gets back to the right place, but I don't really know.
I just remember the original ARM600 had a pipeline 3 instructions long, while the StrongArm had one 4 (or 5) long

Having spent Boxing Day on Google, I've come to the conclusion that it's a non-problem apart from self-modifying code

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Assembler questions

Wed Dec 26, 2012 10:39 pm

ARM nor thumb, you dont have a fixed format for instructions with the same number of operands. this isnt mips. For each instruction there may be a number of different addressing modes and each can have a different number of operands. This becomes immediately obvious if you look at the arm instruction set definitions in the arm documents or elsewhere.

Now I understand, no matter what the actual pipeline depth of the processor is, the instruction works such that the program counter is always two instructions ahead. So an ARM instruction at address 0x100 when you execute it assume as a programmer the pc is 0x108. A 16 bit thumb instruction at 0x100 (followed by another 16 bit thumb) the program counter is at 0x104. thumb2 makes it more complicated but I think I have demonstrated that it is still two instructions ahead. All of this is documented in the arm architectural reference manual. So again the answers are all right there in the arm docs. This two ahead was true for xscale and all the arms to date, I never had experience with the strongarm.

From a random copy of the arm arm.

PC, the Program Counter
• When executing an ARM instruction, PC reads as the address of the current instruction
plus 8.
• When executing a Thumb instruction, PC reads as the address of the current instruction
plus 4.
• Writing an address to PC causes a branch to that address.
Most Thumb instructions cannot access PC.
The ARM instruction set provides more general access to the PC, and many ARM instructions can use the PC as a general-purpose register. However, ARM deprecates the use of PC for any purpose other than as the Program Counter. See Writing to the PC on page A2-46 for more information.
Software can refer to PC as R15.

User avatar
johnbeetem
Posts: 945
Joined: Mon Oct 17, 2011 11:18 pm
Location: The Mountains
Contact: Website

Re: Assembler questions

Thu Dec 27, 2012 2:08 am

Burngate wrote:So my question is: given that the Arm requires 3 registers for ADD, SUB, AND etc., how come the first two examples only require 2?
The ARM Unified Assembly Language allows you to omit a register if it's clear what the assembler is supposed to do. Usually it's the destination register you omit, in which case the assembler assumes the destination register is the same as the first operand register. For example, you can write "ADD r2, r3" and UAL assumes it's "ADD r2, r2, r3". UAL tries to use the same notation for ARM and Thumb(2) instructions so that the same source code can be used for both.

For gory details, see the ARM Architectural Reference Manual (ARM ARM). The ARMv7-AR edition includes RasPi's ARMv6 processor.

ARMv5 documents are mostly fine for the ARMv6 processor. The additional instructions are mostly for DSP and won't be generated by a standard C compiler. ARMv7 adds 32-bit Thumb2 instructions, and it's a stretch to call it a RISC processor. For example, a RISC processor has only a few instruction formats. With ARMv7, they've given up on having a single instruction format table :-)

I've never used a GNU assembly language, so I don't know how compatible it is with UAL and hence with the ARMv7-AR ARM ARM.

User avatar
Burngate
Posts: 5885
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK Tralfamadore
Contact: Website

Re: Assembler questions

Thu Dec 27, 2012 11:42 am

johnbeetem wrote:... For example, you can write "ADD r2, r3" and UAL assumes it's "ADD r2, r2, r3"...
That makes sense. Thanks.

As regards pipeline lengths, my brain has now blown multiple poly-fuses. It will be a week or two before they reset, at which point, after judicious application of omega3 fish-oil for extra power, I'll reacquaint myself with the sources and come back to you

tritonium
Posts: 79
Joined: Tue Jan 03, 2012 7:10 pm

Re: Assembler questions

Thu Dec 27, 2012 1:25 pm

Hi
my brain has now blown multiple poly-fuses. It will be a week or two before they reset, at which point, after judicious application of omega3 fish-oil for extra power, I'll reacquaint myself with the sources and come back to you
I've been reading the arm manuals to try to get to grips with all the permutations available when writing some code. Its been like sticking needles in my eyes - perhaps its just manuals or maybe my age - digesting the 24L01 radio module manual had a similar effect, not to mention my 3.2" lcd with touchscreen combining spi and 16bit addressing and horrendous setup routine, and then coding in (arduino) 'C', (not my favourate environment).
Anyway reading this thread was comforting and illuminating. In fact somewhere like this might be a good way to go through the arm instructions slowly and gently. Somehow the conversational way of learning and multiple participation works for me. What do you think?
For instance - if I understand correctly, you cannot load a register with a 32 bit immediate value without resorting to 'tricks'. Dex' uses a macro. I wonder how many ways there are of doing it. Also, do you store variables in a distant block or in among the code? I think some assemblers use a sort of pseudo code ie you code your 'intention' and the assembler 'makes it so' invisibly, and you have to disassemble to see what it actually did. It would be nice to code without having to rely on the assembler to fill the gaps - well until sufficiently profficient to be able to predict what it will do.
Also discussions on implemeting lookup tables, perhaps sorting, that sort of thing - I bet people have found ways of doing things that I wouldn't have dreamt of.
Every week a discussion on a different instruction and all the various ways it can be used, might be the right kind of pace; of course it would require the kind efforts of someone with a mixture of experience and patience!. I come from a time when combining a decrement and conditional jump was groundbreaking... ho hum.
Dave H

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Assembler questions

Fri Dec 28, 2012 12:15 am

I am definitely from that school of thought. Learning how it really works before resorting to tricks. I sometimes resort to tricks like using ldr r0,=0x12345678 on arm, I find it to be fair game since I understand everything the assembler is doing or going to do. For folks that dont know this I think it is worth the few minutes to understand what is going on. Problem is it may take hours or days or weeks to find where to look for those few minutes, or understand what they are looking at. I fully understand that. let me think about this and craft an answer if that would be useful for anyone.

Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.

I can certainly walk you through any arm instruction you are interested in. (and a number of other instruction sets as well)

David

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Assembler questions

Fri Dec 28, 2012 5:01 am

ARM: Loading an immediate into a register.

When using a variable length instruction set which means some instructions can use more bytes than others, the entire immediate can be encoded in the instruction. x86, msp430, and a bunch of others are variable instruction length. ARM historically is fixed instruction length like mips and openrisc and perhaps a few others.

Some folks learned mips first in school. Like the traditional ARM instruction set the instructions are fixed at 32 bits. And both, traditionally, have 32 bit registers. Since some of the instruction bits have to be used to define the operation, the opcode, you cant have some bits for opcode and 32 bits of data and fit that into 32 bits. With mips you use a pair of instructions one allows you to control all 32 bits by zeroing 16 of them and seting 16 of them to whatever you want, then the other instruction (you have a couple-three choices) can be used to modify the 16 zeros to be any 16 bits you want. So two instructions and you can create any 32 bit pattern. Understand though for some of these instruciton sets you may need to burn a full 32 bits of instruction space for that immediate. A mov eax,0x00000001 in x86 might end up costing you 32 bits for that immediate (might not of there is a trick). For many variable length instruction sets you lose instruction space for the whole immediate. The ARM instruction set does it differently.

Take a look at the ARM ARM (ARM Architectural Reference Manual, pretty much any of them other than the cortex-m) there will be a section that in some way says or implies an alphabetical list of instructions. Make sure it is ARM not thumb or if the ARMv7 it might be combined arm and thumb. You are looking for the mov immediate instruction. I happen to be looking at the ARMv7 right now, the raspberry pi uses an ARMv6 but that is okay because with each newer architecture the ARM ARM does indicate, per instruction, what architecture will support that instruction (encoding). So you can use the ARMv7 reference manual for everything ARMv4 to ARMv7.

What I see in my manual as Encoding A1, look for the 32 bit instruction that supports ARMv4*, ARMv5*, ARMv6* and ARMv7 or up to whatever manual you are looking at. the lower 12 bits are called imm12. This DOES NOT mean you get 12 bits of immediate. Depending on your ARM ARM it varies how and where it says it but all manuals from the older print only ones (yes I have those, pre-thumb) to the present, point you at some other section of the arm arm where the immedaite bits are defined. In the pdf you can simply click on that link or go to that page.

Depending on your manual either directly or implied by a picture here is the rule:
Legitimate immediates
Not all 32-bit immediates are legitimate. Only those that can be formed by rotating an 8-bit
immediate right by an even amount are valid 32-bit immediates for this format.
When they say 8 bit value they mean up to 8 non-zero bits, the other 24 bits in the immediate must be zero. So 0x000000FF is valid, up to 8 non zero bits, the other 24 zero, and it can be rotated an even amount, zero is an even number and rotating by zero is valid. But 0x000001FE is not valid, think about it. 0xC000003F is valid though. Lets try

Code: Select all

mov r0,#0x000000FF
mov r1,#0x000001FE
mov r2,#0xC000003F

arm-none-linux-gnueabi-as mov.s 
mov.s: Assembler messages:
mov.s:3: Error: invalid constant (1fe) after fixup
so that second constant is in fact, no good, for now lets fix it to make the assembler happy and try something interesting at the same time:

Code: Select all

mov r0,#0x000000FF
mov r1,#0x000000FC
mov r2,#0xC000003F

arm-none-linux-gnueabi-as mov.s -o mov.o
no output means no complaints right?

Code: Select all

arm-none-linux-gnueabi-objdump -D mov.o

mov.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <.text>:
   0:	e3a000ff 	mov	r0, #255	; 0xff
   4:	e3a010fc 	mov	r1, #252	; 0xfc
   8:	e3a021ff 	mov	r2, #-1073741761	; 0xc000003f
Now what really happened? From the encoding in your arm arm in some form you should see that the lower 8 bits of the instruction are the 8 bit value to be rotated, the four bits above that, bits 8 to 11, indicate the amount to rotate to the right. Take the bits 8 to 11 multiply that value by 2 then rotate the immediate right that many bits. for the first two above bits 8 to 11 are 0x0 so we rotate the lower 8 bits of the instruction 0*2 = 0 bits. 0xFF rotated right 0 is 0xFF which is the constant we wanted 0x000000FF. 0xFC rotated right zero bits is 0x000000FC. the third one though 0xFF rotated 0x1*2 = 2 bits. so take 0x000000FF and rotate it right 2 bits. 0xFF is 11111111 in binary we need to take two of those bits on the bottom 11 in binary and rotate them around to the top so our constant in binary is 11000...0000111111 or 0xC000003F. Pretty cool.

Why did I choose 0x000000FC the second time around? If you you think about it 0xFC is the same as 0x3F shifted left 2, which is the same as rotating right 30 bits, mov r1,#0x000000FC could be encode as above as 0xe3a010fc or it could be encode as 0xe3a01f3f. Both encodings give the same result 0xFC rotated right zero, or 0x3F rotated right 30.

Just like mips for immediate values you cannot create in a single instruciton you can use multiple instructions to end up with the same result. In mips if you want to load a register with the value 0x12345678 you would load high 0x1234 which zeros the lower bits and puts 0x12340000 in the register, then you can or with an immediate or and with an immediate value of 0x5678 resulting in 0x12340000+5678=0x12345678 or 0x12340000|0x5678 = 0x12345678. With arm unfortunately it would take up to four instructions.

Code: Select all

mov r0,#0x12000000
orr r0,r0,#0x00340000
orr r0,r0,#0x00005600
orr r0,r0,#0x00000078
Yes, that looks as ugly to me as it does to you, it has pros and cons though. the pro to it is that if your instruction memory (rom or ram) is fast enough to feed the pipeline these will execute right through very fast, no branching or memory cycles to get in the way. but it does burn four 32 bit locations and four clock cycles at least to execute.

So here is an alternative, and it is not technically an immediate. And this works for variable word length instruction sets as well as fixed. Many instruction sets allow you to load a register using an address that is computed as an offset to the program counter. What does that mean? Well lets back into this and do it this way:

Code: Select all

ldr r0,bigimm
add r1,r1,r0
bigimm: .word 0x12345678

arm-none-linux-gnueabi-as mov.s -o mov.o
arm-none-linux-gnueabi-objdump -D mov.o

Disassembly of section .text:

00000000 <bigimm-0x8>:
   0:	e51f0000 	ldr	r0, [pc, #-0]	; 8 <bigimm>
   4:	e0811000 	add	r1, r1, r0

00000008 <bigimm>:
   8:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000
Trust me I am walking you through this just bear with it. There will be a light at the end of the tunnel.

So this takes a lot of explaining unfortunately. The gnu assembler as with many assemblers did some work for you. Yes assembly is supposed to be pure and a one to one relationship with machine code. Well, fortunately any useful assembler lets you use labels so you dont have to hand count instructions and instruction lengths to put the exact offset from one instruciton to the target lable in there yourself. Even better, what about external labels. Anyway, first off ldr means load register or load register from some address in memory. the part after the comma tells the assembler how to figure out where in memory this thing is to be loaded. There are many forms of this instruction. We have specifically told the assembler to please resolve the address for us between the ldr instruction and the address of the label and use a pc-relative addressing. So that is the first part, ldr r0 means load r0 with some value found in memory somewhere. The second bit of information you need is that no matter what architecture of ARM you are using, and no matter how deep the pipeline inside really is, from a programmers perspective whenever you use r15, the program counter itself, in an instruction the value of the pc (r15) is always two instructions ahead. In arm mode this means the address of the instruction plus 8. the third bit if information is that the ldr instruction wants at least one register to define the address to load from, and it allows either an immediate offset or a register offset. In this case the encoding is the address of the pc plus some offset, specifically 0. Why it put -0, who knows that is just a wierdism of the disassembler. Likewise why did the disassembler disassemble our data value? It did, it doesnt hurt is, read on.

Why did I put an add instruction after the ldr? I wanted you to think about code flow, the ldr is an instruction then the next one is the add then the next thing in code space is our data. wait that is OUR DATA! Yes, if you wrote and ran that code exactly as written above it would execute the data. This is a problem. I wanted you to see that. In order to use ldr to get around the mov immediate limitation, you need to place the data in a location that will not get executed. And you may have already noticed that the assembler let you do this, without warning. So lets try fix this before continuing:

Code: Select all

ldr r0,bigimm
add r1,r1,r0
b somewhere
bigimm: .word 0x12345678


Disassembly of section .text:

00000000 <bigimm-0xc>:
   0:	e59f0004 	ldr	r0, [pc, #4]	; c <bigimm>
   4:	e0811000 	add	r1, r1, r0
   8:	eafffffe 	b	0 <somewhere>

0000000c <bigimm>:
   c:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000
I added another isntrucion between the ldr and the data word I wanted to read and load into r0, which changed the address difference between the ldr instruction and the bigimm labels address and the assembler did all the math for me. The program counter at the time of the ldr instruction which in this case is at address 0 is address+8 in arm mode so that means the program counter is 0x00000008 from a programmers perspective, the bigimm label which is where our constant is in memory is at address 0x0000000C or 0x00000008+4 so the instruction the assembler made for us is load r0 from the address [pc+4]

Now lets try another shortcut that I know the arm assemblers allow, not sure about other instruction sets.

Code: Select all

ldr r0,=bigimm
add r1,r1,r0
b somewhere
bigimm: .word 0x12345678

00000000 <bigimm-0xc>:
   0:	e59f0008 	ldr	r0, [pc, #8]	; 10 <bigimm+0x4>
   4:	e0811000 	add	r1, r1, r0
   8:	eafffffe 	b	0 <somewhere>

0000000c <bigimm>:
   c:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000
  10:	0000000c 	andeq	r0, r0, ip
The =bigimm means "address of bigimm" if this were a C program I would have written &bigimm which means "address of bigimm". What the assembler did in this case is it found a place in the code which is near enough to the ldr instruction to reach, but also not in the execution path (hopefully), and it placed the constant I wanted to load into the register in that location then encoded an instruction to load that value into the register I specified. This is definitely a departure from assembly being one to one relationship to the machine code, the assembler has consumed a location which I didnt directly define to place some data (which I did ask it to it to place somewhere).

Now I am simply assembling this code to an object and then disassembling the object, I am not linking. Why is that important? Well I could link this code to some other address. The ldr after being linked might actually be at address 0x10341000 for example making the address of bigimm 0x1034100C, the linker would fix this for me so I didnt have to. It would also expect the somewhere address to be resolved and that branch instruction to be fixed to match. Lets prove it:

Code: Select all

.globl _start
_start:
ldr r0,=bigimm
add r1,r1,r0
b somewhere
bigimm: .word 0x12345678
somewhere:
and r2,r3,r4
b _start

arm-none-linux-gnueabi-as mov.s -o mov.o

arm-none-linux-gnueabi-objdump -D mov.o

Disassembly of section .text:

00000000 <_start>:
   0:	e59f0010 	ldr	r0, [pc, #16]	; 18 <somewhere+0x8>
   4:	e0811000 	add	r1, r1, r0
   8:	ea000000 	b	10 <somewhere>

0000000c <bigimm>:
   c:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000

00000010 <somewhere>:
  10:	e0032004 	and	r2, r3, r4
  14:	eafffffe 	b	0 <_start>
  18:	0000000c 	andeq	r0, r0, ip


arm-none-linux-gnueabi-ld -Ttext 0x10341000 mov.o -o mov.elf
arm-none-linux-gnueabi-objdump -D mov.elf

mov.elf:     file format elf32-littlearm


Disassembly of section .text:

10341000 <_start>:
10341000:	e59f0010 	ldr	r0, [pc, #16]	; 10341018 <somewhere+0x8>
10341004:	e0811000 	add	r1, r1, r0
10341008:	ea000000 	b	10341010 <somewhere>

1034100c <bigimm>:
1034100c:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000

10341010 <somewhere>:
10341010:	e0032004 	and	r2, r3, r4
10341014:	eafffff9 	b	10341000 <_start>
10341018:	1034100c 	eorsne	r1, r4, ip
Notice how the b to somewhere the b to _start and the address of bigimm were all modified by the linker? good stuff to know if you disassemble the object instead of a final binary.

Just hang on a little longer:

Code: Select all

ldr r0,bigimmadd
add r1,r1,r0
b somewhere
bigimm: .word 0x12345678
bigimmadd: .word bigimm

Disassembly of section .text:

00000000 <bigimm-0xc>:
   0:	e59f0008 	ldr	r0, [pc, #8]	; 10 <bigimmadd>
   4:	e0811000 	add	r1, r1, r0
   8:	eafffffe 	b	0 <somewhere>

0000000c <bigimm>:
   c:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000

00000010 <bigimmadd>:
  10:	0000000c 	andeq	r0, r0, ip
Doing it this way, I am explicitly telling the assembler where to put things, I am in control of where things go instead of letting it be put there for me, yes this takes more typing. And although a departure from the real discussion, notice I didnt use the equals sign on bigimm, why not? Experience and experimenting. As written above it did what I wanted it took the label which is an address and placed that value in memory at that location so that the ldr instruction could get it. That is what I wanted, I wanted the address of bigimm to be loaded into r0, not bigimm itself (WHY? is what I am slowly getting to)

What if I instead did this:

Code: Select all

ldr r0,bigimmadd
add r1,r1,r0
b somewhere
bigimm: .word 0x12345678
bigimmadd: .word =bigimm

Disassembly of section .text:

00000000 <bigimm-0xc>:
   0:	e59f0008 	ldr	r0, [pc, #8]	; 10 <bigimmadd>
   4:	e0811000 	add	r1, r1, r0
   8:	eafffffe 	b	0 <somewhere>

0000000c <bigimm>:
   c:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000
That is NOT AT ALL what I wanted, it removed the .word and label I had explicitly put in the code! Ouch! KNOW YOUR TOOLS. WHY did it do this, I dont know, but it did hurt, you only make this mistake once and then you remember. hopefully.

So back to this

Code: Select all

ldr r0,=bigimm
add r1,r1,r0
b somewhere
bigimm: .word 0x12345678
I am going to wrap this tangent up here.

Code: Select all

ldr r0,=bigimm
Means assembler please load the address bigimm into r0, bigimm is an address so you are saying

Code: Select all

ldr r0,=address or just mov r0,address in so many words.
What if I told you, or you saw somewhere that you could use a constant for the address:

Code: Select all

ldr r0,=0x12345678
add r1,r1,r0
b somewhere

Disassembly of section .text:

00000000 <.text>:
   0:	e59f0004 	ldr	r0, [pc, #4]	; c <.text+0xc>
   4:	e0811000 	add	r1, r1, r0
   8:	eafffffe 	b	0 <somewhere>
   c:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000
So we used a single instruction, ldr, and 32 bits of instruction memory (smells kinda like a variable word length instruction set doesnt it?) to load any 32 bit value into a register we want. didnt take four instructions, took have the instruction space of four instructions. The negative to it is that the add instruction that follows has to wait to execute until the memory cycle to read pc+4 happens. With a good pipeline this may already have happened by the time either of these instructions are actually executed, but there is still somewhere in time that an extra read was required.

Lets play with this and see another nuance/feature of the gnu assembler, perhaps other arm assemblers will do this as well. (note that all of the above is a feature of the gnu assembler, it may or may not work with other arm assemblers and that is a major drawback if you are interested in learning and writing pure arm assembly)

Code: Select all

ldr r0,=0xC000003F
ldr r1,=0x00120000
add r1,r1,r0
b somewhere

Disassembly of section .text:

00000000 <.text>:
   0:	e3a001ff 	mov	r0, #-1073741761	; 0xc000003f
   4:	e3a01812 	mov	r1, #1179648	; 0x120000
   8:	e0811000 	add	r1, r1, r0
   c:	eafffffe 	b	0 <somewhere>
Because the values I chose to load could be encoded as immediates in a mov instruction, the assembler chose to optimize the ldr into mov instructions. I can assure you that many arm gurus, including myself because I forgot until researching this again today, may not know or remember that 0xC000003F for example is a valid immediate for a mov instruction. The assembler knew and took the shortcut for me.

So if you choose you can use the

Code: Select all

ldr rd,=immed32
trick with the gnu assembler and it will convert it to a mov if it can otherwise it will put the value somewhere close enough that a

Code: Select all

ldr rd,[pc,#+/-offset]
can reach it. While writing this, the assembler did put that data in the code path which is bad. Normally it doesnt, normally the programs are not these trivial three line things that make no sense. From time to time your asm code may be so long without an unconditional branch that it cannot find a place for an immediate and you might have to provide a place

Code: Select all

b over
.pool
over:
or use .ltorg instead of .pool, either one tells the assembler I am providing a place for you to add local data.

Using the

Code: Select all

ldr rd,=immed32
may not work the same way with every version of gnu assembler and it may not work the same with other arm assemblers. I suspect it will work, but I would not expect it to do the mov shortcut with all assemblers.

It may sound like it, but wait I am not finished! READ ON.

There is another interesting mov instruction in the ARM instruction set, MVN or mov negative, and it has an immediate flavor, and if you look at the arm encoding it also has the same immed12 thing in the lower 12 bits. What if we tried

Code: Select all

ldr r0,=0xFFFFFFFC
add r1,r1,r0
b somewhere

Disassembly of section .text:

00000000 <.text>:
   0:	e3e00003 	mvn	r0, #3
   4:	e0811000 	add	r1, r1, r0
   8:	eafffffe 	b	0 <somewhere>
When they say negative they really mean ones complement not twos complement which means simply invert the bits.

Code: Select all

mvn r0,#3 
In C this means r0 = ~0x00000003 which is 0xFFFFFFFC so r0 = 0xFFFFFFFC.

So if your immediate has 24 or more contiguous ones (when rotated by some even amount) then you can use the mvn instruction to load that immediate, if your immediate has 24 or more contiguous zeros (when rotated by some even amount) then use the mov instruction. Otherwise it takes either an ldr and some memory location somewhere or it takes one mvn, or mov plus one, two, or three alu instructions.

If you have a hard time visualising binary or hex numbers as bits, you should try writing them out as binary, ones and zeros, then somehow draw lines or visually move or copy chunks of those bits around. Early in my career I worked in an industry where visualizing these bits and shifts was necessary to the point that decimal math like filling out a time card or balancing a checkbook became more difficult for me. I know many many programmers that struggle with hex and binary and the bitwise logic functions (and, or, xor, not), you are not alone.

lets take another example

Code: Select all

ldr r0,=0xFFFFFE03
add r1,r1,r0
b somewhere

Disassembly of section .text:

00000000 <.text>:
   0:	e3e00f7f 	mvn	r0, #508	; 0x1fc
   4:	e0811000 	add	r1, r1, r0
   8:	eafffffe 	b	0 <somewhere>
So somewhere you need to make yourself a hex to binary chart:

Code: Select all

0  0000
1  0001
2  0010
3  0011
4  0100
5  0101
6  0110
7  0111
8  1000
9  1001
A  1010
B  1011
C  1100
D  1101
E  1110
F  1111
If you dont have these memorized already, make a chart then print it out, post it on the side of your monitor or on the wall, and have it always there as a reference or there long enogh to memorize it forever.

If we start with the hex value 0xFFFFFE03 and convert that to binary

Code: Select all

1111
1111
1111
1111
1111
1110
0000
0011
In this case I put each hex value on its own line, because it got really hard to not make a mistake, another way would have been

Code: Select all

1111 1111 1111 1111 1111 1110 0000 0011
I like to use a text editor not pencil and paper because you can easily cut and paste the text in one form and easily remove the spaces for example:

Code: Select all

1111 1111 1111 1111 1111 1110 0000 0011
11111111111111111111111000000011
First we see that there are 7 zeros in a row, mov and mvn constants do not require these 8 or fewer bits of non-zero or non-one chunks to be all zeros or ones, in this case it is. If we can rotate left or right an even number of ones then this is a valid constant that can be used in an mvn. I will separate two (an even number) ones off the right and then move them to the top, basically a rotate right:

Code: Select all

111111111111111111111110000000 11
11 111111111111111111111110000000
11111111111111111111111110000000
That gives me 25 ones and 7 zeros, this will work as an mvn. Just to demonstrate that you dont have to have all ones or all zeros in the 8 bit part of your immediate

Code: Select all

mov r0,0x00000034

0x00000000 = 
0000 0000 0000 0000  0000 0000 0011 0100 
00000000000000000000000000110100 
000000000000000000000000 00110100 without rotating  there are 24 or more zeros in a row, this is a valid constant 
000000000000000000000000001101 00 rotate these two (even number) zeros
00 000000000000000000000000001101 to the right
00000000000000000000000000001101 
000000000000000000000000 00001101 and I still get 24 or more zeros, this is a valid constant.
Because there was more than one way to show a valid constant with even numbers of bits rotated, this constant can be encoded more than one way

Another constant 0x81000000

Code: Select all

1000 0001 0000 0000  0000 0000 0000 0000
1000 0001      0000 0000 0000 0000 0000 0000 separate off an even number of zeros
0000 0000 0000 0000 0000 0000      1000 0001 rotate them around to the top
000000000000000000000000 10000001 24 or more zeros, this is a valid constant for mov
I have hopefully given you enough meat to chew on here as far as loading immediate values into a register using ARM instructions. I have left out how far away the ldr rd,[pc,#offset] can reach, leave that up to you to research for now. And leave it up to you to try to write some reasonably realistic code and see if the assembler puts the constants in a safe place for you or if it inserts them in a place where they will get executed (Ideally it wont but if your programs are as lame and simple as mine above it might).

I might do the same thing with the thumb instruction set. This would be a very short discussion compared to the above.

Anyone else that wants to jump in and help explain this mov immediate, please do so, Dex, anyone?

David (aka dwelch67)

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Assembler questions

Fri Dec 28, 2012 6:19 am

THUMB: Loading an immediate into a register.

Read the prior ARM based mov immediate post first. This may not be a short as I thought. Thumb originally was a 16 bit, fixed length, instruction set. A subset of the ARM instruction set with a one to one mapping of thumb instructions to ARM instructions. I have no personal knowledge but I can visualize that the thumb decoder simply shoved the ARM instruction into the pipeline and the processor was really only an ARM engine. The thumb instructions dont have the ARM instruction set feature of every instruction can be condionally executed, most thumb instructions only operate on registers r0-r7 only some use the "high registers", alu instructions dont have the option to not change flags, alu instructions generally are limited to two register operands not three, etc. So the same program in thumb mode might use instructions that are half the size, 16 bits instead of 32 bits, but it might take more instructions to do the same thing. So ARM did something ugly. they created the thumb2 extensions. they took formerly undefined thumb instruction patterns, when one of these patterns is encountered it is actually a 32 bit instruction, not 16 and this 32 bit instruction doesnt have to be word aligned. Even worse one of the places most folks saw this was the cortex-m3, which is ARMv7 based. ARMv7 added over 100 thumb2 instructions. ARMv6 only added a dozen or two, some much smaller number. So if you got used to thumb2 on a cortex-m3 then years later when the cortes-m0 became available, you, me, gnu, and lots of other folks that had gotten used to the ARMv7m based thumb2 instructions found that cortex-m0 is ARMv6 based. The ARM11 in the raspberry pi is also ARMv6 based but does it support thumb-2 instructions? Any of them?

Traditional thumb has only one mov immediate option. Depending on the ARM ARM you are reading it may show different things. I am looking at one that shows encoding T1 for the mov immediate, supported by ARMv4T, ARMv5T*, ARMv6*, ARMv7. bits 0 to 7 of the instruction are noted as imm8. In the description for the instruction it says The range of values is 0-255 for encoding T1. Other ARM ARMs will basically say the same thing.

That is it for the "all thumb variants" mov immediate thumb instruciton, all you can do with a mov is values that are between 0x00000000 and 0x000000FF no rotation tricks or anything, basically you can load one byte with the upper 24 bits zero. And there is no mvn for the "all thumb variants" thumb instructions.

If we look at the add immediate we are also limited to either 8 bits 0x00 to 0xFF or 3 bits 0 to 7, no rotation or anything. there is no thumb (not thumb2) version of orr immediate, and if there were I doubt it would rotate through 32 bits. Now there is an lsl immediate, so you could do something like

Code: Select all

.thumb

mov r0,#0x12
lsl r0,#8
add r0,#0x34
lsl r0,#8
add r0,#0x56
lsl r0,#8
add r0,#0x78
to load 0x12345678 into register r0 in thumb mode. ugly but it works.

So here is a weakness of the ARM website and documentation. The original ARM ARM doc grew to be too big and they split it up with the new architectures. There is the ARMv5 ARM, and then they jump to ARMv6M which is cortex-m0 and -m1 nothing to do with our ARM1176 (an ARM11). then you see the ARMv7 manuals and the ARMv8, well the ARMv7AR manuals say something about the "later" ARM11 architectures. Well where is early vs later defined? If you look at the Technical Referenc Manual for the ARM1176... that the raspberry pi uses, it only mentions thumb and 16 bit instructions, it doesnt say thumb2 anywhere or 32 bit thumb instructions.

Further it goes into a list of thumb instructions in syntax form. Note that in an ARMv7-AR architectural reference manual there is an orr immediate thumb2 encoding (encoding T1) which is supported by ARMv6T2 and ARMv7. What is ARMv6T2? Is the ARM1176 qualify as an ARMv6T2. Well the ARM1176 TRM does not list an orr immmediate in the thumb instructions and it only lists one mov immediate rather than three or four covered by ARMv6T2. Even if I look at the TRM for the cortex-m0 and the ARM for the ARMv6-M which is what the cortex-m0 is based on it does not show for example the multiple mov immediates or the orr immediate. so even the cortex-m0 which came out much later than the ARM1176, does not support these ARMv6T2 instructions. So I really think that the ARM1176 does not support 32 bit thumb-2 instructions. Which is nice because I think thumb2 instructions are ugly.

So our choice is some flavor of the above mov with some lsl+add combinations until you have all the bits in place in your register.

Here you really want to do a ldr to get larger values into a register, and you can use the same trick.

Code: Select all

.thumb

ldr r0,=0x12345678
add r1,r0
b somewhere
Disassembly of se

Code: Select all

ction .text:

00000000 <.text>:
   0:	4801      	ldr	r0, [pc, #4]	; (8 <.text+0x8>)
   2:	1809      	adds	r1, r1, r0
   4:	e7fe      	b.n	0 <somewhere>
   6:	56780000 	ldrbtpl	r0, [r8], -r0
   a:	Address 0x0000000a is out of bounds.
Note the disassembly is hosed? where is my 0x1234? and do you see another issue/problem that I didnt really cover in the prior discussion? a 32 bit ldr needs to load from a 32 bit aligned address, basically an address with the lower 2 bits being a zero. 0x0, 0x4, 0x8, 0xC, 0x10, etc. What really happened here is that because I let the assembler place the constant it placed it at an even boundary 0x8 which means the halfword (16 bits) at address 0x6 is 0x0000, the halfword at 0x8 is 5678 and the halfword at 0xA is 0x1234

To see that our 0x1234 is not lost, and the alignment is right:

Code: Select all

.thumb
.globl _start
_start:
ldr r0,=0x12345678
add r1,r0
b _start

Code: Select all

arm-none-linux-gnueabi-as mov.s -o mov.o
arm-none-linux-gnueabi-ld -Ttext 0x1000 mov.o -o mov.elf
arm-none-linux-gnueabi-objcopy mov.elf -O binary mov.bin

Code: Select all

hexdump -C mov.bin
00000000  01 48 09 18 fc e7 00 00  78 56 34 12              |.H......xV4.|
0000000c
yes, yes, yes you can use arm-none-eabi-whatever or if you are on a raspberry pi running linux then it is just as, ld, objcopy without the arm-stuff-stuff in front of it. Most if not all of my bare metal code is written such that the differences between arm-blah-blah- and arm-stuff-stuff- dont matter. I bounce around several different builds of the gnu arm tools, just happens to be what I was using today.

anyway, starting at address 6 there are two 0x00 bytes then starting at address/offset 0x8 we see our constant, complete 0x12345678 (remember arm is little endian so that shows up in memory as bytes as 0x78, 0x56, 0x34, 0x12). So the arm wont crash because this is an aligned address. but of we did this:

Code: Select all

.thumb
.globl _start
_start:
ldr r0,bigimm
add r1,r0
b _start
bigimm: .word 0x12345678

Code: Select all

arm-none-linux-gnueabi-as mov.s -o mov.o
mov.s: Assembler messages:
mov.s:6: Error: invalid offset, target not word aligned (0x00000002)
mov.s:6: Error: invalid offset, value too big (0x00000002)
How nice! the assembler saved us from pain.

to fix it we use a .align

Code: Select all

.thumb
.globl _start
_start:
ldr r0,bigimm
add r1,r0
b _start
.align
bigimm: .word 0x12345678

Code: Select all

arm-none-linux-gnueabi-as mov.s -o mov.o
arm-none-linux-gnueabi-ld -Ttext 0x1000 mov.o -o mov.elf
arm-none-linux-gnueabi-objcopy mov.elf -O binary mov.bin
hexdump -C mov.bin 
00000000  01 48 09 18 fc e7 c0 46  78 56 34 12              |.H.....FxV4.|
0000000c
It stuck something other than 0x00s in to fill in the space but the constant is still at offset 0x8, if we disassemble

Code: Select all

arm-none-linux-gnueabi-objdump -D mov.elf 

mov.elf:     file format elf32-littlearm


Disassembly of section .text:

00001000 <_start>:
    1000:	4801      	ldr	r0, [pc, #4]	; (1008 <bigimm>)
    1002:	1809      	adds	r1, r1, r0
    1004:	e7fc      	b.n	1000 <_start>
    1006:	46c0      	nop			; (mov r8, r8)

00001008 <bigimm>:
    1008:	12345678 	eorsne	r5, r4, #120, 12	; 0x7800000
It actually disassembles the whole thing and not the weirdism we saw before, and we see the fill data at offset 0x6 is a nop. And I know I said most thumb instructions are limited to r0-r7. to make thumb work that has to be some that allow for r0-r15, this is one of them and that really is the encoding for mov r8,r8 as the dissassembly states.

From looking at a bunch of TRMs and ARM ARMs I dont think the ARM1176 supports thumb2 instructions so I dont think the other forms of mov immediate and orr immediate, etc are supported by the raspberry pi. Hmm, doing a little googling wikipedia shows ARMv6T2 is actually the ARM1156 but not the 1176 or 1136 nor arm11 mpcore. And yes the arm1156 trm does talk about thumb-2. If someone wants to try thumb2 instructions on their raspberry pi and see what happens or has some arm docs that clarify I am right or wrong, please speak up...

David (aka dwelch67)

tritonium
Posts: 79
Joined: Tue Jan 03, 2012 7:10 pm

Re: Assembler questions

Fri Dec 28, 2012 2:13 pm

David
A thousand thanks for that THOROUGH reply!
You must have spent half a day (or night) writing that.
Very illuminating.
I'm wondering if it is possible to switch all this clever stuff OFF in the assembler - I know that its makes coding a lot more fluid once you understand whats going on having all this 'assistance', but coming from 8080 background when nothing like that was available (if it was I never found it) and you knew exactly what you were getting, I find having all this juggling in my head (now at least) somewhat overwhelming.
I need to study it some more for it all to sink in.
I had considered PC relative addressing with the data nearby in memory, so the immediate data then I suppose can be a variable, ie it can be written to and changed as the program progresses. I suppose though that there is a limit to the relative distance +/- ? and if the code goes out of range other problems occur.
Now Dex uses Fasmarm cross-assembler and the website says
' FASMARM currently supports the full range of ARM processors and coprocessors.'

I dont know if that is going to be a problem in view of what you have said above - Dex seems to manage ok and I can just about understand his code so we'll see. I do like the simple one click assemble rather than making etc.
I did try to copy paste some code into fasmarm but if I recall it did not like LDR but Dex has a macro which works, and at least that way I know what I am getting, and now I know enough to write a different PC relative macro (I think!).

Up to now I've taken some of Dex's code and modified it to see what happens and so far so good, but its a lot to take in all this conditional execution, shifting and so on, and I have read your tutorials but 'C' is not something I am comfotable with despite writing lots of stuff for the arduino I still seem to like assembler - don't know why.

But Its all good fun - and thanks again for taking all that time to answer my question.

Now I must study it again and see if my quick response is missing something or not.

Dave H

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Assembler questions

Fri Dec 28, 2012 3:51 pm

In no way do I want to take away from what Dex is working on, nor the assembler or tools he uses or has created.

Understand that "assembly language" is a programming language but unlike most of the programming languages you may know or may have heard of it does not have a standard syntax. The machine code has a standard syntax that must be conformed to. The assembly language is a more human managable form of that machine code or a way for humans to create machine code directly without having to manage bits in binary, octal or hex directly.

Usually but not always a processors creator will put some sort of syntax definition, anything from loosely defined to very specific in the document that includes the machine code definitions. Some vendors isolate the machine code and assembly code into separate sections or in some cases leave the machine code out entirely (yes quite strange). When the syntax is shown with the machine code that is often implied as the syntax an assembler should conform to but that does not mean it has to.

For a processor to have any kind of success it needs tools, at a minimum an assembler and if need be a linker to go with it. If the processors creator (as a company) wants to be successful with this product it is in their best interest to create such tools and or hire/contract someone to do so. And one would hope that the vendor/creator supported tools matches the vendor/creators machine code and assembly documentation. Not always the case, but one would hope.

If for example as the world well knows, the original assembler was not binutils, then when the gnu folks (as in "them" a collective like the borg. "us vs them") do choose to support it, for whatever reason, they have a strong desire to screw it up. x86 is the shining example, but even with arm they screwed it up. And sadly the gnu assembler tends to dominate after that, particularly if there is a gcc port. Because folks go after the compiler and tolerate the assembler.

Even though it can be used to define the language, I prefer to use the word "assembler" to define the tool/utility that converts the "assembly language" (the programming language written in a text editor) into machine code. Each assembler, if useful, needs to support an assembly language for a processor, such that the user can create any of the machine code instructions desired. For arm instructions like

Code: Select all

add r0,r1,r2
mov r3,#4
Are expected and unless you do the x86 screw up the order thing, pretty obvious, not vague, directly translate the implied language into machine code.

but then an assembler can define its own, assembler specific, additions, directives or whatever you want to call them, for example

Code: Select all

.equ FOUR, 4
add r0,r1,r2
mov r3,#FOUR
or

Code: Select all

#define FOUR 4
#define HELLO r0
add HELLO,r1,r2
mov r3,#FOUR
And with all assembly languages you have the problem of pc relative addressing and the similar problem of branch destinations. Both often solved with labels and a mangling of the assembly language but in a way that greatly benefits the assembly language programmer. As discussed above

Code: Select all

.text
ldr r0,foobar
add r0,r0,r1
str r0,foobar
b somewhere

.data
foobar: .word 0
And then you get into macros, etc.

At this point it should be obvious that different assemblers can 1) define any assembly language they want that meets the criteria of being able to create the desired machine code and 2) can create any extensions or directives to the machine language desired by the authors of the assembler, hopefully for the benefit of the programmer, but could also be for the benefit of a compiler that creates machine code, or whatever. At times, who knows, there may even be software patents or trademarks or copyrights that someone is afraid of or trying to reinforce with their choices.

So after that very long story, you definitely need to know your tools, it is possible to write arm assembly that mostly assembles with different assemblers. Basically portable. I generally try to avoid compiler or assembler specific nuances and features, but at times I get lazy like everyone else and cave in. I believe the ldr rd,=constant trick will work with gnu binutils going back as much as a decade (ARM7, ARMv4 days) to the present as well as the arm tools of that time, what was it SBT? ADS, and RVCT. I only used the very early version of RVCT before the project lost funding and we were not able to purchase those tools and I switched to gnu anyway more for the wider community, not because the tools were better (they were not). I actually had the pleasure of having a phone call or two with the two RealView guys shortly after being acquired by ARM as the tools were going through the ADS to RVCT changeover. Since then ARM has purchased Keil and even the eval versions of their ARM tools includes an eval of RVCT. I support the open tools so I dont dabble much with Keil or the others, but one might try these tricks with those pay-for or eval tools.

Even if you dont want to use the =address trick but want to define the value directly behind a label and have the assembler then create the pc-relative addressing for you, you might still run into how a label is defined and how a 32 bit word is defined, you might see .data and not have a .word or who knows what you might find. Maybe the C syntax 0x1234 is not supported but more of a legacy $1234 or 1234h is. If you use the gnu like label shortcuts like:

Code: Select all

1: .word 6
ldr r0,1b
ldr r1,1f
ldr r2,2
add r2,r2,r1
add r2,r2,r0
b 2
1: .word 5
2: 

Code: Select all

Disassembly of section .text:

00000000 <.text>:
   0:	00000006 	andeq	r0, r0, r6
   4:	e51f000c 	ldr	r0, [pc, #-12]	; 0 <.text>
   8:	e59f1008 	ldr	r1, [pc, #8]	; 18 <.text+0x18>
   c:	e0822001 	add	r2, r2, r1
  10:	e0822000 	add	r2, r2, r0
  14:	eafffffe 	b	2 <.text+0x2>
  18:	00000005 	andeq	r0, r0, r5
Those shortcuts will not work with all arm assemblers, makes life much easier by not having to invent a million label names, but can lead to human error if another one of these labels gets inserted in the middle of something, and can be difficult to read for the unintiated.

Since many of you are likely clueless as to what I just did. With gnu assembler you can create these numeric labels. You noticed I created two labels that were the number 1. When referencing I referenced the 1b or one before, meaning go backwards/up in the code until you find the first instance of the label 1: and use that. 1f or one forward, down in the code until you find the next instance of 1:. You could write all of your asm repeating a handful of labels 1:, 2: 3:, etc. Since I only had one instance of 2: in the code, I could use it without a b or f after it for a branch but if you try to use it as ldr rd,2 the assembler will complain.

Super short response: Assembly language for the same processor can vary from one assembler to another. No matter what when you learn assembly language you are to some extent learning the assembly language of a specific assembler and not necessarily a generic, standard, assembly language for that processor. You may have to re-learn the asm for each assembler you use for the same processor. x86 is a prime example, not only the ass backwards ordering of at&t syntax, but the byte ptr stuff changes from one assembler to another. I have noticed a movement in college classes to corrupt the arm syntax to use mips like register names a0, v0, etc, you can even use $r0, $r1, etc with gnu assembler, which is equally disgusting. Please resist this movement.

The reason why most of my comments are of the form ;@ comment is because that "real" assembly language programmers know that ; is a comment symbol. Gnu assembler lets you use it to put more than one instruction per line! a cardinal sin. So gnu assembler used @ for the comment symbol (this is all for arm) but ;@ has a tendency to port better if the assembler uses ; then it takes the @ as a comment, for gnu it tolerates the ; with only a commented out line behind it.

Gnu also tolerates "Assembling" your asm with the C compiler and that lets you use the C pre-processor, so code like:

#define FOUR 4
mov r0,#FOUR

works, or C macros, etc. I also consider that to be a cardinal sin, IMO if it cannot assemble with the assembler it is not assembly language.

I will get off my soap box now, I was trying to be helpful before I started preaching...sorry.

Hopefuly Dex or someone will jump in with other assemblers details on the tricks used to move any arbitrary constant into a register in ARM assembly language.

David

tritonium
Posts: 79
Joined: Tue Jan 03, 2012 7:10 pm

Re: Assembler questions

Fri Dec 28, 2012 5:53 pm

Gosh
Way back I learned 8080 code by first drawing a flow-chart, then converting to mnemonics on paper, then looking up the actual machine code from an 8080 mnemonic book and writing that down, then entering into ram via a hex editor (ever so crude), and then pointing the pc at the start of my program. To save it, it had to go to cassette tape (not reliable).
What a revelation when I got a copy of ECAL, a cross-assembler that covered lots of processors from 8080, z8, z80, 6502, 6800, 68000, 8051..., even x86, (except for the 6800, I used them all, no pic or avr,- too early), and the Sintax was always consistent (I suppose I mean sintax - the org, db, dw, name:, ;remark, and so on), but nothing clever a.f.a.i.r. To use a text editor and copy and paste was such a luxury I was in heaven. To print it off on a dot matrix fanfold paper, sometimes dozens of pages all hinged together, and then unroll on the floor and on hands and knees analyze what I had done - sometimes years later - oh the value of comments cannot be overstated.
Somehow its stuck and minimalist seems to suit me. My method of working is, once a routine works (a to d/ uart /timer /pci/ interrupts handling) I copy them into a routines file, and then copy bits from there as I need them. Never got to using libraries (for assembler anyway). If I couldn't remember how an instruction worked I would look to see how I'd used it in the past, as a reminder. I know thats not very efficient and perhaps I wouldn't last long in the programming industry but as a hobbiest who has actually sold some working hardware (but with only 4k of code) it suits me.

Its really great that people like yourself are accessable by people like me, even if we only understand 90%, its really appreciated.

I reckon its going to take a week to digest what you've said so far, and thats good, - it doesn't do to bolt your food. ;)

Dave H

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Assembler questions

Fri Dec 28, 2012 6:34 pm

If you now or at one point even knew half of those instruction sets, ARM will be a breeze. No need to memorize all the nuances, just get the basics down and you are off to the races.

Not being able to use any constant you want any time you want is of course the first hurdle, get past that and the a basic subset of the instruction set is easy to come by and use for creating fun.

The second hurdle will be alignment, memory accesses need to be aligned to the size. 32 bit reads and writes only on 32 bit boundaries, 16 bit reads and writes, only on 16 bit boundaries, unfortunately architectures like x86, have crippled folks into be lazy, there is a penalty on x86 for going unaligned, unfortunately the penalty is not strong enough for folks to take notice. ARM and others do not tolerate unaligned accesses (by default or in general) and that causes small to major problems for folks for a while.

Third hurdle is that arm alu instructions by default dont change flags, you have to add the 's'.

Code: Select all

and r0,r0,#0x20  does not change flags
ands r0,r0,#0x20 does change flags
And in true gnu fashion, in thumb mode they hosed the assembly syntax to confuse you further (which is why I get messed up as I will do thumb for a while then go back to arm).

David

User avatar
DavidS
Posts: 4208
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Assembler questions

Sun Jan 06, 2013 1:03 pm

Burngate wrote:My second question is to do with the BL instruction, and is really hammering my long-term memory.
When the Arm encounters a branch with link instruction, it puts the PC contents into R14, before taking the branch. But what's in the PC is already ahead of the BL instruction, by the length of the pipeline.
So on return (mov PC, R14) the PC is one or more instructions ahead of where it should be
Now thinking about it some more, it would seem that the Arm should modify what it puts into R14 so that it gets back to the right place, but I don't really know.
I just remember the original ARM600 had a pipeline 3 instructions long, while the StrongArm had one 4 (or 5) long

Having spent Boxing Day on Google, I've come to the conclusion that it's a non-problem apart from self-modifying code
There was also the issue of modifying the PC, if you told it to update the flags using ^ in some instructions it would cause some real problems when in 32-bit addressing mode (due to the old 26-bit addressing only ARM chips storing the flags in the unused bits of the PC.
Though this is still true today, all 32-bit addressing ARM cores have the same issue.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4208
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Assembler questions

Sun Jan 06, 2013 1:09 pm

Also if you are using assembly on the ARM, why are you using the GNU assembler? I can understand using GCC for C (as LCC does a terrible job of optimization and the RISC OS DDE is commercial), but there are so many better assemblers available, that are easier to use and more capable when targeting the ARM (even the BBC BASIC assembler is better when patched to support the newer instructions and VFP). I use extASM myself.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

tufty
Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: Assembler questions

Sun Jan 06, 2013 1:58 pm

DavidS wrote:Also if you are using assembly on the ARM, why are you using the GNU assembler?
Can't answer for the others, but:

1 - it's free
2 - if you install gcc its already installed
3 - one functionally complete assembler is as good as another.

Yes, gas has horrible syntax, but the reasons for using a.n.other assembler over it are /purely/ syntactic. And syntax, frankly, doesn't matter. The reasons for using another C compiler than the GNU one, on the other hand, are related to the performance of generated code.

User avatar
DavidS
Posts: 4208
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Assembler questions

Mon Jan 07, 2013 1:08 am

tufty wrote:Yes, gas has horrible syntax, but the reasons for using a.n.other assembler over it are /purely/ syntactic. And syntax, frankly, doesn't matter. The reasons for using another C compiler than the GNU one, on the other hand, are related to the performance of generated code.
And extASM is free with a simple integrated WIMP front end, not to mention that the patched BBC BASIC assembler is included with the current releases of the OS (starting with RISC OS 5.18).

So why use an assembler that has a terrible syntax when even the assembler built into the operating system has a better syntax (and fallows the correct ARM syntax [with the exception of the VDUP.16 mnemonic which is changed to VDPL]).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4208
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Assembler questions

Mon Jan 07, 2013 1:18 am

Also if you wish that that comes with GCC, the native GCC includes ASASM
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

tufty
Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: Assembler questions

Mon Jan 07, 2013 6:32 am

...if you're using RiscOS. Most people aren't. Especially the ones hanging around in this particular subforum (bare metal / no OS development).

But, like I said, syntax, quite simply, doesn't matter in the slightest. Functionally, gas is /exactly/ as capable as the ARM assembler.

User avatar
DavidS
Posts: 4208
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Assembler questions

Mon Jan 07, 2013 8:05 pm

tufty wrote:...if you're using RiscOS. Most people aren't. Especially the ones hanging around in this particular subforum (bare metal / no OS development).
I understand this, though you need a good OS to develop on. I am not aware of any assembler + text editor package that runs on the bare metal.

What ever you choose for a host development environment to use for developing your bare metal projects you want plenty of free RAM (this helps to speed up the compile/assemble/link process as most toolchains allow you to catch intermediate files to RAM). And you want a system that allows your toolchain to use almost all of the processor time. Remember we are working on a machine that only has 512MB (or 256MB on the earlier boards) and a 700MHz 850MIPs CPU.

And of course it is always best to do bare metal development on the target hardware (so you are not swapping storage media, or serial booting).

On the Pi we are limited in our options for a host OS for development.
  • 1)There are a few Linux distros. these take up a good amount of RAM, and do not allow for single-tasking (very important when compiling on slow hardware).
    2)Then there is Plan 9, Development under this I understand to be quite difficult.
    3)And finally there is RISC OS, taking very little RAM and allowing for single-tasking.
So it is only logical to develop your bare metal projects on the OS best suited to the purpose, that runs on the RPi. At least until some one writes a good text editor and assembler to run on the bare metal.
tufty wrote: But, like I said, syntax, quite simply, doesn't matter in the slightest. Functionally, gas is /exactly/ as capable as the ARM assembler.
I agree, if you do not wish every one to be able to assemble your code. Further how much since does it make to waste a lot more RAM than necessary in order to assemble a program.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Assembler questions

Mon Jan 07, 2013 8:20 pm

There are a few positive reports of this book on the forum: http://www.amazon.co.uk/Raspberry-Pi-As ... 752&sr=8-1 . This edition centres on programming from within a BASIC shell on RISC OS although the author has hinted that he may touch on a Linux aproach in a follow-up publication. If anyone wishes to use GCC and conserve memory then I and others have managed to use GCC on RISC OS, so its possible.

If I were to attempt Assembly Language on the Pi I would be tempted to try RISC OS + BASIC as the development environment and then compile using GCC on RISC OS. This would seem easiest (for me) as it would seem to bypass any cross-platform tool chainy head-aches, but I cannot claim to be speaking from experience.

I think I am right that RISC OS is the least bloaty OS on the Pi because it is the only one specifically designed for ARM architecture, so perhaps the best bet for native assembly as it leaves the most memory free.

User avatar
rurwin
Forum Moderator
Forum Moderator
Posts: 4258
Joined: Mon Jan 09, 2012 3:16 pm
Contact: Website

Re: Assembler questions

Mon Jan 07, 2013 8:47 pm

DavidS wrote:1)There are a few Linux distros. these take up a good amount of RAM, and do not allow for single-tasking (very important when compiling on slow hardware).
Try

Code: Select all

sudo init 1
The only processes running are the OS and drivers, and even those are tunable.
Over 99% idle CPU and 50MB used. As opposed to normal run-mode 2, over 98% idle and 60MB used.

User avatar
DavidS
Posts: 4208
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Assembler questions

Mon Jan 07, 2013 11:56 pm

rurwin wrote:The only processes running are the OS and drivers, and even those are tunable.
Over 99% idle CPU and 50MB used. As opposed to normal run-mode 2, over 98% idle and 60MB used.
Yes then run make and monitor the CPU usage, it will peak at 80% and usually average about 45%. That tells me that there is on average 55% CPU time that could be claimed if we did not have a preemptive scheduler in the way. The current port of GCC to RISC OS still leaves some dead time but not as bad. Also you are using 50MB of RAM with no user applications other than the shell? That is a lot even for Linux.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
rurwin
Forum Moderator
Forum Moderator
Posts: 4258
Joined: Mon Jan 09, 2012 3:16 pm
Contact: Website

Re: Assembler questions

Tue Jan 08, 2013 12:17 am

The shell and "top" to be pedantic, but yes. However I'm wrong. That includes the disk caches. If you subtract them (Linux uses unused memory for disk buffers, but gives it back when it's needed) run-mode 1 uses 8MB, and a fairly loaded run-mode 2 (I've got Samba for instance), uses 20MB.

I'd interpret it differently. I'd say the process was I/O bound waiting for the SD 50% of the time. The fact that RiscOS gets more CPU %age indicates it is less efficient; maybe it has less intelligent, or just smaller, disk caches. The real measure of course, is how long the compilation takes.

You could try make with the -j switch to use more threads, but of course the SD is still a bottle-neck, so I don't know if that would improve matters.

Return to “Bare metal, Assembly language”