There are no advantages to that macro, only disadvantages.

The logic for loading a 32 bit constant into a register on ARMv6 using ARM assembly language is as follows:

- if x can be expressed as a modified immediate value*, use MOV

- if the complement of x can be expressed as a modified immediate value, use MVN

- Otherwise use a pc-relative load (what you're seeing).

That gives the fastest possible load where possible. The macro you're looking at takes the "brain dead" approach of "everything is an arbitrary value".

Looking at the ARM1176jzf-s TRM, section "Cycle timings and interlock behaviour" we see that:

MOV Rn, x -> 1 cycle

MVN Rn, x -> 1 cycle

LDR Rn, [PC, #constant] -> 1 cycle, with a latency of 3 cycles on Rn

The macro, which consists of 4 data processing instructions, will always take 4 cycles, and will always underperform the other options by at least 1 cycle - the worst case where you do something stupid like this:

Code: Select all

```
ldr r0, [pc, #xx] ; My constant
add r0, r0, r0 ; try to use r0 straight away, incurring a 3 cycle wait on use of r0
```

It's worth noting that a lot of compilers will generate the "worst case" code unless they have some fairly hairy ARM-specific optimisations enabled.

Another thing worth considering is that, if you can manage to express most of your constants as modified immediate values, you can not only load them fast, but mostly use them as immediates in data processing instructions rather than even loading them to registers - the macro's 4 cycles become *zero* cycles (no register load) *and* win you an extra free register. For example, here's some already fairly optimised code to count the number of entries in a linked list.

Code: Select all

```
mov r2, #0xff000000 ; end of list marker
mov r1, #0 ; count
loop:
cmp r0, r2
ldrne r0, [r0]
addne r1, #1
bne loop
...
```

Can be expressed as:

Code: Select all

```
mov r1, #0 ; count
loop:
cmp r0, #0xff000000 ; end of list marker
ldrne r0, [r0]
addne r1, #1
bne loop
...
```

... thus freeing up r2 for other uses. Now, the above is trivial, the constant only gets loaded once, but if the loop was more complex, the register holding it might have to be spilled (shoved to the frame temporarily) in order to free up the register for other usage. Just spilling the register *once* in the loop would incur 4 load/store operations per loop iteration.

So. Why do the fasm boys & girls use that macro? Because fasmarm doesn't provide the facilities armasm or gcc do, and nobody (as far as I'm aware) has bothered to implement a more intelligent macro for fasmarm. Perhaps those wizards aren't as clever as they proclaim themselves to be.

Simon

* i.e. can be expressed as an 8-bit value shifted by an even power of 2