## Quick log2????

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Quick log2????

I have been writing a lot of C code in lately, that I hope to figure out how to get into my git repo at a time when my internet connection is stable. As such I have run into a difficulty:

I have a question of howto do some basic math in C. This is a task that is very simple in assembly. How do you quickly determine the maximum set bit in a number in a minimal number of operations in C? I know slow ways of doing this in C by multiple testing, though not a direct way.

For example if implementing the function unsigned long log2(unsigned long n);. In assembly it can be done like:

Code: Select all

``````;Crude log2 implementation, gives the log2 of
;  the largest number below the input value that
;  is a power of 2.
;ON ENTRY:  R0 = Value to find the base 2 log of.
;ON EXIT:   R0 = Log 2 of input value.
.log2
TST    R0,&FFFFFFFF
CLZNE  R0,R0
RSBNE  R0,R0,#31
MOV    R15,R14
``````
Though I can not figure out an equally simple way of finding the topmost set bit in C. In ARM assembly we just use CLZ (for ARMv4 and newer anyway), how do we do it as effeciently in C?

This is very important for many operations. For example many display operations, we prefer a row pitch that is a power of 2, and obtaining the log2 of the row pitch for use in quick calculations is important (even though only needed when the display resolution is setup, if it changes the virtual FB width). For example to quickly do a setpix (assumes color is 32bpp, and color value passed in is valid):

Code: Select all

``````typedef unsigned long ULONG;

int FBSetPix32(ULONG Color, ULONG x, ULONG y)
{
if (x <= ScrMaxX && y <= ScrMaxY)
FB32[(y << ScrRowShft) + x)] = Color;
}
``````
which in ARM assembly is (assuming that the variables are in range):

Code: Select all

``````;Set a 32BPP pixal at point (x,y) on a screen with a framebuffer that
; has a power of 2 row pitch.
.FBSetPix32
STMFD  R13!,{R3-R4,R14}
LDR    R3,ScrMaxX         ;Sanity check, make sure we are in bounds.
CMP    R1,R3
LDRHI  R4,ScrMaxY
LDRLS  R3,ScrRowShft      ;So we can calc the row offset.
CMPHI  R2,R4
LDRLS  R4,FB32
ADDLS  R1,R1,R2,LSL R3    ;Quick way to calculate the index.
STRLS  R0,[R1]
LDMFD  R13!,{R3-R4,R15}
``````
Not shown in the example:
FB32 is ULONG pointer to the base address of the frame buffer.
ScrMaxX is the maximum x coord for the current screen resolution.
ScrMaxY is the maximum y coord for the current screen resolution.
ScrRowShft is the log2 of the row pitch (and assumes a power of 2 for the row pitch in this case).

Needless to say the same information is needed for all frame buffer drawing and raster functions.

There are also many other uses where finding the topmost set bit quickly would be very helpful.

I think I corrected the typos in the code. Sorry about that, they are quick on the fly typed in examples to illustrate the point of the question.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jojopi
Posts: 3078
Joined: Tue Oct 11, 2011 8:38 pm

### Re: Quick log2????

GCC provides __builtin_clz(), et al, which will emit the hardware bit counting instruction when available, or otherwise call a helper function for the architecture.

Traditionally you would want to try to avoid these operations in code that needs to be both fast and portable. On platforms without the hardware support, it will always be much slower. And the best algorithm or the method to get the hardware implementation would be system-specific.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

jojopi wrote:
Sat Jun 29, 2019 12:46 pm
GCC provides __builtin_clz(), et al, which will emit the hardware bit counting instruction when available, or otherwise call a helper function for the architecture.
While I thank you, I said in C not in GCC extension to C.
Traditionally you would want to try to avoid these operations in code that needs to be both fast and portable. On platforms without the hardware support, it will always be much slower. And the best algorithm or the method to get the hardware implementation would be system-specific.
Not to worried about portability, though if there is a way in C it should be fairly portable anyway. Much of the C that I am writing is accessing HW that is very much ARM specific, or in some cases very much Raspberry Pi specific. This is what happens when we start looking at C to write a new Operating System (I still prefer ARM Assembly for this target specific type of code, though I am attempting to write as much in C as possible as a nod to other coders that like C).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

DavidS wrote:
Sat Jun 29, 2019 12:52 pm
I said in C not in GCC extension to C.
These builtins/intrinsics are more portable than you think.
The other main compilers such as Clang and ICC copy the extensions that GCC provides.

But if you are really really worried about portability, then

#ifdef __GNUC__
#define cls(n) __builtin_clrsb(n)
#define clz(n) __builtin_clz(n)
#define ctz(n) __builtin_ctz(n)
#else
provide some hand written C functions to do the same.
#endif

Finally I will repeat whats said above, the builtin's will give you the fastest possible code. They will use the single instructions to do all the work when the hardware supports it, otherwise efficient library routines get called.
I doubt if you can hand write assembler than can compete for speed with these builtin's.

In the above, cls() uses, for example, the brilliant ARM cls instruction - count leading redundant sign bits.
Obviously clz() uses the ARM clz instruction when needed.
There is no ARM ctz instruction, so the compiler emits some clever assembler instead.

You could of course write some inline assembler.

Code: Select all

``````static inline int
clz( const unsigned int num )
{
int bits;
asm( "clz %0,%1" : "=r" (bits) : "r" (num) );
return bits;
}``````
Which will do the same thing (emit one single CLZ instruction) - but is pointless really given the more portable builtin.
Last edited by jahboater on Sat Jun 29, 2019 7:53 pm, edited 1 time in total.

HermannSW
Posts: 1309
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

### Re: Quick log2????

> How do you quickly determine the maximum set bit in a number in a minimal number of operations in C?
>
I can only answer the minimal number for minimum set bit, and that is constant:

Code: Select all

``````y = x;
x &= (x-1);   // deletes last bit set
y ^= x;``````

There are different algorithms to determine the number of bits set in binary representation of N:
• iterate over all bits and test (log N steps)
• use middle operation from above until 0 ("number of bits set" steps)
• there is a masking algorithm that completes in O(log(log(N))) steps
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

HermannSW wrote:
Sat Jun 29, 2019 7:52 pm
There are different algorithms to determine the number of bits set in binary representation of N:
Do you mean population count?

ARM has the "cnt" instruction to do that ("popcnt" on Intel).
and gcc has the __builtin_popcount() intrinsic to produce the "cnt" instruction.
*use middle operation from above until 0 ("number of bits set" steps)
If you write Brian Kernighans algorithm, a decent compiler will spot what you are doing and emit the single instruction.

HermannSW
Posts: 1309
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

### Re: Quick log2????

jahboater wrote:
Sat Jun 29, 2019 8:00 pm
ARM has the "cnt" instruction to do that ("popcnt" on Intel).
and gcc has the __builtin_popcount() intrinsic to produce the "cnt" instruction.
Interesting, I did not know that.
I looked up quite some documents on cnt and popcnt and cannot find information on the cycles the instructions need.
Do you know?
⇨https://stamm-wilbrandt.de/en/Raspberry_camera.html

https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://gitlab.freedesktop.org/HermannSW/gst-template
https://github.com/Hermann-SW/fork-raspiraw

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

HermannSW wrote:
Sat Jun 29, 2019 8:08 pm
jahboater wrote:
Sat Jun 29, 2019 8:00 pm
ARM has the "cnt" instruction to do that ("popcnt" on Intel).
and gcc has the __builtin_popcount() intrinsic to produce the "cnt" instruction.
Interesting, I did not know that.
I looked up quite some documents on cnt and popcnt and cannot find information on the cycles the instructions need.
Do you know?
For intel, the best info I know of is:-

https://www.agner.org/optimize/instruction_tables.pdf

which should hopefully include the "popcnt" insn.

I have just looked at what GCC emits and it looks like a NEON instruction, sorry.

Code: Select all

``````// try.c:4765:        val->i = popcount(q);
fmov    d0, x20
cnt     v0.8b, v0.8b
umov    w0, v0.b[0]
``````
More than one instruction.

For intel it just does

Code: Select all

``````# try.c:4765:         val->i = popcount(q);
popcntq %rsi, %rsi
``````

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

DavidS,

Sorry, we wandered off topic a bit.

If you want pure ISO standard C, then the only thing I can suggest is to
write a simple clz() function by hand. Compile it and look at the assembler code produced.

Modern compilers are very clever at spotting what the user is trying to, and common idioms, and producing much faster code. In this case it may well choose the CLZ instruction on ARM.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

Interesting, Intel ops have come some way since I had to write a lot of Intel ASM. I remember when they added the MOVcc instructions and thinking maybe they are paying attention to ARM after all (I was wrong).

Well I appreciate the input on the intrinsics, though I am definitely NOT including assembly equivalent intrinsics in my C compiler (as I openly apposed it when some better known compilers of the 1990's started doing it). Thank you though.

My compiler is following the ANSI C standard as close as possible, with additions to support C99 and C11 (though still some questions about the standard library additions). That includes the compiler defined use of the asm keyword to implement more like usable compilers do then what the GCC and copycats do.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

DavidS wrote:
Sat Jun 29, 2019 8:47 pm
Interesting, Intel ops have come some way since I had to write a lot of Intel ASM. I remember when they added the MOVcc instructions and thinking maybe they are paying attention to ARM after all (I was wrong).
Yes indeed. Intel keeps producing more and more complicated instruction sets.
I think this instruction is cool (something to do with cryptography)

GF2P8AFFINEINVQB — Galois Field Affine Transformation Inverse

Description
The AFFINEINVB instruction computes an affine transformation in the Galois Field 2^8 . For this instruction, an affine
transformation is defined by A * inv(x) + b where “A” is an 8 by 8 bit matrix, and “x” and “b” are 8-bit vectors. The
inverse of the bytes in x is defined with respect to the reduction polynomial x 8 + x 4 + x 3 + x + 1.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

jahboater wrote:
Sat Jun 29, 2019 9:29 pm
DavidS wrote:
Sat Jun 29, 2019 8:47 pm
Interesting, Intel ops have come some way since I had to write a lot of Intel ASM. I remember when they added the MOVcc instructions and thinking maybe they are paying attention to ARM after all (I was wrong).
Yes indeed. Intel keeps producing more and more complicated instruction sets.
I think this instruction is cool (something to do with cryptography)

GF2P8AFFINEINVQB — Galois Field Affine Transformation Inverse

Description
The AFFINEINVB instruction computes an affine transformation in the Galois Field 2^8 . For this instruction, an affine
transformation is defined by A * inv(x) + b where “A” is an 8 by 8 bit matrix, and “x” and “b” are 8-bit vectors. The
inverse of the bytes in x is defined with respect to the reduction polynomial x 8 + x 4 + x 3 + x + 1.
WOW, they are off there rocker at InTel. And they wonder why very few bother to keep up with the rules of optimization on there CISC (emphasis on the first C for Complex) CPU's.

Do they forget that this is what killed many other well known CISC Architectures, getting to complex with the ISA (anyone remember VAX).

Though thank you, that seals that if I ever have to deal with Intel assembly again I will stick with the original Pentium subset of the ISA.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 12960
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Quick log2????

DavidS,

There are more or less cunning ways to get log2 of an integer in C all over the internet. For example:
http://guihaire.com/code/?p=414

Almost certainly you don't need an ultimately optimized log2 for what you are doing there.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

Heater wrote:
Sat Jun 29, 2019 11:05 pm
DavidS,

There are more or less cunning ways to get log2 of an integer in C all over the internet. For example:
http://guihaire.com/code/?p=414

Almost certainly you don't need an ultimately optimized log2 for what you are doing there.
You have a point there, as most of what I would need it for is stuff that rarely happens, or in a few cases on an application startup or library initialization.

Though I still prefer reasonably optimal code.

I have spent more time getting my compiler to identify certain sequences that can be highly optimized when targeting the ARM than I did getting it working in the first place. For example shifts and ALU ops where the result of the shift is one operand of the ALU op need to be put into a single instruction. Another example is looking forward to have results that become parameters of function calls end up in the correct register naturally. Another is tracking how variables are accessed within a given function to minimize the number of memory accesses. As I have said already the compiler is far from the best at optimization, though so long as you feed it reasonably optimal C it should produce reasonably optimal assembly (in other words it will optimize the code for how it is written, though it will not attempt to correct programmer mistakes in producing sub-optimal C code).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

DavidS,
DavidS wrote:
Sun Jun 30, 2019 2:22 am
Though I still prefer reasonably optimal code.

I have spent more time getting my compiler to identify certain sequences that can be highly optimized when targeting the ARM than I did getting it working in the first place. For example shifts and ALU ops where the result of the shift is one operand of the ALU op need to be put into a single instruction. Another example is looking forward to have results that become parameters of function calls end up in the correct register naturally. Another is tracking how variables are accessed within a given function to minimize the number of memory accesses.
Are you still using GCC 4.7 ?

If so I think you are wasting your time. That compiler is so old it never generated good ARM code. If you can, try GCC 9.1 (the latest version), or even GCC 8.3 as included in the latest Raspian.

It will do all the things you have mentioned above and much much more. What you describe is trivial stuff that a modern compiler will do easily and routinely.
DavidS wrote:
Sun Jun 30, 2019 2:22 am
though it will not attempt to correct programmer mistakes in producing sub-optimal C code).
It will. Recent compilers are getting quite good at deducing what the programmer was really trying to achieve and producing better code, including changing the algorithm sometimes.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

jahboater wrote:
Sun Jun 30, 2019 3:10 am
DavidS,
DavidS wrote:
Sun Jun 30, 2019 2:22 am
Though I still prefer reasonably optimal code.

I have spent more time getting my compiler to identify certain sequences that can be highly optimized when targeting the ARM than I did getting it working in the first place. For example shifts and ALU ops where the result of the shift is one operand of the ALU op need to be put into a single instruction. Another example is looking forward to have results that become parameters of function calls end up in the correct register naturally. Another is tracking how variables are accessed within a given function to minimize the number of memory accesses.
Are you still using GCC 4.7 ?

If so I think you are wasting your time. That compiler is so old it never generated good ARM code. If you can, try GCC 9.1 (the latest version), or even GCC 8.3 as included in the latest Raspian.

It will do all the things you have mentioned above and much much more. What you describe is trivial stuff that a modern compiler will do easily and routinely.
DavidS wrote:
Sun Jun 30, 2019 2:22 am
though it will not attempt to correct programmer mistakes in producing sub-optimal C code).
It will. Recent compilers are getting quite good at deducing what the programmer was really trying to achieve and producing better code, including changing the algorithm sometimes.
I am using the compiler I am writing for use with my Operating System. So no version of gcc.

I do not think anyone has managed to get any newer version of GCC running on RISC OS as of yet, so for the things I do use GCC for I am stuck with GCC 4.7.4.

It is not a compilers job to correct programmers mistakes in writing optimal code. Programmers need to follow the basic rules of producing optimal implementations of there algorithm in the language of choice, then the compiler only needs to do the optimizations that better suite the target CPU, as well as some basic variable usage tracking to minimize memory access (especially in loops, or leaf functions that could be called from loops).

Writing poor code because 'The compiler will optimize it' is the worse excuse in the world, and one I hear more and more.

Ok new programmers will not produce very good code, that is ok. Though as they learn they should also learn to produce more optimal code as they go.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

Wow, that's a big project - both a compiler and an OS! Good luck.

You mentioned register tracking etc a couple of times. Its quite hard.
The current register allocator in GCC took 20 years to write and has been the subject of numerous research projects.

https://en.wikipedia.org/wiki/Register_ ... techniques

The new Raspberry Pi4 has introduced a new problem with its "out-of-order" CPU's.
The instruction scheduling is completely different of course.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

jahboater wrote:
Sun Jun 30, 2019 8:05 am
Wow, that's a big project - both a compiler and an OS! Good luck.

You mentioned register tracking etc a couple of times. Its quite hard.
The current register allocator in GCC took 20 years to write and has been the subject of numerous research projects.

https://en.wikipedia.org/wiki/Register_ ... techniques

The new Raspberry Pi4 has introduced a new problem with its "out-of-order" CPU's.
The instruction scheduling is completely different of course.
I am surprised it only took them 20 years with as many CPU's as they support. I have always felt that a compiler is one tool best developed for each target separately. That is you work on the ARM target and the ARM target shares almost nothing with other targets.

I am definitely looking forward to figuring out how to best allocate resources on the RPi 4B, it is going to be a challenge. Little has changed in the rules of optimization from the original ARMv2 up through the Cortex-A53 so far in computers I have had, now soon I will have a computer that breaks all the optimization rules and makes a new set of rules. Up till now it has been stay out of memory, we added not using result reg dependent instructions back to back as much as possible, and be reasonable in the choice of ops. Now the rules are likely to look a good bit different for the Cortex-A72.

Do not get me wrong there have been changes in the rules over time, though those changes were fairly simple ones, like paying attention to the memory bus size when it exceeded 32-bits.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

DavidS wrote:
Sun Jun 30, 2019 11:39 am
we added not using result reg dependent instructions back to back as much as possible
Could you just skip that for the A72? The hardware should sort it all out.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

I have no idea what happened, every one of the responses from jahboater vanished, making this thread look a bit strange, and a little broken in nature. I do apologize to anyone stumbling on this thread for this oddness.

I was about to blame Linux (actually the Web Browser) for a lost reply until I noticed that his posts vanished from existence.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

DavidS wrote:
Sun Jun 30, 2019 12:29 pm
I have no idea what happened, every one of the responses from jahboater vanished, making this thread look a bit strange, and a little broken in nature. I do apologize to anyone stumbling on this thread for this oddness.

I was about to blame Linux (actually the Web Browser) for a lost reply until I noticed that his posts vanished from existence.
Sorry, my bad. I didn't think my last post contributed anything of real interest, so I deleted it

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

jahboater wrote:
Sun Jun 30, 2019 12:29 pm
DavidS wrote:
Sun Jun 30, 2019 11:39 am
we added not using result reg dependent instructions back to back as much as possible
Could you just skip that for the A72? The hardware should sort it all out.
Still need to optimize for the A53 and earlier.

Though I had typed in a much more complete reply that got lost when your reply that was in this spot disappeared along with the previous responses of yours in this thread. Sorry I do not feel up to typing it in again at the moment.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4595
Joined: Wed Feb 04, 2015 6:38 pm

### Re: Quick log2????

Just for interest, code written for the A72 or A73 works fine on the A53.
It common for the two CPU types to be in the same "big.little" group and a process could be run on either.
Obviously its not optimal for one of them!

Heater
Posts: 12960
Joined: Tue Jul 17, 2012 3:02 pm

### Re: Quick log2????

DavidS,
I am surprised it only took them 20 years with as many CPU's as they support. I have always felt that a compiler is one tool best developed for each target separately. That is you work on the ARM target and the ARM target shares almost nothing with other targets.
I am no compiler writer but a moments reflection would tell you that this is not true. As would reading about the design of GCC and LLVM.

When compiling C for many targets there are many components of a compiler that can be reused.

The pre-processor
The lexer
The parser
The Intermediate Representation(IR)/syntax tree whatever generator.
A great many optimizations, that are performed on the IR

It does not stop there:

In GCC and LLVM there is common code for code generation (assembler), creating the assemblers and linkers etc. Common code that is table driven or somehow steered to do the right thing for each target.

Even things like register allocation can be used on many architectures if written in a generic way and given descriptions of the registers and ABI required.

Proof of the pudding here is that when a new processor instruction set architecture comes along there is very soon support for it in GCC and LLVM. As has happened for the RISC V recently. They did not have to build a new compiler from the ground up.

It gets better....

When you have a compiler infrastructure in place, like GCC and LLVM, you can use it to compiler all manner of other languages by building a new front end. In this way we get compilers for C++, Fortran, etc with a lot less effort. GCC recently add the D language to it's repertoire for example.

If you think about it it's the only way to go. If you have N languages you want to compile and M architectures to target then that is N * M compilers you have to write and maintain. That soon becomes a huge number. But if you can arrange that the bulk of the work common to all those is in some kind of intermediate engine then you only need to create N front ends and M back ends. A very much smaller number.

DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

### Re: Quick log2????

@Heater:
And you mention many of the problems with the GNU Compiler Collection (as well as a couple of others). A language specific Lexer and parser will reduce compile time, reduce memory usage, and save power. Now it is true that both the Lexer and Parser are components that can be made target independent, those are among the minimal amount of code shared among different targets.

As to the codgen, there is a lot of debate as to if it is really good practice to use a platform independent intermediate code (though some kind of intermediate code does simplify some optimizations). The arguments going that having a platform independent intermediate representation means more work for the final codegen that produces the platform targeted code.

As to optimizations that are platform independent, yes there are many, though most of them are correcting the user creating poor code (which should not be in a compiler). Those that are useful across multiple processors may contain code that can be used in multiple targets, though not all targets (think about the limits of Intel x86 in register usage for instructions, or the split address and data registers of the 680x0 series, or single accumulator machines).

So yes there is code that can be reused, and hence why I said (in a post that did not get posted): "While there is some code that can be reused between targets, most of the compiler needs to be developed for the target platform".
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers