What an interesting conversation!
jahboater wrote: ↑
Sat Feb 09, 2019 3:10 pm
was able to remove one of the recursive calls.
Which is pretty impressive, I gave you that. But still haven't changed the asymptotic behaviour.
Clang compiles this entire function to one single instruction (on x86).
Again, very impressive, non the less a special case. Btw replacing "for(i=0;i<sizeof(a);i++) a[ i ]=0;" with a single memset() call is similar, a pre-programmed optimisation, not an universal solution.
Pretty clever I'd say!
Agreed! I did not wanted to suggest the compiler optimisations are not good. In most cases they can do awesome things, probably will perform better than 90% of the programmers. But there's the remaining 10%, the really talented programmers, with a great algorithmic perspective that no compiler can compete. Probably you have heard the story of Mel
, it's a classic. That's what I wanted to say by stressing "experienced". A typical average programmer is no match for today's compilers, no doubt.
Heater wrote:Looks like we are in agreement then. If one wants to make serious optimizations one needs to be looking for algorithmic changes. Certainly we cannot expect our compilers to magically swap our slow fibo for a fast one or a naive Fourier Transform for a Fast Fourier Transform and so on.
Absolutely! I couldn't agree more! That's exactly what I meant.
Simply tinkering around with micro-optimizations or working in assembler is not going to get you orders of magnitude improvements.
No it usually won't, unless you know something that the compiler doesn't. With jahboater's example, an experienced programmer can use the "bsf" instruction on x86 to find a bit, but the compiler can only optimize for it if a) it can recognize what the C loop does, b) it knows that there's an assembly instruction equivalent. As jahboater pointed out, with finding-a-bit Clang can do that, but I bet it can't do the same with an SHA calculating code for example (maybe it will in some future version). A good programmer can understand and utilize an instruction right on the spot, while a compiler have to wait for it's next release when that particular optimisation is added (if that happens at all).
trying out different algorithms and so on it is far easier in a high level language.
Probably yes. But using a high-level-language also stops you from experimenting with cool close-to-hardware stuff. As you put it, when "There an entire algorithm has been buried in hardware."
Further, that the small gains to be had by working in assembler are far outweighed by the increased development time, increased bug rate, lack of portability and so on.
Usually yes, but there are exceptions. The portable pixman
library pops into my mind, which does not leave the optimisation to the compiler, instead it's full of assembly parts
, each written for different architecture, but with the same interface, doing the same thing. I'm sure the reason why the creators choose assembly over C was because they're experienced programmers, and they could do a much better job than any compiler optimizer. (Frankly most of their optimisations are parallel computing related, something that compilers are particularly not really good at yet, but they're improving significantly.)
A "superior" algorithm may be excellent in general but sub-optimal for the typical data in an application.
True. The consent here is that an algorithm that runs in constant time is always better than the one which depends on the input. But you're perfectly right that the typical input data must be taken into account, another thing I believe a compiler can't do, but a programmer can. (Assuming he is an experienced one, not an average, of course
) For example I would never ever use a logarithmic search (which requires sorted input) in conjuction with quick sort, when I know there's only ten unsorted input items tops. Really good point!
Your suggested fibo equation is certainly not O(1).
It certainly is, as you can calculate sqrt(5) in advance with any arbitrary precision, and you can use that simple constant instead in F(). Similairly there's an algorithm for calculating the power of two which is O(n) sure, but you can use a bitshift instead which is O(1). Or you could use a look-up table, which is again, O(1). Here the point is, you can split that algorithm into smaller parts, and do exactly the same asymptotic optimisation with every parts individually. At the end you'll have many smaller parts, each being O(1), and that will also mean that their overall product, the whole resulting algorithm is O(1) too. I admit asymptotic analysis is not the easiest part of computer science, I've studied it for more semesters, and it was only years later when I started to understand what I've learned.