Aran
Posts: 36
Joined: Fri Jan 25, 2019 5:55 pm

Re: Spider-OS a new operating system

Sat Feb 09, 2019 3:55 pm

as Heater mention Fast Fourier Transform, I take this opportunity to report this site : http://www.aholme.co.uk/GPU_FFT/Main.htm
The program performs the operation taking advantage of the power of the GPU. The code is in C, and another in assembler to directly program the GPU.
So I would say that to optimize, we must use the right hardware components at our disposal and of course the best algorithm. And in some cases, we have to go through the assembler because we have to inject the code into the chip.
Last edited by Aran on Sat Feb 09, 2019 4:41 pm, edited 1 time in total.

Heater
Posts: 13683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Spider-OS a new operating system

Sat Feb 09, 2019 4:08 pm

Wow, what a wonderful piece of work that FFT is.

It's interesting that the new Pi with NEON can match the Videocore now.
Memory in C++ is a leaky abstraction .

Aran
Posts: 36
Joined: Fri Jan 25, 2019 5:55 pm

Re: Spider-OS a new operating system

Sat Feb 09, 2019 4:56 pm

yes actually from Pi 2, we can use Neon, and it's still easier to program.
A benchmarking, as well as a library available here : https://community.arm.com/android-commu ... ft-feature

Heater
Posts: 13683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Spider-OS a new operating system

Sat Feb 09, 2019 5:52 pm

Interesting.

From the graph they give we see that all the FFT they compare are within about 15% for 4096 bins.

They stop at 4096 bins. Looks to me like Ne10's execution time is rising more rapidly than FFTW or OpenMax before that. It would be nice to see where they converge and how well they do after that.

Do FFTW or the OpenMax DL solutions use pure C? I have no idea but have always assumed they make use of DSP/vector/GPU ops with assembler or intrinsics when they can.
Memory in C++ is a leaky abstraction .

bzt
Posts: 393
Joined: Sat Oct 14, 2017 9:57 pm

Re: Spider-OS a new operating system

Sat Feb 09, 2019 7:06 pm

What an interesting conversation! :-)
jahboater wrote:
Sat Feb 09, 2019 3:10 pm
was able to remove one of the recursive calls.
Which is pretty impressive, I gave you that. But still haven't changed the asymptotic behaviour.
Clang compiles this entire function to one single instruction (on x86).
Again, very impressive, non the less a special case. Btw replacing "for(i=0;i<sizeof(a);i++) a[ i ]=0;" with a single memset() call is similar, a pre-programmed optimisation, not an universal solution.
Pretty clever I'd say!
Agreed! I did not wanted to suggest the compiler optimisations are not good. In most cases they can do awesome things, probably will perform better than 90% of the programmers. But there's the remaining 10%, the really talented programmers, with a great algorithmic perspective that no compiler can compete. Probably you have heard the story of Mel, it's a classic. That's what I wanted to say by stressing "experienced". A typical average programmer is no match for today's compilers, no doubt.
Heater wrote:Looks like we are in agreement then. If one wants to make serious optimizations one needs to be looking for algorithmic changes. Certainly we cannot expect our compilers to magically swap our slow fibo for a fast one or a naive Fourier Transform for a Fast Fourier Transform and so on.
Absolutely! I couldn't agree more! That's exactly what I meant.
Simply tinkering around with micro-optimizations or working in assembler is not going to get you orders of magnitude improvements.
No it usually won't, unless you know something that the compiler doesn't. With jahboater's example, an experienced programmer can use the "bsf" instruction on x86 to find a bit, but the compiler can only optimize for it if a) it can recognize what the C loop does, b) it knows that there's an assembly instruction equivalent. As jahboater pointed out, with finding-a-bit Clang can do that, but I bet it can't do the same with an SHA calculating code for example (maybe it will in some future version). A good programmer can understand and utilize an instruction right on the spot, while a compiler have to wait for it's next release when that particular optimisation is added (if that happens at all).
trying out different algorithms and so on it is far easier in a high level language.
Probably yes. But using a high-level-language also stops you from experimenting with cool close-to-hardware stuff. As you put it, when "There an entire algorithm has been buried in hardware." :-)
Further, that the small gains to be had by working in assembler are far outweighed by the increased development time, increased bug rate, lack of portability and so on.
Usually yes, but there are exceptions. The portable pixman library pops into my mind, which does not leave the optimisation to the compiler, instead it's full of assembly parts, each written for different architecture, but with the same interface, doing the same thing. I'm sure the reason why the creators choose assembly over C was because they're experienced programmers, and they could do a much better job than any compiler optimizer. (Frankly most of their optimisations are parallel computing related, something that compilers are particularly not really good at yet, but they're improving significantly.)
A "superior" algorithm may be excellent in general but sub-optimal for the typical data in an application.
True. The consent here is that an algorithm that runs in constant time is always better than the one which depends on the input. But you're perfectly right that the typical input data must be taken into account, another thing I believe a compiler can't do, but a programmer can. (Assuming he is an experienced one, not an average, of course :-) ) For example I would never ever use a logarithmic search (which requires sorted input) in conjuction with quick sort, when I know there's only ten unsorted input items tops. Really good point!
Your suggested fibo equation is certainly not O(1).
It certainly is, as you can calculate sqrt(5) in advance with any arbitrary precision, and you can use that simple constant instead in F(). Similairly there's an algorithm for calculating the power of two which is O(n) sure, but you can use a bitshift instead which is O(1). Or you could use a look-up table, which is again, O(1). Here the point is, you can split that algorithm into smaller parts, and do exactly the same asymptotic optimisation with every parts individually. At the end you'll have many smaller parts, each being O(1), and that will also mean that their overall product, the whole resulting algorithm is O(1) too. I admit asymptotic analysis is not the easiest part of computer science, I've studied it for more semesters, and it was only years later when I started to understand what I've learned.

Cheers,
bzt

jahboater
Posts: 4770
Joined: Wed Feb 04, 2015 6:38 pm

Re: Spider-OS a new operating system

Sat Feb 09, 2019 11:38 pm

bzt wrote:
Sat Feb 09, 2019 7:06 pm
Clang compiles this entire function to one single instruction (on x86).
Again, very impressive, non the less a special case. Btw replacing "for(i=0;i<sizeof(a);i++) a[ i ]=0;" with a single memset() call is similar, a pre-programmed optimisation, not an universal solution.
No, its not using bsf, its cleverer than that.

Code: Select all

int CountSetBits( int a )
{
  int count = 0;
  while( a )
  {
    ++count;
    a &= (a - 1);
  }
  return count;
}
The whole function is replaced with the "popcnt" (population count) instruction.
Perhaps the compiler has recognized Brian Kernighans bit counting algorithm, I don't know.
That algorithm is well known so maybe the compiler is replacing common idioms - in the same way that "n >> count | n << (64 - count)" gets converted to a "ror" instruction.

Take a closer look at the second example though (count_t_letters), that is not a well known idiom, it is definitely not a special case, and there is no way the compiler devs have seen it before, yet the compiler has worked out what the code really does, applied it to the argument given, and replaced it with a constant.

By the way, Intel hardware can now do AES encryption and decryption.

I cant see compilers using this Intel cryptography instruction ...

GF2P8AFFINEINVQB — Galois Field Affine Transformation Inverse

Aran
Posts: 36
Joined: Fri Jan 25, 2019 5:55 pm

Re: Spider-OS a new operating system

Mon Feb 11, 2019 6:37 pm

Hi everyone, I posted an article about my window manager. I explain how it works : here

User avatar
Gavinmc42
Posts: 3925
Joined: Wed Aug 28, 2013 3:31 am

Re: Spider-OS a new operating system

Tue Feb 12, 2019 12:21 am

Some of this can be done in hardware, if I have understood the VC4 manual enough.

pik33 understands some of the limitations better than me..
https://ultibo.org/forum/viewtopic.php? ... ng+manager

Think of the Desktop as a 3D scene those objects further away behind other objects don't get rendered.
When I was messing about with OpenVG, the order of drawing was important.
Last drawn goes over the top of first drawn, I had no way of benchmarking or checking how this worked.
But there is hardware that does not render those pixels behind others.

The whole point of the VC4 hardware was to save the CPU lots of rendering by software.
This is less of an issue now with the 3B+ with 4 cores and NEON.

Linux is a UNIX like OS with X11 added to it and we still suffer some of the addedness ;)
Will Spider be an OS without a GUI or a GUI based OS?
There have been a few high GUI OS's, but will it be 3D or 2D, software or hardware based UI?
You could spend a lot of time on just the GUI, many have done this and failed.
When they failed they gernally went on a succeeded once the hardware and clock speeds got fast enough.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Aran
Posts: 36
Joined: Fri Jan 25, 2019 5:55 pm

Re: Spider-OS a new operating system

Tue Feb 12, 2019 8:18 pm

I planned a graphical interface for Spider-OS ;-) But at first I focus on 2D. For now it works with Neon, but I think add the copy via DMA channels, and multicore.
I would like to use the hardware acceleration of Videocore, but I can not understand how pixel depth works.
I can program the VC4 via the binning and render threads, and draw a triangle for example. Then impossible to draw a triangle behind the first, it is always in front. On what parameters should we play ? vertex, fragment shader ? Or use the control lists differently ?
A great site to program the chip directly : http://jaystation2.maisonikkoku.com/201 ... -triangle/.

The drawing order does not matter. It is rather the depth parameter that plays, as explained here:
https://paroj.github.io/gltut/Positioni ... ering.html

Image

User avatar
Gavinmc42
Posts: 3925
Joined: Wed Aug 28, 2013 3:31 am

Re: Spider-OS a new operating system

Wed Feb 13, 2019 4:12 am

Cool link, was looking for something that explains that Depth buffer stuff.
Order seems to matter in OpenVG 2D stuff as I made shapes and edited them by putting black shapes over them to cut off sections.
Need to learn about layers and masking, need to learn lots of thing :oops:
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: Spider-OS a new operating system

Wed Feb 13, 2019 4:54 am

Aran wrote:
Tue Feb 12, 2019 8:18 pm
I would like to use the hardware acceleration of Videocore, but I can not understand how pixel depth works.
There will be two triangles per square area you can either organize them slightly at different Z depth on the physical triangle vertexes or simply change the order of the indexed array as either will do what you want and it's a toss up which is easier.

Either way you have a block of GPU memory which is holding the actual array, you simply change the triangle entries and re-issue a render and it is done. The more complex part is text and bitmap which need to work thru clipping areas top/down. Usually you launch the windowing system with something like 100 windows capable, if you hit the limit you halt resize a new allocation to say 200 and transfer all the window data etc. You don't want to be reallocating all the time just occassionally. Now 100 windows is 400 vertexes and on index triangles you only have 32768 which is 8192 windows you want more than that you have to run 2 renders :-)

If you get stuck let me know I have a sample windows system I was messing around with which I am happy to publish.

Heater
Posts: 13683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Spider-OS a new operating system

Wed Feb 13, 2019 4:42 pm

...asymptotic analysis is not the easiest part of computer science, I've studied it for more semesters, and it was only years later when I started to understand what I've learned.
Certainly the asymptotic analysis of algorithms can be tricky. Perhaps when you have thought about it for a few more years you will realize why your fibo equation, Binet's formula, is not constant in run time, O(1).

I'll be generous and agree to storing root 5 and perhaps 1 +/- root 5 as well.

That leaves us with something that looks like:

F(n) = A^n - B^n) / (2^n * C)

I'm sure you will not say that calculating x^n or 2^n or doing that divide and multiply will take the same time no matter if n is ten, a hundred, a million, a billion, etc.

If you really think Binet's formula is constant time please do enter the fibo(4784969) challenge using it. Here: https://www.raspberrypi.org/forums/view ... 3#p1394452

The rules, such as they are, are simple: Show code, in the language of your choice, capable of calculating Fibonacci numbers up to at least one million digits, no libraries or language features outside the language standard allowed.

Actually, googling around, it seems there are all kind of estimates of asymptotic complexity of Binet's formula, none of them O(1):

O (log(n)): ycombinator : https://news.ycombinator.com/item?id=12872171

O ((n + log n)^2) : HollowayJames https://ir.library.oregonstate.edu/conc ... /t435gg51w

O(log n) : Martin Hock : http://pages.cs.wisc.edu/~mhock/SSL/fibcalc.pdf

O(n^k) : https://www.quora.com/How-would-I-analy ... fibbonacci

It is indeed tricky. First thing is to define what you mean by "operation". Just because there is a "*" in the formula does not mean that multiply always takes the same time. Clearly it cannot as the operands get bigger.
Memory in C++ is a leaky abstraction .

bzt
Posts: 393
Joined: Sat Oct 14, 2017 9:57 pm

Re: Spider-OS a new operating system

Fri Feb 15, 2019 3:09 pm

Heater wrote:
Wed Feb 13, 2019 4:42 pm
I'm sure you will not say that calculating x^n or 2^n or doing that divide and multiply will take the same time no matter if n is ten, a hundred, a million, a billion, etc.
Heater wrote:Just because there is a "*" in the formula does not mean that multiply always takes the same time. Clearly it cannot as the operands get bigger.
Both x^n and 2^n "algorithms" are basic enough to have hardware implementation. Sure, it may take different CPU cycles depending on the input, but you can always give an upper limit approximation on for how long a certain instruction could take. Not to mention for drastically reduce the required number of instructions it is very important to know how much memory can those instructions handle at once; for example in x86 AVX registers are 512 bits wide. The fewer instruction you use, the more precise your approximation becames.

2^n is special, because no matter how big n is, you can clearly give O(1) for that: calculating the index of the integer array takes fix time (n>>4), and shifting a bit in that particular integer also takes fixed CPU cycles regardless how big n & 63 is.

Which leads us to the point, where all subparts are O(1), expect for x^n for which we can give an upper limit approximation.

Cheers,
bzt

Heater
Posts: 13683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Spider-OS a new operating system

Fri Feb 15, 2019 7:55 pm

bzt,
Both x^n and 2^n "algorithms" are basic enough to have hardware implementation. Sure, it may take different CPU cycles depending on the input, but you can always give an upper limit approximation on for how long a certain instruction could take.
Yes, indeed...

Provided the "n" and the results fit within the confines of the 32 bit, 64 bit, whatever, confines of the computer you are running on.

I think the point that you are missing here is that fibo(4784969) is a very big number. In fact it is the first number of the Fibonacci sequence to have one million decimal digits as it's result.

There is no way your constant time machine instructions can handle that.

As I said before, if you really believe that you can calculate fibo(4784969) in the same time as fibo(10) and such you are welcome to demonstrate your code in the fibo(4784969) challenge (See link above).

I suspect that if you can do that you will be eligible for a Turing Award.

Cheers,
heater.
Memory in C++ is a leaky abstraction .

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: Spider-OS a new operating system

Mon Feb 18, 2019 3:35 pm

For you Aran I did the quick sample to show moving window areas around using the VC4 as I discussed above it doesn't even have the MMU on yet and it is a lot faster than your fastest code :-).

The movement code is a bit jittery I should just do a random projection to screen limit (rather than multiple micro random moves causing the jitter), something I will fix as I add title text and borders onto the window areas.

https://github.com/LdB-ECM/Raspberry-Pi ... _GUI_STEP1

As usual if you want to test throw the files from this directory onto SD card
https://github.com/LdB-ECM/Raspberry-Pi ... P1/DiskImg

Aran
Posts: 36
Joined: Fri Jan 25, 2019 5:55 pm

Re: Spider-OS a new operating system

Mon Feb 18, 2019 9:48 pm

Thank you LdB. Nice code, very readable. I would almost be jealous ;-)
The demo is a good illustration of the concept.
If I understand everything, you always display the rectangles in the same order. So despite moving on the screen, the rectangles are always superimposed in the same way. That is, the last displayed rectangle is over the other.

For my part, I finally managed to modify the fragment shader code to take into account the buffer Z. So regardless of the order of display of the rectangles, this is the depth Z that plays. I wanted to publish an article, but you were faster than me ;) . I think I'll do it by the end of the week.

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: Spider-OS a new operating system

Mon Feb 18, 2019 11:56 pm

Aran wrote:
Mon Feb 18, 2019 9:48 pm
If I understand everything, you always display the rectangles in the same order. So despite moving on the screen, the rectangles are always superimposed in the same way. That is, the last displayed rectangle is over the other.
Correct

We are thinking alike, that was next step up for me change the 1/Z field for Z window order.

The step after that was to add a wireframe shader to be able to make borders on the window area.
Not seen anyone do this baremetal so it is a section I will have to write from scratch.

Final step was to render the text from the bitmap font onto the window area as a texture (jay from your link actual shows this
http://jaystation2.maisonikkoku.com/201 ... texturing/ )

Aran
Posts: 36
Joined: Fri Jan 25, 2019 5:55 pm

Re: Spider-OS a new operating system

Fri Feb 22, 2019 8:44 pm

As promised, I posted on Videocore : a tutorial to draw three overlapped triangles, as on the screenshot of my Raspberry Pi below.

Image

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: Spider-OS a new operating system

Fri Feb 22, 2019 9:26 pm

Yep looks real good can't see any errors

Also after talking with Eric Anholt I found a really neat way to make up new shader assembler code
https://www.mesa3d.org/envvars.html
The setting VC4_DEBUG=qpu and it writes the VC4 assembler code of a shader out for you :-)

That was actually my biggest problem with going further with the VC4 which is now solved so I will have some more complex samples in near future, but busy doing a new multicore scheduler sample write up now.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Spider-OS a new operating system

Sun Mar 03, 2019 3:10 pm

I am glad to see another person playing on the metal. I am also very happy to see someone new that likes assembly language, for some of us it is easier than C to code in assembly :) .

I look forward to seeing more of your progress.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 13683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Spider-OS a new operating system

Tue Mar 05, 2019 7:52 am

DavidS,
...for some of us it is easier than C to code in assembly
Don't forget you are still welcome to demonstrate that fact by submitting your assembler entry to the fibo(4784969) challenge: https://www.raspberrypi.org/forums/view ... 3#p1394452
Memory in C++ is a leaky abstraction .

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: Spider-OS a new operating system

Tue Mar 05, 2019 9:54 am

The basic code for F(4784969) is simple enough
https://github.com/ZiCog/fibo_4784969/b ... C/fibo.bas
Run the time

Personally I can't take that whole thread seriously, basic has died out for very good reasons ... none of which has anything to do with Pi Foundation but more to do with lack of career opportunities :-)

Heater
Posts: 13683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Spider-OS a new operating system

Tue Mar 05, 2019 11:41 am

LdB,
The basic code for F(4784969) is simple enough
I know, I put it there, it's my repo.

Actually I would say that whilst the BASIC code is simple enough, it's just BASIC code after all, there is nothing simple about the whole thing. One has to know the algorithm for quickly calculating fibo() which most people do not and requires some maths chops. One has to know how an algorithm for quickly multiplying huge numbers, karatsuba in this case, again not widely known.

Certainly don't take that thread seriously, but I learned a lot from it. From the algorithms, to a peek at different languages, to optimizing C++.

All wee need is for someone, DavidS for example, to contribute an assembler version and then we have a way for people to compare and contrast different languages for a somewhat non-trivial application in terms of simplicity, readability, performance or whatever they like. All the way from assembler to Haskell.

Sorry, this is all way of topic for this thread.
Memory in C++ is a leaky abstraction .

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Spider-OS a new operating system

Tue Mar 05, 2019 4:42 pm

Heater wrote:
Tue Mar 05, 2019 11:41 am
LdB,
The basic code for F(4784969) is simple enough
I know, I put it there, it's my repo.

Actually I would say that whilst the BASIC code is simple enough, it's just BASIC code after all, there is nothing simple about the whole thing. One has to know the algorithm for quickly calculating fibo() which most people do not and requires some maths chops. One has to know how an algorithm for quickly multiplying huge numbers, karatsuba in this case, again not widely known.

Certainly don't take that thread seriously, but I learned a lot from it. From the algorithms, to a peek at different languages, to optimizing C++.

All wee need is for someone, DavidS for example, to contribute an assembler version and then we have a way for people to compare and contrast different languages for a somewhat non-trivial application in terms of simplicity, readability, performance or whatever they like. All the way from assembler to Haskell.

Sorry, this is all way of topic for this thread.
Yes it is off topic. I will look through my play directories to attempt to find the assembly version, I thought I already posted that in the correct thread, though get a bit confused with being unable to login for a month or a little more.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Spider-OS a new operating system

Tue Mar 05, 2019 4:44 pm

I hope that good progress happens with Spider-OS. It also seems like he is getting some good baremetal resources together in one place, nice to see.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Return to “Bare metal, Assembly language”