Go to advanced search

by teh_orph
Thu Nov 19, 2015 2:51 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: LLVM backend for QPU development

Yo. I completed on my house at about the same time as the last post I placed on here...I have had very little free time since then! The fun world of DIY.
I would like to pick up this project again, as I made very good success but it's really a full time job and tbh very few people were interested.
by teh_orph
Fri Jun 20, 2014 9:44 pm
Forum: Bare metal, Assembly language
Topic: GEMM example on the QPU
Replies: 8
Views: 7067

Re: GEMM example on the QPU

Yes the mutexes are required because it appears you use the same VPM rows on all QPUs. No, the VPM rows are chosen by the QPU index (index * 8). I'd need mutexes around *all* the calculations if I really were overwriting the same rows, rather than just around the fetch kickoff. It's worth taking a ...
by teh_orph
Sat Jun 14, 2014 1:22 pm
Forum: Bare metal, Assembly language
Topic: GEMM example on the QPU
Replies: 8
Views: 7067

Re: GEMM example on the QPU

Yes the mutexes are required because it appears you use the same VPM rows on all QPUs.
by teh_orph
Sat Jun 14, 2014 9:45 am
Forum: Scratch
Topic: beta scratch performance
Replies: 21
Views: 3706

Re: beta scratch performance

I'm impressed. In those last two results, is the pi really faster than PC? What's going on there?
by teh_orph
Fri Jun 13, 2014 9:47 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: LLVM backend for QPU development

For those who are interested in following this I have now added, - basic 32-bit fp arithmetic support - no conditional operations, they can't be used as function arguments (pointers are fine though) and no sqrt/divide etc - 2/4/8/16-way vector 32-bit fp arithmetic support - exposure of embedded C's ...
by teh_orph
Mon Jun 09, 2014 10:02 pm
Forum: Bare metal, Assembly language
Topic: Deep learning neural networks on the QPUs
Replies: 4
Views: 5348

Re: Deep learning neural networks on the QPUs

Hi, I'd be interesting in seeing the QPU asm to perhaps look at your mutex problem...but I can't find it! Only the mods to the assembler.
Without seeing the code, my first guess would be sharing the same VPM address amongst all the QPUs?
by teh_orph
Sun Jun 08, 2014 8:33 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: LLVM backend for QPU development

An update: the code generator has improved quite a bit since my last post. - rewrite of conditional braches and conditional stores - Fib now works in O3 - rework of references to symbols (I basically started with the MIPS backend which uses hi(symbol) + lo(symbol)) - removal of (MIPS) big endian tar...
by teh_orph
Sun Jun 01, 2014 9:40 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

Cheers James for the help :) How about we start from my first post and go from there? http://www.raspberrypi.org/forums/viewtopic.php?f=33&t=6188&start=100#p550710 Let's go for something like "LLVM backend for QPU development". I dunno though which sub-forum to put it in though. - it's not graphics ...
by teh_orph
Sat May 31, 2014 11:41 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

Today's new gotcha - although the spec sheet says branches through a register use lane zero's value as the destination, this does not appear to be true! Hours wasted. (I'm still not sure which lane is required, but broadcasting lane zero to all lanes equals success) In better news, the Fibonacci ser...
by teh_orph
Fri May 30, 2014 2:43 pm
Forum: C/C++
Topic: Memory management within a malloc'd block.
Replies: 1
Views: 684

Re: Memory management within a malloc'd block.

I think this is the classic one; I've used it many times. AFAIK some OS allocators are built upon it. It's one c file, and it includes mega comments in the header telling you how to use it. http://g.oswego.edu/dl/html/malloc.html Customise it the way you want by settings certain includes and off you...
by teh_orph
Fri May 30, 2014 9:16 am
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

Ah I didn't know his middle name had an H in it too!
I've read some of his QPU stuff but see no stuff on accumulator 5 and broadcast. Might you have a link or quote?
by teh_orph
Fri May 30, 2014 8:55 am
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

What's HHH?
by teh_orph
Thu May 29, 2014 4:05 pm
Forum: C/C++
Topic: Clang++ not compiling C++ code, undefined operator new
Replies: 6
Views: 5042

Re: Clang++ not compiling C++ code, undefined operator new

Ah, a handy explanation! Does that apply to just libraries or what if you have 'loose' object files too?
(eg I've always found that loose object files need to have all their undef symbols connected up, but if that was within an unreferenced object file within a library then there would be no problem)
by teh_orph
Thu May 29, 2014 2:43 pm
Forum: C/C++
Topic: Clang++ not compiling C++ code, undefined operator new
Replies: 6
Views: 5042

Re: Clang++ not compiling C++ code, undefined operator new

Aren't linker objects resolved right-to-left? If you put the -lstdc++ on the right do you get the same thing?
by teh_orph
Thu May 29, 2014 8:44 am
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

I've not checked it in anywhere - originally I started coding from an llvm source tar.gz but I think I need to branch their git repo, but I don't know how that works as I don't think they use github... Anyway I've not got much motivation for that at the moment as my machine at work (where I sometime...
by teh_orph
Tue May 27, 2014 5:48 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

Quite excitingly it worked first time. Of course every single thing I try needs to have its assembly looked over carefully :-) For those who are interested, extern "C" { void entry(void) { int *A = (int *)0x0888c000; for (int count = 0; count < 16; count++) A[count] = count - 15; } } goes through Cl...
by teh_orph
Tue May 27, 2014 6:49 am
Forum: Bare metal, Assembly language
Topic: VideoCore IV QPUs
Replies: 6
Views: 2428

Re: VideoCore IV QPUs

I would guess you mean the bit about "4-way multiplexed over four successive cycles". I suppose it's this http://en.wikipedia.org/wiki/Time-division_multiplexing There are only really four hardware units, but they pretend to be 16 by re-running the same instruction four times. The inputs and output ...
by teh_orph
Sun May 25, 2014 11:11 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

Just an update to say this is still on-going. Previously I was limiting myself to the accumulators for nearly all operations, and only using the two register files for my procedure calling system. Also try as I might I couldn't get LLVM to insert branch delay slots (even taking the MIPS code for the...
by teh_orph
Thu May 22, 2014 3:16 pm
Forum: Bare metal, Assembly language
Topic: VideoCore IV QPUs
Replies: 6
Views: 2428

Re: VideoCore IV QPUs

It logically looks like a 16-way SIMD processor as you can issue only one new instruction every four cycles. Even though the latency of each operation is four clock cycles, as you can only do something new every four when you're scheduling your code it looks like it has one-cycle throughput. (though...
by teh_orph
Tue May 20, 2014 9:49 am
Forum: C/C++
Topic: SHA-256 implementation on QPUs
Replies: 17
Views: 21464

Re: SHA-256 implementation on QPUs

Yeah it's the same thing. What I discovered that when you disable it, the GPU completely disappears from the MMIO interface and when you turn it back on all sins are forgiven :)
by teh_orph
Sun May 18, 2014 8:08 pm
Forum: C/C++
Topic: SHA-256 implementation on QPUs
Replies: 17
Views: 21464

Re: SHA-256 implementation on QPUs

What's the caching like on the memory chosen to hold the program and working set? Btw I found a way of apparently resetting the GPU, by power cycling it. It did the trick for me for the rendering front-end at least (and all semaphore state). Have a look at QpuEnable: https://github.com/simonjhall/dm...
by teh_orph
Fri May 16, 2014 6:58 am
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

Yeah I def agree with you there. A target-specific C would be a winner. I think the biggest problems are the load/store system not supporting vector loads/stores and the operations to manipulate the execution mask are not being complex enough if you want to treat it like a 16 scalar processors. I th...
by teh_orph
Thu May 15, 2014 9:41 am
Forum: General discussion
Topic: build videocore driver: Video :)
Replies: 18
Views: 2588

Re: build videocore driver: fail :(

ric96 wrote:But how (or even can ) you get the 100+ fps as shown in the blog post http://www.raspberrypi.org/quake-iii-bo ... -a-winner/
Yes this one image is responsible for so much email in my inbox :shock:
by teh_orph
Thu May 15, 2014 9:32 am
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

but very few people do this as the instruction set is a nightmare +1 The whole design is pretty whack tbh. I can just imagine pitching the architecture design. I'd like to compare it to previous designs, and see where the next version goes. There must be some enormous die space advantage by structu...
by teh_orph
Thu May 15, 2014 7:03 am
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 11795

Re: GPU Processing API

That's good work - I have forwarded it to someone who mentioned this sort of thing to me the other day... Thanks. But lots of work still to go! Can I ask: when you program these devices, do you do it in assembly or in some higher-level language? who is actually after a high-level GPU processing API...

Go to advanced search