Go to advanced search

by eman
Fri Jun 20, 2014 5:50 pm
Forum: Bare metal, Assembly language
Topic: GEMM example on the QPU
Replies: 8
Views: 7273

Re: GEMM example on the QPU

I have to admit I haven't had a chance to look at this code yet (though I intend to) but to comment on the locking in the SHA-256 code, I believe the final version has all 12 QPU threads competing for the same 16 rows of VPM space (which is all it uses) so it needs the mutex. I was worried about con...
by eman
Tue Jun 10, 2014 5:35 am
Forum: Bare metal, Assembly language
Topic: Deep learning neural networks on the QPUs
Replies: 4
Views: 5506

Re: Deep learning neural networks on the QPUs

I'll third the request for the code, if possible. I'm a little familiar with deep learning and neural networks and the matrix-matrix multiply (sgemm) is usually the most expensive part but it's also highly reusable. It makes sense to write an optimized QPU sgemm routine (somewhere way down on my TOD...
by eman
Mon Jun 09, 2014 7:43 pm
Forum: C/C++
Topic: SHA-256 implementation on QPUs
Replies: 17
Views: 22495

Re: SHA-256 implementation on QPUs

Nice!

I hope you don't mind if I integrate some of your changes back into my assembler? They look like useful improvements.
by eman
Tue May 20, 2014 6:15 am
Forum: C/C++
Topic: SHA-256 implementation on QPUs
Replies: 17
Views: 22495

Re: SHA-256 implementation on QPUs

Thanks. I will take a look. The mailbox interface is pretty opaque so I'm using very similar code to the GPU FFT sample in /opt/vc. (I actually link to that mailbox.c file in the Makefile. It's a bit of a hack but I'm pretty sure that's installed on every system). The memory is allocated cached (at ...
by eman
Sat May 17, 2014 4:13 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 12507

Re: GPU Processing API

Yeah. If it makes it easier, I would consider exposing the 16-wide vector as the primitive type and let the programmer worry about handling scalar code. To get good performance out of it, the user is going to have to understand how it works and restructure their algorithm. Just having the compiler d...
by eman
Sat May 17, 2014 4:04 pm
Forum: C/C++
Topic: SHA-256 implementation on QPUs
Replies: 17
Views: 22495

Re: SHA-256 implementation on QPUs

Thanks. Yeah, I have seen that a few times where the GPU will get in some state where it either gives garbage or hangs but usually only with more complicated programs (for example, when playing with the synchronization operations, it's easier to make it hang for the next program). I couldn't find an...
by eman
Thu May 15, 2014 7:12 pm
Forum: Advanced users
Topic: LLVM backend for QPU development
Replies: 29
Views: 12507

Re: GPU Processing API

An LLVM back-end would be pretty cool and pretty interesting. I was thinking about taking a crack at that but I haven't gotten much past reading the docs and cloning the bare minimum from one of the existing back-ends. I honestly didn't think the QPUs were that bad. It has about the functionality I ...
by eman
Thu May 15, 2014 6:37 pm
Forum: C/C++
Topic: SHA-256 implementation on QPUs
Replies: 17
Views: 22495

Re: SHA-256 implementation on QPUs

Good old trial and error ;-) Like the blog posts describe, I built a reference implementation first and then I could check at every stage as I built it out. I first tried using the VPM as a queue for the data vectors and found that as soon as I unrolled the loops the performance dropped which meant ...
by eman
Thu May 15, 2014 2:49 am
Forum: C/C++
Topic: SHA-256 implementation on QPUs
Replies: 17
Views: 22495

SHA-256 implementation on QPUs

For anyone interested, I've written a parallel SHA-256 implementation for the QPU: https://github.com/elorimer/rpi-playground It does about 3.1 Mh/s (single-block hash) at about 93% efficiency (IPC) which makes it 14.6x faster than the CPU reference implementation. (Which is probably saying at least...

Go to advanced search