petewarden
Posts: 9
Joined: Wed May 14, 2014 3:50 pm

Deep learning neural networks on the QPUs

Mon Jun 09, 2014 7:02 pm

I've been having some fun with the QPUs over the last few weeks, and I've just posted the results:
http://petewarden.com/2014/06/09/deep-l ... pberry-pi/

They've allowed me to boost my speed from around 20 seconds using Atlas for the numerics, to five seconds with a stock Pi, and three seconds with GPU overclocking! I've very grateful to eman's SHA-256 example code (http://www.raspberrypi.org/forums/viewt ... 1&p=550759), and Herman's hard work pulling together documentation. I ended up having to extend eman's original assembler to fix a few bugs and handle some additional instructions (eg unpacking, horizontal shifts, multi-register immediates), so I've put the updated code up here, along with some helper m4 macros:
https://github.com/jetpacapp/qpu-asm

I learned a few things from the process. My use-case was heavily dependent on the VPM memory, and unfortunately it appears that all DMA load and store operations have to be guarded with a mutex. After a lot of experimentation, I found the guard only had to be around the actual kick-off instruction, but any thoughts on why this is needed or workaround ideas would be very welcome since it's a big performance hit.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Deep learning neural networks on the QPUs

Mon Jun 09, 2014 10:02 pm

Hi, I'd be interesting in seeing the QPU asm to perhaps look at your mutex problem...but I can't find it! Only the mods to the assembler.
Without seeing the code, my first guess would be sharing the same VPM address amongst all the QPUs?

eupton
Forum Moderator
Forum Moderator
Posts: 56
Joined: Sun Apr 15, 2012 7:28 pm

Re: Deep learning neural networks on the QPUs

Mon Jun 09, 2014 10:44 pm

Also interested in seeing the code for this. As Simon says, I suspect you're using overlapping VPM areas from different QPUs.
Btw, one golden rule (which you may already have figured out) is not to do reads from memory via the VPM/VDR - use the texture unit direct read mode instead for much higher bandwidth.

eman
Posts: 9
Joined: Wed Mar 19, 2014 10:23 pm

Re: Deep learning neural networks on the QPUs

Tue Jun 10, 2014 5:35 am

I'll third the request for the code, if possible. I'm a little familiar with deep learning and neural networks and the matrix-matrix multiply (sgemm) is usually the most expensive part but it's also highly reusable. It makes sense to write an optimized QPU sgemm routine (somewhere way down on my TODO list) and make it available to jump start other applications like this. It also seems like sgemm could map nicely to the QPUs where it can take full advantage of both pipes pretty straightforwardly.

Anyway, looks like a very interesting library.

pageauc
Posts: 224
Joined: Fri Jan 04, 2013 10:52 pm

Re: Deep learning neural networks on the QPUs

Wed Jul 08, 2015 4:52 pm

Here is a post re installing the deep learning DeepBeliefSDK on a raspberry pi RPI B+ or B2. The B uses GPU while the B2 uses the CPU's. Here is the link to the post if anyone is interested.
viewtopic.php?p=786040#p786040
GitHub - https://github.com/pageauc
YouTube - https://www.youtube.com/user/pageaucp

Return to “Bare metal, Assembly language”