GPGPU on the R-PI.


76 posts   Page 2 of 4   1, 2, 3, 4
by jbeale » Thu May 24, 2012 4:34 am
I think the GPU has floating point hardware: from http://www.raspberrypi.org/faqs
The GPU is capable of 1Gpixel/s, 1.5Gtexel/s or 24 GFLOPs of general purpose compute and features a bunch of texture filtering and DMA infrastructure.
User avatar
Posts: 2095
Joined: Tue Nov 22, 2011 11:51 pm
by jdobmeier » Thu May 24, 2012 4:50 am
Posts: 19
Joined: Sat Dec 10, 2011 4:49 pm
by shirro » Thu May 24, 2012 5:12 am
jdobmeier wrote:This is straight from JamesH: http://www.raspberrypi.org/phpBB3/viewtopic.php?p=14633#p14633


Wow, interesting thread. An integer vector lib would be really cool though.
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am
by jdobmeier » Thu May 24, 2012 5:28 am
Floats can be done just emulated with integer arithmetic, there are pretty good options out there for converting the question is whether the speed increase from the GPU isn't completely nullified in the process. My guess is there is some headroom left over, consider for n floats if it takes 4*n operations (wild guess) to convert so 8*n both ways, I thought jamesh said 16 cores on the GPU leaves (8n)/(16n) = 1/2 or 2X speedup. I'm trying to get those tutorials from gpgpu.org but new to GLSL. Have opencl experience so it makes some sense
Posts: 19
Joined: Sat Dec 10, 2011 4:49 pm
by lb » Thu May 24, 2012 1:16 pm
Huh? The GPU most definitely has native float support. In fact, the lack of proper integer support in GLSL ES is problematic. You can emulate integer-like behavior with floats, but it's so slow it becomes uninteresting.

But in any case, GPGPU on the Pi is generally not viable with the existing API support. And it wouldn't be a panacea either, even if we had great OpenCL support.
Posts: 193
Joined: Sat Jan 28, 2012 8:07 pm
by jdobmeier » Thu May 24, 2012 7:20 pm
lb are you possibly referring to the lack of fixed point real numbers on the iphone?
GLfixed: Fixed point numbers are a way of storing real numbers using integers. This was a common optimization in 3D systems used because most computer processors are much faster at doing math with integers than with floating-point variables. Because the iPhone has vector processors that OpenGL uses to do fast floating-point math, we will not be discussing fixed-point arithmetic or the GLfixed datatype.

http://iphonedevelopment.blogspot.com/2009/04/opengl-es-from-ground-up-part-1-basic.html
if not I would certainly be interested in where to find a reference to this missing functionality, or perhaps a test case to demonstrate it...
Posts: 19
Joined: Sat Dec 10, 2011 4:49 pm
by lb » Thu May 24, 2012 8:57 pm
No, I am referring to the fact that GLSL ES (the stripped down version of the OpenGL Shader Language) does not have proper support for integers. See paragraph 4.1.3 in the GLSL ES specification. Integers are only supported as a programming aid (in loops, for instance). They have very loosely defined semantics and precision, and many typical integer operations are not supported. Most OpenGL ES implementations simply map int to float.

Applications that rely on integers simply won't work on the GPU. Good examples for that are bitcoin mining, cryptography in general and compression.
Posts: 193
Joined: Sat Jan 28, 2012 8:07 pm
by jdobmeier » Thu May 24, 2012 9:04 pm
... there is no requirement that integers in the language map to an integer type in hardware. It is not expected that underlying hardware has full support for a wide range of integer operations. An OpenGL ES Shading Language implementation may convert integers to floats to operate on them.


this is hardly a close and shut case about what the capabilities of the broadcom chip actually are. Since I for one care little about portability to other chip designs I will continue to move forward.
Posts: 19
Joined: Sat Dec 10, 2011 4:49 pm
by lb » Thu May 24, 2012 10:19 pm
It doesn't matter what the GPU hardware is actually capable of. The GLSL ES restrictions are the same, no matter what hardware you have. Even if you can assume int maps to a native integer type with certain semantics, you'll find that there are no bitwise operators or modulo available in GLSL ES. Oh, and there's no unsigned int either.
Posts: 193
Joined: Sat Jan 28, 2012 8:07 pm
by jdobmeier » Thu May 24, 2012 10:47 pm
how is the standard restrictive? On the contrary, it seems if anything more flexible in that both native floats and ints are not even required as long as one or the other is present. If there is hardware support for both then all the better performance wise.

As far as the illegal operations of 5.1 in the standard, yeah the lack of hardware modulo is going to hurt the crypto guys for sure but again that is no guarantee there is no support, it is just not required by the implementation in order to conform to the standard. Really my application does not need the bitwise operators or modulo arithmetic anyway.
Posts: 19
Joined: Sat Dec 10, 2011 4:49 pm
by lb » Thu May 24, 2012 11:35 pm
jdobmeier wrote:how is the standard restrictive? On the contrary, it seems if anything more flexible in that both native floats and ints are not even required as long as one or the other is present. If there is hardware support for both then all the better performance wise.


Uh... so basically you're saying, lots of undefined and platform-specific behavior is *good*? That's crazy.

As far as the illegal operations of 5.1 in the standard, yeah the lack of hardware modulo is going to hurt the crypto guys for sure but again that is no guarantee there is no support, it is just not required by the implementation in order to conform to the standard. Really my application does not need the bitwise operators or modulo arithmetic anyway.


Most OpenGL ES implementations stick to the standard as strictly as possible, and implement few or no extras. The implementations I know all require constant loop expressions, for example. I say it's quite unlikely the VideoCore IV OpenGL ES supports native integers, non-constant loop expressions or any extra operators in GLSL ES. Maybe some of the people from Broadcom can enlighten us...

Anyway, if GLSL ES is good enough for your application, that's fine. However, generally GPGPU is not something that is a viable to do with the Pi. It's too restrictive, much beyond just being inconvenient. And the GPU isn't that fast anyway.
Posts: 193
Joined: Sat Jan 28, 2012 8:07 pm
by naeger » Sun Jun 03, 2012 9:54 pm
Hi,

so as far as i understood the previous discussion, the only viable option to be able to use GPGPU on the Raspberry Pi would be to have Broadcom port OpenCL to this GPU. Is this correct?

Does anyone have connections to these guys? Any chance to have them discuss this issue with us? We could initiate a Kickstarter project to raise some money to make this possible!

Anyone with more info?

Greetings, Chris
Posts: 2
Joined: Sun Jun 03, 2012 9:49 pm
by jdobmeier » Wed Jun 06, 2012 4:47 pm
apparently GPGPU is a controversial term and some purists believe it is out of reach for all mobile devices so I propose to define what I am doing as PiGPU, which is not to be confused with GPGPU proper. However, for the purposes of this forum I would like to stipulate that whenever GPGPU is mentioned it is understood to mean PiGPU.

That said, I have ported tutorial 0 from http://gpgpu.org/developer/legacy-gpgpu-graphics-apis to the RPi. I started with the code examples from the OpenGL ES 2.0 Programming Guide which were kindly provided by Ben O'Steen's blog: http://benosteen.wordpress.com/ The updated code can be found here: http://pastebin.com/mKW0YbE0 for the source code and http://pastebin.com/m2XWQmWD for the new makefile. Also you could compile like this:
Code: Select all
gcc -DRPI_NO_X ./Common/esShader.c ./Common/esTransform.c ./Common/esShapes.c ./Common/esUtil.c ./Chapter_9/helloPiGPU/helloPiGPU_GLESSL.c -o ./Chapter_9/helloPiGPU/helloPiGPU_GLESSL -I./Common -I/opt/vc/include -lGLESv2 -lEGL -lm -lbcm_host -L/opt/vc/lib
if in Raspi directory.

I just added a directory /Raspi/Chapter_9/helloPiGPU where I put the source. Makefile goes in Raspi directory. btw, I'm using Raspian.
Posts: 19
Joined: Sat Dec 10, 2011 4:49 pm
by rodonn » Sun Jun 17, 2012 11:03 am
Something I'm not following, and I'm possibly being slow here.

OpenGL ES is effectively a standard API to embedded GPU

The RPi GPU supports OpennGL ES, and OpenCL hooks into the Open GL and Open GL ES.

Logically, if you're having to do a thousand man hours patching the port to get it to run, then one of the things in the loop is NOT following published standard. Otherwise, is should be little more than a cross compile, surely?

I admit, I'm not a low level hardware chap, and it's well over a decade since I touched API work (inter platform operability - PC to AS400 DB2 to Lotus Notes and SmartSuite,) but I always assumed that it sort of worked the same way...
I
Posts: 12
Joined: Mon May 28, 2012 4:00 pm
by dom » Sun Jun 17, 2012 12:12 pm
rodonn wrote:The RPi GPU supports OpennGL ES, and OpenCL hooks into the Open GL and Open GL ES.


That's your misunderstanding. You can't implement openCL on top of OpenGL.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4059
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by KeithSloan » Mon Jan 21, 2013 9:03 am
Could Broadcom not give creating OpenCL for the GPU to some bright summer students on some internship. Or if its a bigger project than that, get a University involved and have people sign the appropriate None Disclosure Agreements. Surely would make a good PhD research project at least.

I would love to run World Community Grid stuff on a Pi. Low cost, low energy and doing good for humanity.
Posts: 174
Joined: Tue Dec 27, 2011 9:09 pm
by jamesh » Mon Jan 21, 2013 9:37 am
KeithSloan wrote:Could Broadcom not give creating OpenCL for the GPU to some bright summer students on some internship. Or if its a bigger project than that, get a University involved and have people sign the appropriate None Disclosure Agreements. Surely would make a good PhD research project at least.

I would love to run World Community Grid stuff on a Pi. Low cost, low energy and doing good for humanity.


I'm not sure exactly, but to implement OpenCL on a device (any device) is quite a few man years of work, unless you have code you can base a port off, and even then is a big job.
Volunteer at the Raspberry Pi Foundation, helper at Picademy September and October 2014.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 12113
Joined: Sat Jul 30, 2011 7:41 pm
by KeithSloan » Mon Jan 21, 2013 3:34 pm
I'm not sure exactly, but to implement OpenCL on a device (any device) is quite a few man years of work, unless you have code you can base a port off, and even then is a big job.


So would would require say a University to get involved, rather than just a Summer project

I see they have an internship that would like OpenCL experience https://sjobs.brassring.com/1033/ASP/TG ... l.asp?SID=^XTOdWZVHG6t0DPfq26RpdS3iornjpCUau2MIBU2hHHqG7VxhZxx1F2e8i/xf63fX&jobId=924836&type=search&JobReqLang=1&recordstart=1&JobSiteId=5482&JobSiteInfo=924836_5482&GQId=0

I would have thought they could ask a local university research department to see if they were interested. Isn't that the sort of thing that industrial cooperation of universities is supposed to do.
Posts: 174
Joined: Tue Dec 27, 2011 9:09 pm
by jojopi » Tue Jan 22, 2013 12:52 am
This would be a lot of work for little more than the learning experience. As good as the VideoCoreIV is for its size, it will not beat a desktop PC graphics card for GPGPU on flops/dollar. It likely will not beat a laptop GPU chip on flops/watt either.

If you want to build an efficient compute platform, you do not start by designing an embedded chip with adequate HD/3D performance, then use lots of them. Performance is mostly limited by the number of gates/transistors. So you must design a GPGPU chip near the chip size where the cost per transistor is the lowest.
User avatar
Posts: 2122
Joined: Tue Oct 11, 2011 8:38 pm
by respawnd » Tue Jan 22, 2013 2:00 am
lb,
Applications that rely on integers simply won't work on the GPU. Good examples for that are bitcoin mining, cryptography in general and compression.

For crypto, its not the floating point math that makes GPUs attractive, its the fact that you have a large number of cores. Check out http://hashcat.net/oclhashcat-lite/ for a crypto cracking lib that uses OCL to take advantage of GPU cores with impressive results.

With oclHashcat I can distribute the processing across many GPU cards in the same box. I can see a low cost approach with a single 19" tray full of headless RPi cranking out hashes against a distributed password database. Hmmm. Has anyone considered a RPi based on NVIDIA Tegra instead of Broadcom? Could be a case the Raspberry Mu.
Posts: 1
Joined: Tue Jan 22, 2013 1:30 am
by KeithSloan » Tue Jan 22, 2013 4:24 am
it will not beat a desktop PC graphics card for GPGPU on flops/dollar. It likely will not beat a laptop GPU chip on flops/watt either.


But my x86 dual core does not have a GPU that will run World Community Grid. I know some PC's do but a lot do not. So I don't see it competing with a modern graphics GPU just has to make worthwhile running BOINC stuff on a Pi .
Posts: 174
Joined: Tue Dec 27, 2011 9:09 pm
by jamesh » Tue Jan 22, 2013 10:30 am
jojopi wrote:This would be a lot of work for little more than the learning experience. As good as the VideoCoreIV is for its size, it will not beat a desktop PC graphics card for GPGPU on flops/dollar. It likely will not beat a laptop GPU chip on flops/watt either.

If you want to build an efficient compute platform, you do not start by designing an embedded chip with adequate HD/3D performance, then use lots of them. Performance is mostly limited by the number of gates/transistors. So you must design a GPGPU chip near the chip size where the cost per transistor is the lowest.


I agree about flops/dollar. Not so sure on flop/watt.

How many flops/watt do you get with a current spec desktop graphics card? The Raspi has about 24Gflops total performance (not all at same time), so let's say half that accessible (made up number) at 500mA on 5v = 2.5W = about 4.8GFlops/watt. (let me reiterate, made up numbers just to get a rough idea)
Volunteer at the Raspberry Pi Foundation, helper at Picademy September and October 2014.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 12113
Joined: Sat Jul 30, 2011 7:41 pm
by hermanhermitage » Tue Jan 22, 2013 12:13 pm
jamesh wrote:
jojopi wrote:This would be a lot of work for little more than the learning experience. As good as the VideoCoreIV is for its size, it will not beat a desktop PC graphics card for GPGPU on flops/dollar. It likely will not beat a laptop GPU chip on flops/watt either.

If you want to build an efficient compute platform, you do not start by designing an embedded chip with adequate HD/3D performance, then use lots of them. Performance is mostly limited by the number of gates/transistors. So you must design a GPGPU chip near the chip size where the cost per transistor is the lowest.


I agree about flops/dollar. Not so sure on flop/watt.

How many flops/watt do you get with a current spec desktop graphics card? The Raspi has about 24Gflops total performance (not all at same time), so let's say half that accessible (made up number) at 500mA on 5v = 2.5W = about 4.8GFlops/watt. (let me reiterate, made up numbers just to get a rough idea)


Green500 record is 2.5Gflops/watt (but in fairness you don't really have the memory bandwidth to offer 4.5Gflops/watt on a range of industry standard kernels).

I might also pipe in a subset of OpenCL can be done for a lower effort than people imagine. And as a counter point to jojopi, efficiency can be measured many ways - if a large number of kids (big and little) already have the Pi then a mini OpenCL implementation could be an efficient way of educating the next generation on SPMD.
Posts: 65
Joined: Sat Jul 07, 2012 11:21 pm
Location: Zero Page
by iso9660 » Wed Jan 23, 2013 2:06 pm
I think this Library https://github.com/BradLarson/GPUImage could be a good starting point to apply GPGPU image manipulation primitives. It is written in Objective C but the OpenGL ES code is exactly the same that should run in the Raspberry PI.
Posts: 25
Joined: Sun Sep 16, 2012 1:48 pm
by dasankir » Thu Jan 24, 2013 8:23 am
What about this, (sorry for the ignorance) does it mean anything regarding GPGPU?:

http://www.khronos.org/conformance/adop ... cts#opencl

Broadcom Corporation 2011-11-11 OpenGL_ES_2_0
BCM7346 (big endian) CPU: MIPS (big endian)

OS: Linux 2.6.37
API pipeline:
GL_VENDOR "Broadcom"
GL_RENDERER "VideoCore IV HW"
GL_VERSION "OpenGL ES 2.0"
GL_SHADING_LANGUAGE_VERSION "OpenGL ES GLSL ES 1.00"

Display: 1920x1080, 32bpp

http://www.khronos.org/conformance/adop ... cts#openvg

Broadcom Corporation 2011-04-03 OpenVG_1_1
CPU: VideoCore IV, OS:Threadx, Pipeline: Broadcom VideoCore IV HW/OpenVG 1.1, Display: 64x64 32bpp
Posts: 5
Joined: Thu Jan 24, 2013 8:20 am