doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Thu Nov 30, 2017 11:59 am

MrWhiter wrote:
Wed Nov 29, 2017 8:55 pm
Could you comment on how it compares to PoCL?
In my Master's thesis I included this comparison of the clpeak-results with PoCL (PoCL values were not actually run, but taken from here):
Performance.png
Performance.png (25.7 KiB) Viewed 4689 times
Left is the floating-point test (in GFLOPS), in the middle the memory-bandwidth and right the transfer bandwidth test (both in MB/s). The filled bars are the median results of all tests (e.g. for floating point the median of 1-, 2-, 3-, 4-, 8-, and 16-element vector tests) while the shaded bars give the maximum value (e.g. for floating-point test the result for 16-element vector)

So as you can see, VC4CL outperforms PoCL by far on simple arithmetic operations (floating-point test), but suffers from bad memory-throughput (memory-bandwidth test).
MrWhiter wrote:
Wed Nov 29, 2017 8:55 pm
Would it possible to merge this into PoCL and re-use code, or are the two just very different?
PoCL tries to run OpenCL on every platform by running it on the CPU, VC4CL runs OpenCL on one particular platform on the GPU. So they differ greatly.

User avatar
jcyr
Posts: 62
Joined: Sun Apr 23, 2017 1:31 pm
Location: Atlanta

Re: OpenCL on the VideoCore IV!

Fri Dec 01, 2017 5:10 am

This is interesting... glad I stumbled into it! Partial OpenCL eh? Wonder if it'd support something like ethminer in OpenCL mode?
If you want your child to get the best education possible, it is actually more important to get him assigned to a great teacher than to a great school. -- Bill Gates

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Fri Dec 01, 2017 11:25 am

jcyr wrote:
Fri Dec 01, 2017 5:10 am
Wonder if it'd support something like ethminer in OpenCL mode?
If you are referring to https://github.com/ethereum-mining/ethminer, it won't work, since its OpenCL kernel requires 64-bit data-types (ulong), which are not supported.

mic_s
Posts: 68
Joined: Sun Oct 26, 2014 4:15 pm

Re: OpenCL on the VideoCore IV!

Fri Dec 01, 2017 11:29 am

Partial OpenCL eh?

Don't blame doe300. Blame the (nowadays old) videocoreIV/QPUs-Design.

As he, eupton and many others pointed out bevore, one of the performance restrictions is memory bandwidth.

Options ??
(1) Upcoming videocoreV (16 bit floats, and other advantages .. e.g. it don't require CMA)
(2) Add a Co-processor with enough intern memory.

As a side-note to option 2:
These days Google is introducing a new Google (AIY) Design Kit for the pi zero. It is based on the new Intel/Movidius 2450. (basically12 VLIW cores , …, …. , 2 simple riscs and a clever intern memory bus, ...).
Here is the PCB with Pi0-formfactor (Target pricepoint : 45-55 USD) for rasperypi :
https://www.blog.google/topics/machine- ... vices-see/
.

User avatar
Gavinmc42
Posts: 2074
Joined: Wed Aug 28, 2013 3:31 am

Re: OpenCL on the VideoCore IV!

Fri Dec 01, 2017 1:40 pm

Does anyone else find this a little intrusive, first google is listening, now it wants to watch.
Besides my dog knocks at the back door, he has trained me to open it for him ;)

I had been wondering what Modivius was up to after being acquired, guess I don't have to pull apart a DJI Phantom 4 now ;)
Kind of spoils the fun doing everything for you.
Goes via CSI port, will that work baremetal or only with Raspbian driver?
ISP = In Stream processing, comes into the Zero via CSI?

Wonder if they will bring out a stereo version?
That compiler must be very interesting to split things up like that.
I'm wonder how the OpenCL one works now.
Connectors? one JTAG and one ? and a 6 way header
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Daniel Gessel
Posts: 21
Joined: Sun Dec 03, 2017 1:47 am

Re: OpenCL on the VideoCore IV!

Sun Dec 03, 2017 1:54 am

Awesome work! What’s the reason for pocl’s poor fp performance? Does the CPU emulate floating point? Do you know the theoretical limits for GPU CPU in terms of straight instruction throughput?

MrWhiter
Posts: 3
Joined: Sun Jun 11, 2017 5:14 pm

Re: OpenCL on the VideoCore IV!

Sun Dec 03, 2017 8:20 am

As far as I could find out, the rpi3 cpu runs at 500 MHz, has 4 cores and arm Neon instructions are 128-bit wide.
Theoretical flops would than be 500*4*4 = 8 GFlops for the cpu, versus 24 GFlops for the GPU.

The 500 Mhz comes from: https://en.wikipedia.org/wiki/Raspberry_Pi#Overclocking

Follow up question: The QPU shares memory with the CPU, it has DMA, so naively I would expect similar memory bandwidth.
Somehow the QPU is much slower, is that really a HW limitation, or are there still some improvements in SW possible?

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 20067
Joined: Sat Jul 30, 2011 7:41 pm

Re: OpenCL on the VideoCore IV!

Sun Dec 03, 2017 10:05 am

Gavinmc42 wrote:
Fri Dec 01, 2017 1:40 pm
Does anyone else find this a little intrusive, first google is listening, now it wants to watch.
Besides my dog knocks at the back door, he has trained me to open it for him ;)

I had been wondering what Modivius was up to after being acquired, guess I don't have to pull apart a DJI Phantom 4 now ;)
Kind of spoils the fun doing everything for you.
Goes via CSI port, will that work baremetal or only with Raspbian driver?
ISP = In Stream processing, comes into the Zero via CSI?

Wonder if they will bring out a stereo version?
That compiler must be very interesting to split things up like that.
I'm wonder how the OpenCL one works now.
Connectors? one JTAG and one ? and a 6 way header
Please try to keep on topic, and make posts vaguely understandable. Or have you posted this in the wrong thread?

ISP is Image System Pipeline btw.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Sun Dec 03, 2017 11:17 am

MrWhiter wrote:
Sun Dec 03, 2017 8:20 am
Follow up question: The QPU shares memory with the CPU, it has DMA, so naively I would expect similar memory bandwidth.
Somehow the QPU is much slower, is that really a HW limitation, or are there still some improvements in SW possible?
In the graph I posted, there are 2 benchmarks related to memory access:

The graph "Speicherbandbreite" (memory bandwidth) in the middle shows the memory throughput of the OpenCL device (e.g. the GPU) from/to its own memory. In the case of the VideoCore IV, the "device's own memory" is the RAM, but there is the VPM between the QPU (processing cores) and the memory. You are right, the memory is accessed via DMA, but the VPM is much slower at accessing the memory than the CPU is.

The third graph "Übertragungsrate" (transfer bandwidth) shows the bandwidth for accessing the OpenCL device-memory from the CPU. Here, both PoCL and VC4CL could theoretically achieve the same performance. Why don't they? Both implementations do basically the same (simple memcpy), but PoCL just seems to be more efficient at it ;)

Daniel Gessel
Posts: 21
Joined: Sun Dec 03, 2017 1:47 am

Re: OpenCL on the VideoCore IV!

Sun Dec 03, 2017 3:11 pm

MrWhiter wrote:
Sun Dec 03, 2017 8:20 am
As far as I could find out, the rpi3 cpu runs at 500 MHz, has 4 cores and arm Neon instructions are 128-bit wide.
Theoretical flops would than be 500*4*4 = 8 GFlops for the cpu, versus 24 GFlops for the GPU.
Beautiful; thanks! I believe the rpi3 is closer to 1 GHz, however, making it 16 GFlops for CPU. I bet the pocl benchmarks were using emulated fpu, which I think is the default for many compilers (I read this on another forum; I haven’t programmed an arm for 25 years, so this is all basically new to me! But super fun!)

So that’s a pocl project to work on.

Does VC4CL play well with OpenGL ES and video encoder software under Raspbian?
Do you want any help?
I’d try to get OpenCV running on it for robotics projects and might dig in enough to add/fix (I’m new to this too, so no promises).

Please let me know if you’d seriously be looking for a job in LLVM dev (in the US).

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Sun Dec 03, 2017 4:12 pm

Daniel Gessel wrote:
Sun Dec 03, 2017 3:11 pm
Does VC4CL play well with OpenGL ES and video encoder software under Raspbian?
When using the Mailbox to start OpenCL kernels, VC4CL should be able to run beside other libraries using the Mailbox-interface to reserve GPU-memory and start kernels. The global mutex in the mailbox-driver makes sure, only one "program" is run at the same time on the GPU. I haven't tested any combination of running VC4CL beside a graphical library though, since my development Raspberry Pi runs headless.
Daniel Gessel wrote:
Sun Dec 03, 2017 3:11 pm
Do you want any help?
I’d try to get OpenCV running on it for robotics projects and might dig in enough to add/fix (I’m new to this too, so no promises).
I'm happy about every helping hand, there is still so much to do!
About running OpenCV: As far as I have tested, OpenCV won't work (without modifications), since some of its kernels require a work-group size of more than 256 work-items. So, to event check whether the calculations are done correctly, you'll have to rewrite the kernels in a way, that they can work with at most 12 work-items in a work-group and still execute the correct algorithm.

merlz42
Posts: 25
Joined: Sun May 13, 2012 1:19 pm

Re: OpenCL on the VideoCore IV!

Sat Dec 09, 2017 3:51 am

Downloading all the bits and pieces now!

Do you think it will be possible to do some of the 64 bit type support in software? (this could help with compatibility even if it's not fast)

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Sat Dec 09, 2017 9:11 am

merlz42 wrote:
Sat Dec 09, 2017 3:51 am
Do you think it will be possible to do some of the 64 bit type support in software? (this could help with compatibility even if it's not fast)
Possible yes, but it probably won't be implemented any time soon, at least not by me. If anyone else wants to support (u)long data-types (double would be way more complicated I think), feel free to do so!

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Sat Dec 23, 2017 11:31 am

As promised, my Master's thesis for the VC4CL project is now available at github.

A lot of things changed since I finished my thesis, so the text might be outdated on some places. Notable changes:
  • memory is no longer loaded via the VPM, but the TMU resulting in a huge speed-up (see below) as well as a lot less use of the hardware-mutex. Thanks to nomaddo for pushing me to use the TMU
  • the meta-data for the binary program has been greatly extended and now also works with larger kernels
  • a few more extensions have been implemented
  • The OpenCL-CTS conformance has been increased
  • all three projects are now compiled on CircleCI and built as debian packages (also thanks to nomaddo)
  • and so much more...

The performance measurements have also changed, see the diagram:
graphs.png
graphs.png (20.56 KiB) Viewed 3961 times
Notable changes in performance:
  • The floating-point test-cast using the mailbox-interface to start kernels currently freezes the system (see here)
  • The performance of the floating-point and transfer-bandwidth benchmarks have slightly deteriorated. I have not done any investigation in this, but I suspect a more complex handling host-side to be the culprit.
  • The performance for the global-bandwidth benchmark has increased by up to 500%, thanks to reading from memory via TMU.

merlz42
Posts: 25
Joined: Sun May 13, 2012 1:19 pm

Re: OpenCL on the VideoCore IV!

Thu Dec 28, 2017 4:06 am

Is there a development channel on irc/discord or something? I wouldn’t mind trying to poke things but don’t have a lot of experience with compiler infrastructure. Being able to ask questions would be very helpful.

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Sat Dec 30, 2017 9:58 am

merlz42 wrote:
Thu Dec 28, 2017 4:06 am
Is there a development channel on irc/discord or something?
No there isn't at the moment. If you have any questions, you could ask them in an issue at one of the github projects or write me a private message on this board.

User avatar
Gavinmc42
Posts: 2074
Joined: Wed Aug 28, 2013 3:31 am

Re: OpenCL on the VideoCore IV!

Wed Jan 03, 2018 12:16 pm

1Mil is probably not enough for the man years needed to make OpenCL work.
as how much extra would you pay to have an open CL version
5 man years may be enough to get it working .....
Or one student who missed this post ;)
viewtopic.php?f=63&t=23321

Interesting that the TMU resulted in a speed increase.
I had just read that this is because they can access memory.
RPF has told me the QPU cannot access the GPIO peripherals but the TMU have memory access, does it include the peripheral registers?

Can the TMU's be used for bit banging I/O?
One TMU for each QPU slice, so 12 TMU's or is it one for each slice, ie 3 TMU's.
They have access to the TU L2 cache plus their own small L1 cache, do they have bit operators?
They have a 1bpp black and white mode?
As they are lookup from memory units n they do the reverse, write to memory?

Wish I could read German. ;)
Please try to keep on topic, and make posts vaguely understandable. Or have you posted this in the wrong thread?
Sorry my brain is NOT normal, it wanders a bit off track, triggered by random extrapolation of input, all perfectly logical to me, just not to anyone else. :oops: I also like to keep my ideas in the place where they happened, for the last year or so I have been forgetting things.
So sometimes my forum posts turn out to be my Journal, google will find it again ;)
If I forget how to google, time to stick me in a home.
ISP is Image System Pipeline btw.
I did know that once :o
Thank's now even more worried :roll:
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Wed Jan 03, 2018 5:57 pm

Gavinmc42 wrote:
Wed Jan 03, 2018 12:16 pm
RPF has told me the QPU cannot access the GPIO peripherals but the TMU have memory access, does it include the peripheral registers?
I don't know, bit I highly doubt that. Afaik, the periphery registers are not located in the physical memory, but are somehow mapped into the virtual memory address range to be accessed by the CPU. The QPUs on the other hand have different virtual addresses (they have a different "view" on the memory), so I doubt that they can access the periphery.
Gavinmc42 wrote:
Wed Jan 03, 2018 12:16 pm
One TMU for each QPU slice, so 12 TMU's or is it one for each slice, ie 3 TMU's.
There are 2 TMUs per slice (shared by 4 QPUs). With the Raspberry Pi's configuration of 3 slices, there are 6 TMUs (for 12 QPUs), of which a QPU has access to two of them (the ones for the QPU's slice).
Gavinmc42 wrote:
Wed Jan 03, 2018 12:16 pm
They have access to the TU L2 cache plus their own small L1 cache, do they have bit operators?
They have a 1bpp black and white mode?
As they are lookup from memory units n they do the reverse, write to memory?
A TMU can only read from memory and only 1) read 32-bit values from arbitrary memory addresses (maybe need to mapped to VideoCore address space) or 2) read pixels from images. Reading pixel-data supports various image-types (see official documentation, table 18), one of them is 1 bit black-and-white (0 = white, 1 = black I guess).

To write into memory, the VPM or the Tile buffer must be used. Afaik, the tile buffer cannot be used by non-shader programs.

MahindX
Posts: 1
Joined: Tue Jan 09, 2018 7:46 am

Re: OpenCL on the VideoCore IV!

Wed Jan 10, 2018 9:16 am

Hello,
I read your master thesis and was very impressed by your work.
I wanted to try some examples with OpenCL on my RaspberryPi 3 but I'm getting trouble building the VC4C compiler.
I'm not experienced in compilling on and for linux-systems, so it could be just a configuration problem from my side.
Is there a pre-compiled version or package of your compiler and libs available (and where) for the RaspberryPi 3?
This would be the easiest way for me (sudo apt-get ;-))
If this is not possible, can you offer a step-by-step introduction to compile and install the components?
Many thanks!

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Thu Jan 11, 2018 12:04 pm

MahindX wrote:
Wed Jan 10, 2018 9:16 am
Is there a pre-compiled version or package of your compiler and libs available (and where) for the RaspberryPi 3?
This would be the easiest way for me (sudo apt-get ;-))
Yes there is, but is it not that straight-forward yet as I want it to be (see issue):
  1. Currently, you will need to download SPIRV-LLVM into /opt/SPIRV-LLVM/ and build it according to the instructions (Notes: 1. You will have to build "only" clang and llvm-spirv. 2. This step I will try to remove)
  2. After that you can access this URL and download and install the vc4cl-stdlib-***.deb and vc4c-***.deb artifacts from the specific URLs.
  3. The last step can be repeated with this URL for the vc4cl-***.deb package.
The second and third steps can/must be repeated for every new update on the corresponding project and can be automated with this script (thanks to nomaddo) and can be used like this:

Code: Select all

curl "https://circleci.com/api/v1.1/project/github/doe300/VC4C/latest/artifacts?branch=master&filter=successful" --output /tmp/dump
wget -O /tmp/vc4cl-stdlib.deb $(python get_url.py "vc4cl-stdlib-" "/tmp/dump")
wget -O /tmp/vc4c.deb $(python get_url.py "vc4c-" "/tmp/dump")
curl "https://circleci.com/api/v1.1/project/github/doe300/VC4CL/latest/artifacts?branch=master&filter=successful" --output /tmp/dump
wget -O /tmp/vc4cl.deb $(python get_url.py "vc4cl-" "/tmp/dump")
MahindX wrote:
Wed Jan 10, 2018 9:16 am
If this is not possible, can you offer a step-by-step introduction to compile and install the components?
Step-by-step instructions can be found here.

sibnick
Posts: 7
Joined: Wed Oct 25, 2017 11:24 am

Re: OpenCL on the VideoCore IV!

Fri Jan 12, 2018 5:01 pm

doe300 wrote:
Thu Jan 11, 2018 12:04 pm
Notes: 1. You will have to build "only" clang and llvm-spirv.
Can you please share necessary build options?

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Fri Jan 12, 2018 8:31 pm

The docker image we use includes the building of the SPIRV-LLVM in its last lines lines.

A more simpler and faster version would be:

Code: Select all

mkdir -p /opt/SPIRV-LLVM/build && cd /opt/SPIRV-LLVM/build
cmake ../ -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_RUNTIME=Off -DLLVM_INCLUDE_TESTS=Off -DLLVM_INCLUDE_EXAMPLES=Off -DLLVM_ENABLE_BACKTRACES=Off -DLLVM_TARGETS_TO_BUILD=X86
make -j4 clang llvm-spirv
The x number in make -jx should match the number of cores of the machine.

sibnick
Posts: 7
Joined: Wed Oct 25, 2017 11:24 am

Re: OpenCL on the VideoCore IV!

Sat Jan 13, 2018 6:30 pm

Thanks! I successfully compiled V4C and run tests.

paulreimer
Posts: 3
Joined: Mon Jan 15, 2018 1:47 am

Re: OpenCL on the VideoCore IV!

Mon Jan 15, 2018 2:00 am

Back to the pocl interop question; IIRC OpenCL has support for multiple devices/contexts, so we could have one CPU device and one GPU device available for executing queues?

doe300
Posts: 41
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Mon Jan 15, 2018 8:40 am

paulreimer wrote:
Mon Jan 15, 2018 2:00 am
Back to the pocl interop question; IIRC OpenCL has support for multiple devices/contexts, so we could have one CPU device and one GPU device available for executing queues?
Yes, if both pocl and VC4CL are configured with the ICD loader, you can query a cl_platform for each implementation and then execute code on them in parallel.

Return to “Graphics programming”

Who is online

Users browsing this forum: No registered users and 3 guests