doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

OpenCL on the VideoCore IV!

Mon Oct 09, 2017 9:14 am

Not really graphics programming, more like GPGPU...

The last six months I spent on my masters thesis developing an OpenCL implementation running on the VideoCore IV GPU!

I present to you VC4CL (VideoCore IV OpenCL):

Of course it is far from complete, but it runs about 50% of the OpenCL CTS test-cases for supported features, 60% of the test-programs of a slightly modified boost compute library, 71% of the test cases for EasyCL, as well as some other test-programs.

Performance-wise it beats the results of the pocl implementation for the floating-point benchmark (reaching up to 4GFLOPS!) and has an expected inferior memory-access speed (at up to 120MB/s).

The VC4C compiler supports compilation of OpenCL C source-code, LLVM-IR intermediate code as well as SPIR-V via the corresponding front-end and can use standard LLVM as well as Khronos SPIRV-LLVM as front-end compiler. The VC4CL library can also be used with the Khronos ICD loader.

Notable not (yet) supported features:
  • 64-bit data-types (long, double)
  • linking of multiple source code files
  • images (WIP)
  • a lot of mathematical correctness (WIP)
  • performance (mostly within the compiler)
The code can be taken from here (the runtime-library), here (the compiler) and here (the standard-library).

NOTE: Due to the lack of a MMU between the VPM and the RAM as well as the required memory-mapping to access V3D registers, applications using the VC4CL implementation must be run as root!

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18210
Joined: Sat Jul 30, 2011 7:41 pm

Re: OpenCL on the VideoCore IV!

Mon Oct 09, 2017 1:10 pm

That sounds like very good work indeed, nice one.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

eupton
Forum Moderator
Forum Moderator
Posts: 29
Joined: Sun Apr 15, 2012 7:28 pm

Re: OpenCL on the VideoCore IV!

Mon Oct 09, 2017 9:04 pm

Is a copy of you Masters thesis available online? I'd love to know a bit more about the challenges you encountered getting this to work. Particularly interested in your approach to writes to memory via the VDW (which as you observe is not the fastest thing in the world).

User avatar
Gavinmc42
Posts: 1555
Joined: Wed Aug 28, 2013 3:31 am

Re: OpenCL on the VideoCore IV!

Tue Oct 10, 2017 3:03 am

Wow, someone give this poster a job quick.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Tue Oct 10, 2017 8:12 am

eupton wrote:
Mon Oct 09, 2017 9:04 pm
Is a copy of you Masters thesis available online?
Not yet, since I haven't turned it in yet. I will upload it once I have, it's written in German though.
eupton wrote:
Mon Oct 09, 2017 9:04 pm
I'd love to know a bit more about the challenges you encountered getting this to work. Particularly interested in your approach to writes to memory via the VDW (which as you observe is not the fastest thing in the world).
Getting memory access to work (especially freely configurable for 1, 2, 3, 4, 8 and 16 elements for 1-byte, 2-byte and 4-byte types) was not an easy thing. Akane was a great help getting it working on this thread.

The basic steps of what is done for memory writes:
  1. Lock the hardware mutex, since all QPUs share the same VPM to prevent overwriting the configuration
  2. Configure access from QPU to VPM (byte-size of type, number of elements and number of vectors to write)
  3. Write the correct number of vectors into the VPM
  4. Configure DMA (byte-size of type, number of elements and number of vectors to write)
  5. Write memory address to initiate DMA write
  6. Read the DMA wait register to wait for the DMA access to finish
  7. Unlock the hardware mutex
Gavinmc42 wrote:Wow, someone give this poster a job quick.
I'd gladly accept ;)
Last edited by doe300 on Tue Oct 10, 2017 2:43 pm, edited 1 time in total.

User avatar
Gavinmc42
Posts: 1555
Joined: Wed Aug 28, 2013 3:31 am

Re: OpenCL on the VideoCore IV!

Tue Oct 10, 2017 8:58 am

How's your Japanese :lol:

I suspect Akane might have something to do with this.
https://idein.jp/
https://github.com/nineties/py-videocore
Either that or there are more than one Japanese qpu guru's ;)

I do find it interesting that anyone who does AI/ML stuff with VC4 GPU/QPU's seem to end up at MS or Intel or Google or...
Start brushing up your resume, passport photo, etc ;)
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Thu Oct 12, 2017 11:18 am

Gavinmc42 wrote:
Tue Oct 10, 2017 8:58 am
How's your Japanese :lol:
Google translate for the win! :?
Gavinmc42 wrote:
Tue Oct 10, 2017 8:58 am
I do find it interesting that anyone who does AI/ML stuff with VC4 GPU/QPU's seem to end up at MS or Intel or Google or...
Start brushing up your resume, passport photo, etc ;)
Not really my first choice of employers, Intel would be okay though ;)

blackshard83
Posts: 67
Joined: Fri Jan 10, 2014 8:31 am

Re: OpenCL on the VideoCore IV!

Tue Oct 17, 2017 10:31 am

Great job indeed!
Congratulations!

User avatar
jbeale
Posts: 3302
Joined: Tue Nov 22, 2011 11:51 pm
Contact: Website

Re: OpenCL on the VideoCore IV!

Wed Oct 18, 2017 4:26 pm

This sounds very impressive but I'm not sure I understand the implications. Are there any examples of this in action? Does this mean we might be able to get better performance on compute-intensive tasks, like image recognition? Right now there are neural-network "deep learning" based object-detection programs that run on the RPi3 and take just over 1 second to process one video frame and detect the location of objects (chair, person, etc.) in it, for example: https://www.pyimagesearch.com/2017/10/1 ... ent-437929

These neural-network programs spend most of the CPU time doing a huge number of simple multiply-and-add instructions to go from an input array of pixels to the output of predicted object locations. To compare some actual numbers, the Google MobileNets project https://research.googleblog.com/2017/06 ... s-for.html offers several versions of an object detector and classifier, requiring from 14 to 569 million MACs (multiply-accumulate operations) per frame depending on what accuracy you want.

The deep learning code I've seen on the RPi so far runs entirely on the CPU. Would this OpenCL work enable such an application to leverage the GPU to reach higher frame rates? When an object-recognition program runs at 0.9 fps it is not fast enough for some real-time applications, but if for example a 2 or 3x speedup was possible, that would start to become more useful, and of course the more the better.

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Wed Oct 18, 2017 7:07 pm

jbeale wrote:
Wed Oct 18, 2017 4:26 pm
This sounds very impressive but I'm not sure I understand the implications. Are there any examples of this in action?
No not yet, I can run several test-cases, but haven't really tested it out with any productive application yet.
jbeale wrote:
Wed Oct 18, 2017 4:26 pm
Does this mean we might be able to get better performance on compute-intensive tasks, like image recognition? [...] These neural-network programs spend most of the CPU time doing a huge number of simple multiply-and-add instructions to go from an input array of pixels to the output of predicted object locations.
Yes, probably. Using the GPU for OpenCL calculations definitively has the advantage, that the CPU can be used to do other calculations. For small OpenCL kernels, the performance will probably be worse than a native execution on the CPU, since there is some overhead to start kernels. For larger kernels however, the performance of the GPU exceeds an execution on the CPU, especially for parallel tasks.

Performance-wise, I measured up to 4 GFLOPs for the clpeak floating-point benchmark (out of the theoretical maximum of 24 GFLOPs for the VC4 GPU), which is a lot more than the "original" Raspberry Pi A or B can achieve and about as high as the theoretical maximum computing power of the Raspberry Pi 3 without using NEON instructions.

So, in theory, OpenCL on the VideoCore IV GPU should increase performance of such applications. Currently, the greatest obstacle won't be the performance, but the fact that the implementation is not yet complete and will most likely produce some wrong results. You are definitively welcome to try it out and give feedback on the performance or the correctness of such applications!

User avatar
jbeale
Posts: 3302
Joined: Tue Nov 22, 2011 11:51 pm
Contact: Website

Re: OpenCL on the VideoCore IV!

Wed Oct 18, 2017 8:59 pm

Thanks for that informative reply. I assume the difference between the 4 GFlops in practice and 24 GFlops in theory, has to do with memory bandwidth?

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Wed Oct 18, 2017 9:16 pm

jbeale wrote:
Wed Oct 18, 2017 8:59 pm
Thanks for that informative reply. I assume the difference between the 4 GFlops in practice and 24 GFlops in theory, has to do with memory bandwidth?
Yes, to some part due to unoptimized instructions, but the most performance loss is due to memory access speed (currently about 110MB/s for the clpeak bandwidth benchmark), which is a big bottleneck for the VC4 GPU.

User avatar
Gavinmc42
Posts: 1555
Joined: Wed Aug 28, 2013 3:31 am

Re: OpenCL on the VideoCore IV!

Thu Oct 19, 2017 1:41 am

OpenCL on the Pi QPU means things like the ARM Compute library now has a chance of being ported.
This Compute Library will run on ARM, NEON and the Mali GPU, now there is more chance of it running on VC4.

The Pi 3 has bigger caches, could the code be made small enough to mostly work in those to avoid accessing the DDR?
RPF says NEON would probably be faster than QPU, but why not use both at the same time :D

Then there is quite a bit of OpenCL code out there that could/may now run on Pi's, even Zero's.
Poor man's NEON for the BCM2835 Pi's.

It is also not just OpenCL but also the toolset, LLVM etc that is needed to get OpenCL working, this can be used for other stuff.
This is just beginning, who knows where this could lead?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Thu Oct 19, 2017 9:45 am

Gavinmc42 wrote:
Thu Oct 19, 2017 1:41 am
The Pi 3 has bigger caches, could the code be made small enough to mostly work in those to avoid accessing the DDR?
The size of the code depends on its purpose, so the compiler cannot force it to stay smaller than some limit. Also, VC4CL does not really mind the L2, instruction and uniform caches, since it cannot really influence their behaviour, except force cleaning the L2 cache.

If you are referring to the VPM cache size, then there is an optimization in development to implement a write-back cache using the VPM to limit the amount of memory access required. Currently, the VPM only caches data for successive reads or writes using consecutive memory addresses for a single QPU.

sibnick
Posts: 5
Joined: Wed Oct 25, 2017 11:24 am

Re: OpenCL on the VideoCore IV!

Thu Oct 26, 2017 7:19 am

I can't build VC4C
Can you give some advice?

Code: Select all

pi@raspberrypi:~/opencl/build_VC4C $ cmake ../VC4C
-- VC4CL standard library headers found: /home/pi/opencl/VC4C/../VC4CLStdLib/include/VC4CLStdLib.h
-- The C compiler identification is GNU 4.9.2
-- The CXX compiler identification is GNU 4.9.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Enabling multi-threaded optimizations
-- Khronos OpenCL toolkit: /usr/local/bin
-- CLang compiler found: /usr/bin/clang
-- Compiling SPIR-V front-end...
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pi/opencl/build_VC4C
pi@raspberrypi:~/opencl/build_VC4C $ make
Scanning dependencies of target vc4cl-stdlib
In file included from /home/pi/opencl/VC4C/../VC4CLStdLib/include/VC4CLStdLib.h:17:
In file included from /home/pi/opencl/VC4C/../VC4CLStdLib/include/_config.h:81:
/home/pi/opencl/VC4C/../VC4CLStdLib/include/opencl-c.h:7460:21: warning: incompatible redeclaration of library function 'acos'
float __ovld __cnfn acos(float);
                    ^
/home/pi/opencl/VC4C/../VC4CLStdLib/include/opencl-c.h:7460:21: note: 'acos' is a builtin with type 'double (double)'
/home/pi/opencl/VC4C/../VC4CLStdLib/include/opencl-c.h:7486:21: warning: incompatible redeclaration of library function 'acosh'
float __ovld __cnfn acosh(float);
...................
................... a lot of such warnings .........
....................
/home/pi/opencl/VC4C/../VC4CLStdLib/include/opencl-c.h:10080:21: note: 'abs' is a builtin with type 'int (int)'
/home/pi/opencl/VC4C/../VC4CLStdLib/include/opencl-c.h:15570:5: warning: incompatible redeclaration of library function 'printf'
int printf(__constant const char* st, ...);
    ^
/home/pi/opencl/VC4C/../VC4CLStdLib/include/opencl-c.h:15570:5: note: 'printf' is a builtin with type 'int (const char *, ...)'
In file included from /home/pi/opencl/VC4C/../VC4CLStdLib/include/VC4CLStdLib.h:19:
In file included from /home/pi/opencl/VC4C/../VC4CLStdLib/include/_conversions.h:12:
/home/pi/opencl/VC4C/../VC4CLStdLib/include/_intrinsics.h:220:49: error: can't convert between vector values of different size ('uchar16' (vector of 16 'uchar' values) and 'int')
SIMPLE_1(uchar, vc4cl_msb_set, uchar, val, (val >> 7))
                                            ~~~ ^  ~
/home/pi/opencl/VC4C/../VC4CLStdLib/include/_overloads.h:99:10: note: expanded from macro 'SIMPLE_1'
                return content; \
                       ^
In file included from /home/pi/opencl/VC4C/../VC4CLStdLib/include/VC4CLStdLib.h:19:
In file included from /home/pi/opencl/VC4C/../VC4CLStdLib/include/_conversions.h:12:
/home/pi/opencl/VC4C/../VC4CLStdLib/include/_intrinsics.h:220:49: error: can't convert between vector values of different size ('uchar8' (vector of 8 'uchar' values) and 'int')
SIMPLE_1(uchar, vc4cl_msb_set, uchar, val, (val >> 7))
                                            ~~~ ^  ~
/home/pi/opencl/VC4C/../VC4CLStdLib/include/_overloads.h:105:10: note: expanded from macro 'SIMPLE_1'
                return content; \
...................
................... a lot of such errors .........
....................

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Thu Oct 26, 2017 7:26 pm

sibnick wrote:
Thu Oct 26, 2017 7:19 am
I can't build VC4C
Can you give some advice?
What version of LLVM/Clang do you have? Do you use the LLVM version from the Raspbian repositories?

This looks like I broke the compilation with the default LLVM somewhere along the way (I personally almost exclusively use the SPIRV-LLVM). I will run more accurate tests tomorrow.


Edit:
On my Raspberry Pi, I have no problem compiling the VC4CL standard library headers with the default CLang in version 3.9.

sibnick
Posts: 5
Joined: Wed Oct 25, 2017 11:24 am

Re: OpenCL on the VideoCore IV!

Fri Oct 27, 2017 5:52 pm

I upgrade clang to 3.9 version and start use SPIRV-LLVM
Now I get following errors:

Code: Select all

 const Value IMAGE_CONFIG_ACCESS_OFFSET(Literal(sizeof(unsigned)), TYPE_INT32);
                                                                ^
/home/pi/opencl/VC4C/src/intrinsics/Images.cpp:20:64: note: candidates are:
In file included from /home/pi/opencl/VC4C/src/intrinsics/../Module.h:12:0,
                 from /home/pi/opencl/VC4C/src/intrinsics/Images.h:12,
                 from /home/pi/opencl/VC4C/src/intrinsics/Images.cpp:7:
/home/pi/opencl/VC4C/src/intrinsics/../Values.h:158:3: note: constexpr vc4c::Literal::Literal(vc4c::Literal&&)
   Literal(Literal&&) = default;
   ^
/home/pi/opencl/VC4C/src/intrinsics/../Values.h:157:3: note: constexpr vc4c::Literal::Literal(const vc4c::Literal&)
   Literal(const Literal&) = default;
   ^
/home/pi/opencl/VC4C/src/intrinsics/../Values.h:155:3: note: vc4c::Literal::Literal(bool)
   Literal(const bool flag);
   ^
/home/pi/opencl/VC4C/src/intrinsics/../Values.h:154:3: note: vc4c::Literal::Literal(double)
   Literal(const double real);
   ^
/home/pi/opencl/VC4C/src/intrinsics/../Values.h:153:12: note: vc4c::Literal::Literal(long unsigned int)
   explicit Literal(const long unsigned integer);
            ^
/home/pi/opencl/VC4C/src/intrinsics/../Values.h:152:3: note: vc4c::Literal::Literal(long int)
   Literal(const long integer);
After changing in src/intrinsics/Images.cpp:

Code: Select all

const Value IMAGE_CONFIG_ACCESS_OFFSET(Literal(static_cast<long int>(sizeof(unsigned))), TYPE_INT32);
I get error on the step RegressionTest

Code: Select all

[ 94%] Building CXX object build/test/CMakeFiles/TestVC4C.dir/RegressionTest.cpp.o
/home/pi/opencl/VC4C/test/RegressionTest.cpp:324:1: error: converting to ‘std::tuple<unsigned char, unsigned char, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >’ from initializer list would use explicit constructor ‘constexpr std::tuple< <template-parameter-1-1> >::tuple(_UElements&& ...) [with _UElements = {const unsigned char&, const unsigned char&, const char (&)[20], const char (&)[1]}; <template-parameter-2-2> = void; _Elements = {unsigned char, unsigned char, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >}]’
 };
 ^
/home/pi/opencl/VC4C/test/RegressionTest.cpp:324:1: error: converting to ‘std::tuple<unsigned char, unsigned char, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >’ from initializer list would use explicit constructor ‘constexpr std::tuple< <template-parameter-1-1> >::tuple(_UElements&& ...) [with _UElements = {const unsigned char&, const unsigned char&, const char (&)[23], const char (&)[1]}; <template-parameter-2-2> = void; _Elements = {unsigned char, unsigned char, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >}]’

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Sat Oct 28, 2017 12:18 pm

sibnick wrote:
Fri Oct 27, 2017 5:52 pm
Now I get following errors:
These should all be fixed with the latest github version, along with a lots of compiler warnings for both host-side projects.

sibnick
Posts: 5
Joined: Wed Oct 25, 2017 11:24 am

Re: OpenCL on the VideoCore IV!

Mon Oct 30, 2017 11:27 am

Thanks
Now I can build everything (include VC4CL tests)
Unfortunately VC4CL tests fail:

Code: Select all

root@raspberrypi:/home/pi/opencl/VC4CL/build# build/test/TestVC4CL 
Running suite 'TestSystem' with 1 tests...
Test 'TestSystem::testGetSystemInfo()' failed!
	Suite: TestSystem
	File: TestSystem.cpp
	Line: 24
	Failure: Got 0, expected 16
Test-method 'TestSystem::testGetSystemInfo()' finished with errors!
Suite 'TestSystem' finished, 0/1 successful (0%) in 2175 microseconds (2.175 ms).
Running suite 'TestPlatform' with 2 tests...
Suite 'TestPlatform' finished, 2/2 successful (100%) in 266 microseconds (0.266 ms).
Running suite 'TestDevice' with 5 tests...
Test 'TestDevice::testGetDeviceInfo()' failed!
	Suite: TestDevice
	File: TestDevice.cpp
	Line: 151
	Failure: Got 300, expected 250
Test-method 'TestDevice::testGetDeviceInfo()' finished with errors!
Suite 'TestDevice' finished, 4/5 successful (80%) in 1535 microseconds (1.535 ms).
Running suite 'TestContext' with 5 tests...
Suite 'TestContext' finished, 5/5 successful (100%) in 193 microseconds (0.193 ms).
Running suite 'TestCommandQueue' with 4 tests...
Suite 'TestCommandQueue' finished, 4/4 successful (100%) in 552 microseconds (0.552 ms).
Running suite 'TestBuffer' with 17 tests...
Suite 'TestBuffer' finished, 17/17 successful (100%) in 2513 microseconds (2.513 ms).
Running suite '' with 0 tests...
Suite '' finished, 0/0 successful (0%) in 0 microseconds (0 ms).
Running suite 'TestProgram' with 11 tests...
Test 'TestProgram::testCompileProgram()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 101
	Failure: Got -3, expected 0
Test-method 'TestProgram::testCompileProgram()' finished with errors!
Test 'TestProgram::testLinkProgram()' failed!
	Suite: TestProgram
	File: TestProgram.cpp
	Line: 109
	Failure: Got -59, expected 0
	..........................

My clinfo output looks OK:

Code: Select all

Number of platforms                               1
  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
  Platform Vendor                                 doe300
  Platform Version                                OpenCL 1.2 VC4CL 0.4
  Platform Profile                                EMBEDDED_PROFILE
  Platform Extensions                             cl_khr_il_program cl_altera_device_temperature cl_arm_shared_virtual_memory cl_khr_icd cl_vc4cl_performance_counters
  Platform Extensions function suffix             VC4CL

  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices                                 1
  Device Name                                     VideoCore IV GPU
  Device Vendor                                   Broadcom
  Device Vendor ID                                0xa5c
  Device Version                                  OpenCL 1.2 VC4CL 0.4
  Driver Version                                  0.4
  Device OpenCL C Version                         OpenCL C 1.2 
.........................................
BTW I try run VC4CL from command line and get:

Code: Select all

root@raspberrypi:/home/pi/opencl/VC4CL/test# VC4C -o /tmp/fft2 fft2_2.cl 
Compiling 'fft2_2.cl' into '/tmp/fft2' with options '' ...
[I] Mon Oct 30 11:25:44 2017: Compiling OpenCL to LLVM-IR with :/usr/bin/clang -m32  -I . -O3  -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -include-pch /home/pi/opencl/VC4C/include/VC4CLStdLib.h.pch -x cl -S -emit-llvm -o /dev/stdout fft2_2.cl
[E] Mon Oct 30 11:25:45 2017: Errors in precompilation:
[E] Mon Oct 30 11:25:45 2017: 
terminate called after throwing an instance of 'vc4c::CompilationError'
  what():  Pre-compilation: Error in precompilation: error: OpenCL version was �_�U
Aborted


doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Mon Oct 30, 2017 7:11 pm

The TestVC4CL failures (e.g. in TestProgram) are partly because I haven't updated TestVC4CL in a while. I will fix, update and extend the TestVC4CL tests the next days.
The error in TestSystem is due to the V3D register somehow having the wrong values. In this case, the value is expected to list 16 hardware-semaphores, but reads none. I had a similar error with the number of QPUs being way wrong, which I couldn't fix either.

The error in TestDevice looks like you are running on a Raspberry Pi 3, which apparently has 300MHz GPU speed instead of 250MHz like the other models. I will have to update the test.

I don't know where the error in VC4C comes from. Can you run the pre-compile command on its own and post the output?

Code: Select all

/usr/bin/clang -m32  -I . -O3  -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -include-pch /home/pi/opencl/VC4C/include/VC4CLStdLib.h.pch -x cl -S -emit-llvm -o /dev/stdout fft2_2.cl

sibnick
Posts: 5
Joined: Wed Oct 25, 2017 11:24 am

Re: OpenCL on the VideoCore IV!

Mon Oct 30, 2017 11:22 pm

Sure

Code: Select all

root@raspberrypi:/home/pi/opencl/VC4C/example# /usr/bin/clang -m32  -I . -O3  -Wno-undefined-inline -Wno-unused-parameter -Wno-unused-local-typedef -Wno-gcc-compat -include-pch /home/pi/opencl/VC4C/include/VC4CLStdLib.h.pch -x cl -S -emit-llvm -o /dev/stdout fft2_2.cl
error: OpenCL version was �_�U9@�_�UH �_�V�� in PCH file but is currently ob� ,
l��a�
      ,
�H�\�

May it depends from kernel too?
I have latest kernel:

Code: Select all

root@raspberrypi:/home/pi/opencl/VC4C/example# uname -a
Linux raspberrypi 4.9.59-v7+ #1047 SMP Sun Oct 29 12:19:23 GMT 2017 armv7l GNU/Linux

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Tue Oct 31, 2017 7:52 am

Have you updated the OpenCL-headers package or clang after building the PCH file (via vc4cl-stdlib)? If so, you might need to rebuilt it, by deleting include/VC4CLStdLib.pch in the VC4C project and re-running the target vc4cl-stdlib.

Since the PCH (precompiled header) is an internal state of the LLVM compiler, it is very sensitive to changes in the compiler or any source used (e.g. the VC4CLStdLib project or the system OpenCL-headers) and LLVM generally throws errors on every small difference between the PCH file and e.g. the compiler version or the source-headers.

It should not depend on the kernel, since I don't use any kernel-related software, except the mailbox-interface, to run OpenCL-kernels.

sibnick
Posts: 5
Joined: Wed Oct 25, 2017 11:24 am

Re: OpenCL on the VideoCore IV!

Thu Nov 02, 2017 6:09 am

Latest version is failed with

Code: Select all

In file included from /home/pi/opencl/VC4C/src/Locals.h:12:0,
                 from /home/pi/opencl/VC4C/src/Locals.cpp:7:
/home/pi/opencl/VC4C/src/Values.h:287:33: error: call of overloaded ‘Literal(long int)’ is ambiguous
  const Value INT_ZERO(Literal(0L), TYPE_INT8);
                                 ^
/home/pi/opencl/VC4C/src/Values.h:287:33: note: candidates are:
/home/pi/opencl/VC4C/src/Values.h:153:3: note: constexpr vc4c::Literal::Literal(vc4c::Literal&&)
   Literal(Literal&&) = default;
   ^
/home/pi/opencl/VC4C/src/Values.h:152:3: note: constexpr vc4c::Literal::Literal(const vc4c::Literal&)
   Literal(const Literal&) = default;
   ^
/home/pi/opencl/VC4C/src/Values.h:150:3: note: vc4c::Literal::Literal(bool)
   Literal(const bool flag);
   ^
/home/pi/opencl/VC4C/src/Values.h:149:3: note: vc4c::Literal::Literal(double)
   Literal(const double real);
   ^
/home/pi/opencl/VC4C/src/Values.h:148:12: note: vc4c::Literal::Literal(uint64_t)
   explicit Literal(const uint64_t integer);
            ^
/home/pi/opencl/VC4C/src/Values.h:147:3: note: vc4c::Literal::Literal(int64_t)
   Literal(const int64_t integer);
   ^
/home/pi/opencl/VC4C/src/Values.h:288:32: error: call of overloaded ‘Literal(long int)’ is ambiguous
  const Value INT_ONE(Literal(1L), TYPE_INT8);
                                ^
/home/pi/opencl/VC4C/src/Values.h:288:32: note: candidates are:
/home/pi/opencl/VC4C/src/Values.h:153:3: note: constexpr vc4c::Literal::Literal(vc4c::Literal&&)
   Literal(Literal&&) = default;
   ^
/home/pi/opencl/VC4C/src/Values.h:152:3: note: constexpr vc4c::Literal::Literal(const vc4c::Literal&)
   Literal(const Literal&) = default;
   ^
/home/pi/opencl/VC4C/src/Values.h:150:3: note: vc4c::Literal::Literal(bool)
   Literal(const bool flag);
   ^
/home/pi/opencl/VC4C/src/Values.h:149:3: note: vc4c::Literal::Literal(double)
   Literal(const double real);
   ^

doe300
Posts: 28
Joined: Thu Dec 29, 2016 1:41 pm

Re: OpenCL on the VideoCore IV!

Thu Nov 02, 2017 1:30 pm

sibnick wrote: Latest version is failed with
I haven't actually pushed the fixes for the errors you last reported yet. They should (almost all) now be fixed including this error.

I still have some weird error in TestVC4CL, in the TestExecutions test-case, where the a constant value suddenly changes within a function-call, resulting in failed checks and a lot of errors.

MrWhiter
Posts: 3
Joined: Sun Jun 11, 2017 5:14 pm

Re: OpenCL on the VideoCore IV!

Wed Nov 29, 2017 8:55 pm

Oh, this is so awesome!

Could you comment on how it compares to PoCL?
Would it possible to merge this into PoCL and re-use code, or are the two just very different?

Return to “Graphics programming”

Who is online

Users browsing this forum: No registered users and 4 guests