The last six months I spent on my masters thesis developing an OpenCL implementation running on the VideoCore IV GPU!
I present to you VC4CL (VideoCore IV OpenCL):
Of course it is far from complete, but it runs about 50% of the OpenCL CTS test-cases for supported features, 60% of the test-programs of a slightly modified boost compute library, 71% of the test cases for EasyCL, as well as some other test-programs.
Performance-wise it beats the results of the pocl implementation for the floating-point benchmark (reaching up to 4GFLOPS!) and has an expected inferior memory-access speed (at up to 120MB/s).
The VC4C compiler supports compilation of OpenCL C source-code, LLVM-IR intermediate code as well as SPIR-V via the corresponding front-end and can use standard LLVM as well as Khronos SPIRV-LLVM as front-end compiler. The VC4CL library can also be used with the Khronos ICD loader.
Notable not (yet) supported features:
- 64-bit data-types (long, double)
- linking of multiple source code files
- images (WIP)
- a lot of mathematical correctness (WIP)
- performance (mostly within the compiler)
NOTE: Due to the lack of a MMU between the VPM and the RAM as well as the required memory-mapping to access V3D registers, applications using the VC4CL implementation must be run as root!