Since Broadcom released complete documentation for the VideoCore IV GPU back in February 2014 we’ve seen a number of fun uses of our 24GFLOPs of QPU compute, from Andrew Holme’s FFT library to Pete Warden’s deep learning experiments. It’s not unusual to see a 10x increase in performance over the ARM for algorithms with a decent amount of parallelism.
A platform is only as good as its development tools, so it’s a great start to the New Year to see a new QPU macro assembler from Marcel Müller. This builds on Pete and Eman’s earlier QPU assemblers to include support for macros and functions. Along the way, he’s even managed to squeeze another few percent out of the size and run time of Andrew’s FFT library. You can find source, binaries, documentation and sample code here.