Real-time depth perception with the Compute Module

Liz: We’ve got a number of good friends at Argon Design, a tech consultancy in Cambridge. (James Adams, our Director of Hardware, used to work there; as did my friend from the time of Noah, @eyebrowsofpower; the disgustingly clever Peter de Rivaz, who wrote Penguins Puzzle, is an Argon employee; and Steve Barlow, who heads Argon up, used to run AlphaMosaic, which became Broadcom’s Cambridge arm, and employed several of the people who work at Pi Towers back in the day.)

We gave the Argon team a Compute Module to play with this summer, and they set David Barker, one of their interns, to work with it. Here’s what he came up with: thanks David, and thanks Argon!

This summer I spent 11 weeks interning at a local tech company called Argon Design, working with the new Raspberry Pi Compute Module. “Local” in this case means Cambridge, UK, where I am currently studying for a mathematics degree. I found the experience extremely valuable and a lot of fun, and I have learnt a great deal about the hardware side of the Raspberry Pi. And here I would like to share a bit of what I did.


My assignment was to develop an example of real-time video processing on the Raspberry Pi. Argon know a lot about the Pi and its capabilities and are experts in real-time video processing, and we wanted to create something which would demonstrate both. The problem we settled on was depth perception using the two cameras on the Compute Module. The CTO, Steve Barlow, who has a good knowledge of stereo depth algorithms gave me a Python implementation of a suitable one.


The algorithm we used is a variant of one which is widely used in video compression. The basic idea is to divide each frame into small blocks and to find the best match with blocks from other frames – this tells us how far the block has moved between the two images. The video version is designed to detect motion, so it tries to match against the previous few frames. Meanwhile, the depth perception version tries to match the left and right camera images against each other, allowing it to measure the parallax between the two images.

The other main difference from video compression is that we used a different measure of correlation between blocks. The one we used is designed to work well in the presence of sharp edges and when the exposure differs between the cameras. This means that it is considerably more accurate, at the cost of being more expensive to calculate.

When I arrived, my first task was to translate this algorithm from Python to C, to see what sort of speeds we could reasonably expect. While doing this, I made several algorithmic improvements. This turned out to be extremely successful – the final C version was over 1000 times as fast as the original Python version, on the same hardware! However, even with this much improvement, it was still taking around a second to process a moderate-sized image on the Pi’s ARM core. Clearly another approach was needed.

There are two other processors on the Pi: a dual-core video processing unit called the VPU and a 12-core GPU, both of which are part of the VideoCore block. They both run at a relatively slow 250MHz, but are designed in such a way that they are actually much faster than the ARM core for video and imaging tasks. The team at Argon has done a lot of VideoCore programming and is familiar with how to get the best out of these processors. So I set about rewriting the program, from C into VPU assembler. This sped up the processing on the Pi to around 90 milliseconds. Dropping the size of the image slightly, we eventually managed to get the whole process – get image from cameras, process on VPU, display on screen – to run at 12fps. Not bad for 11 weeks’ work!

I also coded up a demonstration app, which can do green-screen-free background removal, as well as producing false-colour depth maps. There are screenshots below; the results are not exactly perfect, but we are aware of several ways in which this could be improved. This was simply a matter of not having enough time – implementing the algorithm to the standard of a commercial product, rather than a proof-of-concept, would have taken quite a bit longer than the time I had for my internship.

To demonstrate our results, we ran the algorithm on a standard image pair produced by the University of Tsukuba. Below are the test images, the exact depth map, and our calculated one.



We also set up a simple scene in our office to test the results on some slightly more “real-world” data:




However, programming wasn’t the only task I had. I also got to design and build a camera mount, which was quite a culture shock compared to the software work I’m used to.


Liz: I know that stereo vision is something a lot of compute module customers have been interested in exploring. David has made a more technical write-up of this case study available on Argon’s website for those of you who want to look at this problem in more…depth. (Sorry.)




This story is interesting and totally not interesting at the same time!

Do not get me wrong, I know the RPi is ment to help kids get into programming for the lowest investment possible. Now with the compute mudule this is not the case.

For businesses to use the compute model in their products, the unavailability of documentation of the VC amongst other things is rendering the whole CM pretty useless for any serious development.

I have the need for processing images from two camera’s (stereo) but the RPi camera is not suitable to do the job, I am not capable of using other sensors and nobody is capable of doing what is done in this post.
The SOC used should be opened op completely for CM users.

Also, over time there has been ‘sudden’ releases of information. Not announced and not according to a roadmap. We have to decide soon which hardware platform we will start using. The CM is on the top of the list hardware wise, but at the mean time it is also on the top of the ‘avoid’ list because of lack of openness. The last renders the CM module completely useless for our use.
Full access to the VC, csi and dsi ports is becoming more and more ‘wanted’ and mandatory. Information on availability or a release date is seemingly not there. I do follow the blog posts and read the forums.

I think the RPi foundation should be more clear on this issue.



Just out of curiosity, why is the raspi’s existing camera not up to the job?


Because the ‘image’ in the ‘video’ is generated by sensing non visible forces. This is done from two points in space and therefore it is like stereo imaging and can be processed by such thing as a VC. Feeding the data in a video format means cheap and fast hardware (like the CM) can do the job in a small form factor with a low power budget. Highly reducing the costs by eliminating additional hardware and software tools. I would love to support the RPi foundation by using (and therefore buying) CM’s. They are just so capable of doing so much more than being used as a media player…..


Just to be clear David had little to no help at all from Raspberry Pi, in fact probably only had about two email conversations with him. Otherwise he used all publicly available documentation and interfaces.



A demo program is mentioned. Has it been released?

Would allow others to see how to program the VideoCore.


The article mentions that the algorithm is running on the VPU, so I wonder if it’s using similar techniques to ? (i.e. it doesn’t actually require any special ‘insider knowledge’)


He has used either the GPU information released earlier in the year or Herman Hermitage’s VPU reverse engineering work and his assembler I believe



I love that the “real world data” picture includes a kettle and an oscilloscope :-D


Cool, I’m glad someone has done this. I had a similar idea back in August to do distance measuring using two networked Pis.

I was trying to get my head around the trig for this using this phd workshop:

I wanted to just detect a blob of colour and use that as the target. Like an orange circle on a big A1 sheet of card.

Is the Python you used for the trig available online?


To help anyone wanting to get started programming the VideoCore VPU, here are a few links.

Herman Hermitage has done an excellent job of reverse engineering the VPU instruction set. See:

You can run pieces of VPU code from the ARM side as described in:

We use Volker Barthelmann’s assembler vasm available at:

Although the web page doesn’t mention VideoCore, this is one of the processors it supports. The latest version on his site includes some patches we have submitted to him.

Finally, to integrate kernels you have written with the camera flow, you need to use MMAL. This is a Broadcom specific API. MMAL access on the ARM side is provided by and documented in mmal.h:

There is a more readable version of this at:

To see how this works, look at the example app to take stills – raspistill:

There is definitely a steep learning curve here. We’re very happy at Argon Design to help anyone wanting to use the Compute Module for multimedia projects. We do though have to charge for our time. Feel free to contact me at


This young man is brilliant! Transcoding between Python, C, and then and assembly with improvements. I sure hope they gave him a job, he is going to be in big demand.



That’s a very cool project!

Just thought I’d post a quick note to add that picamera theoretically supports stereo cameras on the compute module (I ported raspivid’s calls into the library in 1.8). I should warn however, that it’s untested and I’ve got one bug report from a user who couldn’t get it working (though I suspected this was down to out-of-date firmware, and haven’t heard back from him since). I’d be very interested to hear of anyone else’s experiences (feel free to post bug reports on picamera’s GitHub, or just e-mail me)!

Obviously Python’s not fast enough (on the Pi) for stuff like this (unless there’s anything similar to pyopencl for running stuff on the VPU?), but I suspect it’d be useful in other areas, and for rapidly prototyping things.


This week I started playing with extracting 3d from dual image data using a copy of Agisoft’s stereo scan. My interest is to capture something acurately enough that I could use it for 3d printing. So far the results aren’t encouraging and part of me wonders if I could take simultaneous pictures with similar cameras of that wouldn’t fix my problem. I wouldn’t need real time. But I would like non-lossy pictures. Needless to say I’m pricing this out.


very cool to have to camera’s on the pi. unfortuantely the pi will be a dead end to really get something meaningful. its just to slow and the not a good environment to really program cutting edge algorithms. It is however really nice as a front end for the camera, which can stream to the server with some real horse power.

That was my approach in

then you can have a feed back loop so a robot does something


Is it possible to use this capability to do short to mid-range 2d mapping of a room?


Is it possible to merge 2 videos from two cameras real time with no gap in between?

Leave a Comment

Comments are closed