jbeale wrote:If you want video frame rates (30 fps or more) you should probably use the video codec, which is .h264
that way you can even get full HD at 30 fps.
Just to follow up on jbeale's point (which is well made), it's probably about time I did a post on realistic processing limits on the Pi...
Whenever you're working on the Pi the major thing in your head should be: bandwidth. The Pi has pretty severe limits on bandwidth in numerous places - firstly disk (SD card or USB), but also memory. This is one of the reasons it's sometimes faster to work with JPEG output from the camera instead of unencoded YUV/RGB; although the latter may involve less processing, the former involves moving smaller chunks of memory from the GPU to the CPU.
Now if you're considering doing serious visual processing (face recognition, object tracking, etc.) you can pretty much forget doing 720p30 on the Pi (unless you're willing to get down'n'dirty with the GPU - but that's outside my realm of experience). Let's assume you go for the next simplest option: capture on the Pi, and ship the frames over the network to some beefy box capable of doing the processing fast enough.
If the box is fast enough to do the serious visual processing at that rate it's almost certainly fast enough to do the (comparatively trivial) video decoding and frame splitting as well. Hence, just record video on the Pi and dump the stream as-is over the network.
So, why use H264 instead of MJPEG? Surely full JPEG pictures will be higher quality than a video codec? Incorrect. Of all the formats supported, H264 provides by far the best quality in the smallest space. Remember that JPEG is (by now) quite an ancient format and hasn't been upgraded (in a widely supported manner) since the late 90s. H264 keyframes (I-frames) are smaller, yet better quality than JPEGs (hardly surprising given H264 has the benefit of years of research beyond JPEG). One might argue that the P-frames (predicted frames in between) are lower quality and in the case of a full scene change or extremely complex motion, you might have a point. But you can easily configure the H264 encoder to only output I-frames (intra_period=1) and you'll still get similar or better quality to JPEG in a smaller space.
What about shipping the unencoded YUV/RGB frames for maximum quality? Good luck: here's the bandwidth requirement for 720p30 in RGB format:
1280 * 720 * 3 * 30 / 1048576 = 79Mbytes/sec
Remember that the Pi's ethernet port (which is connected via USB anyway) is 100Mbit, so its maximum capacity (assuming no protocol overhead or other bandwidth restrictions) is 10Mbytes/sec. What about YUV? That just cuts the requirement in half by halving the bytes per pixel to 1.5, so we get down to 40Mbytes/sec - still way outside the available capacity (and we haven't even discussed whether the Pi can shove bits around that fast in memory - in my experience it can't).
So, realistic processing limits:
If you're doing your processing entirely on the Pi and you're wanting to do things that are considered relatively hard (like face tracking), expect to be down in the single digits of fps (1-2fps on a Pi1, probably higher on a Pi2). Simpler stuff (like recognizing the dominant colour in a scene) can be achieved much faster, so you might manage 15fps or perhaps higher with a bit of cunning. Generally speaking in this scenario you might start working with unencoded frames (because it's easiest and involves the least processing overhead), but you'll probably want to experiment with frames extracted from MJPEG too (as in the gist posted above ... I really must add that as a recipe in the next release ...) in order to find out whether bandwidth is your limiting factor.
If you're offloading your processing onto a big box (a big laptop with a Core i7 for example) - don't bother playing around with unencoded frames or MJPEG. Just record H264, shove it over the network and do all the processing on the other end. Fire up an appropriately configured ffmpeg subprocess, feed the H264 stream to its stdin, and read unencoded YUV/RGB frames from its stdout (or find some appropriate bindings for libav). In this scenario you should be able to manage 30fps or higher with ease (the limiting factor will be processing speed on the other machine).