Simltaneous 1 decoding / 2 encodings on the GPU?


5 posts
by pyke369 » Tue Oct 02, 2012 4:09 pm
Hi everyone,

(I guess the following is for dom and/or jamesh).

I'd like to use the R.Pi as a live transcoding system, using the GPU-accelerated codec blocks (through the OMX-IL layer provided by Broadcom as part of the VideoCore SDK). I would have one incoming 720p H264/AAC @ ~2Mbps stream, and would like to re-encode on the fly to two outgoing H264/AAC streams: one 480p @ ~900Kbps and one 360p @ ~500Kbps. The incoming stream would of course be properly constrained and framed to accomodate the hw decoder (Annex-B framing for AVC, ADTS prefixes for AAC, etc ...). My questions are:

- is the VC4 embedded in the R.Pi/BCM GPU powerful enough to handle such configuration (1 720p decoding and 2 480p+360p encodings) at the same time?
- what about the needed memory for data buffers and tables? is there enough on the board? (I'm already using a 128MB/128MB memory partitioning scheme, so 128MB are dedicated to the GPU)
- using OMX-IL, is it possible to instantiate the OMX.broadcom.video_encode component multiple times (at least 2 in my case)? what about OMX-IL tunnels between OMX.broadcom.video_decode, OMX.broadcom.video_splitter and OMX.broadcom.video_encode components? would that work as expected?
- I know that AVC decoding and encoding are available as OMX-IL accelerated components, what about AAC? also available accelerated or do I have to do that in software on the ARM CPU (or maybe not re-encode audio but just remux if there's not enough power for doing that on the CPU)?

Thanks for your reply,
Pierre-Yves
Posts: 15
Joined: Sat Feb 04, 2012 8:54 pm
by jamesh » Tue Oct 02, 2012 7:04 pm
If you assume the maximum band width is 1080p30 encode at about 20Mbits/s (can't remember exactly - might be 25 or even more), that is the max - so you cannot run any decode at the same time. If you then add up the bandwidth of all your streams, if it exceed that, then no you cannot do it. You really need to take off a little more bandwidth because you are now switching the blocks from encode to decode which also takes some time/bandwidth.

Memory might be OK, not sure what the requirements are there.

I *think* you are OK to have multiple encode/decode OMX components going.

I'd just try it, and see what happens.....
Soon to be unemployed software engineer currently specialising in camera drivers and frameworks, but can put mind to most embedded tasks. Got a job in N.Cambridge or surroundings? I'm interested!
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 11527
Joined: Sat Jul 30, 2011 7:41 pm
by pyke369 » Tue Oct 02, 2012 7:35 pm
All right James, thx for the quick reply.
What about audio (AAC) hw-accelerated support? Any insight on this?

Cheers,
Pierre-Yves
Posts: 15
Joined: Sat Feb 04, 2012 8:54 pm
by dom » Tue Oct 02, 2012 9:40 pm
My view is that it sounds possible. May need to be done carefully to keep up (e.g. tunnelled components and keeping the ARM off the critical path).

Theoretically the GPU can encode and decode AAC, but that requires licensing from MPLA, and is not currently supported.
For stereo AAC I'd imagine the ARM could probably handle the encode/decode, but why are you reencoding? Just remuxing sounds better.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 3993
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by pyke369 » Wed Oct 03, 2012 9:02 am
Hi Dom,

I understand of course that the ARM CPU must be kept out of the whole graph (using OMX-IL tunnels). Only the initial incoming 720p stream and the two final re-encoded 360p/480p streams would actually pass through the applicative layer running on the CPU. As for the audio, there may be cases where we would like to actually re-encode using a lower bandwidth (like 128Kbps to 96Kbps or even 64Kbps). I may be able to manage that on the CPU itself (libfaad/libfaac).

Anyways, I will give it a try (but OMX-IL is kind of a nightmare to get a grasp of) and you may expect some nice R.Pi transcoding clusters in a near future if I succeed (we are thinking of putting 48 R.Pi in a 2-3U 19" chassis, for a 1TFlops+ GCGPU total capacity!).

Thanks for your kind reply,
Pierre-Yves
Posts: 15
Joined: Sat Feb 04, 2012 8:54 pm