Cluster for HW accelerated stream transcoding


45 posts   Page 1 of 2   1, 2
by Santa77 » Wed Dec 05, 2012 2:37 pm
Hello folks,

we are working here on cluster (just bunch of raspberry pi ;) ) that are able to transcode our IPTV streams from MPEG2 in MPEGTS multicasts to H264 streams to get lower load on network and storage (archived streams).

Currently, it's not Beauty, but it's BEAST :twisted:
Image
That prototype contain 5 pcs of model B with 512MB RAM, splitted by 256 for CPU and 256 for GPU. All of them have added pasive heat sinks to main CPU chip and ethernet chip. All of them are not overclocked, but after boot is changed governor to "on demand", so max cpu freq that was setted by that governor was 950MHz.

Performance:
every board is able to transcode 2 SD channels on input to 3 different bitrate (so 6 streams at all), with cpu on 700 MHz (sometime it hops to 800 MHz). Average temperature is on 39,6 Celsius. Audio is not transcoded, it's passed form input stream directly to output stream.

Image

We are now preparing 2U case, which will contain 16 boards which will be placed into RACK, so temperatures will go down.

So for questions like "is it possible to hw accelerated transcode on RPi?" we have answer: YES IT IS.

It's realy great device that RPi. Thank's to all of RPi team for who bring to us that piece of HW.

PS: sorry for some spelling errors, english is not my native language ;)
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by dom » Wed Dec 05, 2012 2:44 pm
Are you saying you are decoding MPEG2 video (with the GPU) and encoding to H264 (with the GPU)?
You've written OpenMAX code to do this?
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by Santa77 » Wed Dec 05, 2012 2:56 pm
Yes dom, we did. We are using OpenMAX to handle decoding and encoding. For now we are using buffering methode, not tunneling, because we need some images in CPU (not every one, just 1 per second). But we are working on configuration swich that will allow to chose between tunneling mode or buffer mode. Currently only buffer mode is done. We made piece of code, that is "compatible" with libavcodec's calls.

sample:

Code: Select all
      if (my_config.input_1.hw_accel==1){
        // Decode video frame by HW
        len = hw_decoder_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);
      }else{
        // Decode video frame by SW
        len = avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);
      }
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by dom » Wed Dec 05, 2012 3:23 pm
Santa77 wrote:Yes dom, we did. We are using OpenMAX to handle decoding and encoding. For now we are using buffering methode, not tunneling, because we need some images in CPU (not every one, just 1 per second). But we are working on configuration swich that will allow to chose between tunneling mode or buffer mode. Currently only buffer mode is done. We made piece of code, that is "compatible" with libavcodec's calls.


Sounds great. Passing decoded video frames to the ARM and back will be expensive.
I'd be tempted to tunnel the video_decode to video_encode. When you want a snapshot, feed the encoded frame back through another video_decode (checking it's an IDR frame).
The other option is video_decode->video_splitter->video_encode which is tunnelled. And read occasional frames from the second port of video splitter.

(I'm not an expert on OpenMAX, so the above may or may not be good advice).

So are you in a position to produce an accelerated ffmpeg for command line transcoding? That sounds great.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by Santa77 » Wed Dec 05, 2012 3:39 pm
dom wrote:Passing decoded video frames to the ARM and back will be expensive.

yes it is, thats an reason why only 2 input SD streams are now per board.

dom wrote:(I'm not an expert on OpenMAX, so the above may or may not be good advice).

Same here. It's my first touch with OpenMAX and it's because of that project ;)

dom wrote:So are you in a position to produce an accelerated ffmpeg for command line transcoding?

There is no goal to implement it into ffmpeg. At first, libavcodec is strictly based on encode and decode, so if someone will try to implement it, he must to use buffers. And my knowledge of libavcodec is only at role of "user". Some internal processes there are for me an black magic, same like in OpenMAX ;) My goal at company was to build that transcoding machine, that will be able to transcode one stream by most optimised way to multiple streams with different bitrate.

I am planing to use there video_spliter, but to connect to outputs multiple video_encode components, to feed by buffer only one component, not 3 of them. So I will feed in final version only one video_spliter, not 3 video_encode like i am doing it now. Then i save much of memcpy and transfers between ARM and GPU, so performance will grow again. ;)
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by ghans » Thu Dec 06, 2012 7:42 am
This is great work !


ghans
• Don't like the board ? Missing features ? Change to the prosilver theme ! You can find it in your settings.
• Don't like to search the forum BEFORE posting 'cos it's useless ? Try googling : yoursearchtermshere site:raspberrypi.org
Posts: 4506
Joined: Mon Dec 12, 2011 8:30 pm
Location: Germany
by jamesh » Thu Dec 06, 2012 9:36 am
ghans wrote:This is great work !

ghans


My thoughts exactly. Anyone who can get to grips with OpenMAX deserves some plaudits!
Unemployed software engineer currently specialising in camera drivers and frameworks, but can put mind to most embedded tasks. Got a job in N.Cambridge or surroundings? I'm interested!
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 11686
Joined: Sat Jul 30, 2011 7:41 pm
by Santa77 » Thu Dec 06, 2012 8:13 pm
Update:
today we implemented

1) deinterlace switch, that will deinterlace frames produced by videodecoder before we feed encoder.
2) detection of aspect ratio changes in middle of mpeg2/mpegts streams (longterm bug in libavcodec from ffmpeg project)

Is there any reqests to functionality? Final product in 2U case for IPTV operators would be ready at middle of 1Q 2013
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by ghans » Thu Dec 06, 2012 9:39 pm
So is this transcoding from HD to HD content or down to SD ?
How many frames per second does this achieve ?


ghans
• Don't like the board ? Missing features ? Change to the prosilver theme ! You can find it in your settings.
• Don't like to search the forum BEFORE posting 'cos it's useless ? Try googling : yoursearchtermshere site:raspberrypi.org
Posts: 4506
Joined: Mon Dec 12, 2011 8:30 pm
Location: Germany
by Santa77 » Fri Dec 07, 2012 7:23 am
ghans:
currently we are working only with SD channels, so SD to SD
fps? it's hard to exactly say. currently we are able on one model B with 512MB transcode on the fly 2 SD input channels to 6 output SD channels without loosing single frame. So in teory we can speek about 6x25 frames on output per second (150 fps). On input it's 50 semi frames per channel (interleaved) so on input we can speak about 100 fps.

To decoder output is inserted image_fx to deinterlace. We would like to implement into chain OMX.broadcom.resize too, to alow for example downscale HD to SD or SD to mobile for phones and tablets targets. We already have on other project implemented MpegTS to HLS streamer so if we join that together we will be able to optimize bitrate and size of streams, and stream them to mobile devices too.
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by ghans » Fri Dec 07, 2012 7:43 am
That's impressive. Any chance of a version for "consumers" i.e. private customers ?
(To be run on a single Raspi).

ghans
• Don't like the board ? Missing features ? Change to the prosilver theme ! You can find it in your settings.
• Don't like to search the forum BEFORE posting 'cos it's useless ? Try googling : yoursearchtermshere site:raspberrypi.org
Posts: 4506
Joined: Mon Dec 12, 2011 8:30 pm
Location: Germany
by Santa77 » Fri Dec 07, 2012 7:54 am
Bussines model is not my task. But at monday on meeting I will talk about that with partner who is responsible for bussines.
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by closetgeek » Fri Dec 07, 2012 4:21 pm
Looks like a great use for these little boards. Have some questions if you would answer
- Whats the stats on the uptimes of the nodes?
- Are you having problems with throughput due to the 10/100 Ethernet?
- I'm not very familiar with video solutions on this level, so please excuse me if the answer is obvious, but are you using the system just for video conversion (converting then storing product) or is the product being streamed to consumers? If so, are the streams from each node going from the cluster to a larger server or are they streaming to clients directly?

Thanks
Posts: 16
Joined: Wed Dec 05, 2012 10:55 am
by linuxstb » Sat Dec 08, 2012 11:50 am
For those looking to do something similar to this with open code (I'm assuming this project will be closed-source), I've uploaded some proof-of-concept transcoding code to github. This is still in the early stages, but hopefully someone with some spare time can run with it and turn it into something generally useful.

See this thread:

viewtopic.php?f=70&t=25022
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by Santa77 » Mon Dec 10, 2012 12:49 pm
ghans: smaller version with less than 8 RPi will not be in sale for now.

closetgeek: it's in development phase now, so uptime is not so high, but testing node (one piece) is running constantly witouth fall from one stable version to other one (days).

ethernet throughput is not problem, beacuse whole input traffic is approx 13 Mbps (2 SD channels with highest MPEG2 rate that we have). Total network load on testing node is not more than 40 Mbps.

output from that devices are UDP Multicast, so it's multicasted directly to LAN with 40 testing settopboxes, so directly to "end users". But there is other machine (PC based server) that archive that streams for 7 days (instant recording) and other server that produces HLS streaming from that multicast streams for tablets, mobiles, and SmarTVs.

linuxstb: nice and clear code. that's nice start point for others. may be, when we sell enought units we will open part of code ;)
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by wozarib » Mon Dec 10, 2012 1:27 pm
im curious to know what you've used there to power the pi's? im guessing a usb hub but what sort as id be interested in building a similar setup for different purposes
Posts: 5
Joined: Tue Feb 21, 2012 5:02 pm
by Santa77 » Mon Dec 10, 2012 1:45 pm
Yes, we are powering Pi's via USB active HUB. We tested few of them, and that one looks stable.

Trust 7 Port USB2 Powered Hub HU-5870V

This one is good too if u need for example switches...
http://www.softcom.sk/eshop/usb-hub-7-p ... 96755.html
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by Deborah » Mon Dec 10, 2012 3:05 pm
I'm curious: when you say you're doing 2 SD decodes and 6 SD encodes, do you mean 720x576? That's 1620 macroblocks per frame, so the encodes alone (6x1620x25fps) are pretty much equivalent to a single 1080p30 stream, which is what the device is spec'd for; I'm surprised there's the capacity left to run the decoder as well!
Posts: 2
Joined: Mon Dec 10, 2012 12:32 pm
by Santa77 » Mon Dec 10, 2012 3:25 pm
Deborah: 6 output is max that is configurable. Iinput is interlaced, only 2 of outputs are SD, rest are resized down (resolutions for mobiles). Of course that small but powerfull device can not handle to encode 6 outputs in SD realtime. But using OMX tunneling with splitter and resizer, it's able to work.

May be I was not so clear in my previous post.

That's an reason why I am trying to get load from GPU in thread viewtopic.php?f=67&t=23185

May be in final we will reduce allowed outputs just to 4, depend on stress tests and stats from GPU to get stable and fully working device ;)
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by Deborah » Mon Dec 10, 2012 3:53 pm
Ah, of course; thank you for clarifying. It's a nice application for the device!
Posts: 2
Joined: Mon Dec 10, 2012 12:32 pm
by PanamaMan » Wed Dec 12, 2012 10:40 am
I'm interested in investing into one of these transcoding clusters. Please PM me.
Posts: 1
Joined: Wed Dec 12, 2012 10:37 am
by mrlinux2u » Sat Dec 22, 2012 2:10 pm
@Santa77

Hi,

Love the look of the cluster, but I'm curious about how you mounted the pi's in the case as I'm looking for something similar, would your company be willing to sell the mounts on their own?

Cheers

mrlinux2u
Posts: 169
Joined: Sat Sep 24, 2011 8:38 pm
by Santa77 » Sat Dec 22, 2012 2:16 pm
@mrlinux2u
that was an inspiration

http://www.thingiverse.com/thing:30563

we made new holders based on that. But that base could help you I hope.;)

BTW: Merry christmas to all
Posts: 32
Joined: Wed Oct 31, 2012 2:08 pm
Location: Slovakia
by mrlinux2u » Sat Dec 22, 2012 2:20 pm
thanks very much for the quick reply (and link), much appreciated :)

And a happy xmas to you too.
Posts: 169
Joined: Sat Sep 24, 2011 8:38 pm
by bluemoon » Sun Jan 13, 2013 3:09 pm
This sounds very interesting ! :D
Posts: 1
Joined: Sun Jan 13, 2013 2:52 pm