MPEG-2 Decoding


231 posts   Page 2 of 10   1, 2, 3, 4, 5 ... 10
by jacksonliam » Wed May 30, 2012 6:00 pm
jamesh wrote:
jacksonliam wrote:
lenod wrote:I think decoding live mpeg without overclocking and reasonable quality is not really possible.
Have you tried transcoding such a video in xvid or even h264? I would be very interested to know how long it takes. I'll try it when I'll have some spare time.

If it is a reasonable time, then we could estimate the transcoding remaining time, wait long enough and start omxplayer on the partially transcoded file. I assume it would almost not use the cpu (except for audio?), letting this ressource for transcoding.

Another possibility would be to program recording on the pi using a DVB device*, and transcode the recording once finished or during the night (using crontab or something) while it has nothing else to do.

* Has someone managed to do this? Mine isn't recognised.

There's a thread on dvb-t woes which includes a link to an image with dvb support.

Transcoding to h264 is a no, there's some research into doing it quickly but I don't think it will work for us. I think just playing back the sound could use 30 per cent of the cpu.

What I'm trying to do is use openly gl to do the last step of decoding (colour space conversion) which is about 40 per cent of the decoding process.

It might be possible to do motion compensation in opengl too, which is a further 30 per cent saving in cpu use.

This is based on papers I've read. I read a paper where someone claimed to do mpeg2 decoding almost fully in an opengl es shader, but there was no method or code examples in any of the papers I've read.

Someone in a forum said they got 6 mpeg2 streams at the same time on the ipad 2 gpu. There's a little iphone code for colour space conversion and some opengl examples but nothing too useful!

im just learning opengl now but its a long process before i get a good working mpeg2 decoder. im confident it can be done though!


There's no encode support on the GPU at the moment (disabled), which limits transcoding. TBH, a licence pack that included encode would probably also include MPEG2 anyway.

The Raspi GPU could decode 6 streams of SD MPEG2 I reckon. It can certainly very easily do 1080p, which is over 6 times the data of SD video.

Ah yeah, I never thought about encoding with the gpu and decoding at the same time. I didn't (don't?) think it would be possible. Won't the camera add on need gpu encoding? Anyway I think the best that would offer would be skipping colour space conversion, you'd probably have to decode most of the mpeg2 on the cpu anyway!

I'm confident that I'll eventually get watchable playback using open gl es to do as much as possible!
It will just take time, if someone else with better skills than me decides to do it I will be happy as it's not easy, im more of a throw other people's code together and use a framework to do the least amount of coding from scratch as possible kind of guy :-D

And I really have to write from scratch, there's a lot of pi specific things that need doing and I need to research which methods of sending the textures to opengl are the least cpu intensive in this particular opengl es implementation!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by MartenR » Thu May 31, 2012 6:28 am
Well, I am currently porting vomp (which is a vdr client), to rasp berry and there mpeg2 is a must for dvb transmission.(git.vomp.tv) Currently my plan is to use ffmpeg do yuv conversion.

If then the mpeg performance is as bad as with mplayer, I will configure ffmpeg to output Xvmc data and then first handle motion compensation in GPU shaders and if this is not enough handle also the DCT as GPU shader (and of course audio FFT and mdct).
It will be a long way and it will be tricky, but this is my plan...

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by Paul Webster » Thu May 31, 2012 7:27 am
2 of the people closely involved with RPi (Dom and Gert) have recently made very positive statements on this forum about getting more codecs licensed on the GPU (requiring those that want it to pay a bit more and associate license with serial number).
I assume that there would then be a new version of omxplayer to make the most of it and that this will satisfy a lot of people who want to play pre-existing MPEG2 content.
It won't satisfy everyone but will be a great improvement for many.

If someone can find a way to do this with GPU acceleration then that could be great.

Is the approach taken by OpenGL significantly more efficient that the handling in mplayer (and similar tools)?
Does the RPi port of it make use of Broadcom GPU to offload a large portion of the work?
User avatar
Posts: 430
Joined: Sat Jul 30, 2011 4:49 am
Location: London, UK
by MartenR » Thu May 31, 2012 7:37 am
Paul Webster wrote:If someone can find a way to do this with GPU acceleration then that could be great.

Is the approach taken by OpenGL significantly more efficient that the handling in mplayer (and similar tools)?


Sure, it will be more efficient, I did not find any code in mplayer that offloads DCT and motion compensation to GPU, because this is easyly handled efficient by mmx or sse, but they are avaliable on our pi.
Paul Webster wrote:Does the RPi port of it make use of Broadcom GPU to offload a large portion of the work?

If you mean the vomp port, not now, I will start with it once h264 omx playback works and ffmpeg, but it will take a long time to implement.

If you mean mplayer, I do not think that any part is offloaded in the moment connected to decoding, since it does not use opengl es.

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by jacksonliam » Thu May 31, 2012 7:40 am
MartenR wrote:Well, I am currently porting vomp (which is a vdr client), to rasp berry and there mpeg2 is a must for dvb transmission.(git.vomp.tv) Currently my plan is to use ffmpeg do yuv conversion.

If then the mpeg performance is as bad as with mplayer, I will configure ffmpeg to output Xvmc data and then first handle motion compensation in GPU shaders and if this is not enough handle also the DCT as GPU shader (and of course audio FFT and mdct).
It will be a long way and it will be tricky, but this is my plan...

Marten

I've tried xvmc with ffmpeg and it's as bad as mplayer. Xvmc isn't hardware accelerated and even when X will be I don't know if it will make this any faster and no one knows how long it will take to get hw acceleration for X.

If you work on the motion compensation and get results that would be great as if I get colour space conversion working we may be able to get a decent framerate between us!

Paul Webster wrote:2 of the people closely involved with RPi (Dom and Gert) have recently made very positive statements on this forum about getting more codecs licensed on the GPU (requiring those that want it to pay a bit more and associate license with serial number).
I assume that there would then be a new version of omxplayer to make the most of it and that this will satisfy a lot of people who want to play pre-existing MPEG2 content.
It won't satisfy everyone but will be a great improvement for many.

If someone can find a way to do this with GPU acceleration then that could be great.

Is the approach taken by OpenGL significantly more efficient that the handling in mplayer (and similar tools)?
Does the RPi port of it make use of Broadcom GPU to offload a large portion of the work?
opengl is entirely gpu, except for transferring data from cpu memory to gpu memory, which is fairly intensive!

Mplayer doesn't get smooth playback no matter what vo options you use, so we could try to change mplayer but I'm using libavcodec (ffmpeg) and trying to speed it up with opengl which based on papers I've read should be 40% faster.
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by MartenR » Thu May 31, 2012 7:55 am
jacksonliam wrote:I've tried xvmc with ffmpeg and it's as bad as mplayer. Xvmc isn't hardware accelerated and even when X will be I don't know if it will make this any faster and no one knows how long it will take to get hw acceleration for X.

If you work on the motion compensation and get results that would be great as if I get colour space conversion working we may be able to get a decent framerate between us!
.


Well I meant using the xvmc data as input to our own shaders, it is an output format with all data needed to do DCT and/or motion compensation externaly. So that we write our own xvmc code.

For the color space conversion, this is really easily just a matrix multiplication in glsl.

jacksonliam wrote:opengl is entirely gpu, except for transferring data from cpu memory to gpu memory, which is fairly intensive!

Mplayer doesn't get smooth playback no matter what vo options you use, so we could try to change mplayer but I'm using libavcodec (ffmpeg) and trying to speed it up with opengl which based on papers I've read should be 40% faster.

opengl is not entirely gpu, especially not on the raspi, it has only opengl es.
You have to modifiy the programms, otherwise it is probably using the mesa emulation of opengl.

I would leave ffmpeg alone and just use the xvmc output, then you can offload all stuff (yuv conversion dct and motion compensation) to gpu without changing ffmpeg.

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by jacksonliam » Thu May 31, 2012 11:05 am
Yeah I meat opengl es :-) the programmable pipeline is entirely gpu though isn't it?

Libavcodec gives a frame of yuv data (AVframe) which should be easy to pass to opengl es - either packed or as seperate y u and v textures, though probably using square textures. and then convert to rgb and display full screen, i think that should give close to 25 fps, because IIRC that's what people have done for the iphone.
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by MartenR » Thu May 31, 2012 11:21 am
jacksonliam wrote:Yeah I meat opengl es :-) the programmable pipeline is entirely gpu though isn't it?

Libavcodec gives a frame of yuv data (AVframe) which should be easy to pass to opengl es - either packed or as seperate y u and v textures, though probably using square textures. and then convert to rgb and display full screen, i think that should give close to 25 fps, because IIRC that's what people have done for the iphone.

Yep the pipeline is complete gpu and you do not need square textures on the pi.
I would use separate textures, then the repacking can also be done in gpu.
If this is already enough it will be great, other wise libavcodec can also give xvmc frames for doing motion compensation.
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by AndrewS » Thu May 31, 2012 11:43 am
I'm afraid I don't have anything technical to add, but it's fascinating to see the way people are trying to squeeze more performance out of the limited RPi hardware by using novel tricks and techniques. Seems to fit the foundation's goals of "learning by experimenting" quite well :)
I guess in a round-a-bout way this is kinda similar to the very early days of GPGPU?
User avatar
Posts: 3626
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
by jacksonliam » Thu May 31, 2012 11:51 am
Cool, nice to know square textures aren't needed, the only problem is the call to send textures glTexImage2d can be slow (couple of ms) and this is cpu time. Calling this three times might mean we don't make 25fps. May be possible to use gltexsubimage2d and Pack data into a single texture then seperate in the shader, im not sure if that would be quicker. It's something I'm playing with currently (well actually, still trying to get a single frame to display, but I'm learning from scratch)

Would be good i could pipeline this and have a couple of frames buffered to the gpu, in case the cpu gets held up for a ms or two but i can't think of a good way to do that.
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by MartenR » Thu May 31, 2012 12:34 pm
jacksonliam wrote:Cool, nice to know square textures aren't needed, the only problem is the call to send textures glTexImage2d can be slow (couple of ms) and this is cpu time. Calling this three times might mean we don't make 25fps. May be possible to use gltexsubimage2d and Pack data into a single texture then seperate in the shader, im not sure if that would be quicker. It's something I'm playing with currently (well actually, still trying to get a single frame to display, but I'm learning from scratch)

Would be good i could pipeline this and have a couple of frames buffered to the gpu, in case the cpu gets held up for a ms or two but i can't think of a good way to do that.

I see there two ways, do the texture upload and decoding on a separate thread (you need a second egl context with shared resources) then the rendering on another thread, commonly done for texture loading or maybe load the data into an egl element like an eglimage and use this as texture (also on separate thread).
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by yakovlevtx » Sat Jun 02, 2012 12:21 am
I find it very unfortunate that this is ending up where it is.

Without a license pack, and with a real need to decode MPEG-2 (particularly in the US), the next best choice is for some of us to re-invent the wheel and implement an MPEG-2 decoder in pure OpenGL. This is bad on many levels, and good on one or two.

The good is that it gets a pure OpenGL ES decoder out there. This is useful for many projects besides this one. Unfortunately, it wastes valuable developer time re-inventing the wheel where there is already a better implementation written by someone else that users just need to be allowed to pay money for. I would really rather pay $10 than having myself or others spend weeks or months implementing an MPEG-2 decoder for the Pi.

However, this is something I might be willing to help with, if I can find enough information to know how to help. It sounds like some people here know more about MPEG-2 than I do, and are better equipped to work on this kind of function.
Posts: 1
Joined: Sat Jun 02, 2012 12:07 am
by jacksonliam » Sun Jun 03, 2012 10:53 am
Here are some benchmarks of the pi decoding mpeg2 with libavcodec. This is a really simple player I wrote of just grabbing the video stream and decoding a frame with avcodec_decode_video. It doesn't do anything with the frame once decoded and doesn't do anything with the audio. I've tried over the network or locally - no difference seen in numbers there.

This is a recording from the UK Channel BBC 1. Im not sure whats going on with the first value, must be something with my code or perhaps a dodgy stream.
Code: Select all
    Stream #0.0[0x30]: Video: mpeg2video, yuv420p, 720x576 [PAR 64:45 DAR 16:9], 15000 kb/s, 25 tbr, 90k tbn, 50 tbc
    Stream #0.1[0x40](eng): Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
    Stream #0.2[0x41](eng): Audio: mp2, 48000 Hz, mono, s16, 64 kb/s
    Stream #0.3[0x50](eng): Subtitle: dvbsub
 123 frames rendered in 2.0312 seconds -> FPS=60.5565
  51 frames rendered in 2.0146 seconds -> FPS=25.3154
  45 frames rendered in 2.0418 seconds -> FPS=22.0392
  45 frames rendered in 2.0134 seconds -> FPS=22.3503
  45 frames rendered in 2.0161 seconds -> FPS=22.3199
  53 frames rendered in 2.0118 seconds -> FPS=26.3447
  47 frames rendered in 2.0100 seconds -> FPS=23.3836
  45 frames rendered in 2.0117 seconds -> FPS=22.3686
  40 frames rendered in 2.0172 seconds -> FPS=19.8299
  37 frames rendered in 2.0070 seconds -> FPS=18.4356
  39 frames rendered in 2.0169 seconds -> FPS=19.3365
  37 frames rendered in 2.0336 seconds -> FPS=18.1940
  37 frames rendered in 2.0243 seconds -> FPS=18.2778
  36 frames rendered in 2.0357 seconds -> FPS=17.6843
  36 frames rendered in 2.0477 seconds -> FPS=17.5803
  33 frames rendered in 2.0641 seconds -> FPS=15.9875
  30 frames rendered in 2.0404 seconds -> FPS=14.7031
  33 frames rendered in 2.0451 seconds -> FPS=16.1359
  35 frames rendered in 2.0293 seconds -> FPS=17.2472
  35 frames rendered in 2.0514 seconds -> FPS=17.0611
  33 frames rendered in 2.0347 seconds -> FPS=16.2187
  35 frames rendered in 2.0244 seconds -> FPS=17.2891
  33 frames rendered in 2.0180 seconds -> FPS=16.3524
  40 frames rendered in 2.0209 seconds -> FPS=19.7929
  40 frames rendered in 2.0258 seconds -> FPS=19.7451
  33 frames rendered in 2.0385 seconds -> FPS=16.1883
  32 frames rendered in 2.0605 seconds -> FPS=15.5303
  31 frames rendered in 2.0595 seconds -> FPS=15.0519
  31 frames rendered in 2.0373 seconds -> FPS=15.2160
  30 frames rendered in 2.0564 seconds -> FPS=14.5883
  31 frames rendered in 2.0266 seconds -> FPS=15.2963
  32 frames rendered in 2.0372 seconds -> FPS=15.7078
  33 frames rendered in 2.0499 seconds -> FPS=16.0985
  33 frames rendered in 2.0195 seconds -> FPS=16.3410
  37 frames rendered in 2.0323 seconds -> FPS=18.2056
  37 frames rendered in 2.0056 seconds -> FPS=18.4480
  34 frames rendered in 2.0125 seconds -> FPS=16.8945
  42 frames rendered in 2.0006 seconds -> FPS=20.9936
  40 frames rendered in 2.0241 seconds -> FPS=19.7622
  40 frames rendered in 2.0265 seconds -> FPS=19.7383
  36 frames rendered in 2.0498 seconds -> FPS=17.5625
  31 frames rendered in 2.0548 seconds -> FPS=15.0864
  29 frames rendered in 2.0062 seconds -> FPS=14.4552
  33 frames rendered in 2.0045 seconds -> FPS=16.4630
  35 frames rendered in 2.0336 seconds -> FPS=17.2108
  32 frames rendered in 2.0084 seconds -> FPS=15.9332
  37 frames rendered in 2.0483 seconds -> FPS=18.0638
  33 frames rendered in 2.0087 seconds -> FPS=16.4286
  33 frames rendered in 2.0518 seconds -> FPS=16.0838
  32 frames rendered in 2.0214 seconds -> FPS=15.8304
  39 frames rendered in 2.0376 seconds -> FPS=19.1403
  43 frames rendered in 2.0360 seconds -> FPS=21.1194
  42 frames rendered in 2.0539 seconds -> FPS=20.4490
  45 frames rendered in 2.0315 seconds -> FPS=22.1507
  46 frames rendered in 2.0142 seconds -> FPS=22.8373
  43 frames rendered in 2.0072 seconds -> FPS=21.4233
  48 frames rendered in 2.0257 seconds -> FPS=23.6959
  41 frames rendered in 2.0135 seconds -> FPS=20.3628
  43 frames rendered in 2.0381 seconds -> FPS=21.0976
  41 frames rendered in 2.0178 seconds -> FPS=20.3194
  42 frames rendered in 2.0166 seconds -> FPS=20.8268
  43 frames rendered in 2.0371 seconds -> FPS=21.1086
  39 frames rendered in 2.0438 seconds -> FPS=19.0817
  37 frames rendered in 2.0180 seconds -> FPS=18.3349
  39 frames rendered in 2.0473 seconds -> FPS=19.0497
  39 frames rendered in 2.0071 seconds -> FPS=19.4313
  42 frames rendered in 2.0416 seconds -> FPS=20.5719
  38 frames rendered in 2.0474 seconds -> FPS=18.5604
  36 frames rendered in 2.0025 seconds -> FPS=17.9774
  36 frames rendered in 2.0311 seconds -> FPS=17.7241
  38 frames rendered in 2.0505 seconds -> FPS=18.5319
  40 frames rendered in 2.0387 seconds -> FPS=19.6205
  44 frames rendered in 2.0269 seconds -> FPS=21.7082
  44 frames rendered in 2.0057 seconds -> FPS=21.9378
  47 frames rendered in 2.0004 seconds -> FPS=23.4955
  47 frames rendered in 2.0312 seconds -> FPS=23.1396
  47 frames rendered in 2.0122 seconds -> FPS=23.3573
  46 frames rendered in 2.0281 seconds -> FPS=22.6812
  46 frames rendered in 2.0124 seconds -> FPS=22.8578
  46 frames rendered in 2.0287 seconds -> FPS=22.6748
  45 frames rendered in 2.0050 seconds -> FPS=22.4437
  45 frames rendered in 2.0130 seconds -> FPS=22.3549
  42 frames rendered in 2.0165 seconds -> FPS=20.8285
  40 frames rendered in 2.0012 seconds -> FPS=19.9884
  45 frames rendered in 2.0398 seconds -> FPS=22.0614
  46 frames rendered in 2.0237 seconds -> FPS=22.7310
  41 frames rendered in 2.0096 seconds -> FPS=20.4016
  41 frames rendered in 2.0107 seconds -> FPS=20.3912
  44 frames rendered in 2.0096 seconds -> FPS=21.8953
  43 frames rendered in 2.0032 seconds -> FPS=21.4659
  45 frames rendered in 2.0083 seconds -> FPS=22.4068
  47 frames rendered in 2.0072 seconds -> FPS=23.4152
  48 frames rendered in 2.0357 seconds -> FPS=23.5790
  49 frames rendered in 2.0063 seconds -> FPS=24.4230
  49 frames rendered in 2.0308 seconds -> FPS=24.1289
  46 frames rendered in 2.0356 seconds -> FPS=22.5983
  47 frames rendered in 2.0453 seconds -> FPS=22.9791
  42 frames rendered in 2.0085 seconds -> FPS=20.9116
  50 frames rendered in 2.0370 seconds -> FPS=24.5462
  42 frames rendered in 2.0562 seconds -> FPS=20.4263
  38 frames rendered in 2.0002 seconds -> FPS=18.9981
  39 frames rendered in 2.0223 seconds -> FPS=19.2853
  37 frames rendered in 2.0436 seconds -> FPS=18.1053
  40 frames rendered in 2.0429 seconds -> FPS=19.5799
  38 frames rendered in 2.0200 seconds -> FPS=18.8118
  37 frames rendered in 2.0589 seconds -> FPS=17.9711
  41 frames rendered in 2.0354 seconds -> FPS=20.1432
  41 frames rendered in 2.0395 seconds -> FPS=20.1029
  39 frames rendered in 2.0186 seconds -> FPS=19.3205
  35 frames rendered in 2.0049 seconds -> FPS=17.4573
  41 frames rendered in 2.0215 seconds -> FPS=20.2817
  40 frames rendered in 2.0214 seconds -> FPS=19.7883
  41 frames rendered in 2.0048 seconds -> FPS=20.4508
  40 frames rendered in 2.0387 seconds -> FPS=19.6199
  43 frames rendered in 2.0479 seconds -> FPS=20.9971

Pretty much repeats these values in a random order, sticks close to 21FPS
Here's a video the best mplayer does on the above clip, using frame dropping and lowres http://www.youtube.com/watch?v=5BDZobLzsSA

This is a short rip from MTV
Code: Select all
  Duration: 00:00:16.53, start: 2582.918122, bitrate: 3065 kb/s
    Stream #0.0[0x1e0]: Video: mpeg2video, yuv420p, 720x576 [PAR 16:15 DAR 4:3], 15000 kb/s, 25 tbr, 90k tbn, 50 tbc
    Stream #0.1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16, 192 kb/s
[mpeg @ 0xd6d560]invalid dts/pts combination
  50 frames rendered in 2.0121 seconds -> FPS=24.8501
  48 frames rendered in 2.0404 seconds -> FPS=23.5249
  37 frames rendered in 2.0114 seconds -> FPS=18.3950
  41 frames rendered in 2.0321 seconds -> FPS=20.1762
  51 frames rendered in 2.0363 seconds -> FPS=25.0460
  47 frames rendered in 2.0340 seconds -> FPS=23.1069
  38 frames rendered in 2.0249 seconds -> FPS=18.7663
  45 frames rendered in 2.0324 seconds -> FPS=22.1409
  41 frames rendered in 2.0252 seconds -> FPS=20.2448
  49 frames rendered in 2.0154 seconds -> FPS=24.3125
  47 frames rendered in 2.0167 seconds -> FPS=23.3055
  50 frames rendered in 2.0125 seconds -> FPS=24.8443
  36 frames rendered in 2.0273 seconds -> FPS=17.7573
  47 frames rendered in 2.0256 seconds -> FPS=23.2031
  41 frames rendered in 2.0261 seconds -> FPS=20.2354
  40 frames rendered in 2.0093 seconds -> FPS=19.9076
  41 frames rendered in 2.0064 seconds -> FPS=20.4346
  48 frames rendered in 2.0318 seconds -> FPS=23.6249
  46 frames rendered in 2.0590 seconds -> FPS=22.3404
  43 frames rendered in 2.0284 seconds -> FPS=21.1986
  56 frames rendered in 2.0212 seconds -> FPS=27.7068
  54 frames rendered in 2.0035 seconds -> FPS=26.9523
  47 frames rendered in 2.0339 seconds -> FPS=23.1081

Thats it, its a short clip.

So yeah, Now I'm not sure now if just color space conversion will be enough, though possibly with a bit of frame dropping...
I might try to see what libmpeg2 does over libavcodec, Its supposed to be faster - but its last update was 2008 so I don't know if that's changed.
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by NisseDILLIGAF » Mon Jun 04, 2012 9:06 am
Nice work Liam..!
Good to see progress on getting mpeg2 to work... maybe soon we can watch live TV on our Raspberry Pi!! :D

Please let us know if there's anything we can help with...!
/Nisse
Posts: 7
Joined: Tue Jan 17, 2012 12:24 pm
by jacksonliam » Mon Jun 04, 2012 3:59 pm
libmpeg2 seems to be actually a little bit easier to use than libavcodec. For writing my benchmark anyway, it might be more difficult when it comes to sound.

Heres the BBC video again, for as long as I could be bothered to wait
Code: Select all
  58 frames rendered in 2.0279 seconds -> FPS=28.6010
  50 frames rendered in 2.0257 seconds -> FPS=24.6829
  40 frames rendered in 2.0146 seconds -> FPS=19.8547
  38 frames rendered in 2.0360 seconds -> FPS=18.6636
  40 frames rendered in 2.0308 seconds -> FPS=19.6968
  35 frames rendered in 2.0170 seconds -> FPS=17.3524
  36 frames rendered in 2.0122 seconds -> FPS=17.8912
  45 frames rendered in 2.0394 seconds -> FPS=22.0651
  37 frames rendered in 2.0096 seconds -> FPS=18.4120
  38 frames rendered in 2.0378 seconds -> FPS=18.6472
  47 frames rendered in 2.0255 seconds -> FPS=23.2039
  52 frames rendered in 2.0010 seconds -> FPS=25.9874
  47 frames rendered in 2.0001 seconds -> FPS=23.4990
  47 frames rendered in 2.0246 seconds -> FPS=23.2140
  50 frames rendered in 2.0326 seconds -> FPS=24.5993
  53 frames rendered in 2.0221 seconds -> FPS=26.2110
  52 frames rendered in 2.0103 seconds -> FPS=25.8673
  52 frames rendered in 2.0245 seconds -> FPS=25.6856
  54 frames rendered in 2.0125 seconds -> FPS=26.8317
  53 frames rendered in 2.0087 seconds -> FPS=26.3857
  46 frames rendered in 2.0265 seconds -> FPS=22.6989
  48 frames rendered in 2.0388 seconds -> FPS=23.5432
  49 frames rendered in 2.0268 seconds -> FPS=24.1755
  51 frames rendered in 2.0367 seconds -> FPS=25.0401
  47 frames rendered in 2.0316 seconds -> FPS=23.1346
  51 frames rendered in 2.0059 seconds -> FPS=25.4248
  60 frames rendered in 2.0172 seconds -> FPS=29.7437
  55 frames rendered in 2.0135 seconds -> FPS=27.3153
  55 frames rendered in 2.0298 seconds -> FPS=27.0968
  57 frames rendered in 2.0217 seconds -> FPS=28.1947
  46 frames rendered in 2.0356 seconds -> FPS=22.5981
  48 frames rendered in 2.0046 seconds -> FPS=23.9451
  44 frames rendered in 2.0145 seconds -> FPS=21.8412
  46 frames rendered in 2.0098 seconds -> FPS=22.8882
  52 frames rendered in 2.0225 seconds -> FPS=25.7106
  33 frames rendered in 2.0304 seconds -> FPS=16.2528
  48 frames rendered in 2.0085 seconds -> FPS=23.8986
  63 frames rendered in 2.0031 seconds -> FPS=31.4518
  56 frames rendered in 2.0011 seconds -> FPS=27.9850
  71 frames rendered in 2.0026 seconds -> FPS=35.4535
  69 frames rendered in 2.0148 seconds -> FPS=34.2464
  49 frames rendered in 2.0339 seconds -> FPS=24.0911
  66 frames rendered in 2.0138 seconds -> FPS=32.7741
  84 frames rendered in 2.0304 seconds -> FPS=41.3703
  47 frames rendered in 2.0500 seconds -> FPS=22.9267
  69 frames rendered in 2.0081 seconds -> FPS=34.3614


A bit better numbers, but not spectacular.


And this is the MTV video again, this one had much better performance under libmpeg2 than libavcodec
Code: Select all
   1 frames rendered in 1863182720.0000 seconds -> FPS=0.0000
  59 frames rendered in 2.0183 seconds -> FPS=29.2332
  62 frames rendered in 2.0143 seconds -> FPS=30.7801
  65 frames rendered in 2.0182 seconds -> FPS=32.2062
  62 frames rendered in 2.0135 seconds -> FPS=30.7920
  58 frames rendered in 2.0118 seconds -> FPS=28.8292
  72 frames rendered in 2.0037 seconds -> FPS=35.9338


Now I don't know the best move forward, because if I write either of these with openGL color space conversion and spend a lot of time getting the sound in sync and stuff then it probably won't get near a solid 25fps.

I think hardware accelerated XvMC would be useful but I don't think we'll get it soon, if ever? So motion compensation and yuv-rgb conversion needs to be done in openGL ES I think, I don't know enough to write motion compensation.

MartenR seems to have a very good insight into this, I wonder if he is working on anything? Or has a direction he thinks I should go!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by dom » Mon Jun 04, 2012 4:08 pm
YUV420 is natively supported by the GPU using dispmanx, so if you just want to display the video on the display, you can just add a YUV420 layer, so you may not have to factor in the YUV->RGB conversion.
However if you want video to play inside an X window, you may still need the conversion.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4043
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by jacksonliam » Mon Jun 04, 2012 5:25 pm
dom wrote:YUV420 is natively supported by the GPU using dispmanx, so if you just want to display the video on the display, you can just add a YUV420 layer, so you may not have to factor in the YUV->RGB conversion.
However if you want video to play inside an X window, you may still need the conversion.

Thats a nice and useful snippet :D thanks! dispmanx should do scaling nicely too :D

Though if we need to use openGL ES for motion compensation we might need to use openGL ES to do all that anyway unless openGL->dispmanx is completely GPU? Yuv conversion should be easy in openGL ES anyway (or so people keep telling me).
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by MartenR » Mon Jun 04, 2012 6:24 pm
MartenR seems to have a very good insight into this, I wonder if he is working on anything? Or has a direction he thinks I should go!

I am working on a port of vomp (see my raspberry git at git.vomp.tv). Since only yuv conversion in opengl es is insufficient I will probably start implementing motion compensation next weekend.
I have already a plan how to do this in terms of shaders.
(Note I will put the stuff directly into vomp, if anything useful evolves I will post my results here, so that someone can extract the stuff from it in order to put into other programs),

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by bbb » Wed Jun 13, 2012 1:30 pm
MartenR wrote:
MartenR seems to have a very good insight into this, I wonder if he is working on anything? Or has a direction he thinks I should go!

I am working on a port of vomp (see my raspberry git at git.vomp.tv). Since only yuv conversion in opengl es is insufficient I will probably start implementing motion compensation next weekend.
I have already a plan how to do this in terms of shaders.
(Note I will put the stuff directly into vomp, if anything useful evolves I will post my results here, so that someone can extract the stuff from it in order to put into other programs),

Marten


Any updates on this Marten and what implementation (libmpeg2, libav, ffmpeg) are you tweaking ?

I have worked on optimizing MP3 and eAAC+ decoders specifically for ARM11 - so I might be able to help get MPEG2 decoding working better.

Also does any know if the issue is not enough memory bandwidth or just not enough CPU cycles to actually get everything done ? Has anyone done profiling to see what is taking the most time ?
Posts: 51
Joined: Sat Jun 02, 2012 9:52 am
by jacksonliam » Wed Jun 13, 2012 1:46 pm
bbb wrote:
Any updates on this Marten and what implementation (libmpeg2, libav, ffmpeg) are you tweaking ?

I have worked on optimizing MP3 and eAAC+ decoders specifically for ARM11 - so I might be able to help get MPEG2 decoding working better.

Also does any know if the issue is not enough memory bandwidth or just not enough CPU cycles to actually get everything done ? Has anyone done profiling to see what is taking the most time ?

Ive done simple benchmarking and monitoring 'top' seems to show cpu at 99%. libmpeg2 seems to have some arm optimisation and does seem slightly faster, but libavcodec is easier to code.

I think martin won't be starting till this weekend, based on what he said.
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by bbb » Wed Jun 13, 2012 2:27 pm
jacksonliam wrote: Ive done simple benchmarking and monitoring 'top' seems to show cpu at 99%. libmpeg2 seems to have some arm optimisation and does seem slightly faster, but libavcodec is easier to code., ...


I am not completely sure on this and need to check how Linux reports CPU usage - but I think processes that are just pushing data around in main memory and not getting a lot of processing done due to wait states will still show as high CPU usage ..

Yep from my experience libmpeg2 is usually the fastest - managed to get mythtv running on some pretty low-end hardware (old Intel PII) before by tweaking some of the compile options for libmpeg2 :)

jacksonliam wrote:.....
What I'm trying to do is use openly gl to do the last step of decoding (colour space conversion) which is about 40 per cent of the decoding process.

It might be possible to do motion compensation in opengl too, which is a further 30 per cent saving in cpu use.
....


Opps sorry missed this info first time I read through the thread :) This kind-of answer my question of what is taking the time. Is this a educated guess based on research or did you do some sort of measurements (e.g hooks in the code or GNU profiler) to get the 40% and 30% values ? Sounds very promising if you can get these 2 stages offloaded to the GPU using Open ES.

I will take a look at the libavcodec + libmpeg2 ARM optimizations and the audio decoding used for DVB-T/S, from what I remember its MPEG 1 Audio layer II or AC3
Posts: 51
Joined: Sat Jun 02, 2012 9:52 am
by MartenR » Wed Jun 13, 2012 2:53 pm
The color space conversion has to my experience no large percentage to the decoding.
At weekend I manage to tweak libavcodec, so that it output the xvmc pixfmt, that means I can effective turn off the decoding steps of idct and motioncompensation in libavcodec and push the frmaes in xvmc pixfmt to the shaders.

I could benchmark how many frames are processed by libavcodec depending on if I switched on or off DCT or mocomp (Yuv conversion is done by opengl, so it does not contribute).

Full decoding done by CPU: 12-13 FPS (full quality DCT)
CPU decoding without motion compensation: 22- 33 FPS (22 fps is startup probably some hickups in the buffers).
CPU decoding without motioncompensation and no IDCT: 30-50 FPS (propably limited by VSYNC).

Next week, I will start writing the mocomp shader, since I know get the necessary data out of libavcodec.

I doubt that memory bandwidth limitations are causing problems, since this also applies to the hardware decoders of the pi, they share the same bus. I think it is pure cpu. I used libavcodec from the squeeze debian multimedia stable packets, since they are bit more recent than the debian squeze ones. I looked at the code of the arm assembler optimizations, there were not so many for the older arm design like the raspberry pi, more for the neon supporting cpus.
For me the way to go is the GPU, I do not know, but the SIMD ops of the arm of the pi are really small compared to mmx, I think only 32 bit instead of 128 bit, so I doubt that big performance jumps can happen there.



Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by jacksonliam » Wed Jun 13, 2012 4:01 pm
bbb wrote:
jacksonliam wrote: Ive done simple benchmarking and monitoring 'top' seems to show cpu at 99%. libmpeg2 seems to have some arm optimisation and does seem slightly faster, but libavcodec is easier to code., ...


I am not completely sure on this and need to check how Linux reports CPU usage - but I think processes that are just pushing data around in main memory and not getting a lot of processing done due to wait states will still show as high CPU usage ..

Yep from my experience libmpeg2 is usually the fastest - managed to get mythtv running on some pretty low-end hardware (old Intel PII) before by tweaking some of the compile options for libmpeg2 :)

jacksonliam wrote:.....
What I'm trying to do is use openly gl to do the last step of decoding (colour space conversion) which is about 40 per cent of the decoding process.

It might be possible to do motion compensation in opengl too, which is a further 30 per cent saving in cpu use.
....


Opps sorry missed this info first time I read through the thread :) This kind-of answer my question of what is taking the time. Is this a educated guess based on research or did you do some sort of measurements (e.g hooks in the code or GNU profiler) to get the 40% and 30% values ? Sounds very promising if you can get these 2 stages offloaded to the GPU using Open ES.

I will take a look at the libavcodec + libmpeg2 ARM optimizations and the audio decoding used for DVB-T/S, from what I remember its MPEG 1 Audio layer II or AC3

My numbers are from papers I've read, most I found on IEEE Xplore, a few from google and EBSCO. I'm not doing it any more, but MartinR is for his project (as he says below) which IIRC open source. I'd like to use his code in other things, including a basic media player.

Libavcodec is newer (still maintained) and seems to handle broken streams better (like TV streams), along with being easier to use, so I think effort is being based around that!

Some fast audio playback would be good, there's not going to be many cpu cycles left after 25fps video is decoded! All my samples are mpeg audio layer 2 but that's not to say AC3 isn't being used somewhere!

MartenR wrote:The color space conversion has to my experience no large percentage to the decoding.
At weekend I manage to tweak libavcodec, so that it output the xvmc pixfmt, that means I can effective turn off the decoding steps of idct and motioncompensation in libavcodec and push the frmaes in xvmc pixfmt to the shaders.

I could benchmark how many frames are processed by libavcodec depending on if I switched on or off DCT or mocomp (Yuv conversion is done by opengl, so it does not contribute).

Full decoding done by CPU: 12-13 FPS (full quality DCT)
CPU decoding without motion compensation: 22- 33 FPS (22 fps is startup probably some hickups in the buffers).
CPU decoding without motioncompensation and no IDCT: 30-50 FPS (propably limited by VSYNC).

Next week, I will start writing the mocomp shader, since I know get the necessary data out of libavcodec.

I doubt that memory bandwidth limitations are causing problems, since this also applies to the hardware decoders of the pi, they share the same bus. I think it is pure cpu. I used libavcodec from the squeeze debian multimedia stable packets, since they are bit more recent than the debian squeze ones. I looked at the code of the arm assembler optimizations, there were not so many for the older arm design like the raspberry pi, more for the neon supporting cpus.
For me the way to go is the GPU, I do not know, but the SIMD ops of the arm of the pi are really small compared to mmx, I think only 32 bit instead of 128 bit, so I doubt that big performance jumps can happen there.



Marten

Are those numbers on Pi hardware?
Your CPU decoding without motion compensation benchmarks are similar to mine which does decoding to YUV420P frames with libavcodec. mine are posted in this thread, I thought taking out motion compensation would yield a lot better results than that! Are you using pretty stressing video?
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by MartenR » Wed Jun 13, 2012 4:16 pm
Are those numbers on Pi hardware?
Your CPU decoding without motion compensation benchmarks are similar to mine which does decoding to YUV420P frames with libavcodec. mine are posted in this thread, I thought taking out motion compensation would yield a lot better results than that! Are you using pretty stressing video?

Yes, they are on the pi, but my code also does also handling network transfer over vomp protocoll and demuxing and also uploading to the graphics card. So they are not comparable only the relative numbers of my measurements should give you a hint of the speed up (a factor of two!).
Btw. something I wanted to ask you, did you tweak libavcodec settings, did you compile it from current git (ffmpeg or libav) or did you use the included debian version (which is from 2009 with little arm optimization).
Since I was pretty disappointed, that I only got 11-12 fps everything done in libavcodec, the vomp code should not be the problem is written for 66 Mhz powerpc on mediamvp.

Marten

P.S: The video was the next best DVB-C recording. SD PAL
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by jacksonliam » Wed Jun 13, 2012 4:32 pm
I can't remember, possibly compiled from git, but more likely just did apt-get install libavcodec! I certainly didn't tweak anything!

Is the vomp protocol light? Because over a mounted smb share I saw almost no change in framerate!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm