MPEG-2 Decoding


231 posts   Page 7 of 10   1 ... 4, 5, 6, 7, 8, 9, 10
by gogiman » Wed Aug 01, 2012 6:14 am
Hi guys.
Really great stuff going on here. Just a few useless lines from me :-)
Would it be possible to integrate all this into XBMC (RaspBMC)?
The display frame rate you are mentioning here is great. I am not sure if I understand it properly, but when I run some MPEG4 file from my NAS, XBMC is showing me (when pressing O key) approximately 24 fps and my feeling is that it is perfectly watchable :-)
Posts: 14
Joined: Tue Jul 10, 2012 6:55 am
by JonathanGraham » Wed Aug 01, 2012 12:44 pm
linuxstb wrote:But now to my question - does anyone have any YUV display code they're willing to share that I can try to integrate with libmpeg2?

There's code earlier in the thread. A modified version of hello_dispmanx which is in your /opt directory.
User avatar
Posts: 39
Joined: Thu Jul 05, 2012 5:55 am
by JonathanGraham » Wed Aug 01, 2012 12:55 pm
gogiman wrote:Hi guys.
Really great stuff going on here. Just a few useless lines from me :-)
Would it be possible to integrate all this into XBMC (RaspBMC)?
The display frame rate you are mentioning here is great. I am not sure if I understand it properly, but when I run some MPEG4 file from my NAS, XBMC is showing me (when pressing O key) approximately 24 fps and my feeling is that it is perfectly watchable :-)

It seems reasonable. I don't know what XBMC is doing exactly (I wrote my own STB software :-) ) but I assume that like MythTV it invokes an external player. In this case omxplayer and I likewise assume that it can invoke different players depending on the filetype. So you would just configure it to use one of our apps to do mpeg2 (and whatever else - I did some compares on some .ogm files between omx player and mplayer - mplayer did a better job hands down.)

I will, when I get done with MPEG2 try to write mplayer video codecs to handle the hardware accelerated stuff. It probably will have to work in two parts like VDPAU does.

Incidentally I found a single C routine which according to the profiler gets called a fair bit. It's straightforward and there's a NEON accelerated version but nothing else. So if there are any ARM experts reading I'd like some feedback on it. Essentially it takes two pointers, one to a list of floats and the other to a list of ints, also an integer counter and a float multiplier.

each integer in the list is multiplied by the float and stored in the corresponding position in the float array. I've tried re-writing the C in a couple of different ways (including using Duff's Device) to see if the assembly output would be different. So far no dice.

From my scant knowledge of ARM assembly this looks a lot like a job for VFP and FLDM but I'd like to hear why it shouldn't be optimized too. I assume there's some overhead in invoking the VFP or something.
User avatar
Posts: 39
Joined: Thu Jul 05, 2012 5:55 am
by bbb » Wed Aug 01, 2012 2:52 pm
jacksonliam wrote:...
I haven't been able to get libmpeg2 to output to the dispmanx yet, I think I need to do my own buffer management...
....


Hi dug out some code I wrote many many moons ago which might help, it is a simple libmpeg2 decoder that spits the output to a SDL overlay.

(haven't tried on the pi yet, but works fine on my Linux Ubuntu install).
Attachments
mpeg2sdl.zip
C source code
(2.03 KiB) Downloaded 106 times
Posts: 51
Joined: Sat Jun 02, 2012 9:52 am
by jacksonliam » Wed Aug 01, 2012 8:55 pm
linuxstb wrote:I've finally got my Pi and have started looking at libmpeg2. My approach has been to just copy the .c/.h/.S files from the libmpeg2 distribution and, along with the "simple1.c" example program, create my own test player.

So I've created my own simple Makefile instead of using the libmpeg one, and have removed all unused files to make it easier to work with.

I've enabled the official ARM optimisation (motion_comp_arm_s.S) included with libmpeg2 but haven't yet managed to get the ARM-optimised idct from Rockbox working - it just segfaults...

My simple decoder is managing faster than real-time for all my MPEG-2 clips, but I don't think this is news. My main use will be to player UK DVB-T broadcasts, and one 30-second test sample (from the channel "Dave") decodes in about 13 seconds. This is quite a low bitrate (I think around 1.5Mbits/s-2Mbits/s). A second 30 second sample (BBC1 Scotland from DVB-S) is much higher bitrate (around 5Mbits/s I think), and that takes close to 30 seconds to decode.

So this shows how dependent on bitrate the decoding speed is. Fortunately for this purpose, the UK's low bitrate DVB-T is a good thing ;)

I intend to change my test program so that it reads the entire file into RAM before decoding. I'm expecting this to provide more consistent benchmark figures, and when I do that, I'll post some numbers to compare to others.

But now to my question - does anyone have any YUV display code they're willing to share that I can try to integrate with libmpeg2?

Thanks.


I don't think anyone else has done this and I didnt get it to work... I couldn't get the libmpeg2 buffers into a format dispmanx liked! With libavcodec I was able to look at the ppm output routine to memcpy each frame into the correct yuv420 packed format! I'll upload an old version of that below (my new code is on the pi and the pi isn't booted) The makefile might not be entirely right.

bbb wrote:
jacksonliam wrote:...
I haven't been able to get libmpeg2 to output to the dispmanx yet, I think I need to do my own buffer management...
....


Hi dug out some code I wrote many many moons ago which might help, it is a simple libmpeg2 decoder that spits the output to a SDL overlay.

(haven't tried on the pi yet, but works fine on my Linux Ubuntu install).

Thanks, I'll take a look and see if it helps at the weekend, from a quick scan it seems very similar to the samples included with libmpeg2 though, but it might possibly give me some hints for buffer manipulation.
These samples are great but there's rarely one with sound output and that's the trickiest part IMO!
Attachments
vp01 1st Aug.zip
(3.41 KiB) Downloaded 92 times
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by linuxstb » Thu Aug 02, 2012 12:26 am
jacksonliam,

Thanks for that code. Along with the "dispmanx.c" sample posted by dom (which I think you've seen!) I've been able to get libmpeg2 to output video on my Pi.

The buffer returned by libmpeg2 is an array of three pointers - pointing to the Y, U and V data. The key is that the Y data is aligned to 16 bytes, and the U/V data is aligned to 8 bytes.

The input to dispmanx is a single buffer containing the Y data, followed by the U, and then the V.
Again, if I understand correctly, dispmanx requires the data to be aligned to 32 bytes for the Y and 16 pixels for U and V. So you need to memcpy the data line by line to add this alignment (unless the data is already aligned, in which case you can memcpy the whole thing).

I'm attaching my full test program, including the local copy of libmpeg2, for anyone that wishes to play. It takes as input a demuxed mpeg-2 video stream (no audio), which can be created using the "extract_mpeg" program included with the official libmpeg2 distribution.

Now some benchmarks (using vc_dispmanx_update_submit, not the _sync variant) on a standard 700MHz Pi with the latest Raspian:

Low bitrate 544x576 25fps DVB-T stream - 41.69fps
Higher bitrate (approx 5Mbits/s I think) 720x576 25fps DVB-S stream - 22.31fps
hst_2.mpg sample 29.97fps 720x480 - 24.21fps

This is without audio and without buffering/demuxing (I read the entire test file into RAM before starting the decode).

EDIT:

I've just done some tests commenting out my memcpy loops, and the speed increases to 47.94, 25.82 and 26.34. So it would seem worthwhile to see if libmpeg2 can be persuaded to output the data with the alignment required by dispmanx (and setting the mpeg2 output buffers to be contiguous in my app).

I also still need to get the ARM idct from Rockbox working - I'm still using the C version.
Attachments
test1.tgz
(37.44 KiB) Downloaded 83 times
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by JonathanGraham » Thu Aug 02, 2012 4:18 am
Did some work to determine what kind of expected output I can get once things like the USB interrupts are fixed:

Config
Code: Select all
arm_freq=850
fbset -xres 8 -yres 8 -vxres 8 -vyres 8 -depth 8
echo 1 > /sys/devices/platform/bcm2708_usb/bussuspend

File characteristics
Code: Select all
VIDEO:  MPEG2  720x480  (aspect 3)  29.970 fps  9800.0 kbps (1225.0 kbyte/s)
AUDIO: 48000 Hz, 2 ch, s16le, 384.0 kbit/25.00% (ratio: 48000->192000)

Code: Select all
BENCHMARKs: VC: 729.138s VO:  49.725s A: 112.107s Sys: 389.435s = 1280.406s
BENCHMARK%: VC: 56.9459% VO:  3.8835% A:  8.7556% Sys: 30.4150% = 100.0000%
BENCHMARKn: disp: 30693 (23.97 fps)  drop: 2 (0%)  total: 30695 (23.97 fps)

So clearly there's some room to optimize the audio but I'd say the movie played flawlessly.
User avatar
Posts: 39
Joined: Thu Jul 05, 2012 5:55 am
by linuxstb » Thu Aug 02, 2012 6:16 am
Jonathan,

Are you able to post a patch with your changes to mplayer? It would be great to see how your code is working.

All,

In another thread (I think), dom suggested creating a queue of decoded frames in one thread, and then using another thread to display those frames with submit_sync, in order to even out the processing time. Has anyone tried this approach? Given that we don't have much legroom, it seems a sensible approach to me. Memory shouldn't be an issue (one 720x576 YUV frame is only about 600KB).

Thanks.
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by MartenR » Thu Aug 02, 2012 9:24 am
Incidentally I found a single C routine which according to the profiler gets called a fair bit. It's straightforward and there's a NEON accelerated version but nothing else. So if there are any ARM experts reading I'd like some feedback on it. Essentially it takes two pointers, one to a list of floats and the other to a list of ints, also an integer counter and a float multiplier.

Just, one quick question, which profiler are you using? My first attempts using valgrind resulted in errors, I also like to try to stop what my code spends the most time.

A quick update on my gpu efforts, after I got all motion compensation working the performance was not that impressive. I hit 21-23 fps, the shader for motion compensation is just too complex. Since I had no direct clue, where to optimize the stuff, I frooze working and persued a different idea.
In the last two weeks I wrote a patch for libav aka ffmpeg, which adds the ability to transcode mpeg2 (only frame picture but I did not see others in DVB) to mpeg4 ASP (this is part 2 aka divx with b frames and not h264 aka part 10), as far as I understood this one is also licensed.

If I turn off writing to disk, I achieve constant above 29 fps on a high bitrate DVB-C sample (25 fps 720x576) (on the old debian, will try it on raspbian at the weekend). The good thing, the code is not optimized at all just the naive first implementation gives me this framerate, so lots of room for optimization. (All without threading and demuxing and no passing to omx, but is it designed in a way that libav can write directly to omx buffers).

Drawback, gray bars are added left and right to the picture to simulate intra macroblocks in b frames. (A feature missing in mpeg4). I have to find some little bugs and to do some optimization then I will probable post the code.

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by linuxstb » Thu Aug 02, 2012 10:20 am
Marten,

Sounds an interesting approach. Is there any loss in quality going this route?

Dave.
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by linuxstb » Thu Aug 02, 2012 10:39 am
linuxstb wrote:I've just done some tests commenting out my memcpy loops, and the speed increases to 47.94, 25.82 and 26.34. So it would seem worthwhile to see if libmpeg2 can be persuaded to output the data with the alignment required by dispmanx (and setting the mpeg2 output buffers to be contiguous in my app).


This turned out to be relatively straightforward. libmpeg2 has a function (mpeg2_stride) specifically for this purpose. So I've now modified my test program to tell libmpeg2 to write into my own buffers and to format those buffers with the alignment required by dispmanx.

So this means my program no longer needs to copy any data - the output of libmpeg2 can be written directly to the GPU.

New benchmark figures:

544x576 @ 25fps DVB-T sample - 45.82fps
720x576 @ 25fps DVB-S sample - 25.30fps
720x480 @ 29.97fps hst_2.mpg sample - 26.02fps

Latest code attached. Any suggestions for further improvement very welcome...
Attachments
test2.tgz
(37.38 KiB) Downloaded 107 times
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by MartenR » Thu Aug 02, 2012 12:45 pm
linuxstb wrote:Marten,

Sounds an interesting approach. Is there any loss in quality going this route?

Dave.

It depends, if you use the same quantization (which is possible, since mpeg4 supports mpeg2 style quantization), then you do not have to requantize and it will keep the same quality. (except if you can not keep the quantization, mpeg4 allows only to change quantization level by 2 from macro block to macro block, so you have to requantize 1-2 % of macro blocks, if the quantization is changed to rapidly).

I choose to use h263 quantization, which is easier, at a higher quality than the original mpeg2 quantization. I can not spot a difference and it is a bit faster so far, but this can be changed later.
Then we might be able to keep it as it is, some for most mb there will be no change.

A problem are the missing intra blocks in b frames in mpeg4, I can emulate them quite good, but there is some loss of quality, mostly quantization problems, anyway it works since yesterday. But this are only 5-20 blocks out of 1620 blocks of a b picture (without emulating I could only hardly spot them in the running movie, only at frame by frame watching).

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by JonathanGraham » Thu Aug 02, 2012 2:34 pm
MartenR wrote:Just, one quick question, which profiler are you using? My first attempts using valgrind resulted in errors, I also like to try to stop what my code spends the most time.

Valgrind is so invasive I was pretty sure it wouldn't work or at the very least would perform very slow. So I used oprofile - downloaded the source, compiled, my kernel had the module precompiled in. No problems.

In the last two weeks I wrote a patch for libav aka ffmpeg, which adds the ability to transcode mpeg2 (only frame picture but I did not see others in DVB) to mpeg4 ASP (this is part 2 aka divx with b frames and not h264 aka part 10), as far as I understood this one is also licensed.


I had thought about this and looked at some papers describing how to do it efficiently but I don't really understand enough of the mpeg2 standard to implement.
Last edited by JonathanGraham on Thu Aug 02, 2012 2:52 pm, edited 1 time in total.
User avatar
Posts: 39
Joined: Thu Jul 05, 2012 5:55 am
by JonathanGraham » Thu Aug 02, 2012 2:48 pm
linuxstb wrote:Jonathan,

Are you able to post a patch with your changes to mplayer? It would be great to see how your code is working.

All,

In another thread (I think), dom suggested creating a queue of decoded frames in one thread, and then using another thread to display those frames with submit_sync, in order to even out the processing time. Has anyone tried this approach? Given that we don't have much legroom, it seems a sensible approach to me. Memory shouldn't be an issue (one 720x576 YUV frame is only about 600KB).

Thanks.

I just looked at your posted code you appear to do the following:

If it's the first frame -> creates a resource
Write data to the resource
Start the update cycle
If not the first frame remove the element
Add the element to the display
Finish (submit) the update cycle

What I discovered through messing around is that as long as I just keep writing data and triggering the update then the display gets updated. You still need to create an element but you only need to do it once.

So my inner loop is:

Start the update
Write the data
Finish the update cycle.

The caveat here is this is essentially single buffering so you can see some tearing once this gets going fast. One of my next projects is to use two element and swap writing between them. From there I can just alter the z-order to do the page flip.

This brings CPU usage pretty low ~4% If there was some way (opengl es?) to write directly to the texture. You could probably get that down to 2%.

wrt threading - this was originally conceived because the update_sync call blocks. Since update doesn't I'm not sure threading is going to give us anything but anyone who knows more about pthreads can feel free to contradict me.
User avatar
Posts: 39
Joined: Thu Jul 05, 2012 5:55 am
by jacksonliam » Thu Aug 02, 2012 6:58 pm
linuxstb wrote:
linuxstb wrote:I've just done some tests commenting out my memcpy loops, and the speed increases to 47.94, 25.82 and 26.34. So it would seem worthwhile to see if libmpeg2 can be persuaded to output the data with the alignment required by dispmanx (and setting the mpeg2 output buffers to be contiguous in my app).


This turned out to be relatively straightforward. libmpeg2 has a function (mpeg2_stride) specifically for this purpose. So I've now modified my test program to tell libmpeg2 to write into my own buffers and to format those buffers with the alignment required by dispmanx.

So this means my program no longer needs to copy any data - the output of libmpeg2 can be written directly to the GPU.

New benchmark figures:

544x576 @ 25fps DVB-T sample - 45.82fps
720x576 @ 25fps DVB-S sample - 25.30fps
720x480 @ 29.97fps hst_2.mpg sample - 26.02fps

Latest code attached. Any suggestions for further improvement very welcome...

Awesome, I'll take a look and see if putting in the arm file gives any benefit (or ripping your output into my libmpeg2), since I managed to get it going without segfaulting but not displaying and my benchmarks were promising.

JonathanGraham wrote:
In the last two weeks I wrote a patch for libav aka ffmpeg, which adds the ability to transcode mpeg2 (only frame picture but I did not see others in DVB) to mpeg4 ASP (this is part 2 aka divx with b frames and not h264 aka part 10), as far as I understood this one is also licensed.


I had thought about this and looked at some papers describing how to do it efficiently but I don't really understand enough of the mpeg2 standard to implement.

I'd read about that too, it looked really complex and I wasn't sure about how sound would sync up :/ Sounds promising though!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by linuxstb » Thu Aug 02, 2012 8:56 pm
JonathanGraham wrote:What I discovered through messing around is that as long as I just keep writing data and triggering the update then the display gets updated. You still need to create an element but you only need to do it once.
...
The caveat here is this is essentially single buffering so you can see some tearing once this gets going fast. One of my next projects is to use two element and swap writing between them. From there I can just alter the z-order to do the page flip.


Thanks for the tip - I'll have a play with that. I'm also intending to double-buffer.

JonathanGraham wrote:This brings CPU usage pretty low ~4% If there was some way (opengl es?) to write directly to the texture. You could probably get that down to 2%.


Do you mean to set up libmpeg2 so it creates the YUV image directly in the texture?

I haven't really spent much time trying to understand the various graphics libraries on the Pi (I've no previous experence with graphics libs), so this is all new to me... My experience is more "bare-bones" programming, writing directly to "dumb" LCD controllers without any GPU to help.

On a different subject, I'm losing track of the performance different people are achieving. Are people using libav/ffmpeg getting better or worse performance than my current libmpeg2 attempt?
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by JonathanGraham » Thu Aug 02, 2012 10:32 pm
linuxstb wrote:Do you mean to set up libmpeg2 so it creates the YUV image directly in the texture?

Yes, I have no idea if this is possible. The powers that be have stated that they won't give us the Broadcom docs because they are NDA protected. So most of what I've done I figured out by reading the header files and experimenting. In dispmanx we create a buffer that we attach to an "element" we write our data to the buffer, start the update and then we call write_data to push data from the buffer into the GPU and then close out the update. So there's essentially an extra copy there that doesn't reallyh benefit us and from my profiling it's taking up the majority of the CPU cycles spent in video output.
On a different subject, I'm losing track of the performance different people are achieving. Are people using libav/ffmpeg getting better or worse performance than my current libmpeg2 attempt?


Agreed, the first thing we need to do is standardize the clips we are using as our test battery. All of us have different needs (Liam wants to use it for DVB and I need this to play DVD), Once we have those up in an accessible place we will get a better idea as to what is performing best.

After that we need to advertise in our benchmark postings...maybe in a google doc? what were the major configuration elements (Overclocked, framebuffer minimized, USB off)
User avatar
Posts: 39
Joined: Thu Jul 05, 2012 5:55 am
by linuxstb » Fri Aug 03, 2012 10:01 am
On a different subject, I did a couple of brief tests with audio decoding last night.

A took a 30 second 256kbps MP2 audio file (from a DVB-S broadcast) and decoded it with both madplay (based on libmad - an integer decoder) and mpg123 (a floating point decoder).

Not surprisingly, mpg123 was twice as fast, decoding the 30-second clip in just under 1 second, whereas madplay took almost 2 seconds. So as a rough approximation, decoding needs 700/30 = 23MHz.

I also did some tests running mpg123 in the background whilst playing video, and I still managed about 23fps on my DVB-S sample and about 42fps on my DVB-T. I also tried a higher bitrate DVB-T sample, and this decoded in about 27fps (with mpg123 in the background), so for my purposes (playing back DVB-T streams), it seems almost fast enough already ;)

Note that this was with the stock mpg123 and madplay from Raspbian (installed with apt-get). Just to be sure, I'll try and compile them myself from source to ensure all available optimisations are enabled.
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by bbb » Fri Aug 03, 2012 10:44 am
linuxstb wrote:On a different subject, I did a couple of brief tests with audio decoding last night.

A took a 30 second 256kbps MP2 audio file (from a DVB-S broadcast) and decoded it with both madplay (based on libmad - an integer decoder) and mpg123 (a floating point decoder).

Not surprisingly, mpg123 was twice as fast, decoding the 30-second clip in just under 1 second, whereas madplay took almost 2 seconds. So as a rough approximation, decoding needs 700/30 = 23MHz.

I also did some tests running mpg123 in the background whilst playing video, and I still managed about 23fps on my DVB-S sample and about 42fps on my DVB-T. I also tried a higher bitrate DVB-T sample, and this decoded in about 27fps (with mpg123 in the background), so for my purposes (playing back DVB-T streams), it seems almost fast enough already ;)

Note that this was with the stock mpg123 and madplay from Raspbian (installed with apt-get). Just to be sure, I'll try and compile them myself from source to ensure all available optimisations are enabled.


yep I had around the same results with libmpg123 and libmad.

At the moment just trying to get mplayer compiled with mpg123 AND ffmpeg support to compare (default apt-get mplayer only has ffmpeg). I reckon libmpg123 is slightly better than ffmpeg fixed and floating point mp2 decoder.

Other thing I am starting to look at is creating an -ao driver for openmax.
Posts: 51
Joined: Sat Jun 02, 2012 9:52 am
by linuxstb » Fri Aug 03, 2012 5:46 pm
In an attempt to standardise tests, I've uploaded some mpeg2 sample files here:

http://linuxstb.cream.org/mpeg2_samples_m2v.zip

These are just the video streams from the files, extracted with "extract_mpeg" from libmpeg2. I'll try and upload the original versions with audio later tonight. These are all no longer than 30 second clips.

Code: Select all
Sample                    Resolution   FPS     Avg. Bitrate    libmpeg FPS
                                                    (Mbits/s)
dave_dvbt_30secs.m2v      544x576      25      1.8             46.19
bbc1_dvbt_30secs.m2v      720x576      25      2.9             29.16
bbc1s_dvbs_30secs.m2v     720x576      25      4.2             24.98
centaur_2.m2v             720x480      29.97   6.0             29.70
hst_2.m2v                 720x480      29.97   6.0             26.08


The FPS in the last column is the latest version of my libmpeg2 based decoder with video output. This is running on a stock Pi running Raspian with no hardware/configuration changes. HDMI is set to 1920x1080i @ 50Hz
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by bbb » Fri Aug 03, 2012 6:15 pm
My target is get the raspberry pi working as a mythtv frontend. I have done some audio benchmark with mplayer with mpeg audio layer II (mp2) as used in standard def DVB-S and DVB-T transmissions. AC3 is used for BBC HD transmissions, and possibly Channel 4 (in the UK anyway ...) and x264 codec for video - will investagate this further when standard def is working :)

Code: Select all
SD Card Image: 2012-07-15-wheezy-raspbian.zip from raspberrypi.org

Linux raspberrypi 3.1.9+ #168 PREEMPT Sat Jul 14 18:56:31 BST 2012 armv6l GNU/Linux

Jul 14 2012 13:11:40
Copyright (c) 2012 Broadcom
version 325444 (release)

Build instructions, already had all other development libs for mplayer except libmpg123-dev
apt-get source mplayer
apt-get install libmpg123-dev
cd mplayer-1.0~rc4.dfsg1+svn34540
./configure && make mplayer

-----------------------------------------------------------------------------------
- NULL OUTPUT
-----------------------------------------------------------------------------------
./mplayer -benchmark -ao null -ac ffmp2float output-short.mp2
BENCHMARKs: VC:   0.000s VO:   0.000s A:   1.039s Sys:  30.153s =   31.192s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A:  3.3306% Sys: 96.6694% = 100.0000%

./mplayer -benchmark -ao null -ac ffmp2 output-short.mp2
BENCHMARKs: VC:   0.000s VO:   0.000s A:   1.083s Sys:  30.041s =   31.124s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A:  3.4791% Sys: 96.5209% = 100.0000%

./mplayer -benchmark -ao null -ac mpg123 output-short.mp2
BENCHMARKs: VC:   0.000s VO:   0.000s A:   1.156s Sys:  29.960s =   31.116s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A:  3.7145% Sys: 96.2855% = 100.0000%

-----------------------------------------------------------------------------------
- ALSA OUTPUT (to 3.5mm audio jack using snd_bcm2835 module)
-----------------------------------------------------------------------------------
./mplayer -benchmark -ao alsa -ac ffmp2float output-short.mp2
[AO_ALSA] Unable to set hw-parameters: Invalid argument
Failed to initialize audio driver 'alsa'

./mplayer -benchmark -ao alsa -ac ffmp2 output-short.mp2
BENCHMARKs: VC:   0.000s VO:   0.000s A:   1.090s Sys:  28.634s =   29.724s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A:  3.6679% Sys: 96.3321% = 100.0000%

./mplayer -benchmark -ao alsa -ac mpg123 output-short.mp2
BENCHMARKs: VC:   0.000s VO:   0.000s A:   1.164s Sys:  28.572s =   29.737s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A:  3.9152% Sys: 96.0848% = 100.0000%

-----------------------------------------------------------------------------------
- ALSA OUTPUT again with full 30 minute audio track
-----------------------------------------------------------------------------------
 ./mplayer -benchmark -ao alsa -ac ffmp2 output.mp2
BENCHMARKs: VC:   0.000s VO:   0.000s A:  65.781s Sys:1725.153s = 1790.934s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A:  3.6730% Sys: 96.3270% = 100.0000%

./mplayer -benchmark -ao alsa -ac mpg123 output.mp2
BENCHMARKs: VC:   0.000s VO:   0.000s A:  69.468s Sys:1721.471s = 1790.939s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A:  3.8789% Sys: 96.1211'% = 100.0000%

Notes and conclusions:-
* Difference between -ao null and -ao alsa is only 0.2%. Not worth investigating further solutions like a openmaxil 'ao' driver.
* ALSA driver on the PI does not accept floating point input.
* -ac ffmp2 code is a little quicker than -ac mpg123.
* Load is under 4% for mp2 audio track. This matches linuxstb results for a 256kb/s mp2 track below - around 3.3%.
* mplayer load is a little higher than mpg123, likely due to the modules architecture and virtual-table calling overheads.

Next steps I am planning to look at is how to get the mplayer improvements using dispmanx into either mythfrontend or the mythtv plugin for XBMC. The mp2 audio stuff look to be there, with two viable options (ffmp2 or mpg123).
Posts: 51
Joined: Sat Jun 02, 2012 9:52 am
by jacksonliam » Fri Aug 03, 2012 6:48 pm
linuxstb wrote:
linuxstb wrote:I've just done some tests commenting out my memcpy loops, and the speed increases to 47.94, 25.82 and 26.34. So it would seem worthwhile to see if libmpeg2 can be persuaded to output the data with the alignment required by dispmanx (and setting the mpeg2 output buffers to be contiguous in my app).


This turned out to be relatively straightforward. libmpeg2 has a function (mpeg2_stride) specifically for this purpose. So I've now modified my test program to tell libmpeg2 to write into my own buffers and to format those buffers with the alignment required by dispmanx.

So this means my program no longer needs to copy any data - the output of libmpeg2 can be written directly to the GPU.

New benchmark figures:

544x576 @ 25fps DVB-T sample - 45.82fps
720x576 @ 25fps DVB-S sample - 25.30fps
720x480 @ 29.97fps hst_2.mpg sample - 26.02fps

Latest code attached. Any suggestions for further improvement very welcome...

Im getting some garbage on the screen with the hubble sample (and all others) - Running the official raspbian, no OC. Heres a screeny: https://dl.dropbox.com/u/798356/IMG_20120803_191949.jpg

ALL - Here is a spreadsheet I thought would be good to collate our FPS results (someone else mentioned making one earlier).
https://docs.google.com/spreadsheet/ccc ... lViT0J3aHc
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by dom » Fri Aug 03, 2012 9:16 pm
bbb wrote:* ALSA driver on the PI does not accept floating point input.

I think it will with the 'plug' plugin defined in asound.conf:
viewtopic.php?f=66&t=7107&start=25#p125513
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4059
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by linuxstb » Fri Aug 03, 2012 9:36 pm
jacksonliam wrote:Im getting some garbage on the screen with the hubble sample (and all others) - Running the official raspbian, no OC. Heres a screeny: https://dl.dropbox.com/u/798356/IMG_20120803_191949.jpg


I assume you mean with my test3.tgz program? Is this with the videos I uploaded or your own? Have you demuxed them with "extract_mpeg" ?
Posts: 77
Joined: Sat Jul 07, 2012 11:07 pm
by jacksonliam » Fri Aug 03, 2012 10:13 pm
linuxstb wrote:
jacksonliam wrote:Im getting some garbage on the screen with the hubble sample (and all others) - Running the official raspbian, no OC. Heres a screeny: https://dl.dropbox.com/u/798356/IMG_20120803_191949.jpg


I assume you mean with my test3.tgz program? Is this with the videos I uploaded or your own? Have you demuxed them with "extract_mpeg" ?

The Hubble clip, it has no sound so does it need demuxing? I thought your latest upload was test2?
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm