MPEG-2 Decoding


231 posts   Page 3 of 10   1, 2, 3, 4, 5, 6 ... 10
by MartenR » Wed Jun 13, 2012 8:29 pm
Is the vomp protocol light? Because over a mounted smb share I saw almost no change in framerate!
I think so it was designed for a 60 mhz powerpc embedded device.
But I think most time is wasted in demuxer and in filling decoding buffers, anyway these thinks I will look up after the shader will work (at least mocomp)
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by jacksonliam » Wed Jun 13, 2012 9:18 pm
MartenR wrote:
Is the vomp protocol light? Because over a mounted smb share I saw almost no change in framerate!
I think so it was designed for a 60 mhz powerpc embedded device.
But I think most time is wasted in demuxer and in filling decoding buffers, anyway these thinks I will look up after the shader will work (at least mocomp)

Should be very light then!

Indeed, I'm sure the buffers take time, In my experience I saw a 20% drop in framerate when writing to the GPU memory.

I've been reading up on it and I think getting iDCT done in GPU is quite optimistic, because each pixel is transformed by a different DCT base determined by its position, rather than an identical operation being done to each pixel. However, if you do iDCT would you also be able to do Inverse Quantization in the GPU too?

Theres info and a code example on JPEG iDCT in a shader on page 31 and onward of this presentation, ftp://69.31.121.43/developer/presentati ... Tricks.pdf
But if its applicable to MPEG2 or OpenGL ES (since it uses float in the shaders) I dunno!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by MartenR » Thu Jun 14, 2012 5:53 am
I've been reading up on it and I think getting iDCT done in GPU is quite optimistic, because each pixel is transformed by a different DCT base determined by its position, rather than an identical operation being done to each pixel. However, if you do iDCT would you also be able to do Inverse Quantization in the GPU too?

Actually, this is really ideal for GPU, first you have to seperate 2D idct in two 1d dct.
Then you can just put the position dependent coeffcients in a small (I think 8x8) texture and use the shader texture lookup functions. I am also aware of the nvidia implementation, but opengl es does not support multirendering and the RGBA textures have only 8 bits per channel, the dct coeffcients have 12 bits, so I can only store two coefficients in a pixel. The tough part is, if the GPU has enough computing power. (Floats are not a problem, in fact idct in jpeg and mpeg2 is almost identical)

In principle inverse quantization can also be done, but for this I would have to modify libavcodec and I think passing also the quantization matrix to the shader might hit a limit on vertex data attributes, so I think this one should be optimized on cpu side. But first I do motion compensation.

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by Paul Webster » Thu Jun 14, 2012 6:03 am
I don't understand much of this but it is fascinating.
I'm ready to pay a few quid to RPi Foundation for MPEG2 GPU acceleration license - but would happily put same into fund to help your efforts should it be needed.

Also - if you want a DVB-T sample .ts from UK Freeview (in addition to the DVB-C) that you have then let me know.
User avatar
Posts: 412
Joined: Sat Jul 30, 2011 4:49 am
Location: London, UK
by jacksonliam » Thu Jun 14, 2012 7:34 am
MartenR wrote:
I've been reading up on it and I think getting iDCT done in GPU is quite optimistic, because each pixel is transformed by a different DCT base determined by its position, rather than an identical operation being done to each pixel. However, if you do iDCT would you also be able to do Inverse Quantization in the GPU too?

Actually, this is really ideal for GPU, first you have to seperate 2D idct in two 1d dct.
Then you can just put the position dependent coeffcients in a small (I think 8x8) texture and use the shader texture lookup functions. I am also aware of the nvidia implementation, but opengl es does not support multirendering and the RGBA textures have only 8 bits per channel, the dct coeffcients have 12 bits, so I can only store two coefficients in a pixel. The tough part is, if the GPU has enough computing power. (Floats are not a problem, in fact idct in jpeg and mpeg2 is almost identical)

In principle inverse quantization can also be done, but for this I would have to modify libavcodec and I think passing also the quantization matrix to the shader might hit a limit on vertex data attributes, so I think this one should be optimized on cpu side. But first I do motion compensation.

Marten

Oh of course, I forgot you could use lookup textures in GL ES!
Paul Webster wrote:I don't understand much of this but it is fascinating.
I'm ready to pay a few quid to RPi Foundation for MPEG2 GPU acceleration license - but would happily put same into fund to help your efforts should it be needed.
L

Also - if you want a DVB-T sample .ts from UK Freeview (in addition to the DVB-C) that you have then let me know.


Would it be useful to put some links stuff I've read? I mean if I posted some stuff would people read them?
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by gigapixel » Fri Jun 15, 2012 5:55 am
I wouldn't mind having a look at some of that material. I doubt I'd be much use with the coding, but I'd be willing to help once my pi arrives, and I just think it's a really interesting idea.
Posts: 5
Joined: Wed Jun 13, 2012 11:44 am
by Paul Webster » Fri Jun 15, 2012 4:44 pm
Putting some links in the wiki to help document this on-going project could be useful - might induce others with skills in the area to help out.
My guess is that the right people are more likely to stumble across relevant sections in the wiki as part of their research into what RPi can do rather than hit the right thread in here.
User avatar
Posts: 412
Joined: Sat Jul 30, 2011 4:49 am
Location: London, UK
by HenrikL » Mon Jun 18, 2012 6:30 am
Great to see that dedicated people are working on getting MPEG-2 video playing on the pi. It will be much appreciated if you succed.

jacksonliam wrote:Here are some benchmarks of the pi decoding mpeg2 with libavcodec. This is a really simple player I wrote of just grabbing the video stream and decoding a frame with avcodec_decode_video. It doesn't do anything with the frame once decoded and doesn't do anything with the audio. I've tried over the network or locally - no difference seen in numbers there.

Are the benchmarks you posted from code compiled with hardfloat? I have noticed that mpeg-2 decoding using both ffmpeg and libmpeg2 were a lot faster using mplayer under Raspbian (http://www.raspbian.org/) which is basically debian compiled with hardfloat. Decoding 250 frames SD MPEG-2 video with mplayer took 17s (vc=ffmpeg2) under debian but only 8.5s under raspbian (with no video or audio output).
Posts: 2
Joined: Mon Jun 18, 2012 5:56 am
by MartenR » Mon Jun 18, 2012 8:18 am
I guess it is the newer version of libav in raspbian, more arm asm code optimized

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by jacksonliam » Mon Jun 18, 2012 8:25 am
HenrikL wrote:Great to see that dedicated people are working on getting MPEG-2 video playing on the pi. It will be much appreciated if you succed.

jacksonliam wrote:Here are some benchmarks of the pi decoding mpeg2 with libavcodec. This is a really simple player I wrote of just grabbing the video stream and decoding a frame with avcodec_decode_video. It doesn't do anything with the frame once decoded and doesn't do anything with the audio. I've tried over the network or locally - no difference seen in numbers there.

Are the benchmarks you posted from code compiled with hardfloat? I have noticed that mpeg-2 decoding using both ffmpeg and libmpeg2 were a lot faster using mplayer under Raspbian (http://www.raspbian.org/) which is basically debian compiled with hardfloat. Decoding 250 frames SD MPEG-2 video with mplayer took 17s (vc=ffmpeg2) under debian but only 8.5s under raspbian (with no video or audio output).
nope those numbers are standard debian, i will look at taking some hardfloat benchmarks!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by HenrikL » Mon Jun 18, 2012 9:44 am
MartenR wrote:I guess it is the newer version of libav in raspbian, more arm asm code optimized

Marten


The improvement could of course be due to more optimized arm asm code in the newer libav version, but I would put my money on the better utilization of the vfp hardware. There is an old benchmark at http://pastebin.com/2NZqH2yY that shows a 10x speed increase for floating point operations. Wouldn't this do wonders for decoding media?

Also, decoding using libmpeg2 was improved as well. A reduction from 14s to 9s for the same 250 frame video sample.
Posts: 2
Joined: Mon Jun 18, 2012 5:56 am
by MartenR » Mon Jun 18, 2012 5:37 pm
Wouldn't this do wonders for decoding media?

Because the compute intensive part, motion compensation, IDCT and quantization is integer arithmetic or memory copying. (Can of course be implemented also with floats as in the case of SSE, but standard is integer).
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by Quando » Mon Jun 25, 2012 10:57 am
MartenR wrote:
Wouldn't this do wonders for decoding media?

Because the compute intensive part, motion compensation, IDCT and quantization is integer arithmetic or memory copying. (Can of course be implemented also with floats as in the case of SSE, but standard is integer).


All the standard IDCT implementations seem to be integer based, but I think that is due to the age of the initial algorithm and the target of embedded hardware. These days an FP version can be significantly faster if the CPU has even a small vector float unit, and a lot of chips have better pipelining of FP work than integer which can help. I once spent far too long optimising a JPEG software decode pipeline to work around some broken hardware - got to the point that even taking out some of the shortcuts made it faster as without the branches the loops could be unrolled further and the calculations were so cheap. There are savings to be made in the de-compression of the data stream too - but it can end up getting to the point of one big function doing the whole decode to avoid having to pass data around which gets pretty nasty for bug finding/maintenance.

The GPU has to be the way to go for it though - but as mentioned above squeezing the input data into the right format is going to be fun (in that weird coding way....)

Is anyones work on this in a form that can be shared yet? I've had some small experience with moving code to GPU shaders as well and would like to help on this sort of optimisation work.
Posts: 7
Joined: Mon Mar 05, 2012 10:46 am
Location: London
by MartenR » Mon Jun 25, 2012 11:59 am
@quando
You can look at git.vomp.tv, at the vompclient for raspberry repository at the xvmc branch (not in master yet) to see my effort towards mocomp. I commit every weekend.

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by Quando » Tue Jun 26, 2012 1:45 am
Hi,

I got the code from git.vomp.tv
After struggling for a while and failing to build the full vomp cross-compile toolchain I tried just the vompclient-raspi and that made some progress but gets stuck on some errors in mediafile.cc:


pi@raspberrypi ~/vomp/vompclient-raspi $ git pull
Already up-to-date.
pi@raspberrypi ~/vomp/vompclient-raspi $ git status
# On branch xvmc
nothing to commit (working directory clean)
pi@raspberrypi ~/vomp/vompclient-raspi $ make
raspberry normal compiler
Setting up objects
Raspberry pi flags
g++ -g -O0 -Wall -Wshadow -DDEV -D_GNU_SOURCE -DVOMP_PLATTFORM_RASPBERRY -I/opt/vc/include -D__STDC_CONSTANT_MACROS -c -o mediafile.o mediafile.cc
mediafile.cc: In member function ‘virtual MediaList* MediaFile::getMediaList(const MediaURI*)’:
mediafile.cc:153:21: error: expected primary-expression before ‘struct’
mediafile.cc:153:36: error: ‘d_name’ was not declared in this scope
mediafile.cc:153:42: error: ‘offsetof’ was not declared in this scope
mediafile.cc:153:58: error: array bound is not an integer constant before ‘]’ token
make: *** [mediafile.o] Error 1


I'm on the 'wheezy' beta if that might be causing problems with a mis-match of tool versions.

The code that is throwing the error is:

char b[offsetof(struct dirent, d_name) + NAME_MAX + 1];

and it looks like the 'offsetof' is causing the problem. Temporarily hacking that line out leads to:

In file included from /opt/vc/include/interface/vcos/vcos_assert.h:140:0,
from /opt/vc/include/interface/vcos/vcos.h:105,
from /opt/vc/include/interface/vmcs_host/vc_dispmanx.h:25,
from /opt/vc/include/bcm_host.h:39,
from osdopengl.h:26,
from main.cc:71:
/opt/vc/include/interface/vcos/vcos_types.h:28:33: fatal error: vcos_platform_types.h: No such file or directory
compilation terminated.
make: *** [main.o] Error 1

which looks to be related to https://github.com/raspberrypi/firmware/issues/34 which seems to be an include path problem in the build script.

Have you had either of these issues? Or do they seem related to my RPi setup?
Posts: 7
Joined: Mon Mar 05, 2012 10:46 am
Location: London
by Quando » Tue Jun 26, 2012 2:34 am
Hi,

I fixed the vcos_platform_types.h by adding -I/opt/vc/include/interface/vcos/pthreads to the RPI specific INCLUDES line in GNUMakefile.

Some more compilation (and more dependencies to install: libavcodec-dev, libxvmc-dev, libavformat-dev in case anyone else follows this way) and I've hit a missing file:

Code: Select all
g++ -g -O0 -Wall -Wshadow -DDEV -D_GNU_SOURCE -DVOMP_PLATTFORM_RASPBERRY   -I/opt/vc/include -I/opt/vc/include/interface/vcos/pthreads -D__STDC_CONSTANT_MACROS   -c -o main.o main.cc
In file included from main.cc:71:0:
osdopengl.h:41:28: fatal error: glyuv444shader.h: No such file or directory
compilation terminated.

And given the name of the file I'm guessing it might be a new one you've added but not committed yet?
Posts: 7
Joined: Mon Mar 05, 2012 10:46 am
Location: London
by MartenR » Tue Jun 26, 2012 5:42 am
I am still on the debian squeeze so the issues you had are caused by the newer firmware and newer toolchain. I will add the missing files in the next minutes. (sorry for that)
Be aware that you need a vdr server with the vompserver plugin in order to use the client.
After I finished the shader code for mpeg, someone will need to extract the code to make available for other applications.

Marten

P.S: The vomp toolchain is for the mediamvp not the raspberry pi!
And some else in osd object for opengl is in the moment a MILLISLEEP(1500) in order to display a frame only every 1.5 s, I need this for debuging in the moment.
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by Quando » Tue Jun 26, 2012 10:08 am
Hi Marten,

Thanks for the missing files. With those, one more package (libjpeg-dev) and putting mediafile.o back into the makefile I can get it to link the vompclient on my RPi (still needs line 153 of mediafile.cc hacking to remove the 'offsetof').

I don't have access to the necessary server to make use of vompclient, so I'll take a look at the GPU work and see if I can start to make it work in another MPEG player. Unfortunately 'mplayer' wont compile on my system - or wouldn't: I'm trying again but it takes a while to build to the fail point....

Anyone know of a very simple MPEG player that could be used as the basis for swapping in a GPU decompressor rather than a CPU one?

Richard
Posts: 7
Joined: Mon Mar 05, 2012 10:46 am
Location: London
by jacksonliam » Tue Jun 26, 2012 10:59 am
Quando wrote:Hi Marten,

Thanks for the missing files. With those, one more package (libjpeg-dev) and putting mediafile.o back into the makefile I can get it to link the vompclient on my RPi (still needs line 153 of mediafile.cc hacking to remove the 'offsetof').

I don't have access to the necessary server to make use of vompclient, so I'll take a look at the GPU work and see if I can start to make it work in another MPEG player. Unfortunately 'mplayer' wont compile on my system - or wouldn't: I'm trying again but it takes a while to build to the fail point....

Anyone know of a very simple MPEG player that could be used as the basis for swapping in a GPU decompressor rather than a CPU one?

Richard

I think Martins stuff uses libavcodec which is really easy to get playing an mpeg2 file. I built a command line player in an hour or two learning from scratch.
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by Quando » Tue Jun 26, 2012 11:04 am
I think Martins stuff uses libavcodec which is really easy to get playing an mpeg2 file. I built a command line player in an hour or two learning from scratch.


Cheers - I'll take a look at some docs for that.

For what its worth this is the compile error I continue to get on mplayer:

Code: Select all
cc -MD -MP -Wundef -Wall -Wno-switch -Wno-parentheses -Wpointer-arith -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -Wdisabled-optimization -Wno-pointer-sign -Wdeclaration-after-statement -std=gnu99 -Werror-implicit-function-declaration -O4   -pipe -ffast-math -fomit-frame-pointer -fno-tree-vectorize -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -Ilibdvdread4 -I. -Iffmpeg  -marm -D_REENTRANT  -I/usr/include/freetype2 -c -o libmpeg2/motion_comp_arm_s.o libmpeg2/motion_comp_arm_s.S
libmpeg2/motion_comp_arm_s.S: Assembler messages:
libmpeg2/motion_comp_arm_s.S:29: Error: selected processor does not support ARM mode `pld [r1]'
libmpeg2/motion_comp_arm_s.S:39: Error: selected processor does not support ARM mode `pld [r1]'
libmpeg2/motion_comp_arm_s.S:65: Error: selected processor does not support ARM mode `pld [r1]'
libmpeg2/motion_comp_arm_s.S:70: Error: selected processor does not support ARM mode `pld [r1]'

(and that error repeats for another dozen or so lines).

There is an old thread here:http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2008-October/058726.html that talks about getting the same issue - looks to be a configure problem in libmpeg2 that isn't detecting that the PLD instruction isn't supported/source code that doesn't check if it is supported. Could probably just rip those instructions out (they are just pre-loads) but....
Posts: 7
Joined: Mon Mar 05, 2012 10:46 am
Location: London
by bbb » Tue Jun 26, 2012 9:47 pm
Quando wrote:
I think Martins stuff uses libavcodec which is really easy to get playing an mpeg2 file. I built a command line player in an hour or two learning from scratch.


Cheers - I'll take a look at some docs for that.

For what its worth this is the compile error I continue to get on mplayer:

Code: Select all
cc -MD -MP -Wundef -Wall -Wno-switch -Wno-parentheses -Wpointer-arith -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -Wdisabled-optimization -Wno-pointer-sign -Wdeclaration-after-statement -std=gnu99 -Werror-implicit-function-declaration -O4   -pipe -ffast-math -fomit-frame-pointer -fno-tree-vectorize -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -Ilibdvdread4 -I. -Iffmpeg  -marm -D_REENTRANT  -I/usr/include/freetype2 -c -o libmpeg2/motion_comp_arm_s.o libmpeg2/motion_comp_arm_s.S
libmpeg2/motion_comp_arm_s.S: Assembler messages:
libmpeg2/motion_comp_arm_s.S:29: Error: selected processor does not support ARM mode `pld [r1]'
libmpeg2/motion_comp_arm_s.S:39: Error: selected processor does not support ARM mode `pld [r1]'
libmpeg2/motion_comp_arm_s.S:65: Error: selected processor does not support ARM mode `pld [r1]'
libmpeg2/motion_comp_arm_s.S:70: Error: selected processor does not support ARM mode `pld [r1]'

(and that error repeats for another dozen or so lines).

There is an old thread here:http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2008-October/058726.html that talks about getting the same issue - looks to be a configure problem in libmpeg2 that isn't detecting that the PLD instruction isn't supported/source code that doesn't check if it is supported. Could probably just rip those instructions out (they are just pre-loads) but....


Hmm thought the raspberry Pi would support PLD being an ARM11 ... sure your running the GAS compiler with the correct CPU parameter ?

Anyway, code should work without them, they are just for trigging pre-loading of data from RAM into Cache to increase memory throughput. If I get a chance (unlikely till the weekend .. so busy at the moment) I will check the specifications to determine if PLD is actually supported on the Pi.
Posts: 51
Joined: Sat Jun 02, 2012 9:52 am
by AndrewS » Wed Jun 27, 2012 2:33 am
bbb wrote:If I get a chance (unlikely till the weekend .. so busy at the moment) I will check the specifications to determine if PLD is actually supported on the Pi.

I had a quick search of the ARM datasheet linked to from http://elinux.org/RPi_Hardware and it says:
"ARM1176JZF-S processors support PLD"
and
"In the ARM1176JZF-S processor, in Non-secure state, the PLD instruction has no effect
on the memory system so it behaves like a NOP. In Secure state, this instruction behaves
as a cache preload instruction as implemented in ARM1136JF-S processor." (which I'm afraid I don't understand! Can somebody decipher? ;) )
User avatar
Posts: 3626
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
by MartenR » Wed Jun 27, 2012 6:27 am
Please check if the offsetof error goes away by adding
#include <stdio.h>
#include <stddef.h>

as includes.
Actually I won't use mplayer for your first steps, you should do as liam suggested and use libavcodec for your first it is pretty easy. Btw. the decoding related stuff is all in glmocoshader.c/h and setting of opengl and libavcodec can be found in videovpeogl and osdopengl. Goal would be to write an interface around glmocoshader for other programs.
(At the moment only I frame display is working)

Marten
Posts: 46
Joined: Sat Mar 03, 2012 9:15 am
by Quando » Wed Jun 27, 2012 10:47 am
@marten: Those includes fixed the problem with ofsetoff. Thanks for the pointers to where the GPU code is, I'll start looking at that once I can roll a simple player: mplayer definitely looks a bit heavyweight to start on

@bbb: Had changed no settings after downloading the code. Got the same error on libmpeg2 as well. Removing the PLD instructions lets the code compile, and the built mplayer plays videos properly (very slowly - but correct output). The built mpeg2dec after the same change doesn't produce correct output.

libmpeg2 is the easier one to repeat the build error on as it compiles much quicker: I got the code with "svn co svn://svn.videolan.org/libmpeg2/trunk" and then just went in and built it with:

./bootstrap
./configure
make

It could, of course, be a problem with the wheezy beta OS I'm using, but it seems a bit odd for the compiler/assembler to be misconfigured - maybe libmpeg2 is just an ancient package that hasn't had much ARM work done for a while and has suffered some bitrot? Delving into the config is not something I'm qualified to do, but happy to take a go at it if someone can point me at what is needed.

I've dropped a ZIP file of the build folder as it has got to the failure to assemble point into https://dl.dropbox.com/u/14599585/libmpeg2-test.zip if it is on any use to anyone.

Richard
Posts: 7
Joined: Mon Mar 05, 2012 10:46 am
Location: London
by JonathanGraham » Thu Jul 05, 2012 6:08 am
As a quick test just to see if we were in reach of DVD quality MPEG2. I compiled the aging libmpeg2. With a null video driver I get 23-30fps. With the X11 driver that drops down to 10fps.

I haven't delved much into the OpenMAX stuff but with the examples in /opt and the rather simple structure of libvo I think I could hammer out a mpeg2 decoder using the overlay function in a day. At which point we would know if this is even worth pursuing. If, without hw decoding there's no reasonably low-latency way to get the frame to the GPU then we are toast.

However if there *is* then this might be within reach. While libmpeg2 boasts some ARM code it's all old stuff and given that when I force the system to use the c implementation there's no loss in frame rate I suspect that it's not even getting run on the Pi. Which means there's some opportunity for getting some better performance.
User avatar
Posts: 39
Joined: Thu Jul 05, 2012 5:55 am