horai
Posts: 43
Joined: Fri Apr 21, 2017 2:45 pm

v4l2h264dec instead of omxh264dec

Sat May 11, 2019 4:01 pm

Dear all,

would you mind helping me resolve my issue?

I am running Gstreamer 1.14.4 as well as 1.16 on Raspbian upgraded to kernel 4.19.*.
I would like to replace omxh264dec with v4l2h264dec, my original pipeline is this:
gst-launch-1.0 rtspsrc location="rtsp://10.0.0.3:8555/test" latency=0 ! rtph264depay ! h264parse ! omxh264dec ! videoconvert ! autovideosink

I wanted to replace it with this:
gst-launch-1.0 rtspsrc location="rtsp://10.0.0.3:8555/test" latency=0 ! rtph264depay ! h264parse ! v4l2h264dec capture-io-mode=4 ! autovideosink

But I am not able to run the pipeline, the pipeline starts only when setting capture-io-mode=4.

Thank you,
Ivo

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6707
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: v4l2h264dec instead of omxh264dec

Mon May 13, 2019 1:28 pm

Please always post a test case that others can run. "rtspsrc location="rtsp://10.0.0.3:8555/test"" is going to fail for everyone except you as it is trying to network stream and you've provided no detail of that stream.

Use of autovideosink is leaving a lot of unknowns as to how to render the video. It could use kmssink, fbdevsink, glimagesink, or potentially other things.

Which graphics stack are you using? Legacy, fake-kms, or full-kms? With full KMS you can certainly use kmssink for direct rendering of the YUV output.

Code: Select all

gst-launch-1.0 -e filesrc location=big_buck_bunny_720p_h264.mov ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 ! kmssink
fbdevsink will always give dubious performance as it will be manually blitting into the frame buffer, and requires a conversion to RGB in order to do that.

omxh264dec within Raspbian has some odd hacks using resize and egl_render so that it can export GL objects, but that only works within the legacy GLES stack.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

horai
Posts: 43
Joined: Fri Apr 21, 2017 2:45 pm

Re: v4l2h264dec instead of omxh264dec

Wed May 15, 2019 6:58 pm

Dear sir,

thank you very much for your help. I have RPI3B+, X11, Gstreamer 1.16, KMS driver,no window manager in X11
Based on your recommendation, I played test files this way :
gst-launch-1.0 -e filesrc location=/home/pi/jellyfish-15-mbps-hd-h264.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=4 ! kmssink
It works very much ok only in command line, the speed is amazing.

My problem is, since kmssink cannot be used in X11 (as far as I know), I am enclosing following pipelines (both with hardware accelerated sink) ,with comments, which I would like to encapsulate into GTK window in order to have video rendering in GTK window (GstOverlay):
1) gst-launch-1.0 -e filesrc location=/home/pi/jellyfish-3-mbps-hd-h264.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=4 ! clutterautovideosink
ad 1) This pipeline works but is very slow compared to kmssink (I understand X11 could cause some overhead but could it really be so serious?)
Anyway, glimagesink is recommended.
2) export GST_GL_API=opengl
gst-launch-1.0 -e filesrc location=/home/pi/jellyfish-3-mbps-hd-h264.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=4 ! glupload ! glimagesink
ad). This pipeline shows one image of the desired video hands printing this output:
Setting pipeline to PAUSED ...
0:00:03.869568671 817 0x18a6460 WARN basesrc gstbasesrc.c:3600:gst_base_src_start_complete:<filesrc0> pad not activated yet
Pipeline is PREROLLING ...
Got context from element 'sink': gst.gl.GLDisplay=context, gst.gl.GLDisplay=(GstGLDisplay)"\(GstGLDisplayX11\)\ gldisplayx11-0";
0:00:04.099781284 817 0x18816f0 WARN v4l2 gstv4l2object.c:4194:gst_v4l2_object_probe_caps:<v4l2h264dec0:src> Failed to probe pixel aspect ratio with VIDIOC_CROPCAP: Invalid argument
0:00:04.144984350 817 0x18816f0 WARN v4l2videodec gstv4l2videodec.c:810:gst_v4l2_video_dec_decide_allocation:<v4l2h264dec0> Duration invalid, not setting latency
0:00:04.184819802 817 0x6e73cac0 WARN v4l2bufferpool gstv4l2bufferpool.c:1263:gst_v4l2_buffer_pool_dqbuf:<v4l2h264dec0:pool:src> Driver should never set v4l2_buffer.field to ANY
0:00:04.270274157 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:04.270424732 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:04.270458014 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:04.297981763 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:04.298045150 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:04.298073119 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:05.932123123 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:05.932190884 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
0:00:05.932218853 817 0x6e73cac0 WARN dmabuf gstdmabuf.c:93:gst_dmabuf_mem_unmap:<dmabufallocator2> Using DMABuf without synchronization.
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...

Actually, my final goal is to run following pipeline which is a rendering of a RTSP stream:
sudo gst-launch-1.0 rtspsrc location="rtsp://10.0.0.2:8555/test" latency=200 ! rtph264depay ! h264parse ! v4l2h264dec capture-io-mode=4 ! videoconvert ! clutterautovideosink
Despite the fact I have to run it in superuser mode otherwise I face issues with size of a buffer, this pipeline (not under cluttersink in X11 nor kmssink without X11 )does not show any video image, just this text output:
Progress: (request) Sent PLAY request
0:00:01.178897447 1331 0x73e0e430 FIXME rtpjitterbuffer gstrtpjitterbuffer.c:1551:gst_jitter_buffer_sink_parse_caps:<rtpjitterbuffer0> Unsupported timestamp reference clock
0:00:01.179056041 1331 0x73e0e430 FIXME rtpjitterbuffer gstrtpjitterbuffer.c:1559:gst_jitter_buffer_sink_parse_caps:<rtpjitterbuffer0> Unsupported media clock
0:00:01.183729166 1331 0x703019b0 FIXME basesink gstbasesink.c:3248:gst_base_sink_default_event:<cluttergstvideosink0> stream-start event without group-id. Consider implementing group-id handling in the upstream elements
0:00:01.247296510 1331 0x703019b0 WARN v4l2 gstv4l2object.c:4194:gst_v4l2_object_probe_caps:<v4l2h264dec0:src> Failed to probe pixel aspect ratio with VIDIOC_CROPCAP: Invalid argument
Caught SIGSEGV
#0 0x76b75120 in poll () at ../sysdeps/unix/syscall-template.S:84
#1 0x76c89358 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
0:00:12.853157860 1331 0x73e0e430 WARN rtpjitterbuffer rtpjitterbuffer.c:570:calculate_skew: delta - skew: 0:00:11.140496109 too big, reset skew
0:00:12.868475620 1331 0x73e0e430 WARN rtpjitterbuffer rtpjitterbuffer.c:570:calculate_skew: delta - skew: 0:00:10.961291352 too big, reset skew
Spinning. Please run 'gdb gst-launch-1.0 1331' to continue debugging, Ctrl-C to quit, or Ctrl-\ to dump core.


Do you have any ideas or recommendation or any reasonable sink to use under X11?

Thank you very much
Best regards,
Ivo

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6707
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: v4l2h264dec instead of omxh264dec

Thu May 16, 2019 10:22 am

GL rendering is hideously inefficient compared to kmssink.

The Pi has very efficient composition hardware (the Hardware Video Scalar) for taking any number of YUV and RGB formats and resizing/composing them into an output frame. This what kmssink is linking to.

For GL composition you are converting the YUV image into an RGB texture in order to be passed through to 3D hardware which is then told to render it onto a flat surface. When you think that every YUV frame at 1080P is ~3MB, and an RGBA frame is ~8MB, at 30fps (60fps for rendering) this becomes a large hit on SDRAM bandwidth.
The support for YUV within Mesa and GL is a relatively recent addition - if you're using standard Raspbian Stretch libraries then it's probably horribly out of date.

You can get a gain by reducing the overhead of the texture conversion. GStreamer's v4l2videoconvert component can use the ISP hardware to do the YUV to RGB conversion, and downscale the image. Unfortunately it can't write out to the tile format that the 3D hardware wants, so there is still a significant overhead there. If you haven't got the absolute latest GStreamer with https://github.com/GStreamer/gst-plugin ... 4c7b3d7994 then you'll need to set the "disable_bayer=1" kernel module parameter on bcm2835-codec to get GStreamer to acknowledge that the driver does match v4l2videoconvert.

Code: Select all

gst-launch-1.0 -vv -e filesrc location=foo.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2videoconvert output-io-mode=5 capture-io-mode=4 ! video/x-raw,format=BGRA,width=640,height=360 ! glimagesink
Reducing to 640x360 gives barely usable playback. 1080P renders a few frames.

The majority of effort in getting VLC and Chromium to play reasonably has been down to minimising the amount of work that X11 has to do. A custom pipeline is created within the app to resize to exactly the right size for the window, and the right colour format, and then X only has to do a blit into the frame buffer. Even this starts straining some parts of the system.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

horai
Posts: 43
Joined: Fri Apr 21, 2017 2:45 pm

Re: v4l2h264dec instead of omxh264dec

Thu May 16, 2019 5:02 pm

Actually I have the latest Gstreamer 1.16. With the beforementioned commit.
I don't have the element v4l2videoconvert but I have element v4l2convert with the caps and parameters therefore I hope that the need for "disable_bayer=1" is not necessary.
According to other help:
http://gstreamer-devel.966125.n4.nabble ... 90637.html
I realized that I am missing DMA buf support in my Gstreamer build which (if I understand correctly) could do zero-copy in glimagesink. I see this in ./autogen:
checking for mmap... yes
checking linux/dma-buf.h usability... no
checking linux/dma-buf.h presence... no
checking for linux/dma-buf.h... no

therefore I downloaded the sources via Notro's rpi-source, now I have the entire kernel source in: /home/pi/linux-2f5a6b906ad86ef6570863a75b204551c2c62fec/
I tried to include the headers via:
export CPPFLAGS="-I/home/pi/linux-2f5a6b906ad86ef6570863a75b204551c2c62fec/include/linux -I/home/pi/linux-2f5a6b906ad86ef6570863a75b204551c2c62fec/include/uapi/linux"
./autogen.sh

But I still see:
checking for mmap... yes
checking linux/dma-buf.h usability... no
checking linux/dma-buf.h presence... no
checking for linux/dma-buf.h... no

horai
Posts: 43
Joined: Fri Apr 21, 2017 2:45 pm

Re: v4l2h264dec instead of omxh264dec

Sat May 18, 2019 11:55 am

I managed to fix the dma-buf.h,just to note, I am using custom build of MESA 19.02.
Anyway this pipeline:
gst-launch-1.0 -vv -e filesrc location=foo.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2convert output-io-mode=5 capture-io-mode=4 ! video/x-raw,format=BGRA,width=640,height=360 ! glimagesink

is much slower than this:
gst-launch-1.0 -vv -e filesrc location=foo.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=4 ! capture-io-mode=4 ! glimagesink

So probably I am missing some point and don't see the benefit.
So far I get by far the best results with omxh264dec and cluttersink.

Anyway, is there any tutorial or guide how to set up v4l2 codecs in Raspbian? I have to follow this tutorial which is based on Gentoo:
https://github.com/sakaki-/gentoo-on-rp ... era-Module
But unfortunately, it anyway consumes 95% of CPU, I would like someone to show me that it is hardware accelerated.
Again, I am probably missing some point.


I would be very grateful if someone prooves that v4l2 codec outperform omxh264dec, actually I would be happy if someone shows me how to run even run v4l2 codecs properly as there is very little information.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6707
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: v4l2h264dec instead of omxh264dec

Thu May 23, 2019 10:20 am

As with many things in Linux, there are multiple ways to achieve things, some better, some worse.

omxh264dec is sitting on top of IL. What I can say for definite is that that is copying all the pixels from your image from GPU memory to ARM memory, and it can produce I420, NV12, or RGB565 output.
v4l2h264dec is avoiding that copy state provided that the GStreamer pipe. If the pipe can handle dmabufs then it should use those and avoid copies down the pipe too. it can produce I420, YV12, NV12, NV21, or RGB565 output.

Clutter appears to be some Gnome extension. I haven't the time to go investigating it and how it handles anything.

The V4L2 codecs follow the V4L2 M2M interface docs. There's not much more to say about them.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

Return to “Graphics programming”