Would you be prepared to do some experimentation for me?
We now have V4L2 codecs and resizer merged into the 4.19 kernel branch, and the rpi-update kernel has been updated to be 4.19. They use CMA for their memory allocations, so you'll need to add "cma=256M" or similar to the start of /boot/cmdline.txt. (CMA is not the same as gpu_mem in that Linux can still hand it out to other users, but may move those about in memory should a user request a CMA allocation it can't otherwise fulfil).
GStreamer (>1.10 IIRC) supports these through
video4linux2: v4l2jpegdec: V4L2 JPEG Decoder
video4linux2: v4l2mpeg4dec: V4L2 MPEG4 Decoder
video4linux2: v4l2mpeg2dec: V4L2 MPEG2 Decoder
video4linux2: v4l2h263dec: V4L2 H263 Decoder
video4linux2: v4l2h264dec: V4L2 H264 Decoder
video4linux2: v4l2vp8dec: V4L2 VP8 Decoder
video4linux2: v4l2h264enc: V4L2 H.264 Encoder
video4linux2: v4l2jpegenc: V4L2 JPEG Encoder
video4linux2: v4l2convert: V4L2 Video Converter
It obviously detects MJPEG as JPEG even though they are currently distinct formats within V4L2.
Being V4L2 they also support passing dmabufs for zero copying of the data - this should be a moderate gain over the OMX components which will always copy the images between components.
Transcoding and resizing off the command-line I'm getting x1.2 of realtime (took 8min13 for a 9min56 clip) for a 1080P decode, resize to 800x480 with v4l2convert, and then reencode that.
Code: Select all
gst-launch-1.0 -e -vvv filesrc location=big_buck_bunny_1080p_h264.mov ! qtdemux name=mux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 ! v4l2convert output-io-mode=4 capture-io-mode=4 ! video/x-raw,format=I420,width=800,height=480 ! v4l2h264enc output-io-mode=5 ! h264parse ! qtmux ! filesink location=transcode.mov
Any chance you could try dropping this into your pipelines and see what performance you get? In theory it should be faster than the OMX versions due to the reduced copying.
I haven't read this whole thread, but it looks like you're currently having to use the GStreamer deinterlace component. It may be possible to wrap the GPU deinterlace as well, but I'm not sure GStreamer supports that function through V4L2 at present. Interlacing as a whole is one thing I need to check through - I suspect there are various bits of plumbing missing in the signalling. Always more to do.
Thanks in advance.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.