mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

is real-time decoding possible?

Thu Apr 18, 2013 8:07 pm

I've done some extensive testing with the decoder on the Pi, and have yet to be able to construct a real-time decoder pipeline. I'm not saying the decoder is slow (it decodes perfectly smoothly), but rather that there is some sort of delay (which varies slightly based on the type of stream I send) that is always present.

On this page:

http://home.nouwen.name/RaspberryPi/doc ... ecode.html

It says:
Incoming frames will be buffered to protect against file system read latency when playing back, and protect against the media stream interleaving in the file format. Typically multiple input frames have to be provided before any output frames are produced.
So my question to the forum: is realtime decoding possible on the Broadcom? If so, what type of pipeline would allow for that?

Note: I've been able to get real-time decoding in software on my i5, with numerous stream types, with no problem. I've tried many configuration structs and pipeline styles with OMX, as well as making modifications to my source streams, but I'm wondering if the delay is actually fixed into the hardware?

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Thu Apr 18, 2013 8:08 pm

I forgot to note, this is h264 video decoding.

OtherCrashOverride
Posts: 582
Joined: Sat Feb 02, 2013 3:25 am

Re: is real-time decoding possible?

Thu Apr 18, 2013 9:30 pm

So my question to the forum: is realtime decoding possible on the Broadcom?
From my tests with 720p h264, the Pi is capable of better than real-time decoding: It can decode video faster than the presentation interval.

Perhaps your question is more aimed at latency (the time between when data is sent and the picture displayed)? There are many factors other than video_decode that affect latency.

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Thu Apr 18, 2013 10:00 pm

You're right that once the decoding kicks into gear it keeps up fine. The specific latency I'm trying to eliminate is the time between when the first 264 frame is submitted to the decoder, to when it's display on the screen (through a tunnel). Every frame after the first seems to be behind by this same delay.

Network latency, CPU latency, etc. I'm fairly certain is not a factor. I send the same streams to other computers and they can decode it with no perceptible delay.

I just recently found one test stream that the Pi seems to respond very well to, that is done with x264enc. After analyzing what NAL packets it's sending, I notice that it has an SEI NAL, that I haven't tried including. Maybe as soon as the Pi sees a SEI NAL it proceeds to rendering?

I will follow up after I figure out how to do add the SEI NAL to my main test stream.

saintdev
Posts: 39
Joined: Mon Jun 18, 2012 10:56 pm

Re: is real-time decoding possible?

Fri Apr 19, 2013 4:00 am

mpr wrote:You're right that once the decoding kicks into gear it keeps up fine. The specific latency I'm trying to eliminate is the time between when the first 264 frame is submitted to the decoder, to when it's display on the screen (through a tunnel). Every frame after the first seems to be behind by this same delay.
What you're referring to is decoder delay. All decoders have some sort of delay, this is a fact you will have to deal with. Now, there are ways to (attempt to) minimize this delay on the encoder side. Unfortunately, because the RaspberryPi's decoders are a black box the only way to determine what the latency is with experimentation. This could change because of firmware revisions, number and type of OpenMAX components in the tunnel (decoding, scaling, rendering, etc will all have their own independent delay). Also, OpenMAX only guarantees the processing time of the buffer to be 30ms, it makes no guarantees on the decoder delay and does not offer any parameter to request lower latency.
Network latency, CPU latency, etc. I'm fairly certain is not a factor. I send the same streams to other computers and they can decode it with no perceptible delay.

I just recently found one test stream that the Pi seems to respond very well to, that is done with x264enc. After analyzing what NAL packets it's sending, I notice that it has an SEI NAL, that I haven't tried including. Maybe as soon as the Pi sees a SEI NAL it proceeds to rendering?
No. SEIs just tell the decoder if it starts decoding from this frame, after X frames the output will be 'approximately correct'. This is going to depend on how the specific stream was encoded, you can't just insert them into a stream.

Now, to (attempt to) reduce the latency, you can try the following.
  • Reduce, or eliminate the use of B-frames. This is the biggest one, the more B-frames, the more frames the decoder MUST buffer before it can begin outputting frames.
  • Intra-only. Intra frames have no dependencies on previous frames, so the there is no need to buffer more than one frame at a time.
This does not mean the Pi's decoder will optimize these cases for latency, it's possible that it has a minimum queue of N frames before it begins decoding. If this is the case, nothing you do is going to reduce the latency below that number of frames.
SELECT `signature` FROM `users` WHERE `username`='saintdev';
Empty set (0.00 sec)

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 17877
Joined: Sat Jul 30, 2011 7:41 pm

Re: is real-time decoding possible?

Fri Apr 19, 2013 8:39 am

Look at it like this, there will always be a delay - decoding cannot happen instantly. Take a look at a terrestial HD signal vs SD - the HD lags behind - I think that's due to encode/decode delays being longer for HD as there is more to do.

What I have seen done in an attempt to reduce latency is starting to decode the frames before they have been fully received. This requires the frame to be in stripes. You could probably get down to 10ms latency like that. I think something similar is done on the Nintendo WiiU to transmit the video to the controller.

This sort of code is NOT on the Raspi. I think you will need to live with the startup delay.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Fri Apr 19, 2013 5:10 pm

Thanks all for the replies and ideas. 10ms to 30ms delay would be GREAT. The delay I'm seeing now is in the low seconds.

The SEI NAL turned out to be useless. It was just an informational packet with x264 information in a human readable string. So it didn't affect output at all. :lol:

My circumstances are a bit unusual because I have low level control over my encoded stream. I'm using VAAPI with Quicksync and writing out the NAL packets by hand. So I just have to determine if there's some sort of magic bit I can set that will trigger the Broadcom to decode more immediately.

And I have another solid lead: in the SPS packet, there is a "VUI" subpacket that has flag "max_dec_frame_buffering". x264 sets this to 1 and currently I don't supply VUI information at all, so I'm gonna try adding it!

dickon
Posts: 216
Joined: Sun Dec 09, 2012 3:54 pm
Location: Home, just outside Reading

Re: is real-time decoding possible?

Fri Apr 19, 2013 5:21 pm

Make sure you're setting the right profile before starting the decoder, as this has a direct bearing on the number of B-frames (and hence latency required) expected to be in the stream. And make sure your stream actually conforms to that profile...

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Fri Apr 19, 2013 5:51 pm

The VUI bitstream restriction setting WORKS and now the Raspberry has zero latency decode. :mrgreen:

bitstream_restriction_flag : 1
motion_vectors_over_pic_boundaries_flag : 1
max_bytes_per_pic_denom : 0
max_bits_per_mb_denom : 0
log2_max_mv_length_horizontal : 10
log2_max_mv_length_vertical : 10
num_reorder_frames : 0
max_dec_frame_buffering : 1

This technique may also work for other hardware that is experiencing similar startup delays...

For future reference, since it was mentioned a few times, we are not using B frames in the stream.

jldeon
Posts: 18
Joined: Thu Apr 18, 2013 2:45 pm

Re: is real-time decoding possible?

Thu May 02, 2013 7:58 pm

Just to add some hard numbers to this discussion...

I've got a set of test videos that I'm attempting to play back with my Pi. The video is being streamed over RTP, so the frames should be arriving in real-time. I'm measuring the time between when the first buffer arrives at the decoder's input port, and when the port settings changed message is emitted from the decoder. There's not any funky junk data at the start, and there are no B frames whatsoever.

From first buffer into the decoder to port settings changed message:
  • 350 msec for my 720p30 video
  • 720 msec for 360p30 video
  • 2700 msec for 180p7.5 video
The 2.7 seconds for the 180p7.5 is harsh! I plan to try the VUI settings proposed by mpr to see if there's any benefit to my app.

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Fri May 03, 2013 2:19 pm

jldeon one more small thing to keep in mind, if you're expecting the Pi to pick up an already playing UDP stream, you'll need to be periodically sending IDR packets. How often you send them won't affect the delay of the playback, but it will greatly affect the _startup time_ of your stream. Some streams only have an IDR on the very first frame, and you'll hang forever if you try to play one of those mid-stream.

max_dec_frame_buffering=1 has been pure gold for us in reducing real-time delay. I cannot recommend it enough.

jldeon
Posts: 18
Joined: Thu Apr 18, 2013 2:45 pm

Re: is real-time decoding possible?

Wed May 08, 2013 5:40 pm

mpr,

How were you measuring your first-frame delay? I modified the incoming SPS packet(s) to add a VUI+bitstream restriction fields, and set them as you had in your example. While the first frame delay got better, I'm still seeing 6-10 frames worth of latency at startup.

I'm wondering what else I could be doing differently here... did you set up any particular settings on the OMX decoder component?

I suppose my encoder might be to blame, but I don't have much control over that, sadly.

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Wed May 08, 2013 5:55 pm

jldeon for our application as soon as we send out an IDR frame our playback starts. We have some flags in the encoder (Quicksync) where we control how often this is done. With the VUI flags set appropriately the Raspberry then proceeds to decode in near real-time. We don't care much about fast startup time (reducing ongoing playback delay is our goal), so we set the IDR rate to be every 60 frames and it could probably be much higher.

if you're trying to minimize the startup delay, rather than the ongoing playback delay, you just need to get an IDR frame out as soon as possible or reduce the infra refresh rate if you're not using IDRs. Do you have any settings related to IDR or infra refresh rate in your encoder? May I ask what type of encoder it is?

Also you need to let us know if you're connecting on the fly, or if the encoder is started after the decoder is ready. If the decoder is ready to go when the encoder starts, it should get an IDR frame at the very beginning, and begin playback immediately.

jldeon
Posts: 18
Joined: Thu Apr 18, 2013 2:45 pm

Re: is real-time decoding possible?

Wed May 08, 2013 7:14 pm

In my code, I've added some instrumentation to measure the time the first buffer arrives, and then measure the time again when the first frame is rendered, and that's the measurement I'm using to gauge the latency. It's short enough now that it could appear instantaneous (or nearly so) to the naked eye (ie, less than 250-300 msec), but I wasn't sure if you'd done similar instrumentation on your decode latency. The "goal" though is something on the order of 50 msec or so, which still seems a bit far from my current position.

Overall, I believe the goal is to reduce latency not only at startup, but at the steady state as well. It's just this is the only current measurement I can take with the OMX components/pipeline. Once frames have started to decode, I don't know what buffer corresponds to what frame and can't really measure the latency as accurately as I can when the first frame is coming through.

Right now I'm using some static RTP streams that were captured between a working encoder and a working decoder. I don't really have access to the encoder, (thus I am hacking up packets in the static streams I have...) and I don't know much about it at this point, I was just given the streams as a set of examples of the types of things that would be nice to be able to play back. They start with all the standard NAL packets (SPS, PPS, etc) and an IDR frame, I believe.

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Wed May 08, 2013 7:32 pm

You mentioned 2.7 seconds for 1080p. Is it still that bad? 50ms _seems_ like a doable startup time provided you get the IDR frame promptly, and none of the other parts of loading and initializing your program eat into that..

What elements do you have in the OMX tunnel? If you're playing live streams you can get rid of the schedule and time elements, leaving just video decode and render.

You may want to use h264_analyze and double check that all your test streams have an IDR at the start (nal type 5). You could also check that those test streams playback rapidly on other decoding software or hardware. I usually compare my raspberry decoder to a local ffdec_h264 gstreamer pipeline.

jldeon
Posts: 18
Joined: Thu Apr 18, 2013 2:45 pm

Re: is real-time decoding possible?

Wed May 08, 2013 9:04 pm

The 2.7 seconds number is for a 180p, 7.5fps stream (not 1080p). With the VUI changed, it drops to ~600-700 msec.

I stripped the RTP data off the stream, and the first three NALs are SPS, PPS, and then an IDR.

I've got a scheduler/clock in there, I could potentially rip it out if it was causing my issues... that's probably at least a few hours of rework, though. Did you do any profiling before and after? What was the impact of those elements?

mpr
Posts: 22
Joined: Wed Mar 27, 2013 6:56 pm

Re: is real-time decoding possible?

Wed May 08, 2013 9:09 pm

I don't have exact numbers on what you get from ripping those elements out, but it was easily detectable by the human eye. Several frames worth at least, and it may vary by resolution.

They're relatively easy to rip out too. If you shrink your tunnel sizes and just remove all references to those elements, you pretty much have it. Link the decode element to the render element after you get the settings changed event. Your code will be a lot cleaner all of a sudden. 8-)

jldeon
Posts: 18
Joined: Thu Apr 18, 2013 2:45 pm

Re: is real-time decoding possible?

Tue May 21, 2013 2:38 pm

I think I've driven the latency down to "good enough for the time being."

I'll leave this thread with a few notes:

I didn't see any difference in pulling the scheduler and clock out. Maybe 10's of milliseconds, but nothing significant.

If you're going to write a VUI structure into the SPS at the decoder, you can't use a constant value for "max_dec_frame_buffering" - it has to be at least the value of "max_num_ref_frames" which is part of the standard SPS packet. I naively fixed this value at 1, but when a stream came in with more than 1 in max_num_ref_frames, all sort of interesting glitches started happening. Instead, set max_dec_frame_buffering in the VUI to the greater of 1 or max_num_ref_frames and you should be better off.

If you're doing this at the encoder, one hopes that the encoder is smart enough to process the various values correctly...

sbakke
Posts: 3
Joined: Tue Apr 04, 2017 10:46 pm

Re: is real-time decoding possible?

Tue Apr 04, 2017 10:53 pm

Hope someone is still looking post after 4 years!!!

We are also trying to do get the absolute lowest latency decode, give the decoder a frame, get a frame out. However we get 10 to 20 frame delay through the decoder, when using the hello_video example. Tried creating a pipeline with just decoder and render (removed scheduler and clock) but nothing is ever displayed. Is there more we need to do?

Thanks
Steve

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4426
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: is real-time decoding possible?

Wed Apr 05, 2017 10:48 am

sbakke wrote:Hope someone is still looking post after 4 years!!!

We are also trying to do get the absolute lowest latency decode, give the decoder a frame, get a frame out. However we get 10 to 20 frame delay through the decoder, when using the hello_video example. Tried creating a pipeline with just decoder and render (removed scheduler and clock) but nothing is ever displayed. Is there more we need to do?
How much do you understand of H264 bitstreams?

My recollection of this is that the decoder waits for the DPB to be full before generating the first frame. Prediction allows references to anything within the DPB, so if it isn't full then you have special cases to handle.
The size of the DPB is specified by various parameters in the stream headers, but for level 4.0 at 1080P it would be 4 frames. If you've signalled level 4.0 but a lower resolution then you've increased the number of frames.

There is a field in the headers (max_num_ref_frames) that can specify the maximum number of reference frames. The requirement on the decoder generally was to play as many streams as possible, even those with incorrectly defined headers. For that reason I believe that max_num_ref_frames is ignored within the decoder and solely the DPB size defined by MaxDpbMbs is used.

Basic rule is to set the level appropriately for the stream. If 720P30, then setting level 3.1 will give you 5 frames in the DPB instead of 9 or 10 with level 4.0. Encode VGA but signal level 4 and you'd be looking at 27 or 28 frames.


You also haven't said anything about how you're feeding data into the system, but your reference to hello_video makes me worry that you're using Linux pipes. File access in Linux is buffered, so the OS can be hanging on to data waiting for full buffers. That may be the bigger source of latency than the decoder itself.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
Please don't send PMs asking for support - use the forum.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5099
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: is real-time decoding possible?

Wed Apr 05, 2017 10:55 am

sbakke wrote: Tried creating a pipeline with just decoder and render (removed scheduler and clock) but nothing is ever displayed. Is there more we need to do?
hello_video_simple is a working version of what you are describing. Might be worth a look.

sbakke
Posts: 3
Joined: Tue Apr 04, 2017 10:46 pm

Re: is real-time decoding possible?

Wed Apr 05, 2017 7:36 pm

Thanks. The new example was just what we needed, had a missing config setup. This gave us a latency of 2-3 frames depending on video stream. Then inserted two "Access unit delimiter" NAL units after each frame (0,0,1,9,0,0,1,9) and that dropped the latency to 1 frame for all video streams, that is, takes 2 frames in to get the first frame out.

Now if we could get the one frame latency out of there would be perfect, but this is good enough for now.

Thanks for the help.

Steve

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 17877
Joined: Sat Jul 30, 2011 7:41 pm

Re: is real-time decoding possible?

Wed Apr 05, 2017 7:39 pm

What was the missing config? Just to tidy up the thread!
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

sbakke
Posts: 3
Joined: Tue Apr 04, 2017 10:46 pm

Re: is real-time decoding possible?

Wed Apr 05, 2017 8:05 pm

The hello_video_simple example worked fine. We had modified the hello_video example and it was our modification that didn't have the configuration of the video_render, which is in the hello_video_simple example.

Steve

Return to “OpenMAX”

Who is online

Users browsing this forum: No registered users and 1 guest