mpr wrote:You're right that once the decoding kicks into gear it keeps up fine. The specific latency I'm trying to eliminate is the time between when the first 264 frame is submitted to the decoder, to when it's display on the screen (through a tunnel). Every frame after the first seems to be behind by this same delay.
What you're referring to is decoder delay. All decoders have some sort of delay, this is a fact you will have to deal with. Now, there are ways to (attempt to) minimize this delay on the encoder side. Unfortunately, because the RaspberryPi's decoders are a black box the only way to determine what the latency is with experimentation. This could change because of firmware revisions, number and type of OpenMAX components in the tunnel (decoding, scaling, rendering, etc will all have their own independent delay). Also, OpenMAX only guarantees the processing time of the buffer to be 30ms, it makes no guarantees on the decoder delay and does not offer any parameter to request lower latency.
Network latency, CPU latency, etc. I'm fairly certain is not a factor. I send the same streams to other computers and they can decode it with no perceptible delay.
I just recently found one test stream that the Pi seems to respond very well to, that is done with x264enc. After analyzing what NAL packets it's sending, I notice that it has an SEI NAL, that I haven't tried including. Maybe as soon as the Pi sees a SEI NAL it proceeds to rendering?
No. SEIs just tell the decoder if it starts decoding from this frame, after X frames the output will be 'approximately correct'. This is going to depend on how the specific stream was encoded, you can't just insert them into a stream.
Now, to (attempt to) reduce the latency, you can try the following.
- Reduce, or eliminate the use of B-frames. This is the biggest one, the more B-frames, the more frames the decoder MUST buffer before it can begin outputting frames.
- Intra-only. Intra frames have no dependencies on previous frames, so the there is no need to buffer more than one frame at a time.
This does not mean the Pi's decoder will optimize these cases for latency, it's possible that it has a minimum queue of N frames before it begins decoding. If this is the case, nothing you do is going to reduce the latency below that number of frames.