kotnia
Posts: 3
Joined: Sat Mar 31, 2018 8:37 pm

MMAL camera - h264 encoder latency

Sat Mar 31, 2018 8:58 pm

Hi all.
One day i decided to measure video camera pipilene latency on my raspberry pi 3 with camera module v1. Conditions: 1920x1080x30 H264. Research question was: how much time elapses between first line of image stars to exposure and encoded h264 packet emerged in userspace from VideoCore.

To get decent precision i've built simple timer on led matrix, connected straight to raspi GPIOs. Given a 30 FPS target it's able to display distinct 33ms intervals within 200ms modulo. Here is a picture:
Image

Timer thread updates leds every 33 ms, right at start of another interval. For example enabled leds l_3 and l_1 means it is 33 — 65 ms from ahother 200ms period start. Leds l_4 and l_2 shows 166 — 199 ms from another 200ms periond start. I'm aware l_3 and l_4 are excessive but that small flaw doesnt really affects anything.

Now i point camera at led matrix and start my measuring app wich collects frame along with time it was received. Later it decompresses and writes collected frames into jpeg files enhanced with timing info. This is what i get in the end:

Image
We can see here what led matrix enabled l_4 and l_0 diodes. It means frame started exposuring at 100ms and finished at 133ms. Label at bottom contains: current local time in format s.us; current time modulo 200000 us and time elapsed since previous frame was obtained in ms. This example shows that it was received on 28.390588 second, local time. 390 % 200 = 190-th ms (second number is receive time in microseconds). 190 — 100 = 90ms latency. Considering distortion in time intervals beetween frames wich could vary from 28 to 44 ms worst case would be entire 100ms+. This is rather disappointing. I was prepared for some lag but 90ms seems way too many. Note this is not a matter of transmitting data somethere. It is latency within raspberry pi device. 90ms is the cost of receiving compressed frame from camera.

I use MMAL api for media pipeline. Most of configuration code is similar to example in raspivid.c . Camera buffers in opaque format, camera out port tunneled to encoder input port. MMAL_PARAMETER_VIDEO_ENCODE_H264_LOW_LATENCY configured, but i didnt noticed any difference. Also tried MMAL_PARAMETER_MB_ROWS_PER_SLICE but only effect of it was some videocore semaphores hanged in locked state forever (i am planning to create separate topic for the issue). OpenMAX api gave me similar results, though it doesnt support opaque buffers for some reason.

Now to questions. Is it even possible to achieve reasonable latency on rapi 3 which i believe should be no more than 50 — 60ms? Which API allows to achieve lowest latency: OpenMAX or MMAL? What is the right way to split uncompressed camera frames into slices along with compressed h264 packets (mmal api)? Could there be difference with camera module v2?
Attachments
std_tunnel_0.jpg
std_tunnel_0.jpg (234.24 KiB) Viewed 1154 times
led_matrix_explained.png
led_matrix_explained.png (11.13 KiB) Viewed 1154 times
Last edited by kotnia on Sun Apr 01, 2018 6:59 pm, edited 2 times in total.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5562
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: MMAL camera - h264 encoder latency

Sun Apr 01, 2018 8:19 am

Already covered - viewtopic.php?t=153410&p=1027417#p1004792

I beg to differ that "this is not a matter of transmitting data somethere" - you're getting data from photons to the sensor, and from the sensor into the SoC.

The VideoCore firmware was a common codebase for multiple projects. MMAL_PARAMETER_VIDEO_ENCODE_H264_LOW_LATENCY was for a different project and the camera doesn't supply the data in a suitable format to use it.
AFAIK MMAL_PARAMETER_MB_ROWS_PER_SLICE is working fine - I used it about 6 months ago. The gain is likely to be low without a stripe based input and using _LOW_LATENCY, and it loses coding efficiency (but gains stream robustness).

On what basis have you determined 50-60ms to be reasonable?
None of the operations are instantaneous, and increasing clock speeds to make things run faster costs power whereas VideoCore was aimed at low power mobile applications. The codec block is specified for 1080P30, so it has to be able to accept a new frame every 33ms and anything less just means you're burning more power than necessary.

MMAL using opaque buffers and IL using tunnelling will be nearly identical.
V1 and V2 cameras will vary slightly as V2 has slightly faster ADCs and higher pixel rate over the CSI2, but the difference is likely to be minimal.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
Please don't send PMs asking for support - use the forum.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

kotnia
Posts: 3
Joined: Sat Mar 31, 2018 8:37 pm

Re: MMAL camera - h264 encoder latency

Sun Apr 01, 2018 10:38 pm

Thank you for your answer. I'm sorry for may be controversial terms such as 'reasonable'. It's hard sometime to find a correct word in foreign language.

So if i understand your explaination from viewtopic.php?t=153410&p=1027417#p1004792 correctly, pipeline timings should look like that:
mmal_timeline.png
mmal_timeline.png (6.94 KiB) Viewed 1143 times
If that is true i expect to be able to get raw yuv420 image from camera in 43 + 13 = 56 ms. But after some research i could not achieve less than 70 + 13 = 83 ms total for uncomressed yuv image (13 ms - is memcopy cost for 3mb 1920x1080 yuv420 image). It was measured with zerocopy mode enabled.

Here is a picture:
raw_with_zerocopy_an_zeroshutterlag.jpg
raw_with_zerocopy_an_zeroshutterlag.jpg (52.1 KiB) Viewed 1143 times
Is there something important i miss? Documentation at picamera project says that camera video port always contains two images which is needed for motion estimation. Could this be the reason for that much latency for raw images?
Last edited by kotnia on Thu Apr 05, 2018 7:37 pm, edited 2 times in total.

kotnia
Posts: 3
Joined: Sat Mar 31, 2018 8:37 pm

Re: MMAL camera - h264 encoder latency

Thu Apr 05, 2018 7:35 pm

And to reanimate topic a bit i'd like to answer this question:
On what basis have you determined 50-60ms to be reasonable?
  • Encoding routine can be executed with ISP routine in parallel (with some shift of course)
    • otherwise, if those stages were sequential, iteration would cost 76 ms making impossible to obtain 30 fps
  • Encoding routine has to take 33 ms in average (maybe with some jitter)
    • again, it wouldn't be possible to encode 30 frames per second otherwise
  • If MMAL_PARAMETER_MB_ROWS_PER_SLICE works fine then codec should be able to encode slices separately without waiting for entire frame complete
    • we can even expect Periodic Intra Refresh feature in modern H264 codec. At least there many IP providers offering hardware codecs with those features
    • mmal api allows to configure periodic intra refresh
That given i don't see fundamental reasons why complete pipeline in VideoCore can't look like this:
mmal_timeline_expected.png
mmal_timeline_expected.png (8.98 KiB) Viewed 1065 times

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5562
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: MMAL camera - h264 encoder latency

Fri Apr 06, 2018 4:59 pm

kotnia wrote:
Thu Apr 05, 2018 7:35 pm
And to reanimate topic a bit i'd like to answer this question:
On what basis have you determined 50-60ms to be reasonable?
  • Encoding routine can be executed with ISP routine in parallel (with some shift of course)
    • otherwise, if those stages were sequential, iteration would cost 76 ms making impossible to obtain 30 fps
Not quite. Image buffers are written to SDRAM, so there can be any number of buffers in between source (camera) and encoder. No requirement on timing to achieve 30fps through the pipeline as the source can just move on to filling the next set of buffers whilst the codec churns away.
As currently implemented (and unlikely to change), the camera will only pass off full frames to the encoder.
kotnia wrote:[*]Encoding routine has to take 33 ms in average (maybe with some jitter)
  • again, it wouldn't be possible to encode 30 frames per second otherwise
Again, not quite. Each phase of encoding must be complete within the 33ms frame period, however the encoder can be pipeline. Predominantly this is CABAC or CAVLC entropy encoding being separated out from the motion estimation and coding phase. The overall latency for a 1080P frame through the codec is around 40ms - ~32ms in motion estimation and 8ms in entropy coding. (There are also two phases to the motion estimation too, and I'm not sure if the first phase hardware block can be passed data piecemeal)
kotnia wrote:[*]If MMAL_PARAMETER_MB_ROWS_PER_SLICE works fine then codec should be able to encode slices separately without waiting for entire frame complete
  • we can even expect Periodic Intra Refresh feature in modern H264 codec. At least there many IP providers offering hardware codecs with those features
  • mmal api allows to configure periodic intra refresh
[/list]

That given i don't see fundamental reasons why complete pipeline in VideoCore can't look like this:
mmal_timeline_expected.png
Yes in theory we could overlap the camera and the encode. Reality is that we don't, and the extra effort required to save ~15ms (after overheads) probably isn't worth expending. I'm not aware of any other SoCs that do offer that much parallelism - if they use V4L2 then everything is frame based!
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
Please don't send PMs asking for support - use the forum.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

Return to “OpenMAX”

Who is online

Users browsing this forum: No registered users and 2 guests