mattday
Posts: 15
Joined: Sun Jul 26, 2015 5:00 pm

BGR slower than RGB?

Wed Feb 21, 2018 2:32 pm

I have yet to put together a sufficiently deterministic test that proves undeniably this is the case, so I could be jumping to conclusions. However, when processing video frames on the Pi, I have seen on several occasions that I often start dropping frames by only changing the video port from RGB24 to BGR24.

My understanding is the VPU is converting to these formats from I420. The destination pixel ordering has no obvious influence on computation required for the colour space conversion. So perhaps when BGR is requested, the VPU actually converts to RGB first, then takes a second step going to BGR?

Any insight here is very welcome. I will continue to try to get a solid test together when I have an opportunity.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 9069
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: BGR slower than RGB?

Wed Feb 21, 2018 4:16 pm

mattday wrote:
Wed Feb 21, 2018 2:32 pm
My understanding is the VPU is converting to these formats from I420. The destination pixel ordering has no obvious influence on computation required for the colour space conversion. So perhaps when BGR is requested, the VPU actually converts to RGB first, then takes a second step going to BGR?

Any insight here is very welcome. I will continue to try to get a solid test together when I have an opportunity.
Correct.
There wasn't a conversion function from I420 to BGR, and time was short, therefore it does it as a two pass approach and converts I420 to RGB, and then swaps the R & B samples over.
IIRC BGR was only added as part of the V4L2 driver because OpenCV only supported BGR at the time, and therefore it was a necessity to get it working. viewtopic.php?f=43&t=62364&start=300#p530358 is where I seem to have first discussed it. I seem to recall benchmarking the difference too.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

mattday
Posts: 15
Joined: Sun Jul 26, 2015 5:00 pm

Re: BGR slower than RGB?

Wed Feb 21, 2018 9:41 pm

6by9 wrote:
Wed Feb 21, 2018 4:16 pm
IIRC BGR was only added as part of the V4L2 driver because OpenCV only supported BGR at the time, and therefore it was a necessity to get it working. viewtopic.php?f=43&t=62364&start=300#p530358 is where I seem to have first discussed it. I seem to recall benchmarking the difference too.
Right, I should have taken a look back through that V4L2 thread before now. I've just read some of my own questions and findings from 2015 together with your always helpful replies :oops:
Unfortunately, I haven't really touched any of the camera stuff at a low level since then. However, for reference, you said:
6by9 wrote:
Sun Jul 26, 2015 8:58 pm
BGR3 also takes an extra step over and above RGB3 (My brain couldn't munging the coefficients at the time, so we convert I420 to RGB3, and then swap all the red and blue values. At some point I might revisit that, but very low priority).
So is there much chance of seeing a direct I420 to BGR conversion at some point? Superficially, it seems like it should be relatively straightforward to implement given access to the I420 to RGB code. Certainly OpenCV is my only reason for wanting anything in BGR, but I think that is also going to be the case for a lot of people.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 9069
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: BGR slower than RGB?

Thu Feb 22, 2018 7:26 am

mattday wrote:
Wed Feb 21, 2018 9:41 pm
So is there much chance of seeing a direct I420 to BGR conversion at some point? Superficially, it seems like it should be relatively straightforward to implement given access to the I420 to RGB code. Certainly OpenCV is my only reason for wanting anything in BGR, but I think that is also going to be the case for a lot of people.
It's not a priority. The vector assembly involved isn't commented, and non obvious. IIRC OpenCV does now support I420. Rgb support would be easier in arm neon as it supports reading triplets into 3 vector registers (the vpu doesn't) - I don't know if they've looked at that recently.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 9069
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: BGR slower than RGB?

Thu Feb 22, 2018 2:49 pm

(I really must learn not to get distracted by potentially interesting things!)
I had a look. Turns out that after plotting out what goes where, the code changes aren't as hideous as they at first seemed. After a slight false start (blasted short-paths) I think I have native conversion routines handling most cases, but it could do with a bit more testing.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

mattday
Posts: 15
Joined: Sun Jul 26, 2015 5:00 pm

Re: BGR slower than RGB?

Thu Feb 22, 2018 9:50 pm

6by9 wrote: (I really must learn not to get distracted by potentially interesting things!)
I had a look. Turns out that after plotting out what goes where, the code changes aren't as hideous as they at first seemed. After a slight false start (blasted short-paths) I think I have native conversion routines handling most cases, but it could do with a bit more testing.
That's excellent! :D
Are volunteers required to help test, or do you have it covered?
6by9 wrote: IIRC OpenCV does now support I420. Rgb support would be easier in arm neon as it supports reading triplets into 3 vector registers (the vpu doesn't) - I don't know if they've looked at that recently.
I believe recent versions of OpenCV have a load of NEON optimisations. It looks like OpenCV video capture now supports I420, but you then have to convert to the 'native' BGR to really do anything interesting, so I'm sure a lot of people will be happy with this performance boost!

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 9069
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: BGR slower than RGB?

Fri Feb 23, 2018 10:48 am

mattday wrote:
Thu Feb 22, 2018 9:50 pm
6by9 wrote: (I really must learn not to get distracted by potentially interesting things!)
I had a look. Turns out that after plotting out what goes where, the code changes aren't as hideous as they at first seemed. After a slight false start (blasted short-paths) I think I have native conversion routines handling most cases, but it could do with a bit more testing.
That's excellent! :D
Are volunteers required to help test, or do you have it covered?
https://drive.google.com/file/d/1CyocGJ ... sp=sharing should be the appropriate firmware files. Even more warnings on this than a normal rpi-update - absolutely no guarantees, so please back up your existing files before trying it.
I've created the internal pull request to merge the changes anyway as I'm happy with the results viewed through QV4L2, so it should be in the next rpi-update.
mattday wrote:
6by9 wrote: IIRC OpenCV does now support I420. Rgb support would be easier in arm neon as it supports reading triplets into 3 vector registers (the vpu doesn't) - I don't know if they've looked at that recently.
I believe recent versions of OpenCV have a load of NEON optimisations. It looks like OpenCV video capture now supports I420, but you then have to convert to the 'native' BGR to really do anything interesting, so I'm sure a lot of people will be happy with this performance boost!
In some ways doing conversions in NEON with multiple 1GHz ARM cores has some performance advantages over the dual-core VPU running at 250MHz. The normal pain is in loading the data from SDRAM in the first place, so loading once, doing all processing, and storing the result once is always preferable to loading and saving multiple times. The VPU can win sometimes on this as it can load up more stuff into the VRF for processing than fits in NEON registers, although NEON typically relies heavily on the data cache instead. Swings and roundabouts.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

Return to “Camera board”