Daniel Gessel
Posts: 117
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Understanding glxgears performance on Pi4

Wed Dec 18, 2019 5:22 pm

Apologies if this is naive: glxgears with the window enlarged to 4K, is running on my Pi4 at about 10 FPS and on tears badly on an up-to-date Raspian Buster.

My guess would be that Pi4 should be able to do at least 2 GPixels/s fill rate. Assuming an overdraw of 2, there’s about 16 M/pix to draw and 8 more to copy to screen. Rounding up to 25 Mpix, that’s 80fps.

It reports the V3D driver is used, so I believe it’s using the VC-IV to do the rendering. If I build a modified project locally, which fills the screen with a texture, I see the same behavior. But if I only call swap every 10 frames, total fill rate goes up dramatically, to something equivalent to 40 draws per second. Faster at 100 draws per swap and more by not clearing the screen to peak at about 1Gpix/s.

I suspect that the time is in the swap to screen (the tearing is another hint). CPU usage is low, so I don’t think the cpu is touching the data.

Thoughts, experiences?

User avatar
Paeryn
Posts: 2808
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Understanding glxgears performance on Pi4

Wed Dec 18, 2019 6:53 pm

Are you running the X Composite Manager (xcompmgr)? If so that will cause a hit. At 1920x1080 glxgears gives me 35 fps with xcompmgr running, without it allows glxgears to go up to 60 fps.
She who travels light — forgot something.

Daniel Gessel
Posts: 117
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Understanding glxgears performance on Pi4

Wed Dec 18, 2019 7:45 pm

Thanks! My experience with Linux, Raspbian and X is pretty limited, but I found xcompmgr running using htop. If I kill it (nice that it seems to run on the side) my performance goes up to close to 15 fps (only 50 fps on my HD monitor...). I assume xcompmgr uses the GPU, so it makes me wonder where is the raw performance going? Even at 1Gpix/s (say a pi zero) I’d expect more throughput.

It doesn’t seem like I’m memory bound, given the performance without swapping, but I’m slightly suspicious of that methodology, since if I never swap I get outrageous performance numbers (like rendering is thrown on the floor).

Memory speed could be the limiter: are there standard benchmarks (write a memcpy benchmark?) or any pointers to figuring out the theoretical limits?

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 25043
Joined: Sat Jul 30, 2011 7:41 pm

Re: Understanding glxgears performance on Pi4

Wed Dec 18, 2019 7:51 pm

TBH, this is something that has confused me as well. GLXGears is pretty simple, much simpler than many games etc that have much higher frame rates.

It may be how the image is finally rendered, there may be some weird format conversions going on in there that hammer the systems throughput.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I own the world’s worst thesaurus. Not only is it awful, it’s awful."

Daniel Gessel
Posts: 117
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Understanding glxgears performance on Pi4

Wed Dec 18, 2019 8:15 pm

This thread https://www.raspberrypi.org/forums/view ... 7&t=246983 suggests there are “bandwidth limitations” in GLX that are not present in EGL. No idea why...

But it seems worth trying EGL to create the context. Will update when I get that sorted out.

Update: no magic performance improvement with EGL.

pik33
Posts: 187
Joined: Thu Sep 10, 2015 4:26 pm

Re: Understanding glxgears performance on Pi4

Fri Dec 20, 2019 8:51 am

The keyword is: RAM bandwidth.

Theoretical maximum RAM bandwidth in RPi4 should be 12.8 GBps. In reality you have to send commands to the RAM, etc, and the real bandwidth will be much less. For the ARM on RPi4 maximum it can do is something more than 4 GBps but less than 5 GBps (this is what I measured)

Now what happens when glxgears works?

(1) To simply display 4K screen at 60 fps the RPi needs 1.8 GBps
(2) the GPU has to write the buffer with data (1.8 GBps)
(3) the GPU has to swap buffers. In RPi3 this can be done without memory copy, but it seems to be not the case for the RPi4. This costs 3.6 GB/s (1.8 GBps read, 1.8 GBps write)

So we used 7.2 GBps now
And the ARM still needs RAM bandwidth.

If the compositing is on, another 3.6 GBps is needed for it.

There is only one way to speed things up: write the graphics drivers in a way which minimizes memory copy operations. The memory bandwidth was the bottleneck for RPi3 and it seems to remain in RPi4: the RAM is much faster there but the CPU and GPU are faster too.

User avatar
Gavinmc42
Posts: 4301
Joined: Wed Aug 28, 2013 3:31 am

Re: Understanding glxgears performance on Pi4

Fri Dec 20, 2019 10:42 am

1280x1024, 1980x1080 both doing 60fps.
So need to get a 4K screen.
I think someone has tried with a 4K screen?

glxinfo shows which version of mesa?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Daniel Gessel
Posts: 117
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Understanding glxgears performance on Pi4

Fri Dec 20, 2019 11:55 am

pik33 wrote:
Fri Dec 20, 2019 8:51 am
The keyword is: RAM bandwidth.

Theoretical maximum RAM bandwidth in RPi4 should be 12.8 GBps. In reality you have to send commands to the RAM, etc, and the real bandwidth will be much less. For the ARM on RPi4 maximum it can do is something more than 4 GBps but less than 5 GBps (this is what I measured.
Thanks! That explains the 1Gpix/s limit I’m seeing doing repeated draws with no swaps.

My monitor is @ 30 hz,, round up to 1GB/s. Getting ~15fps (with compositing off) so that’s ~0.5GB/s per read or write of the buffer. Call it 3 to render, 2 to copy to screen... about 3.5 GB/s total. Starting to be in the ballpark...

Return to “OpenGLES”