tufty
Posts: 1454
Joined: Sun Sep 11, 2011 2:32 pm

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 7:39 am

jojopi said:


Compositing is mostly a gimmick.


Not at all.  Compositing is a necessity for any form of drawing that doesn't destroy what was there before.  You're conflating the flashy gimmicks of "modern" windowing systems with the basics of drawing.

A naive approach using the DMA engine for drawing (for example) text on a character-by-character basis will work as long as you restrict yourself to non-proportional typefaces, but any typeface where a character can overlap the "rectangle" of an adjacent character will result in really ugly results, with characters being clipped. For an extreme case, consider how you would implement the swashes in a typeface like Zapfino :



Jim Manley said:


How is _Open_ GL proprietary?


The "Open" in OpenGL refers to the specification, not to the source of any particular implementation.  It's a shame that the Broadcom implementation for Videocore is proprietary (ISTR a comment from Eben saying that he wanted the implementation to be released as open source, something along the lines of "showing how the guts of a professionally developed OpenGL implementation work"), but that's the way it is.

Simon

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18032
Joined: Sat Jul 30, 2011 7:41 pm

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 7:50 am

shirro said:


glx is totally off target as the platform supports ES not desktop opengl. More appropriate would be an EGL that allowed you to attach surface to x windows and I don"t know how any of that works (yet). Anyway there aren"t a lot of programs for linux that have gles support and a hell of a lot that need to move rectangles around.


ES is pretty similar to desktop GL AFAIK.


I haven"t delved into the Raspberry Pi kernel stuff much. Is there any framebuffer accel stuff in there that could be built upon? The support for the xorg driver will have to be in the kernel. Anyone who thinks it can be built on the userspace opengl libs is way off target I think.


Why does it need to be in the kernel? Why not use the standard library?


It is a but cheeky but the Freescale imx xorg driver is lgpl (and probably the TI omap ones as well) so it wouldn"t be hard to see how it is done.


Probably a good starting point!
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

shirro
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 7:56 am

Jim Manley said:


How is _Open_ GL proprietary?


The Broadcom implementation is most definately proprietary. X.org is MIT licensed so I don't think linking in a proprietary X driver is an issue (at least not a license one). I think it is better for applications to link to GLES and EGL libs as required for 3d stuff as not many programs for Linux currently support embedded opengl anyway. Proprietary stuff definately does not belong in the kernel and since X is sharing resources between processes there has to be some kernel involvement to do things right.


The goal of any kernel implementation is to make available all of the hardware features to the OS, services, and applications. If you had extra registers or I/O ports to work with, would you just ignore them?


Well that is why the kernel has all these interfaces like KMS, DRM etc. It is up to Broadcom whether that want to release proprietary info and write to these things. It isn't the sort of thing you generally need for a set top box or a sat nav.

Just because there are some nice user space graphics libraries written to a nice open standard doesn't mean linux programs are able to immediately benefit from them. The software that will adopt support for the embedded graphics stacks first will be the single task full screen apps like XBMC or games because that is the sort of task this class of hardware eats up. Getting proper multi-process management of graphics resources like you would find on a windowing desktop is another layer if complexity. It isn't that nobody wants 3d acceleration in X, it is more that that is orthogonal to the issue of getting X working better for the majority of users.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 8:53 am

As a quick update, I have:
- Xorg code base all building on PC
- forked a fbdev X server, that I can modify to become my Rpi X server (this is only a couple of files, so don't have a heart attack!)
- added in EXA support to this driver
- begun implementing Solid and Copy
- started looking at Composite (this will likely be tricky)

As Dom has said, DMA ought to live in kernel space but for now I'm going to do initial work with this root mmap hack. Then once that's working, a kernel module via DRM looks the way forward.

Using EXA a number of operations are accelerated - but not all of them. Forgetting what is/isn't accelerated, the scheme EXA uses before querying the hardware (ie my EXA driver) if an operation is supported is primarily based on frequency of use of the object. eg if a new glyph appears to X it will render it via the CPU until it's been seen enough times to warrant a GPU upload, then it will do GPU rendering. CPU rendering happens a lot more frequently than you'd imagine!
I may need to tune the MI (machine independent) EXA lib as it is making assumptions about the cost involved in a pixmap upload...

Thoughts?

GLX support is interesting. This is the scheme used to expose an OpenGL run-time to user programs attached to the X server, and allow them to rendering into X windows and pixmaps(?). GLX AFAIK by itself can't be used to accelerate internal X stuff, eg 2D blitting. Am I right in saying that the Broadcom libs have no OpenGL->ES translation layer? If not then something would have to be written to do this to allow X programs to have HW-acc GL.

Surely there's an off-the-shelf library that can do this? Either way my priority at the moment is 2D, not 3D.

shirro
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 8:53 am

JamesH said:


ES is pretty similar to desktop GL AFAIK.


Yes, I just meant the proper way to use ES is really with the supplied libGLESv2 and libEGL rather than GLX. Assuming the EGL library has been built with X11 support you initialise it with the X display and create a surface from an X window. While I am sure someone could write glx to target the hardware it would be much easier just to go with the flow and use the Khronos apis.

On the imx gear there are two sets of Open* libs. One lot target the framebuffer and the other target X. In the framebuffer ones you initialise the display with some constant and create your surface from the framebuffer. On the X11 ones you initialise with the X display and create your surface from an X window.

There isn't a lot of Linux software out there that would use ES out of the box so while it would be great down the track if the Pi's EGL worked with X to allow people to write windowed glES stuff it probably isn't as important as browser and editor scrolling.


Why does it need to be in the kernel? Why not use the standard library?


I don't know that it can't. The imx driver supports EXA via a library called libz160 which is one of the closed libs in the platform support package. It does stuff like setting up buffers and copying and blending and things that seem directly relevant to X acceleration. There might be some videocore lib that does something similar somewhere. I had a quick look at the symbols in the ones we have and couldn't spot anything similar looking. That might be a better match to the problem than OpenGL. I really am not qualified to say.

I went and looked at Wayland thinking that the supplied user space libs would be sufficient and got corrected since Wayland needs to be able to share surfaces between processes and anything that goes across a process boundary necessarily involves the kernel. That possibly doesn't apply to X, I don't know.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 8:59 am

Btw thanks for the enthusiasm guys, it's good to see the programmers coming out of the woodwork! Keep coming up with ideas everybody.

Now if we could get hardware early, a head-start to prepare an accelerated X would only improve the first impression people have when they get their hardware

(no idea where I am in the Farnell queue)

shirro
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 9:45 am

Is there a repo somewhere so we could follow development?

I think I missed it but what are you using to get GPU acceleration? Are you going to try and use gles or openvg? If so there are plenty of cheapish boards that do that pretty well as does Mesa though I imagine the overhead of using these things wouldn't be great for small operations.

I still think there is probably a low level broadcom 2d library (libvcforxaccel.so ?) that is missing from this equation but if you get something running on the OpenGLES or OpenVG APIs and it isn't too high latency it has applications elsewhere. And I would be easy to benchmark it against one of the other embedded xorg drivers to see if it is in the same ballpark since that stack is available everywhere.

If you want to have a look at the imx xorg implementation Genesi have a version of it on github, https://github.com/genesi/xorg-video-imx . As you can see all the heavy lifting in imx_exa_z160.c is done by a proprietary low level 2d graphics lib so it probably won't be a lot of help.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 10:38 am

Big thanks for those code links btw, that's actually really helpful. They do in that code many things that I plan on doing plus answer a number of questions. (the code does work, right? no artefacts? is it actually faster?)

One thing that I'm having trouble with is getting decent documentation for this stuff. Some of the stuff is a joke: http://dri.freedesktop.org/wik.....chitecture "the DRI architecture explained", http://dri.sourceforge.net/doc.....tion.phtml etc. Also this EXA stuff isn't too well-defined either. And how do I find proper definitions for what an X ALU operation is? "The best documentation is often the code" is something I saw late last night

Regarding which acceleration scheme is used, I'm gonna leave those as holes to be filled once I get some kit and can measure the latency of various tasks. Plus I'll need to profile which X operations (in a common usage scenario) are slow and need to be prioritised.

In answer to someone's question earlier, it is easy to have acceleration and the mode of acceleration toggled via the x.conf file via a user-defined string. I'm not sure it can be driven whilst the server is actually running, though. It is also extensible in that other options can be passed in too and they will reach my code to allow value wiggling...

(so does anyone know of other groups already working on X acc?)

User avatar
Jim Manley
Posts: 1593
Joined: Thu Feb 23, 2012 8:41 pm
Location: SillyCon Valley, California, and Powell, Wyoming, USA, plus The Universe
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 11:04 am

Shirro said:


The Broadcom implementation is most definately proprietary. X.org is MIT licensed so I don"t think linking in a proprietary X driver is an issue (at least not a license one). I think it is better for applications to link to GLES and EGL libs as required for 3d stuff as not many programs for Linux currently support embedded opengl anyway. Proprietary stuff definately does not belong in the kernel and since X is sharing resources between processes there has to be some kernel involvement to do things right.


I was assuming that by kernel we were talking about things outside the Broadcom blob, which we can only send things like OGLES calls into. Since I don"t know what"s in the blob, I have no way of knowing why/how to interface with anything in there other than through the interfaces that have been exposed. That does include OGLES, according to what I"ve seen discussed, but, I haven"t seen the docs, yet.

Well that is why the kernel has all these interfaces like KMS, DRM etc. It is up to Broadcom whether that want to release proprietary info and write to these things. It isn"t the sort of thing you generally need for a set top box or a sat nav.


Isn"t OGLES one of the Broadcom-supported interfaces? That"s what Eben, et al, have been saying should be used, since they aren"t providing source for the blob.

It isn"t that nobody wants 3d acceleration in X, it is more that that is orthogonal to the issue of getting X working better for the majority of users.


I"m not talking about 3-D acceleration in X, I"m talking about using 3-D hardware to accelerate the 2-D X primitives. It never ceases to amaze me how few people realize that 2-D is just a subset of 3-D, where the z values are always zero! Everything that"s needed to perform every function required by X already exists in the GPU hardware, including blitting and everything everyone else has mentioned. How do you think SGI was running X on systems since the mid 1980s? They sure as hell haven"t been doing it strictly in software on the CPU.

If no one in the Linux community has done this integration before, is it because few people are familiar with GPUs? I find that very surprising, although, now that I think about it, I"ve run into very few software people outside the 3-D graphics community who actually know how matrix math is used to perform GPU transformations in the hardware, and kernel guys are kinda known for not getting out much - and we"re glad someone is doing that thankless grunt work

I guess I"m "spoiled" because I did matrix multiplications by hand with a slide rule while waiting for our first interactive 3-D system to be installed, which also happened to be examples in our professor"s 3-D graphics math textbook that we were proofing with him. We did get free copies of the first editions, though - a much bigger deal now that engineering textbooks cost upwards of $400 new - yikes! Yes, I walked to and from school both ways, too ... uphill ... into the wind ... in snowdrifts ... that lasted through the Summers ...
The best things in life aren't things ... but, a Pi comes pretty darned close! :D
"Education is not the filling of a pail, but the lighting of a fire." -- W.B. Yeats
In theory, theory & practice are the same - in practice, they aren't!!!

shirro
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 12:06 pm

teh_orph said:


Big thanks for those code links btw, that's actually really helpful. They do in that code many things that I plan on doing plus answer a number of questions. (the code does work, right? no artefacts? is it actually faster?)


Good questions. No artifacts. Seems to work though might be slightly different revision of the code.(II) EXA(0): Driver allocated offscreen pixmaps
(II) EXA(0): Driver registered support for the following operations:
(II) Solid
(II) Copy
(II) Composite (RENDER acceleration)
(II) UploadToScreen
(II) DownloadFromScreen

Is it faster? No idea. I am too tired to go looking for benchmarks. I thought I would just scroll a long wikipedia page in Chromium since that it the sort of large bitmappy thing you would think would accelerate well.  Was hardly any difference between imx driver with AccelMethod none, EXA and frame buffer driver. If anything the EXA seemed slightly slower. Just for comparison I scrolled the same page in Chrome on a 2009 MacBook with nvidia graphics running OSX and it took over twice as long so my browser usability metric probably doesn't adequately reflect graphics performance.

I might have something mis-configured somewhere but so far I am not impressed with the GPU performance on these things compared with what I have seen from the Pi demos. I wouldn't doubt that the 1Ghz A8 can probably render some stuff faster than the GPU on this. The Pi might be a different matter. One of the dangers of optimising something like X without actual hardware is you could easily optimise the wrong thing.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18032
Joined: Sat Jul 30, 2011 7:41 pm

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 12:14 pm

Worth reading this for an idea of what in closed and open. Also, read come of the comments where Dom has put me right (again, and again...)

http://www.raspberrypi.org/archives/592
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

shirro
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 1:10 pm

Hah! Either I have a dodgy board or EXA just slows this thing down.

It might not be much of a benchmark but I just ran gtkperf because it was easy and probably is closer to real work than some fancy graphics benchmark. Slowest was the default image the imx53 shipped with which has the imx xorg driver and EXA. I changed to a simpler gtk theme and it sped up to just about match performance of my hacked up kernel and working image with EXA. The frame buffer driver was still faster. The imx driver without no accel enabled seems to be the sweet spot.

Based on that I wouldn't bother playing with X acceleration until hardware turns up and you can profile things.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 1:37 pm

Haha, love it! In one of my *quick* tests last night where I disabled all of my acceleration (yet said I could accelerate everything), scrolling in Firefox was still going through X's non-accel path. Perhaps "scrolling" is not acceleratable, and always goes via the CPU? (explaining your Chromium example earlier)

I'm also not surprised that your EXA server got slower - I think they're tuned to certain workloads. But I don't know what these workloads are. This was an interesting read: http://cworth.org/talks/lca_2008/
It's about a team who took the Intel 965 EXA driver to pieces and put it back again to make it 900x faster. Seems it's easy to do a sub-par job. (to be fair, X compounds the problem)

James, cheers for the linkage. Who exactly should I send my stack of questions too about Broadcom's implementation of GL+VG etc? (sorry!) Would that be you and Dom? In particular questions about the memory management.

Jim, of course 3D=2D+1D!

(Finally found details of the functions I should be implementing http://www.x.org/releases/curr.....rproto.txt)

shirro
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 2:35 pm

teh_orph, talking of Carl, I am taking his advice and going to run cairo-perf-trace to see how this thing really performs. I want to understand what works for acceleration on an embedded system and what doesn't and hopefully it will give me something to compare the Pi with.  I have some tracefiles checked out but I have to rebuild pixman and cairo first because I am on an antique Ubuntu to keep the proprietary library gods happy

And Jim, I don't dispute that 3d can be a fantastic way to accelerate 2d. Although the Pi has OpenVG as well which may also be even more useful for accelerating SVG rendering and drawing libs like Cairo. I guess the issue is whether the cost of setting these things up and tearing them down is going to be more or less than having the CPU moving pixels around.

There is no doubt that for compositing windows and doing all that swishy iphone/android stuff it makes sense. Rendering a web page or scrolling a terminal might be a different issue. Unless ofcourse the tookits were switched to rendering with gles but then that would likely break a lot of things and require a huge development effort and AFAIK the Pi's EGL doesn't currently support creating a surface from an X window.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18032
Joined: Sat Jul 30, 2011 7:41 pm

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 2:58 pm

teh_orph said:


Haha, love it! In one of my *quick* tests last night where I disabled all of my acceleration (yet said I could accelerate everything), scrolling in Firefox was still going through X's non-accel path. Perhaps "scrolling" is not acceleratable, and always goes via the CPU? (explaining your Chromium example earlier)

I'm also not surprised that your EXA server got slower - I think they're tuned to certain workloads. But I don't know what these workloads are. This was an interesting read: http://cworth.org/talks/lca_2008/
It's about a team who took the Intel 965 EXA driver to pieces and put it back again to make it 900x faster. Seems it's easy to do a sub-par job. (to be fair, X compounds the problem)

James, cheers for the linkage. Who exactly should I send my stack of questions too about Broadcom's implementation of GL+VG etc? (sorry!) Would that be you and Dom? In particular questions about the memory management.

Jim, of course 3D=2D+1D!

(Finally found details of the functions I should be implementing http://www.x.org/releases/curr.....rproto.txt)


Not me! Dom knows almost everything, but Eben designed a lot of the HW and wrote a lot of the 3D stuff - it was a clean room implementation of OpenGL ES.

Actually, you can ask me, but it would take a time for me to find the answers at work as I am not a 3D expert (as shown by Dom having to correct most of my posts on the subject!)
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Please direct all questions to the forum, I do not do support via PM.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 4:18 pm

Cool, I'll just list em out and hopefully someone can fill in the blanks.

1. Can I allocate a piece of GPU R/W memory with CPU L1 caching turned on? I want to control the cache flush explicitly. This will be my frame buffer.

2. Can I make a GL ES off-screen render target from an explicit memory address (ie the memory in #1). Or, if not - can I make a render target and get a CPU-visible pointer back?

3. Can I texture directly from normal 'user' memory, or are there alignment/continuous page/page-locked/etc restrictions which *require* it to be copied into first into GPU-managed memory? (I think I know the answer, haha)

4. Can the GL ES implementation provided to us support multiple render contexts? (think GL running glxgears in a window + GL doing the internal 2D composition)

5. Do the 3D engine, DMA and CPU all have access to the same physical memory bus and bandwidth? I suspect the CPU will be pants, but hope that DMA+GPU will give the same throughput.

(btw if it's "clean room", is it really all written from scratch?? even the shader compiler?)

Thanks again

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5106
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 5:53 pm

1. Yes, but GPU won't see what's in the L1 cache until it is flushed.

2. No to first part. The second part is no with standard EGL. There is an extension eglCreateGlobalImageBRCM which may do what you want.

3. No. The 3D hardware requires textures in its own tiled format. There will always be a copy/convert stage (although this is done by DMA+GPU and is reasonably efficient).

4. Yes, that's standard EGL. Switching context more that once per frame may be expensive (a lot of state has to be saved and restored).

5. Yes, in theory. The ARM cannot generate accesses as fast as GPU or ARM, so can never achieve the same bandwidth.

(shader compiler was written in house from scratch)

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 6:55 pm

Cheers Dom. Questions inline...

dom said:


1. Yes, but GPU won't see what's in the L1 cache until it is flushed.

Cool. When the GPU writes data back to memory, does it do just dirty byte writes or might it write whole lines back at a time (that may include unchanged pixels)? So: can I write to two adjacent pixels from different devices (with the CPU cache turned off) and not worry about the GPU stamping on the CPU?

2. No to first part. The second part is no with standard EGL. There is an extension eglCreateGlobalImageBRCM which may do what you want.

I can't find this function prototype, but I can find the compiled code in libegl.so. Where should I be looking to see what it does? (btw is the render target tiled?) I'm using the Debian release from a few months back.


Can I run by you what I'm thinking of, so you can see if this sounds reasonable?

With X, all the drawing commands from each program are serialised. Only one thread is every doing drawing (X is single threaded). It looks at draw calls to see if they're EXA-able (fills, copies and blends) and then runs them asynchronously. It eventually blocks on completion of this deferred work. I'm unsure if non-EXA CPU draw calls can run in parallel with the EXA GPU stuff. (this is their design, not mine)

So,

- two buffers, L1 cached and visible to the GPU. The CPU draws into this just like normal.
- on vsync, L1 is flushed and these buffers are flipped. This may as well be the normal framebuffer, except there's 2x the memory (one visible, one not visible) and it's L1 cached. L1 alone should give a win.
- a GL ES render target points to this memory (assuming there's a function for this!)
- on an EXA-able function, L1 is flushed and the work is enqueued to GL ES
- (perhaps suspend CPU rendering here)
- on the wait function, call glFlush

Doable?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5106
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 7:25 pm

I did benchmark different caching methods for iomap of framebuffer

This was looking at fps in glxgears (note this is unaccelerated)

nocache: L2 enabled: 15.4 fps (window) 1.257fps (fullscreen)

wc: L2 enabled: 15.8 fps (window) 1.284fps (fullscreen)

default: L2 enabled: 16.4 fps (window) 1.318fps (fullscreen)

default is L1 enabled (and without the flushing has visible artifacts)

I went for wc as a compromise.

So L1 has a small performance benefit.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5106
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 8:00 pm

teh_orph said:

- on an EXA-able function, L1 is flushed and the work is enqueued to GL ES
- (perhaps suspend CPU rendering here)
- on the wait function, call glFlush

Doable?



The 3D hardware uses tile based deferred rendering. That basically means glFlush is hugely expensive.

(actually this information is fairly applicable to us:

http://developer.apple.com/lib.....forms.html)

So how often will you call glFlush per frame? This number really has to be very small.

The question is, what are the non-EXA-able functions?

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 8:48 pm

dom said:


I did benchmark different caching methods for iomap of framebuffer

This was looking at fps in glxgears (note this is unaccelerated)

<snip>


What's the difference between nocache/wc/default - do they all have L2 turned on? Is it nocache L1=off L2=on, wc L1=off L2=on, default L1=on L2=on? (with WC meaning that write combining is done...somewhere?)

I guess from the numbers you've posted, L1 doesn't make a mega difference. Perhaps I should leave it off, as constantly flushing and invalidating lines will take away any win?

(and chance of finding what the L1, L2 and main memory latencies are?)

I don't really have a lot of usage numbers so far. I'd imagine the number of flushes per screen update will be small (lets say <10), and the amount of draw calls in a batch will also be small - but each one could touch a decent number of pixels. How big are the tiles? If they're <32x32 I bet there would be very few polys/touched tile. The amount of work depends on the user activity too - X is idle if nothing is moving/dirty.

(btw I dunno if you say the inline comments I had earlier - I can't find the compiled code for eglCreateGlobalImageBRCM but not the actual prototype. Any help?)

Btw 2: thanks again, this is proper helpful stuff!

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5106
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 9:30 pm

See post 7 of http://www.raspberrypi.org/for.....config-txt for latencies.

Since that post was written, L2 is now enabled by default.

My glxgears numbers all had L2 enabled, they just called either ioremap, ioremap_wc or ioremap_nocache. They are L1 enabled, L1 disabled but write combining enabled and L1 disabled respectively.

I've spoken to 3D expert, and I'm not sure eglCreateGlobalImageBRCM  will help you - it allows sharing of EGL images between ARM processes. It won't let you get a pointer to the 3D framebuffer.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5106
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: So who's working/worked on an Xorg server?

Tue Apr 10, 2012 10:46 pm

Eben got the video frambuffer driver using dma a while ago. Unfortunately we struggled to find any use case where it got used and it was abandoned. I've just fixed it up for latest kernel. It's not well tested.

The fill gets call once per row of text when scrolling the console.

The imageblit gets called for console fonts, but only with a 1bpp source, so not suitable for dma.

I couldn't persuade copyarea to get called.

Once you launch lxde, none of the accelerated functions get called. I don't know if SDL or any other console apps would make use of the accelerated functions.

So we abandoned it. I'll link to the code in case anyone knows how to get some benefit from it.

http://pastebin.com/A0rvasgN

(note: due to the framebuffer acceleration API it is far from optimal. We have to block for DMA to complete, when ideally we'd just return and only block when required. DMA can be chained, so it would be more efficient to batch up a sequence of operations before launching.)

shirro
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am

Re: So who's working/worked on an Xorg server?

Wed Apr 11, 2012 2:23 am

dom said:


I've spoken to 3D expert, and I'm not sure eglCreateGlobalImageBRCM  will help you - it allows sharing of EGL images between ARM processes.


Offtopic here but it sounds like something that might be used to composite the output of several processes together?

I notice the following are in libEGL but can't find prototypes in the headers on github. Is that just a work in progress or are they not public APIs?eglCreateCopyGlobalImageBRCM
eglCreateGlobalImageBRCM
eglDestroyGlobalImageBRCM
eglGetDriverMonitorXMLBRCM
eglInitDriverMonitorBRCM
eglQueryGlobalImageBRCM
eglSaneChooseConfigBRCM
eglTermDriverMonitorBRCM


User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Wed Apr 11, 2012 8:54 am

Thanks again for the info+code Dom. Any chance of finding out from your 3D guy if it's in any way possible to draw to a render target that's already got some data in it? If not I'll be forced to upload the current framebuffer, draw it as a quad, then do my actual composition operation (with more quads over the top), then copy the resulting image back into the main framebuffer. Too much copying! (I guess VG has the same restrictions as GL ES?)

Btw are non-power of two textures supported?

If the framebuffer can't be shared with the CPU then I guess I'll go back to the DMA idea. DMA should be perfectly sufficient for copies and fills, and the third operation (composite) perhaps can still use the GPU.

On my commute in this morning I thought of some DMA issues though:
- the memory regions that are to be DMA'd into the framebuffer must not move during the the DMA operation. eg be paged out.
- the memory regions may be contiguous in the virtual address space, but not in the bus address space, which DMA uses.

Nuts. I guess this means I'll have to use the CPU to copy to a page-locked regions of memory, then use DMA to copy it into the framebuffer?

Return to “General discussion”

Who is online

Users browsing this forum: No registered users and 81 guests