User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Sun Jul 15, 2012 7:48 pm

An update with nothing too exciting to show for 20+ hours of work this weekend... Still lots of work to do; mainly,
1. parallelising DMA work loads, to not require a kick of the whole buffer when data is required by the CPU
2. nice C versions of ~5 functions that must be done on the CPU
3. assembly versions of #2
4. assembly version of "backwards memcpy"

#1 is the most important yet the thing I'm putting off the most. I need to make big changes to both the EXA module and kernel module.

Question for anyone: my kernel is compiled with high resolution timer support. What do I call instead of clock() (and then divide by CLOCKS_PER_SEC) to see these numbers?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5331
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: So who's working/worked on an Xorg server?

Sun Jul 15, 2012 8:06 pm

Microsecond resolution is available. See here:
http://www.raspberrypi.org/phpBB3/viewt ... ers#p96770

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: So who's working/worked on an Xorg server?

Mon Jul 16, 2012 7:41 am

teh_orph wrote:Interesting stuff. Thanks for the references. Seems this function has been beaten to death already!
Nope, at least not for armv6.
I was very surprised that the assembly code has those defines turned off that skip zero-masked pixels.
Yes, this is one of the problems, which causes it to perform worse than C. As that mailing list message says, "even ARMv6 is sometimes slower than generic" part, confirmed by benchmark numbers.
Btw rigging in a modified version of my memset into pixman_fill32 pretty much removed it from the profile, wup wup.
As I mentioned elsewhere, if you have some patches, just send them to http://lists.freedesktop.org/mailman/listinfo/pixman
Don't keep all the good stuff for yourself :)

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: So who's working/worked on an Xorg server?

Mon Jul 16, 2012 9:41 pm

dom wrote:Microsecond resolution is available. See here:
http://www.raspberrypi.org/phpBB3/viewt ... ers#p96770
Excellent stuff, I've tested and confirmed it's working fine on my system. Thanks again Dom.
ssvb wrote: Do you know how the ARM versions for pixman got chosen? 32-bit fast-path fill was low-hanging fruit, ARMv6 n_8_8888 (as you say) can be slower than the C version...what's going on? Alternatively, maybe on their ARMs 32-bit fill C code works "fast enough"?
Anyway, sure I could submit some code but the fact someone else hasn't makes me wonder!
Done anything with your Pi yet...besides the L2 screen flicker thing? ;)

A mod: any chance of changing the title of this thread? I'm aiming to produce an accelerated X server, and now know the "who's working" bit. Something like "Simon's accelerated X development thread"? :)

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: So who's working/worked on an Xorg server?

Thu Jul 19, 2012 11:52 pm

teh_orph wrote: Do you know how the ARM versions for pixman got chosen? 32-bit fast-path fill was low-hanging fruit, ARMv6 n_8_8888 (as you say) can be slower than the C version...what's going on? Alternatively, maybe on their ARMs 32-bit fill C code works "fast enough"?
It's the old code from 2008: http://cgit.freedesktop.org/pixman/comm ... 6afbd28aba
I would guess that it simply was never benchmarked with any real workload or cairo-perf-trace. In any case, the commit message is not terribly informative and does not contain any explanations about the "inner_branch" define.
Anyway, sure I could submit some code but the fact someone else hasn't makes me wonder!
ARM11 processor was just already outdated by the time when pixman started getting ARM assembly optimizations seriously. All the modern high performance ARM processors intended to run user applications are expected to have 128-bit NEON SIMD, which is a really good choice for software rendered graphics. It is not really surprising that nobody worked on optimizations for something old that nobody used anymore.

But now RPi is kinda trying to re-animate the corpse and make ARM11 relevant again. So it's only a matter of time before pixman gets good ARM11 support. It could be implemented by you, by me, or by one of the hundreds of thousands kids interested in programming ;)

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Fri Jul 20, 2012 6:17 pm

Yes actually this would be a great time to get people interested in programming assembly again... Nice and safe environment, easy ISA, something appears on the screen...sorted!

To the naive user (ie me) it does appear that pixman has grown and grown, and could perhaps do with a prune to streamline performance on the rpi, where any extra 'fluff' code does seem to noticeably impact the run-time. For instance, your discussion where you were arguing for the flags to not be 64-bit, yet they didn't see these "micro optimisations" as important would be an example. (slow machines like this are a niche target, so I do understand their argument)

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Thu Aug 02, 2012 6:47 am

This seems to be the only thread I can find on accelerated X. I wanted to get involved, and my (limited) understanding of embedded GL systems is that implementing a proper hardware GLX/DRI2 layer would do the trick. Is anyone working on that angle, or any other approach to GPU accelerated X? Need a developer?

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Fri Aug 03, 2012 3:41 pm

I've been super busy for the last couple of weeks so have unfortunately not had the time to work on this etc. Damn those Olympics and kind people who book their weddings at short notice and girlfriends who choose to have their birthday at the same time!

Anyway, unfortunately the GLX layer you mention covers a different domain of X. This would be for applications that explicitly choose to do 3D operations. They ask the X server if GLX is supported then send their OpenGL calls through it. That's easy. The hard stuff is for the applications that don't use 3D and expect a nice, software-composited display, with the back buffer readable just a few clock cycles away. Various other extensions have been added to X to support server-side acceleration of common things normal programs might want to do. For example, blitting graphics around (eg dragging a window, scrolling), fills (constant colour or gradient) and the more challenging one: composition (eg anti-aliasing fonts). This is what I have focussed my time on.

Ordinarily these things could be batched up and sent to a co-processor like a GPU to do the work. The result could be pipelined to get the highest throughput at the expense of latency. This would be fine for GLX...the application expects this behaviour. Unfortunately in the accelerated 2D X set-up the CPU code could ask for the image back at any time, either as it needs to inspect some pixel data or simply flip the buffers in order to show the image on the display. There's no hint when this event may come so it's not clear how much batching should be done before sending the work to the co-pro or if/when the data should be speculatively returned to CPU in the event of a wait event.

Work tasks from X are not first-in-first-out either...remember X will be accepting rendering tasks from different applications. Some newer render tasks may be blocked on by the CPU, whilst it doesn't yet care about the completion of older tasks. Making one long dependency task graph - ie a wait on a new task ensures that all previous tasks have finished - unnecessarily reduces performance. Maintaining separate task dependency graphics is the way forward...and not really possible with OpenGL.

I reckon changing my code from the single dep graph to proper hazard-detecting multiple dep graphs would take < 1 man week of work. Maybe I'll call in sick next week...

(ramble over)

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Sat Aug 04, 2012 4:25 am

I'm still trying to get my head around the Xorg archetecture and how it relates to DRM, but I went through this stuff a little bit with my netbook. It's an acer aspire one that had the misfortune of using a PowerVR GPU in conjunction with the intel Atom processor (AKA the GMA500). I have accelerated 2D and 3D in X on it, so things are fairly snappy, and I wanted to help in any efforts to bring more of the same to the pi.

As I understand it, the kernel DRM module provides the kernel space support for the libdrm library that sits in user space, and handles all of the drawing commands that come from X (or other applications?). I'm a graphics/game developer professionally, but I'm not super experienced with embedded systems.
On my netbook there's an open DRM driver that talks to a bunch of closed PowerVR stuff, and then there's also, I believe, a DRI2 driver and GLX driver that collectively provide X with the 2D and 3D acceleration support it requires.

But ultimately, my take is that any solution that isn't taking advantage of the GPU is probably going to cut itself short? In my head I envision a system where the XServer translates all of the protocol requests to gl commands and hands them off to the hardware accelerator. Each window can get its own render surface which provides pixel level RW support, and then it's just a matter of texture mapping them onto primitives for compositing. Maybe that doesn't require any kernel space implementation? Is that what Glamour is supposed to do?

Or is the correct path to do something kernel level that makes use of the OpenMAX IL API that's documented in the firmware repo?

At any rate, I'd like to help you get a git repo setup for your own work if you haven't done so already, just so that I, and others like me, can start looking at where you're at so that we can get up to speed and assist. Otherwise I just run the risk of playing catchup to your efforts and duplicating work.

But basically the stage that I'm at right now is trying to understand how the parts of the video stack fit together, and if there exists enough API access to the closed sections of the video core to make GL and VG assisted X rendering possible and/or practical. It sounds like Dom has a lot of insight into what is/isn't possible, and what might become possible in the future. I'm assuming that with all of the broadcom employees that are part of the pi foundation that we essentially have 'vendor support', that there exist people with both access to the broadcom specs and tools, and inclination to support the needs of the open portions of the pi so that we can all get the most from this wonderful device?

Anyway, tl;dr version of the above is:
- What's the best way to get 2D and 3D acceleration for X running on the GPU?
- Is there enough exposed API that we non-NDA developers can even pull this off?
- If not, is there anyone we can talk to about getting those API's made available so that we can move forward?
- Generally speaking, any 2D windowing environment can be represented in an orthographic 3D space, so in theory OpenGL ES should be able to do all the heavy lifting. There's also OpenVG to consider. Is there a sane way to hack that into X?
- I am an experienced 3D game developer, with some kernel hacking experience. How can I assist in these efforts directly?

Looking forward to pitching in...

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Tue Aug 07, 2012 6:10 pm

Update for anyone else following this thread. Simon and I are attempting to attack this issue from two different sides, hoping that our combined efforts and note-comparing will yield something. There'd been a previous suggestion to look in to Glamor, which seems to use the closed OGL OVG libraries as a way of bypassing the need for kernel DRM. Or at least to translate X requests into accelerated drawing that will play nice with others. It's my hope at any rate that this system will let us take full advantage of the GPU without forcing the Pi foundation Broadcom guys to support more than stock APIs.

However, at least following the build instructions from the website involves building Mesa, and the xf86-driver-intel along with Glamour. On a Pi this is going to take awhile, so I'm also trying to get a build of raspbian emulating under qemu on either my ubuntu netbook, or on my windows box, in order to throw more power at the problem.

So at least I've got progress being made on my pi, while I try to come up with a faster way of doing it (which hopefully isn't any less accurate).

If anyone has any experience getting raspbian to run under the qemu-system-arm command and has some relatively fool proof step, please pm me them, or urls to the relevant forum/blog/website etc...

alexchamberlain
Posts: 121
Joined: Thu Jun 14, 2012 11:20 am
Location: Leamington Spa, UK
Contact: Website

Re: Simon's accelerated X development thread

Tue Aug 07, 2012 6:18 pm

Not a lot of experience here, but would you be better off with a cross-compiler?
Developer of piimg, a utility for working with RPi images.

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Tue Aug 07, 2012 7:09 pm

alexchamberlain wrote:Not a lot of experience here, but would you be better off with a cross-compiler?
Under ubuntu, perhaps. But due to my own set of personal driver frustration issues, I'm stuck in Ubuntu 10.10, which uses XServer 1.9. Glamor requires XServer 1.10, and Raspbian uses 1.12. So doing a vm of raspbian seems like the best move, 2nd best move might be to do a debian wheezy vm (or even thumbdrive install) for testing and then to re-configure the build chain to cross-compile.

The whole thing is new to me, so if any authorities can weigh in while I muddle around, feel free.

Update: configure finished while writing, and the xfree86-video-drivers depend on libdrm_intel, so I might be back to square one on that one.

User avatar
jackokring
Posts: 816
Joined: Tue Jul 31, 2012 8:27 am
Location: London, UK
Contact: ICQ

Re: Simon's accelerated X development thread

Tue Aug 07, 2012 7:47 pm

Nice work so far. I'm looking forward to having an accelerated X. As I understand form a different thread the OpenGL ES renders to a surface in front of the framebuffer console, and some people are working on DirectFB (not sure what this will provide). It is true that all windows could be represented by quads with the windows content being a GL texture. I wouldn't even mind windows in the background becoming smaller due to z-buffer ordering. This is all well and good.

What I understand of the main problem, is not this textured window drawing as the problem, but a 3 input alpha blend blit and scale to a surface texture, for font rendering (the main X load, and can be used for all blitting, and with differing strides, can draw lines and polygons too). Aliased fonts may look nice, but are they essential? Does the hardware do a transparency blend of surfaces with different z order? Maybe but this would mean a surface search to find any pixel content. Or if the hardware filters the display output, then anti-alias in not essential for most alpha values of 0 or 100%.

For some people the best use of the GPU would be generation of audio surfaces, or FPU extra ooomph. Well good luck guys. As to be honest, I'll either use GTK+ via Java gcj (potentially the framebuffer direct version), setfont (and ANSI/vga terminal control ncurses?), or mmap /dev/fb0. As although X fast would be nice, some guys are getting the DMA (blit) working on bare hardware framebuffer devices, and square image to square image blit, although not an excellent render, may be good enough for my purposes. I never really was that worried about the ZX spectrum colour flicker effect.
Pi[NFA]=B256R0USB CL4SD8GB Raspbian Stock.
Pi[Work]=A+256 CL4SD8GB Raspbian Stock.
My favourite constant 1.65056745028

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Fri Aug 10, 2012 4:23 am

Progress Report:
Managed to download / configure / build the mesa, x86-video-intel, and glamor packages. Xwindows starts up and attempts to load glamor, but then fails like this...

LoadModule: "glamoregl"
[175747.229] (II) Loading /usr/local/lib/xorg/modules/libglamoregl.so
[175747.232] (EE) Failed to load /usr/local/lib/xorg/modules/libglamoregl.so: /usr/local/lib/xorg/modules/libglamoregl.so: undefined symbol: eglGetProcAddress
[175747.232] (II) UnloadModule: "glamoregl"

But that's a start... actually managed to get everything built, so now it's time to start digging into the source and find out where the problem lies, and if there's anything I can do about it.

The trick seemed to be that the above packages look for EGL / GL / GLES package configs, which don't exist for the videocore drivers. I just did a bunch of CFLAG / LIBS exports as required, but I'm thinking I might try to put together a pkg-config file for videocore so that things automagically resolve to the /opt/vc/ folders as required.

If anyone wants build instructions so they can follow along, I'll see what I can do.

Onward!

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Tue Aug 14, 2012 6:24 am

Got everything compiled and built, was incorrectly using an outdated Mesa implementation. Of course the intel driver fails to load because the rpi doesn't have any of the intel graphics chipsets. I'm currently writing an xf86-video-rpi driver, learing about the DDX implementations as I go. So far I've got a skeleton driver that loads but fails to initialize properly, halting the X server.

Nothing really to report, but at least I know what direction I have to take this to ultimately get it to work with libglamor.so, and I've got the xf86-video-intel driver to use as a reference, plus the DDX developer's guide from the XOrg documentation.

Qtree
Posts: 4
Joined: Fri Aug 17, 2012 3:49 pm

Re: Simon's accelerated X development thread

Fri Aug 17, 2012 4:00 pm

I just found something and I think it may be better reference than xf86-video-intel driver. What do you think?
“Generic DDX driver for ARM SoCs; based on upstream OMAP driver”
http://git.linaro.org/gitweb?p=arm/xorg ... ;a=summary

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Fri Aug 17, 2012 5:23 pm

Oh! Thanks for that, I'll definitely take a look. My current DDX driver doesn't get past the "Fail to allocate screen and segfault" stage. :p

I think this is mainly because the probe functions normally use PCI or ISA helper functions which don't apply in this case (or at least I haven't seen any references to VC and busses), and I haven't been able to get enough information on how to properly probe/configure the driver state when you can't use those functions. Hopefully this driver will have some clues, it'll be informative nevertheless.

Qtree
Posts: 4
Joined: Fri Aug 17, 2012 3:49 pm

Re: Simon's accelerated X development thread

Fri Aug 17, 2012 9:55 pm

factoid wrote: At any rate, I'd like to help you get a git repo setup for your own work if you haven't done so already, just so that I, and others like me, can start looking at where you're at so that we can get up to speed and assist. Otherwise I just run the risk of playing catchup to your efforts and duplicating work.
Any chance to create a git repository for this X driver soon? For this moment, it looks like you and teh_orph are the only ones who can show some code.No doubt that a git repo would speed up things as others can participate and help.

So...who will be the first brave? :)

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Sun Aug 19, 2012 7:02 pm

Git repo up, embarrasingly basic stuff here, hoping to see if I can quickly get a driver up and running with help from that arm SoC driver referenced earlier. In typical github fashion, feel free to clone the repo and submit patches.

git://github.com/Factoid/xf86-video-rpi.git

Would have had this up sooner, but I was trying to get the project properly setup for an autotools build, and to add the usual COPYING, AUTHORS, etc... files.

Cheers,
Adrian

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Sun Aug 19, 2012 8:55 pm

Simon's githib is here. I'm going to assume he doesn't mind me advertising what's public anyway.

https://github.com/simonjhall/

hermanhermitage
Posts: 65
Joined: Sat Jul 07, 2012 11:21 pm
Location: Zero Page

Re: Simon's accelerated X development thread

Wed Aug 22, 2012 1:22 pm

I recommend these two links for those getting a handle on linux graphics:
http://people.freedesktop.org/~marcheu/ ... rivers.pdf
http://cgit.freedesktop.org/xorg/proto/ ... rproto.txt

(Apologies if already posted on this thread - its getting long now :)

dpavlin
Posts: 3
Joined: Mon Jul 16, 2012 4:17 pm
Location: Zagreb, Croatia
Contact: Website

Re: Simon's accelerated X development thread

Tue Sep 04, 2012 8:14 pm

alexchamberlain wrote:Not a lot of experience here, but would you be better off with a cross-compiler?
distcc is fairly easy to setup, which has advantages on native configure and cross-compiling on faster machine, see http://wiki.openmoko.org/wiki/Developme ... ng_distccs

jannis
Posts: 56
Joined: Tue Jan 17, 2012 3:48 pm

Re: Simon's accelerated X development thread

Wed Sep 05, 2012 6:15 am

Here's another guide about distcc and cross-compiling:
http://www.gentoo.org/doc/en/distcc.xml

and since we're at it:
http://www.phoronix.com/scan.php?page=n ... px=MTE3NTE
anyone here in the forums knows more about that?

factoid
Posts: 45
Joined: Tue Jul 17, 2012 5:35 am

Re: Simon's accelerated X development thread

Wed Sep 05, 2012 2:44 pm

I won't hold my breath, phoronix reports on a lot of those efforts, and I still haven't seen anything concrete show up for my Poulsbo based netbook. I'm using Intel's official EMGD drivers (which aren't meant for netbook configurations, nor are they really deployed outside of embedded systems anymore), and they work just fine, but I'm frozen at Ubuntu 10.10 due to Xorg/Kernel requirements. But I'm all for open drivers, though reverse engineering those things is a pain, and as long as the foundation members are able to continue their efforts in keeping the videocore stack compatible with the latest releases of X and the kernel, it's a non-issue for me.

On that note, I'm in the process of understanding the /dev/fb support code to access the mode setting and other info that I need. Once I've basically written my own drop in replacement for the fb X renderer, I can focus on expanding it to use OGL, OVG to accelerate the rendering (I hope).

And I suppose if I can get the glamour thing working for the PI, the solution should be adaptable for my netbook as well.

dpavlin
Posts: 3
Joined: Mon Jul 16, 2012 4:17 pm
Location: Zagreb, Croatia
Contact: Website

Re: Simon's accelerated X development thread

Thu Sep 06, 2012 12:17 pm

jannis wrote:Here's another guide about distcc and cross-compiling:
http://www.gentoo.org/doc/en/distcc.xml

and since we're at it:
http://www.phoronix.com/scan.php?page=n ... px=MTE3NTE
anyone here in the forums knows more about that?
Try searching on github for videocore -- https://github.com/search?langOverride= ... positories

I don't know exactly which project Phoronix is referencing, but there are quite a few repositories with recent activity, but nothing to show off (yet).

Return to “General discussion”