Simon's accelerated X development thread


406 posts   Page 3 of 17   1, 2, 3, 4, 5, 6 ... 17
by teh_orph » Thu Apr 12, 2012 7:33 pm
Bump - any thoughts from people in the know about these DMA questions? Surely when you call glTexImage2D there's a solution to these potential problems?
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by jamesh » Thu Apr 12, 2012 9:21 pm
It's quite possible no-one has ever tried this stuff, so in fact no-one know whether it will work, or what the solutions/workarounds to problems are. This stuff is all pretty new, but if anyone knows, it'll be Dom!
Moderator
Moderator
Posts: 10528
Joined: Sat Jul 30, 2011 7:41 pm
by teh_orph » Fri Apr 13, 2012 9:28 am
Coolio. (again have you heard of any other groups working on this? I don't wanna step on any feet)

I've been looking into this DMA stuff and these page problems do look solvable. Just...need...hardware... If anyone gets a Pi early and wants to send it my way ;)

Btw if anyone with git access is reading this, can I just check the thinking in vc_mem_ioctl()? https://github.com/raspberrypi/linux/blob/rpi-patches/arch/arm/mach-bcm2708/vc_mem.c

Looks like a copy/paste bug in case VC_MEM_IOC_MEM_BASE. vc mem size is returned, not vc mem base. Plus the comments are the same as the other case statement.
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by asb » Fri Apr 13, 2012 10:42 am
At a glance, that looks like a good catch. Please do make an issue on github. https://github.com/raspberrypi/linux/issues - Dom should get an email that way.
Moderator
Moderator
Posts: 757
Joined: Fri Sep 16, 2011 7:16 pm
by teh_orph » Sun Apr 15, 2012 4:25 pm
Yet more questions to those in the GL-know:

a) Does changing GL ES FBO or EGL pixmap render target invoke an expensive glFlush/glFinish?
I'd like to pick a render target, enqueue some work on it, change render target, enqueue work on that one, then wait for it all the finish. There will be no shared data between the two sets of work (ie the first render target is not used as a texture)

b) any chance in opening the source to the ARM-side EGL libs? Otherwise adding in support for hardware-accelerated rendering in an X window (via GLX etc) will require the EGL library to be wrapped unnecessarily.

c) can we have a forum for programming?

Thanks again...
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by teh_orph » Fri May 11, 2012 12:01 pm
Just a heads up for the interested, I got my pi a couple of days ago.
I have Xorg and associated libraries all building and running on the device. My EXA acceleration module is in there and running, yet I'm still using my software emulation of the hardware's features for debugging purposes. So basically it's running the code I'd been working on for the last month.

So hopefully this weekend I can connect it to my kernel module to all it to kick off DMA.

The poor girlfriend will not be getting much attention ;)
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by MattPurland » Fri May 11, 2012 1:44 pm
I'm very interested in this, keep up the good work!
User avatar
Posts: 55
Joined: Fri Apr 13, 2012 7:37 pm
by Hexxeh » Fri May 11, 2012 2:08 pm
teh_orph said:


Just a heads up for the interested, I got my pi a couple of days ago.
I have Xorg and associated libraries all building and running on the device. My EXA acceleration module is in there and running, yet I'm still using my software emulation of the hardware's features for debugging purposes. So basically it's running the code I'd been working on for the last month.

So hopefully this weekend I can connect it to my kernel module to all it to kick off DMA.

The poor girlfriend will not be getting much attention ;)


Great to hear things are progressing. Once you get something that "works" to any degree, I'd love to try it out with Chromium OS, see if we can get things moving faster than a few minutes per click-response... :)
Posts: 90
Joined: Thu Apr 05, 2012 3:07 pm
by teh_orph » Fri May 11, 2012 2:28 pm
Cripes that sounds painful. Why is it so slow? Do they use X anyway, or something custom?
One thing to bear in mind is that if I reply all my acceleration hooks with no-ops, working with things like gedit and gtkperf still aren't super fast. I have a feeling that system bottlenecks may lie elsewhere.

Does sysprof work on ARM? At least gprof instrumentation should be doable...?
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by Hexxeh » Fri May 11, 2012 2:33 pm
I don't know a whole lot about how it works, but it has it's own UI system called Aura, and if you've not got acceleration, it's painfully slow with software rendering. It does use X, but there's an experimental set of patches (which I haven't managed to get my hands on yet) called NoX that let it run without X. Only tested on x86 though iirc.
Posts: 90
Joined: Thu Apr 05, 2012 3:07 pm
by jamesh » Fri May 11, 2012 2:38 pm
teh_orph said:


Cripes that sounds painful. Why is it so slow? Do they use X anyway, or something custom?
One thing to bear in mind is that if I reply all my acceleration hooks with no-ops, working with things like gedit and gtkperf still aren't super fast. I have a feeling that system bottlenecks may lie elsewhere.

Does sysprof work on ARM? At least gprof instrumentation should be doable...?


Surely if you are no-oping your acceleration, you are still using the same system as without your drivers, which you would expect to be slow, because the CPU is doing all the rendering again? Or have I completely misunderstood?
Moderator
Moderator
Posts: 10528
Joined: Sat Jul 30, 2011 7:41 pm
by teh_orph » Fri May 11, 2012 3:39 pm
As in I tell X I'm doing the work (but in fact am doing nothing at all).
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by Hexxeh » Sat May 12, 2012 4:23 pm
Any progress on this? Do you have your work in a Git repo anywhere at all?

I'm putting ebuilds together for all the RPi specific stuff, which includes the Xorg driver, so knowing how the drivers build/install works would be useful, even if it's not yet functional.
Posts: 90
Joined: Thu Apr 05, 2012 3:07 pm
by teh_orph » Sun May 13, 2012 2:21 pm
Hexxeh wrote:Any progress on this? Do you have your work in a Git repo anywhere at all?

I'm putting ebuilds together for all the RPi specific stuff, which includes the Xorg driver, so knowing how the drivers build/install works would be useful, even if it's not yet functional.

I do not have any code depot set up anywhere at the moment.
I was actually going to ask you for suggestions about how to give the code out, once it's ready :)
In terms of what it will be, it'll most likely comprise
- kernel module that handles contiguous memory allocation (for DMA), and does virt->phys mapping + DMA kicks
- custom fbdev X driver - this can be a drop-in replacement for the existing /usr/lib/modules/xorg etc fbdev_drv.so
- potentially a replacement for libfb.so (high chance)
- potentially a replacement for pixmap.so (low chance)
- potentially a libEGL wrapper, to target X (not for a while)
- modifications to xorg.conf

So nearly everything is a link library drop-in. The only knarly thing is the kernel module. Kernel modules have to fit the kernel build config exactly, and since there are only a few distros out there at the moment this should be easily. But ultimately I think this should be built on each target machine and that then requires gcc, kernel headers etc. This seems to be a pretty typical way of giving out a kernel module.
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by Hexxeh » Sun May 13, 2012 2:31 pm
teh_orph wrote:
Hexxeh wrote:Any progress on this? Do you have your work in a Git repo anywhere at all?

I'm putting ebuilds together for all the RPi specific stuff, which includes the Xorg driver, so knowing how the drivers build/install works would be useful, even if it's not yet functional.

I do not have any code depot set up anywhere at the moment.
I was actually going to ask you for suggestions about how to give the code out, once it's ready :)
In terms of what it will be, it'll most likely comprise
- kernel module that handles contiguous memory allocation (for DMA), and does virt->phys mapping + DMA kicks
- custom fbdev X driver - this can be a drop-in replacement for the existing /usr/lib/modules/xorg etc fbdev_drv.so
- potentially a replacement for libfb.so (high chance)
- potentially a replacement for pixmap.so (low chance)
- potentially a libEGL wrapper, to target X (not for a while)
- modifications to xorg.conf

So nearly everything is a link library drop-in. The only knarly thing is the kernel module. Kernel modules have to fit the kernel build config exactly, and since there are only a few distros out there at the moment this should be easily. But ultimately I think this should be built on each target machine and that then requires gcc, kernel headers etc. This seems to be a pretty typical way of giving out a kernel module.


The best thing to do is probably just to put your code into a Git repository, probably on your GitHub account, and let the distro maintainers package as appropriate for their distros. I'd be interested to see what your code looks like so far, if you could dump it into a Git repo.

Are you in any IRC channels at all? It'd be great to have you in #raspberrypi on Freenode so we can hear how you're getting on, I'm available in there most hours of the day.

As soon as your code is ready, I'll add it to my Chromium OS overlay and see if we can bring Chrome/Aura up using it. Is there any way I can check whether my application will work using your driver? Maybe some key calls that aren't supported I can grep for?
Posts: 90
Joined: Thu Apr 05, 2012 3:07 pm
by teh_orph » Mon May 14, 2012 8:31 am
So do you use the Xorg/XFree86 server or is it something custom? If it is, which version? If it's Xorg 7.6 and the fbdev driver then I don't imagine any problems.
Although one thing to note is that it appears EXA support (the hooks I use) appears flakey in the 7.6 tarballs, yet is fine via the stable git install. If so it's just a case of rebuilding X.

Regarding what's used, to gain any acceleration whatever is doing the drawing needs to be using/abusing the 'render' extension. How is the Aura stuff rendered? (OpenGL? ES?)

The kernel module overuses mmap to give kernel logical addresses to X. This is required to ensure pages don't move underfoot during DMA. At least one 2D-capable DMA engine is required. 32-bit SIMD ARM instructions should be compilable and not SIGILL.
(btw I am sitting on #raspberrypi at work today :))
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by teh_orph » Thu May 17, 2012 10:28 am
Just an update to say that work continues on this. I've been tied up for the last couple of evenings (by wine and beers) but this has allowed me time to get a lot of reading in.

In particular I've been able to re-read some chapters of this:
http://lwn.net/Kernel/LDD3/
The actual work required to accelerate X is (in a bullet point list) reasonably straightforward. However not being able to "take over the device" nor do everything in the most priviledged mode like other optimisation projects I've worked on makes things a bit different! Also security is surprisingly involved too. As you are effectively allowing a single program to step past all the protection the OS offers (both from crashing the system and also leaking your credit card details) you need to be careful, and design in security from the beginning. Finally, the fact that client programs connect to this one program means that malicious programs could potentially take control of the system.

Properly designing in security has probably taken the most mental effort so far. It needs to work and have no holes in it but also needs to have a minimal impact on the performance of the system.

You might say this is not important as the Pi won't be used in these sorts of scenarios but this security will also prevent me crashing my debug machine (the whole machine, not just the X server) whenever I make a mistake or some other part of the X system gives me dodgy data. And I make a lot of mistakes :)

I've never written Linux kernel code before so it's all about baby steps. Bugs in this code take out the system.
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by chrisost » Thu May 17, 2012 3:24 pm
I haven't looked too much into what it would take to get it working with a RPi, but I've seen various comments about an OpenGL backed XOrg server called Glamor:http://www.freedesktop.org/wiki/Software/Glamor

From a brief look at the docs, it appears to be an attempt to use any OpenGL backend (EGL included) for accelerated X. This may be another option, though I'm also not sure how mature it is at this point: it does still seem to be going through some growing pains.
Posts: 8
Joined: Thu Jan 26, 2012 5:48 pm
by shirro » Thu May 17, 2012 3:59 pm
I have compiled glamor on Raspbian and had a look at the code. I quickly decided it was way beyond me.

There seems to be a lot of interest there for anyone wanting to build an X server that targets gles - lots of clever looking stuff in there that might be reused for particular drawing algorithms and stuff if someone was building something from scratch. The trouble with using glamor is that it only does part of the job and the Intel driver does the rest and there is no driver other than the Intel one to use as a prototype. I haven't looked at the Intel driver but I imagine pulling the bits out to make something for the Pi wouldn't be fun.

So it doesn't look to be a quick fix to me anyway. If anyone with a better understanding wants to look at it I can at least confirm that it builds under Raspbian as the xorg server in Wheezy is recent enough.
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am
by chrisost » Fri May 18, 2012 3:56 am
shirro wrote:The trouble with using glamor is that it only does part of the job and the Intel driver does the rest and there is no driver other than the Intel one to use as a prototype.


That's too bad. The little reading that I had done gave the impression of it being much more general.
Posts: 8
Joined: Thu Jan 26, 2012 5:48 pm
by shirro » Fri May 18, 2012 4:16 am
I think glamor is trying to be general. They look to have abstracted all the gles stuff out so it could be reused. Just saying it isn't a driver by itself so it doesn't look like a trivial amount of work unless your all over this stuff to begin with. And I think Intel assume you have stuff like kms, drm, gem and stuff for your graphics card which would need to be adapted for videocore. And who knows what gl and egl extensions they require. You could probably hack the frame buffer driver and add some EXA accel a lot faster. Ofcourse I am a total noob on this stuff so I could be wrong.
Posts: 248
Joined: Tue Jan 24, 2012 4:54 am
by teh_orph » Fri May 18, 2012 4:19 pm
Early days still, but blits and fills are now jumping along at 2GB/s...
Off for some beers this evening.
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by teh_orph » Sat May 19, 2012 4:44 pm
DMA is now hooked up to Xorg...and it actually works. I haven't crashed it yet either! A fair few operations are now being lifted off the CPU. The baby steps continue.
User avatar
Posts: 315
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
by MattPurland » Sat May 19, 2012 5:29 pm
Excellent work, keep it up!
User avatar
Posts: 55
Joined: Fri Apr 13, 2012 7:37 pm
by ArborealSeer » Sat May 19, 2012 7:41 pm
MattPurland wrote:Excellent work, keep it up!
Pi Status > Farnell, Arrived 24/5- RS, Arrived 1/6
User avatar
Posts: 292
Joined: Tue Jan 24, 2012 9:48 am