jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23682
Joined: Sat Jul 30, 2011 7:41 pm

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 8:04 am

Actually, quite a few cars with turbo's can have their ECU's modified to produce considerably more power than the manufactures settings. They are third party mods, reverse engineered.

That's perhaps a better analogy. Thanks!

Although in our case, we do intend to release our 'turbo' settings modification, when it's done. And of course, our turbo charger is already available for use for other tasks.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 10:15 am

hdante wrote:Simon, do you need/want to offload part of the code development ?
Nah, I'm good thanks. There's not really that much code so far (about 5.5k lines) but it is pretty fiddly and all the tasks are quite serial. I also program in a team by day so it's nice to not have to work with others! :)
That said I do need some small asm functions written if anyone cares to do that. Writing assembly by day and then coming home to do the same isn't healthy.
And yes, raspberry pies coming with software rendering fbdevs are retarded (think about the above analogy).
lol wut
Though I would beg to differ on the software rendering front! X is terribly latency sensitive - throughput is rarely an issue. Frequently X hands you a task that involves drawing <100 pixels and then forces you to synchronise (think text). I can render 100 pixels on the CPU far faster than I could with the GPU.
In fact an issue that Dom discovered when testing my code is that Netsurf (?) viewing Engadget generates thousands of one-pixel tasks, with lots of synchronisation. Unfortunately I was handing these tasks off external hardware with a set-up cost of 6 us/task and this pretty much hung the system when I tested it! I could have done each task on the CPU in a few nanoseconds.

stagefright1989
Posts: 12
Joined: Tue Nov 06, 2012 2:12 pm

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 10:22 am

if only we had a better cpu like cortex a8 armv7 then this accelerated fbdev driver could have resolved this i think. also are these the similar to directfb project . they also have binaries . please clarify this later part

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 10:39 am

Ah I'm not fussed by having better kit - you will always want something better. The hardware we have is more than capable of driving the display, you just need to learn how to structure your code to get the best from it. Most of the performance deficit comes from the many layers the application must traverse to paint a handful of pixels on the screen. Making that drawing faster (or even zeroing its runtime entirely) doesn't actually make a big dent in performance.
For this reason I've added an option to "do no work" to xorg.conf. Of course the image is completely corrupt but you can still make out what's going on through the corruption and you can also see that it's still slow.

Worth a read if you're unfamiliar with it:
http://en.wikipedia.org/wiki/Amdahl's_l ... al_program

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23682
Joined: Sat Jul 30, 2011 7:41 pm

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 12:24 pm

stagefright1989 wrote:if only we had a better cpu like cortex a8 armv7 then this accelerated fbdev driver could have resolved this i think. also are these the similar to directfb project . they also have binaries . please clarify this later part
I remember using Sun running X 25 years ago, and they were certainly fast enough, and their CPU's were slower than the Raspi's. An A8 will be at most twice as fast for most tasks, so not that much improvement.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

lb
Posts: 261
Joined: Sat Jan 28, 2012 8:07 pm

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 2:00 pm

Cortex-A8 has NEON SIMD, though, which greatly speeds up most important operations (copy, composite). It's possible to achieve 4-5x speedups of typical alpha blending operations (compared to scalar code), as long as memory bandwidth allows it.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 2:15 pm

Don't even get me started on NEON! <snip>

One thing you need to consider with composite operations is that you can take short cuts a lot of the time. For instance, with 'over' you can have short cuts if the mask=0 or 255, and also short cuts depending on the value of the destination. Since most composition ops are for text rendering a large number of pixels have trivial outcomes.
Testing for this with NEON would be a nightmare, due to the dreadful connection normally found between NEON and the ARM core. So sure, you might get a decent speed up for a large square in a benchmark but it'll likely be slower where it's used most frequently.

Let's also think about the size of the register file in combination with the long latency of integer SIMD operations...<snip again>

I don't miss NEON one bit! I'd much rather have dual issue.
EDIT: ah I can't believe SIMD can get me worked up...it must be memories of fun times at work

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23682
Joined: Sat Jul 30, 2011 7:41 pm

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 2:44 pm

lb wrote:Cortex-A8 has NEON SIMD, though, which greatly speeds up most important operations (copy, composite). It's possible to achieve 4-5x speedups of typical alpha blending operations (compared to scalar code), as long as memory bandwidth allows it.
In the areas where NEON would improve things, the Videocore GPU would improve it even further - it has a 16 way vector core running at 250Mhz, plus dedicated HW for lots of stuff. As Simon says, small stuff should be done by the CPU, big stuff handed off to the GPU (once the speed increase offsets the setup cost)
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 3:06 pm

jamesh wrote:In the areas where NEON would improve things, the Videocore GPU would improve it even further - it has a 16 way vector core running at 250Mhz, plus dedicated HW for lots of stuff. As Simon says, small stuff should be done by the CPU, big stuff handed off to the GPU (once the speed increase offsets the setup cost)
Exactly. However since a lot of X work is actually a large series of small operations with frequent CPU synchronisation, a large chunk of pixel painting is best done with the CPU. Places were the GPU is best used appear to be large composite operations where you draw a big alpha'd selection box over the icons on your desktop. Like 200x200+.

However what gives me The Fear is that if I release this driver "with most work being done by the CPU" plenty of people will turn their noses up at it saying "I'll wait for a proper driver that uses the GPU/GLES/VG". I'll then cry!

stagefright1989
Posts: 12
Joined: Tue Nov 06, 2012 2:12 pm

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 5:16 pm

U give a pleasant smooth ui experience plus d other things rpi brings to table like gpio/automation plus video playback/3D courtesy openmax/opengles and i believe a vast majority will b happy. Man would i b pleased if rpi cud offer even basic 2D accel like the old days gma900 I guess this sounds like jumping the gun.


also what abt that the directfb project ? they have some binaries for rpi.

http://directfb.org/index.php?path=Main ... 0-07-0.dok

tufty
Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 8:54 pm

teh_orph wrote: I'll then cry!
No, then you'll write a proper driver that uses the GPU, and then you'll get flamed for not being open enough.

ghans
Posts: 7873
Joined: Mon Dec 12, 2011 8:30 pm
Location: Germany

Re: Simon's accelerated X development thread

Wed Nov 07, 2012 9:19 pm

@stagefright1989

But the Pi offers 2D and 3D support. How else would PenguinsPuzzle work.
OpenVG and OpenGL ES are already opened up ! (They could be even used before ,
free of cost !) .


ghans
• Don't like the board ? Missing features ? Change to the prosilver theme ! You can find it in your settings.
• Don't like to search the forum BEFORE posting 'cos it's useless ? Try googling : yoursearchtermshere site:raspberrypi.org

stagefright1989
Posts: 12
Joined: Tue Nov 06, 2012 2:12 pm

Re: Simon's accelerated X development thread

Thu Nov 08, 2012 4:41 am

Dont apps have to b spacially written in openvglibs for 2D acceleration when it comes to rpi

User avatar
malakai
Posts: 1382
Joined: Sat Sep 15, 2012 10:35 am
Contact: Website

Re: Simon's accelerated X development thread

Thu Nov 08, 2012 5:27 am

teh_orph wrote:However what gives me The Fear is that if I release this driver "with most work being done by the CPU" plenty of people will turn their noses up at it saying "I'll wait for a proper driver that uses the GPU/GLES/VG". I'll then cry!
No matter what happens there will be people on both sides. What should hurt is those that have been waiting and appreciate the work but don't get a chance to have at it. Anything right now that can speed up X should be appreciated.
http://www.raspians.com - always looking for content feel free to ask to have it posted. Or sign up and message me to become a contributor to the site. Raspians is not affiliated with the Raspberry Pi Foundation. (RPi's + You = Raspians)

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23682
Joined: Sat Jul 30, 2011 7:41 pm

Re: Simon's accelerated X development thread

Thu Nov 08, 2012 9:25 am

stagefright1989 wrote:Dont apps have to b spacially written in openvglibs for 2D acceleration when it comes to rpi
Yes. But I think you are misunderstanding what X acceleration gives you. What it means is that desktop type operations - window drawing/compositing, moving etc will become faster, and reduce load on the CPU. If you want fast 2D and 3D you STILL NEED to use the acceleration libraries for those - libVG, libOGES etc. Just having X acceleration doesn't mean your 2D/3D apps get miraculously faster.

Also, can you please stop using txt spk. It makes posts fantastically difficult to read.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Thu Nov 08, 2012 11:10 am

jamesh wrote:Also, can you please stop using txt spk. It makes posts fantastically difficult to read.
+1!
stagefright1989 wrote:also what abt that the directfb project ? they have some binaries for rpi.
Like the Android compositor example, this again is quite different to the X problem.
DirectFB again is just a compositor of its own 'windows'. If you want to run the normal desktop with it you'd need to run a rootless X server that generates a large window image for DirectFB to render. This rootless X server is the bottleneck, and is pretty much exactly what you are currently using. All regular applications would still get drawn through this unaccelerated rootless X. Sorry!
tufty wrote:
teh_orph wrote: I'll then cry!
No, then you'll write a proper driver that uses the GPU, and then you'll get flamed for not being open enough.
Lols :)

stagefright1989
Posts: 12
Joined: Tue Nov 06, 2012 2:12 pm

Re: Simon's accelerated X development thread

Tue Nov 13, 2012 7:18 am

ah i dont know much graphics programming. anybody able to render pixman library through openvg/opengles ?

Kalimar
Posts: 5
Joined: Thu Nov 08, 2012 10:23 am

Re: Simon's accelerated X development thread

Thu Nov 15, 2012 10:20 am

Any progress on this? Please keep us posted! :)

stagefright1989
Posts: 12
Joined: Tue Nov 06, 2012 2:12 pm

Re: Simon's accelerated X development thread

Sat Nov 17, 2012 6:12 am

guys do you think this can be of anyhelp? its not pixman but still

http://code.google.com/p/cairogles/

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Sat Nov 17, 2012 6:28 pm

Kalimar wrote:Any progress on this? Please keep us posted! :)
Sorry for the lack of news, proper busy on this end. I still need more feedback from my testers btw...
I have spent some time debugging performance problems found by Dom and I'm not sure what to do about them - I'm tempted to ignore them! Some applications just send really poorly-structured workloads towards the driver and I can't see how any driver could do a good job with them. Only a high CPU clock speed can hide this!

Once again the applications insist on spending their X time in non-graphics related tasks. For instance, I've found another issue is due to IDLE allocating and deallocating 1x1 pixel pixmaps on the X server many many times per screen refresh. The amount of time actually drawing the screen is tiny, and it spends far more time in the allocator and X RPC.

To handle other ridiculous workloads I'll probably write a fallback this evening (during X factor) to ensure we don't DMA accelerate tasks which could be done far faster on the CPU.

But just to reiterate, we could get a far greater win from some applications if they simply restructured their rendering code...

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23682
Joined: Sat Jul 30, 2011 7:41 pm

Re: Simon's accelerated X development thread

Sat Nov 17, 2012 10:11 pm

Out of interest, which applications are the most inefficient?
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

ScottyPcGuy_03
Posts: 8
Joined: Thu Nov 01, 2012 8:34 pm
Location: VA
Contact: Website

Re: Simon's accelerated X development thread

Sun Nov 18, 2012 1:01 am

Yes, do try to get a list together, that would allow some of us to start patching commonly-used packages.

(not sure what I'm getting myself into, since I'm a somewhat of a novice programmer! :lol: )
Scott

What do you mean I can't eat it? It's a pie, right?

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: Simon's accelerated X development thread

Sun Nov 18, 2012 10:13 am

Well I could talk for ages about rendering extensions etc, but here's a short list of things off the top of my head:
- currently at the top of my hit list would be the IDLE text editor that lives on the desktop, written in Python. I haven't yet found where the rendering code lives so I can inspect it, but I know it uses Tcl/Tk (?) so the rendering problems could be due to some side-effect within those libraries. Gedit works much better by comparison.
- the Python game demos on the desktop do not use the XRender extension, so currently I offer no acceleration for them (to be fair they're not 'slow' though!)
- some window managers are much better than others. I would currently avoid anything with a 'compositing' mode. Window Maker is my #1 "can't recommend highly enough" for the rpi!
- the huge rpi logo on the LXDE desktop saps far more performance than you would imagine. I appreciate it's part of the branding but if you want a higher refresh rate when dragging a window over it then - with a non-compositing window manager - change to a solid colour
- but saying that, leave the anti-aliased fonts turned on :)

charliedurrant
Posts: 35
Joined: Sun Mar 18, 2012 12:06 am

Re: Simon's accelerated X development thread

Sun Nov 18, 2012 10:14 am

Simon,

Happy to beta test and the patch programs accordingly.

Charlie

charliedurrant
Posts: 35
Joined: Sun Mar 18, 2012 12:06 am

Re: Simon's accelerated X development thread

Sun Nov 18, 2012 10:19 am

I posted at the same time as your reply. I'll take a look at the general rendering code in IDLE, I expect there will be no need to actually install the driver (if you think differently just say)

Charlie

Return to “General discussion”