mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idiots?

Mon Aug 25, 2014 11:26 am

I have been messing around with the rpi for quite some time, and today though to myself I really know very little about the underlying hardware and how to optimize lower level code.

I sort of have a fuzzy understanding of some of the parts of a cpu, but am not really au fait with how things really work or the timing and speed of hardware.

Is there any short "rpi optimization for idiots" type guide available that would give beginners general useful tips and stats about speed of different parts of the bcm hardware and how they fit together?

As said I really am a newbie to any sort of optimization or low level stuff on the rpi and what I may be asking is stupid as I am rather uninformed but........

I am thinking about things like:
  • DMA, can it write to ARM cache memory, how fast is it (setup time, transfer times, etc), interaction with ARM MMU, address types (virtual/flat/paged)
  • MMU what are the specs of the ARM MMU, transfer speed setup time, wait states, cache interactions, does the cpu idle wait while memcpys occur, comparison and reasons for mmu or DMA and if possible for them to interact problems that could occur.
  • VC4 comunication, how does the vc4 and ARM exchange data, how does memory split function, speeds and types of transfer, locking, DMA, is it possible to share memory space between ARM and VC4, does the VC4 have faster cache memory available or other possible ways of speeding?
  • ARM cache memory, how does this get used, stats like speed, load times, kernel code that works out how to cache data or program code, is it possible to lock program code into cache from userspace and prevent kernel flushing it?
  • Mesa/X11/Wayland memory transfers and sharing and mapping gles/egl/openvg buffers and code into windowing systems, are there any basic optimization or coding methods that everyone should know to get the best speed from the 3d hardware in the VC4, low level but also optimizations for those writing code that just runs on the librarys. Also how the rpi Wayland works with Mesa as I got the feeling the collabora Wayland was based on dispman rather than EGL?
I am sort of thinking that if a reasonable simple outline of important parts maybe would make people realise hacking on lower level stuff is not so hard and maybe get improvements/hacks/bug reports on some of the new Mesa code that anholt is working on?

I think with this type of stuff it needs some very talented experts to lay foundations and then give good explanations so others can pitch in making bug reports or bug fixes and actually using the infrastructure in the best way. I know there is no substitute for reading the source code, but an overview really makes code reading much easier, and also with wayland there seems very little application programming example code available yet.

I hope this post makes sense, and I hope others can offer some advice.

User avatar
AndrewS
Posts: 3625
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
Contact: Website

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Thu Aug 28, 2014 1:17 am

I'm no expert in any of the areas you're asking about, but lots of it has already been discussed at various times on the forums... (of course finding that info again may be easier said than done!)

As well as laboriously looking through the source code, lots of useful info can also be found in the wikis, commit histories, and various issue-tracker discussions on the Raspberry Pi github repos. (e.g. CMA or the recent VCMEM stuff which got reverted)
And have you already had a look at http://www.raspberrypi.org/open-source-arm-userspace/ and http://www.raspberrypi.org/a-birthday-p ... -broadcom/ ?
And much of the info is outdated now, but you could also have a look around http://elinux.org/R-Pi_Hub and there's probably other collections of info/tutorials scattered haphazardly across the interwebs.
And of course there's also http://www.raspberrypi.org/documentation/

Perhaps you'd be better focusing on a single area at a time (these are all big topics) rather than trying to look at everything at once? :?

mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Thu Aug 28, 2014 9:05 pm

AndrewS wrote: Perhaps you'd be better focusing on a single area at a time (these are all big topics) rather than trying to look at everything at once? :?
Well I think I am really hoping for a simple overview so I know what options are available how they fit together and which is best/fastest/simplest for which purpose.

My main purpose is writing a gui app that will run fast (gles/openVG acceleration) and run within one of the linux windowing systems (wayland, X11, mir) and hopefully be cross platform.

I know there are libs and demos for EGL/gles/openVG in /opt/vc but how do they fit together e.g. can gles and openVG be used together and it there advantage using both gles and openVG for different parts of an application and how does compositing work, what exactly does dispmanx do?

What do all the acronyms mean? (dmaer, mailboxs, shims, CMA, VCMEM, GPU, VPU, QPU)

There is all the information necessary on the interwebs but it could take weeks of research to find/read/understand.

I want something short to the point and easy to understand, my guess is perhaps everything is insufficiently mature for things to have stabilised yet, maybe when the mesa/gallium hardware acceleration is available everything will work together?

I have read a few links but still don't feel I have full understanding

http://www.raspberrypi.org/forums/viewt ... 71&t=47832
http://www.raspberrypi.org/forums/viewt ... 63&t=74651
http://www.raspberrypi.org/forums/viewt ... 35&p=88453
http://www.raspberrypi.org/forums/viewt ... 63&t=84554
http://www.raspberrypi.org/forums/viewt ... =63&t=4649
http://www.raspberrypi.org/forums/viewtopic.php?t=45746
http://www.raspberrypi.org/forums/viewt ... =33&t=7672
http://www.raspberrypi.org/forums/viewt ... =63&t=5532&
http://www.raspberrypi.org/forums/viewt ... 5&start=50
https://github.com/simonjhall/dma
https://github.com/simonjhall/copies-and-fills
https://github.com/ajstarks/openvg
http://mindchunk.blogspot.co.uk/2012/09 ... ry-pi.html

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Fri Aug 29, 2014 8:56 am

Forget about the word "GPU", not in the VC4 vocabulary.
The VPU is a dual-core general-purpose CPU, it has some demos at: github.com/freeblob/samples ,the instructions are in the thread on the bare-metal forum.
The VC4 is the entire SoC except the ARM.
The QPU are the shader processor, they are part of the V3D.

NF3RN0
Posts: 36
Joined: Thu Jan 24, 2013 5:28 am
Location: Texas

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Sat Oct 25, 2014 3:23 am

I feel like I am in the same boat, trying to learn the same things. Maug, have you had any information as of late that was enlightening to you? like you, I am also trying to learn how to create GUI aplications that run efficently (with hardware acceleration) on the raspberry pi.

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Sat Oct 25, 2014 2:50 pm

NF3RN0 wrote:I feel like I am in the same boat, trying to learn the same things. Maug, have you had any information as of late that was enlightening to you? like you, I am also trying to learn how to create GUI aplications that run efficently (with hardware acceleration) on the raspberry pi.
Did you develop for Android or GLX OpenGL 2.1 before?

mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Sun Oct 26, 2014 9:39 pm

I sort of get the feeling that its still too much of a moving target to bother yet (or maybe I am just lazy or lacking time?).

If the vc4 mesa drivers get sorted then would be the time to start work, I decided that as the mesa stuff will probably be doing strange things with dma (which I use for other parts of my program), maybe its better to leave until the mesa work is in general release then look at things, and see if there are any conflicts. Also I am working with real time linux kernel so that's another possible cause of conflicts as the mesa work will add kernel changes.

I really did not spend enough time to learn anything much, I downloaded the mesa work and patched kernel but never actually got round to testing anything.

Have browsed a bit of the huge quantity of information available on google (I could post my browser history but I am sure everyone knows how to google)

I have not really found much in terms of stuff suitable for the idiot/amateur, there is a fair bit of stuff that is very low level (I really don't want to mess with assembly), and not much in terms of compiler options for gcc (c probably being closest to something I am able to work with easily).

If you want to go low level the qpu is probably going to give huge throughput but will take a lot of experimentation to learn, and then you have to integrate it with your ARM based code using dma. I worked through a few of the tutorials on qpu programming and decided its not really yet time to go any further.

http://www.raspberrypi.org/forums/viewt ... 72&t=78414
http://petewarden.com/2014/08/07/how-to ... g-its-gpu/
http://rpiplayground.wordpress.com/2014 ... ofit-pt-1/

I still don't really have much idea, and no real way to profile what is happening in programs accurately, like I have inferred I am an idiot (amateur). I have not found anything about locking threads or functions into cache avoiding cache pollution using gcc compiler options(maybe there are RT options for this?). I was probably asking a silly question because low level stuff is very complex and time consuming to understand, definitely not quick idiot hackable, and really needs going to assembly rather than using C.

I did find a couple of things that seemed to help, which are really obvious and I should have done first before anything.
1)optimise loops and check what variables are used
2)copy used data from struct pointers into local 'register' variables (I am considering trying to minimise struct sizes and create some unions with smaller structs for function passing, don't know if that will help?).

Unfortunately this sort of stuff needs more time than I have available for hobbies, nothing is simple. :lol:

mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Mon Nov 10, 2014 1:33 am

I have been having a quick look at one of my hack projects that relates to this stuff again and thinking that I possibly should chuck in some more links that maybe useful as I am guessing DMA cache interaction may be something to do with some problems I am having(my guess is cache does not get flushed out to main memory quickly enough so the DMA is not copying correct data).

I am hoping to look at my code again after some research this evening, adding in some 'gcc Built-in Function: void __builtin___clear_cache (char *begin, char *end)'. and <unistd.h>cacheflush commands. This is assuming gcc and the kernel have those implemented and that they flush data cache as well as instruction cache?

Found the following links that maybe relevant to cache flushing or they may not be?:

http://community.arm.com/groups/process ... fying-code
http://stackoverflow.com/questions/6046 ... nux-2-6-35
http://stackoverflow.com/questions/1581 ... spberry-pi
https://github.com/simonjhall/dma
http://mechanical-sympathy.blogspot.co. ... llacy.html
http://lxr.free-electrons.com/source/ar ... cheflush.h

NF3RN0
Posts: 36
Joined: Thu Jan 24, 2013 5:28 am
Location: Texas

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Mon Nov 10, 2014 2:18 am

mimi123 wrote:
NF3RN0 wrote:I feel like I am in the same boat, trying to learn the same things. Maug, have you had any information as of late that was enlightening to you? like you, I am also trying to learn how to create GUI aplications that run efficently (with hardware acceleration) on the raspberry pi.
Did you develop for Android or GLX OpenGL 2.1 before?

No I have not on either one.

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Fri Nov 14, 2014 7:32 pm

NF3RN0 wrote:
mimi123 wrote:
NF3RN0 wrote:I feel like I am in the same boat, trying to learn the same things. Maug, have you had any information as of late that was enlightening to you? like you, I am also trying to learn how to create GUI aplications that run efficently (with hardware acceleration) on the raspberry pi.
Did you develop for Android or GLX OpenGL 2.1 before?

No I have not on either one.
Did you program for PC? (if not, don't try first on a Pi, OpenGL 2.1 is a complicated API and is still not 100% reliable on a Pi :roll: )

mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Thu Nov 20, 2014 10:20 am

I just thought I should maybe post this link as it looks like a really great piece of literate script, lots of comments and links to relevant info for DMA and other memory stuff, I have not fully read it, it looks like most of the same as rpio and servoblaster but with better comments, the comments look as though they should give a lot more help in understanding.

https://github.com/Wallacoloo/Raspberry ... dma-gpio.c

I still wondering what wayland/mesa will do to the DMA usage and how it may effect other programs using the DMA?

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26442
Joined: Sat Jul 30, 2011 7:41 pm

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Thu Nov 20, 2014 10:31 am

mimi123 wrote:
NF3RN0 wrote:
mimi123 wrote: Did you develop for Android or GLX OpenGL 2.1 before?

No I have not on either one.
Did you program for PC? (if not, don't try first on a Pi, OpenGL 2.1 is a complicated API and is still not 100% reliable on a Pi :roll: )
What do you mean by not reliable? AFAIK, it passes ALL the Khronos acceptance tests, so should be fully compliant.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Thu Nov 20, 2014 11:33 am

jamesh wrote:
mimi123 wrote: Did you program for PC? (if not, don't try first on a Pi, OpenGL 2.1 is a complicated API and is still not 100% reliable on a Pi :roll: )
What do you mean by not reliable? AFAIK, it passes ALL the Khronos acceptance tests, so should be fully compliant.
Is OpenGL2.1 available on the pi GPU?

I thought only EGL GLES were currently supported by GPU hardware?

I am assuming the anholt work on mesa/gallium means a full hardware accelerated version of Mesa OpenGL (not just GLES) will be available on the Pi, I also assume the current Mesa libs are all handled by software running on the ARM not accelerated by the GPU (I have never actually looked into this, but it seems reasonable assumption as OpenGL runs incredibly slow in X currently).

I always assumed Mesa is fully compliant but may or may not be accelerated depending what hardware/drivers you use.

Is the Mesa software stack not reliable on Pi?, I assume unreliability means crashes or incorrect rendering, how so?

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26442
Joined: Sat Jul 30, 2011 7:41 pm

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Thu Nov 20, 2014 11:54 am

Awful lots of assumptions in there.

I was talking about the HW implementation of OpenGLES that is supplied with the Raspi, rather than MESA libraries.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

NF3RN0
Posts: 36
Joined: Thu Jan 24, 2013 5:28 am
Location: Texas

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Fri Nov 21, 2014 4:41 am

mung wrote:I just thought I should maybe post this link as it looks like a really great piece of literate script, lots of comments and links to relevant info for DMA and other memory stuff, I have not fully read it, it looks like most of the same as rpio and servoblaster but with better comments, the comments look as though they should give a lot more help in understanding.

https://github.com/Wallacoloo/Raspberry ... dma-gpio.c

I still wondering what wayland/mesa will do to the DMA usage and how it may effect other programs using the DMA?
Whoa! You are not kidding, this is well documented. I'm glad someone takes the time to comment their code!

mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Fri Nov 21, 2014 3:11 pm

Well I had a brief read of the links in the comments in the code I posted earlier and unfortunatly there was no real help.

I now thinking maybe I should give an outline of some of my specific problems, though maybe should start a new thread for that?

I always try to get some outline overview before going into specifics, unfortunatly I am no expert and don't know much detail about any thing just research based on what I want to achieve so maybe I am not understanding some important concepts?

I outline below the general system developed:

I am trying to get high speed gpio control of hardware and visualisation using openGL in real time.

Visualisation is not important in real time context, but gpio output and input is, the code runs on RT_PREEMPT patched kernel, gpio is controlled by (read/write) by DMA from a buffer that is precomputed in 504us blocks (this is lowest size possible to avoid thrashing context switches and getting reasonable processing available to non realtime processes). The actual problem is that reducing the gpio switching speed below 8us causes the gpio output to stop (absolutely no output, no errors or noise just flat nothing).

Now the DMA system is really the most important part (as far as I am aware) and is based on servoblaster or similar code(I forget which as there were a number of similar projects I used as reference and its almost a couple of years ago since I did the work). I am assuming that the problem is bus contention but I have no way of really knowing, there are no RT faults or overruns logged and the ARM code suggests everything there are no errors, so I make assumption that the DMA is stopping for some reason.

The DMA is controlled by 7 blocks per switch interval ( @8us = 63*7, @7us = 72*7, @6us = 84*7 multiplied by bs divisors by 504us ), you can probably do the rest of the math control block size I forget? I wondered about the disable_pvt=1 and the 16us gpu timing refresh pause but am assuming its bus contention. The DMA blocks do 3 blocks with sinlge 32bit copies, 2 64bit copies (two 32 bit words with strides), another single 32bit copy, then a time delay using the pwm DREQ thing, then loops onto next 7block sequence.

Does anyone have any suggestions how to diagnose/test the problem and workout what is going on?

The driver (gpio output precomputing) has been tested on the ARM side in a realtime thread (driver written in c) and seems to work at 8us switching but nothing at 7us or below. I have also made some initial tests running code on the vc4 (very very basic proof of concept that just copies data into the buffer area for the DMA to move out to gpio pins and takes simple data from the ARM to control the vc4 to output into DMA buffer) and that has the same problem.

I have not now I think about it tested this outside of a gui system so maybe I should, even though its a load of hassle, and it not how the system will ever be used?

Also I have never tried to change the DMA priority as I assumed it was the maximum allowed (I think 7 though possibility to change up to 16), would that be worth trying would it have any bad consequences?

Another possibilitiy is that the gpio driven output hardware looks as if it is not recieving anything because the output is too fast but from the datasheet this seems to be unlikely.

The two tests where this fails(when switching below 8us) is what will be the normal use case, one is running the gui in X11 with pyopengl and tkwidgets, the other is using pi3d as the gui without X11 running.

Does anyone have any suggestions how to setup the system to debug these types of problems (are there some informations available in the /proc/ filesystem?), how can I find out if bus contention is occuring?

Any other suggestions that can be tried without the need for external testing hardware(I don't have any oscilloscope).

I maybe should have started this question in a new thread as it is rather specific, but is mainly the reason for the original questions in this thread, If anyone has suggestions for alternative place to post this please let me know.

Sorry for the long and rambling hard to understand post, I think maybe I am not in a logical mood for proper desciption and feel like what I have said may not make sense to others, let me know and I will try to clarify if able.

mung
Posts: 506
Joined: Fri Nov 18, 2011 10:49 am

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Tue Nov 25, 2014 2:23 pm

Yeah, so I had a look at pi things and wasted huge amounts of time on this weekend, stripping down the code to a small test that write out data into memory switched through a number of different delay speeds writing out the DMA registers PWM registers control block data etc into a file.

Ran it from the command line then in X while running glxgears and other apps.

Had a check of the data and realised pebkac hits again, somehow in previous hacking I had set PCM instead of PWM and mess with some other configs. So it seems to work okay now I reconfig PWM and should be good for at least 1MHz if I ever get time to create the vc4 code (which probably will not happen).

I think most of this mistake is due the long time intervals between getting round to hacking on code causing memory loss and the poor revision control that I have with multiple SD cards and other code cross compiled in various VMs.

I also discovered in my research this post that I had totally forgotten I made: http://www.raspberrypi.org/forums/viewt ... 4&p=515612

Still wondering if anything else will conflict in future as I think what I am planning will be seriously pushing the pi to limits and I need a faster X openGL lib implementation.

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: X11,GLES,EGL,Wayland/Weston,vc4,GPU,CPU,DMA,MMU for idio

Tue Dec 09, 2014 5:23 pm

mung wrote:Yeah, so I had a look at pi things and wasted huge amounts of time on this weekend, stripping down the code to a small test that write out data into memory switched through a number of different delay speeds writing out the DMA registers PWM registers control block data etc into a file.

Ran it from the command line then in X while running glxgears and other apps.

Had a check of the data and realised pebkac hits again, somehow in previous hacking I had set PCM instead of PWM and mess with some other configs. So it seems to work okay now I reconfig PWM and should be good for at least 1MHz if I ever get time to create the vc4 code (which probably will not happen).

I think most of this mistake is due the long time intervals between getting round to hacking on code causing memory loss and the poor revision control that I have with multiple SD cards and other code cross compiled in various VMs.

I also discovered in my research this post that I had totally forgotten I made: http://www.raspberrypi.org/forums/viewt ... 4&p=515612

Still wondering if anything else will conflict in future as I think what I am planning will be seriously pushing the pi to limits and I need a faster X openGL lib implementation.
I think that you will like anholt.livejournal.com instructions :-)

Return to “General discussion”