dave j
Posts: 117
Joined: Mon Mar 05, 2012 2:19 pm

Re: GPU Programming Info

Sun Mar 11, 2012 12:42 pm

No - it's not THAT question again.

If you go to most mobile chipset manufacturers' websites, you can find lots of information on how to develop software for their products. They don't go into low level details on how to program their hardware directly - rather provide information on how to get the most out of their products using the supported APIs. A good example is the POWERVR OpenVG Application Development Recommendations.

Little of this sort of information seems to be available from Broadcom - there's a paper about OpenGL ES 1.0 on some old hardware from 2008 but that's about it.

In a world where the ability to run Angry Birds is so important to consumers that it gets mentioned in phone adverts, it's surely in Broadcom's interest to provide more support to third party developers writing software to run on it's products. Even if they don't want the expense of setting up a forum and have employees answering questions, sticking a few PDFs on a web site isn't going to cost much.

The RPi is practically a development kit for the BCM2835 and everyone, from lone bedroom coders (think the next Minecraft) upwards, can afford them. The same certainly can't be said of NVIDIA or anyone else's dev. kits. This provides a fantastic opportunity for Broadcom to get people learning how to optimise software for it's platform. It would be nice if they took advantage of it.

User avatar
Chromatix
Posts: 430
Joined: Mon Jan 02, 2012 7:00 pm
Location: Helsinki

Re: GPU Programming Info

Sun Mar 11, 2012 1:45 pm

The information currently available is that it supports OpenGL ES 1 and 2, and OpenVG.  There is also ballpark information about the performance.  That is already sufficient to write the bulk of a game targeting R-Pi.

The missing information is mostly about how to set up a context for use.  Informally, even that info is available: http://pastebin.com/0yRQNb4c

A game developer might also want to know some more detailed information such as resource limits and extension support.  These will always be enhancements over the basic spec though, and most games will work fine with the basic features of ES2.
The key to knowledge is not to rely on people to teach you it.

Narishma
Posts: 151
Joined: Wed Nov 23, 2011 1:29 pm

Re: GPU Programming Info

Sun Mar 11, 2012 3:20 pm

Chromatix said:


The information currently available is that it supports OpenGL ES 1 and 2, and OpenVG.  There is also ballpark information about the performance.  That is already sufficient to write the bulk of a game targeting R-Pi.

The missing information is mostly about how to set up a context for use.  Informally, even that info is available: http://pastebin.com/0yRQNb4c

A game developer might also want to know some more detailed information such as resource limits and extension support.  These will always be enhancements over the basic spec though, and most games will work fine with the basic features of ES2.


I don't think that's what he's talking about. He wants stuff like the PDF he mentioned that go into detail on how to optimize your game or application to best take advantage of the specific GPU. Stuff to do, stuff to avoid, things like that. Most other GPU designers have documentation like that available, for example:

Qualcomm: https://developer.qualcomm.com/develop/ ... ion-adreno

Intel: http://software.intel.com/en-u.....rs-guides/

AMD and Nvidia of course have vast amounts of developer guides and even entire books available on their websites.

dave j
Posts: 117
Joined: Mon Mar 05, 2012 2:19 pm

Re: GPU Programming Info

Sun Mar 11, 2012 4:31 pm

Narishma said:


I don't think that's what he's talking about. He wants stuff like the PDF he mentioned that go into detail on how to optimize your game or application to best take advantage of the specific GPU. Stuff to do, stuff to avoid, things like that. Most other GPU designers have documentation like that available, for example:

Qualcomm: https://developer.qualcomm.com/develop/ ... ion-adreno

Intel: http://software.intel.com/en-u.....rs-guides/

AMD and Nvidia of course have vast amounts of developer guides and even entire books available on their websites.


That's exactly it. Saying we support OpenGL ES version X, etc. is fine for people just getting started learning but for people trying to get the best performance out of a system more information is useful.

e.g. When Nvidia added support for vertex buffer objects to their desktop GPU drivers, they released an application note describing how to use them. It also warned that whilst they supported 16 and 32 bit index buffer values, as defined in the specification, 32 bit values were slower on their hardware at the time and so 16 bit values should be preferred. That "don't do this it works but is slow" type of information allows developers to avoid performance problems with particular chipsets that may only become apparent during final testing or, for small developers who can't afford lots of test hardware, after release when customers start complaining.

Here's an RPi specific issue:-

There has been some discussion on the forum of the optimum amount of memory to allocate to the GPU in the bootstrap configuration file.

Typically for desktop OpenGL implementations you have a relatively small amount of separate graphics memory (GPU memory) and a much larger amount of main memory (CPU memory). OpenGL will manage caching the most frequently used textures in GPU memory and copy textures from CPU memory as needed. To do this it needs to keep a copy of all textures in CPU memory - even if they are also in GPU memory.

For software rendering, all memory is CPU memory so a software implementation doesn't need to do this.

The RPi hardware is a half way house between these two as, although it's all on the same chip, memory is split between the CPU and the GPU during bootstrap. Knowing the OpenGL memory usage strategy will be important - if a copy of all textures is kept in CPU memory, allocating 128Mb to the GPU may not be the best strategy since you'll reduce the amount you'd have for the OS and any applications you are running and may allocate memory to the GPU that you could never use.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23663
Joined: Sat Jul 30, 2011 7:41 pm

Re: GPU Programming Info

Sun Mar 11, 2012 6:15 pm

I'll try and find out what is available.

That said, I understand the host side OGLES is under development at the moment to improve performance, so anything current would be out of date soon anyway.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5331
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: GPU Programming Info

Sun Mar 11, 2012 7:53 pm

dave j said:

There has been some discussion on the forum of the optimum amount of memory to allocate to the GPU in the bootstrap configuration file.
Typically for desktop OpenGL implementations you have a relatively small amount of separate graphics memory (GPU memory) and a much larger amount of main memory (CPU memory). OpenGL will manage caching the most frequently used textures in GPU memory and copy textures from CPU memory as needed. To do this it needs to keep a copy of all textures in CPU memory - even if they are also in GPU memory.

For software rendering, all memory is CPU memory so a software implementation doesn't need to do this.

The RPi hardware is a half way house between these two as, although it's all on the same chip, memory is split between the CPU and the GPU during bootstrap. Knowing the OpenGL memory usage strategy will be important - if a copy of all textures is kept in CPU memory, allocating 128Mb to the GPU may not be the best strategy since you'll reduce the amount you'd have for the OS and any applications you are running and may allocate memory to the GPU that you could never use.


I think you have a misunderstanding of how openGL works.

openGL will not manage caching of frequently used textures.

Basically you have some pixels in ARM memory. You call glTexImage2D. The pixels are *copied* to GPU memory. You can now free the pixels from ARM memory - the openGL API guarantees that the GPU won't require them again once the call has returned. The texture remains in GPU memory until glDeleteTextures is called.

So, you know exactly where the memory is being used, on client (e.g. ARM) or server (e.g. GPU), for our implementation, or NVIDIA, or a software implementation.

instance->alsa_stream = alsa_stream;

dave j
Posts: 117
Joined: Mon Mar 05, 2012 2:19 pm

Re: GPU Programming Info

Sun Mar 11, 2012 8:47 pm

dom said:

I think you have a misunderstanding of how openGL works.
openGL will not manage caching of frequently used textures.

Basically you have some pixels in ARM memory. You call glTexImage2D. The pixels are *copied* to GPU memory. You can now free the pixels from ARM memory - the openGL API guarantees that the GPU won't require them again once the call has returned. The texture remains in GPU memory until glDeleteTextures is called.

So, you know exactly where the memory is being used, on client (e.g. ARM) or server (e.g. GPU), for our implementation, or NVIDIA, or a software implementation.

instance->alsa_stream = alsa_stream;


From the OpenGL FAQ:

Graphics cards have limited memory, if you exceed it by allocating many buffer objects and textures and other GL resources, the driver can store some of it in system RAM. As you use those resources, the driver can swap in and out of VRAM resources as needed.

Direct3D has a similar concept with Automatic Texture Management.

This may be out of date or not apply to OpenGL ES but it certainly used to work that way.

That was just one question through. The real purpose of starting this thread was to encourage Broadcom to provide recommendations on how to get the best from their hardware - like their competitors do.

User avatar
Chromatix
Posts: 430
Joined: Mon Jan 02, 2012 7:00 pm
Location: Helsinki

Re: GPU Programming Info

Sun Mar 11, 2012 9:53 pm

OpenGL calls are allowed to cause OUT_OF_MEMORY errors, but if they don't do that, they are required to store whatever state they created permanently (or until the client indicates that it is no longer required).  This requirement did not change in OpenGL ES.

This does not require anything in particular about how memory is organised.  The driver could return OUT_OF_MEMORY as soon as GPU memory is exhausted, without attempting to also use CPU memory.  It could also use VRAM as an "exclusive cache" to minimise the amount of CPU memory required, at the cost of extra link bandwidth when a texture needs to be evicted from VRAM.

I suspect that Broadcom currently falls into the former category, especially as this is technically a shared-memory architecture.  Further developments might be to dynamically allocate the boundary between GPU and CPU memory, so that up to some limit of sanity, the GPU can use CPU memory directly.  When that sanity limit is hit however, allocating new textures is likely to fail.
The key to knowledge is not to rely on people to teach you it.

dave j
Posts: 117
Joined: Mon Mar 05, 2012 2:19 pm

Re: GPU Programming Info

Sun Mar 11, 2012 10:27 pm

Chromatix said:


OpenGL calls are allowed to cause OUT_OF_MEMORY errors, but if they don't do that, they are required to store whatever state they created permanently (or until the client indicates that it is no longer required).  This requirement did not change in OpenGL ES.

This does not require anything in particular about how memory is organised.  The driver could return OUT_OF_MEMORY as soon as GPU memory is exhausted, without attempting to also use CPU memory.  It could also use VRAM as an "exclusive cache" to minimise the amount of CPU memory required, at the cost of extra link bandwidth when a texture needs to be evicted from VRAM.

I suspect that Broadcom currently falls into the former category, especially as this is technically a shared-memory architecture.  Further developments might be to dynamically allocate the boundary between GPU and CPU memory, so that up to some limit of sanity, the GPU can use CPU memory directly.  When that sanity limit is hit however, allocating new textures is likely to fail.



We can look up what the spec says and we can speculate about what drivers might be doing internally, but that doesn't really answer my earlier point:

Saying we support OpenGL ES version X, etc. is fine for people just getting started learning but for people trying to get the best performance out of a system more information is useful.

Hence my request for recommendations on how to get the best out of the system. Implementations change over time and I would expect recommendations to anticipate likely changes as well as be updated to reflect new circumstances.. This isn't limited to OpenGL ES by the way, that's why I used an example of OpenVG recommendations in my first post.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23663
Joined: Sat Jul 30, 2011 7:41 pm

Re: GPU Programming Info

Sun Mar 11, 2012 10:32 pm

Two Broadcom people have already posted on this thread, so I hope some information may be forthcoming on the best usage - patience is required though,  everyone is busy. And FYI, Eben, the founder of the RaspberryPi foundation, was the chief architect of the 3D hardware/OpenGL on the GPU.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

mole125
Posts: 228
Joined: Tue Jan 10, 2012 2:01 pm

Re: GPU Programming Info

Sun Mar 11, 2012 10:46 pm

dave j said:


That was just one question through. The real purpose of starting this thread was to encourage Broadcom to provide recommendations on how to get the best from their hardware – like their competitors do.


Which competitors?  This is a mobile chip aimed at phones. It has a completely different audience that desktop chips from NVidia and the company has different priorities – the chip is probably most optimised for multimedia capture and playback for embedded systems, hi-res game performance is probably more of a secondary concern.  Producing performance tuning guides is extremely time consuming and so expensive, particularly when it isn't a key market.  Broadcom aren't in the business of providing products to Joe Developer public so they don't have any of the incentives that NVidia do in a completely different market.

If you are lucky you may get off the record hints from people who work on the product, [edit and from JamesH's post it looks like we may be lucky] but I''d be surprised if you get anything official and most things will have to be trial and error, or just accepting any performance we do get as being better than a £30 product deserves.

dave j
Posts: 117
Joined: Mon Mar 05, 2012 2:19 pm

Re: GPU Programming Info

Sun Mar 11, 2012 10:52 pm

JamesH said:


Two Broadcom people have already posted on this thread, so I hope some information may be forthcoming on the best usage - patience is required though,  everyone is busy. And FYI, Eben, the founder of the RaspberryPi foundation, was the chief architect of the 3D hardware/OpenGL on the GPU.


Thanks for that. I appreciate everyone is very busy and I'm not in a hurry for the information - it will be a while before I get an RPi anyway as I haven't even registered interest yet. These sort of recommendations are useful to have and I think it's in Broadcom's interest to help ISVs make their products look good.

dave j
Posts: 117
Joined: Mon Mar 05, 2012 2:19 pm

Re: GPU Programming Info

Sun Mar 11, 2012 11:28 pm

mole125 said:

Which competitors? This is a mobile chip aimed at phones. It has a completely different audience that desktop chips from NVidia and the company has different priorities – the chip is probably most optimised for multimedia capture and playback for embedded systems, hi-res game performance is probably more of a secondary concern.  Producing performance tuning guides is extremely time consuming and so expensive, particularly when it isn"t a key market.  Broadcom aren"t in the business of providing products to Joe Developer public so they don"t have any of the incentives that NVidia do in a completely different market.

If you are lucky you may get off the record hints from people who work on the product, [edit and from JamesH's post it looks like we may be lucky] but I'"d be surprised if you get anything official and most things will have to be trial and error, or just accepting any performance we do get as being better than a £30 product deserves.


I only mentioned an example with a desktop GPU to indicate the sort of information i meant.

Broadcom's competitors (and their GPUs) in this market segment include ARM (Mali), Freescale (i.MX), Imagination Technologies (PowerVR), Nvidia (Tegra), Qualcomm (Adreno) and probably others. Look at their websites to see the sort of information they provide to developers.

Broadcom might not be in the business of providing products to Joe Developer public but their products do get bought by people who put them in phones and sell them to Joe Consumer public. Joe Developer's products help make phones attractive to consumers, see the Angry Birds example, and so indirectly contribute to phone chipset manufacturers' sales. It is therefore in chipset manufacturers' interests to help Joe Developer make products that work well on their chipsets. It also doesn't need to be detailed performance tuning guidelines -  documents with try and do these things and avoid doing these things type recommendations are all that are really needed. The cheapest way of doing this is to write some guidelines and stick them on your web site. Broadcom will of course have to do some sort of cost benefit analysis to see if they think it's worth the effort but it is notable that so many of their competitors seem to think it is.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5331
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: GPU Programming Info

Mon Mar 12, 2012 12:22 am

dave j said:


Graphics cards have limited memory, if you exceed it by allocating many buffer objects and textures and other GL resources, the driver can store some of it in system RAM. As you use those resources, the driver can swap in and out of VRAM resources as needed.

...
That was just one question through. The real purpose of starting this thread was to encourage Broadcom to provide recommendations on how to get the best from their hardware - like their competitors do.



I guess technically the OpenGL driver could treat the ARM side like virtual memory when the GPU runs out of memory. This isn't done on our processor, and I'd be surprised if it was done on any similar (memory limited) processors.

We do have work in progress for allowing the split between ARM and GPU to be dynamic (using CMA) which would make this less of an issue, but this is probably a month or two off.

I gave some info to a developer that may be interesting:

I'd imagine the "best practices" will be very similar to what is recommended on iOS. This is not (yet) well documented.

All opengles commands go through an asynchronous message queue to the GPU. This has a bandwidth of ~100MB/s.

Now, if all you do is push data at the GPU, it runs in a nicely pipelined way, and the ARM is not held up.

Every time you read data from the GPU, you will block until the queue is drained, there will be interrupts and task switches on the GPU and the ARM and lots of wasted time. This could take a few milliseconds. So. ideally never read from GPU (including glGetError). If you must, then once or twice per frame.

Uploading large amounts of data is expensive. Ideally use VBOs or vertex shaders with small numbers of attributes changing each frame.

By default we triple buffer. An environment variable V3D_DOUBLE_BUFFER=1 will switch to double buffering.

Texture size max is 2Kx2K.

I don't think you'd have a problem with a scene with a million triangles (whatever blending/culling/shading), if you use VBOs (or vertex shaders) that are mostly static, but just the camera moves each frame.

Minimise the traffic between the ARM and GPU. Avoid blocking waiting for a response.

User avatar
Chromatix
Posts: 430
Joined: Mon Jan 02, 2012 7:00 pm
Location: Helsinki

Re: GPU Programming Info

Mon Mar 12, 2012 7:17 am

Good stuff.  What's the recommended number of vertices/triangles per call for maximum throughput (assuming not fragment limited)?  Experience from other GPUs suggests 100-1000 is a good ballpark.
The key to knowledge is not to rely on people to teach you it.

dave j
Posts: 117
Joined: Mon Mar 05, 2012 2:19 pm

Re: GPU Programming Info

Mon Mar 12, 2012 5:12 pm

dom said:

I guess technically the OpenGL driver could treat the ARM side like virtual memory when the GPU runs out of memory. This isn't done on our processor, and I'd be surprised if it was done on any similar (memory limited) processors.
The OpenGL specs (plus OpenVG and other Khronos administered standards) deliberately describe what should be implemented rather than how things should be implemented to allow for different implementations. Unfortunately this variability means people end up asking questions like the one I did.


We do have work in progress for allowing the split between ARM and GPU to be dynamic (using CMA) which would make this less of an issue, but this is probably a month or two off.


That will be a good solution.


I gave some info to a developer that may be interesting:

I'd imagine the "best practices" will be very similar to what is recommended on iOS. This is not (yet) well documented.

All opengles commands go through an asynchronous message queue to the GPU. This has a bandwidth of ~100MB/s.

Now, if all you do is push data at the GPU, it runs in a nicely pipelined way, and the ARM is not held up.

Every time you read data from the GPU, you will block until the queue is drained, there will be interrupts and task switches on the GPU and the ARM and lots of wasted time. This could take a few milliseconds. So. ideally never read from GPU (including glGetError). If you must, then once or twice per frame.

Uploading large amounts of data is expensive. Ideally use VBOs or vertex shaders with small numbers of attributes changing each frame.

By default we triple buffer. An environment variable V3D_DOUBLE_BUFFER=1 will switch to double buffering.

Texture size max is 2Kx2K.

I don't think you'd have a problem with a scene with a million triangles (whatever blending/culling/shading), if you use VBOs (or vertex shaders) that are mostly static, but just the camera moves each frame.

Minimise the traffic between the ARM and GPU. Avoid blocking waiting for a response.


Thanks for that. I'll keep an eye out for any official documentation.

Return to “General discussion”