So I will step in here, too. Kind greetings from Germany!
zandzak wrote:I've found that Doom3 uses X11 for keyboard, video, mouse in/output.
Cannot confirm that, according to my logs it uses the VideoCore HW.
It uses X11 for input but opens a fullscreen EGL display in front of X11
I was able to compile doom3 with henriks package...
But even the main screen rendering mars in background is stuttering.
Having a look at the GL calls I can See that the data is pushed in small chunks to the gfx which results in a lot of calls for each frame just to upload the data (It's like the nvidia driver works when filling vbos, upload in small chunks).
I will try to go with bigger allocs/buffer small chunks in the vertexcache now... Let's See.
As there is no draw call in between, this should probably work.
The rest ist just plain drawing, okay there are a lot of redundant calls to disable/enablvertexarray but that should do so much.
It cannot get better due to the lack of OES_vertex_array_objects so every draw call will set up its own attribute/uniform set as required.
Due to the fact that other gles2 gfx seem to have no real problem I currently have the feeling for a bad driver. Sysprof shows too much action in vhciq...
Again, there are a lot of gl calls, we are talking about ~3400 per frame. But come on, other gles2 gfx are dealing with that!
I've read in this topic that more than 4 glDraw* are not good, I cannot guess that this is true. Keeping in mind that no instancing is possible on this gfx and we want to draw several models at different space with different textures, there is need to switch the call sometimes as this info needs to be provided via uniforms.
Otherwise you need to pack this info into the vertex stream which in contrast needs more memory streamed between cpu and gpu. I don't know where the limits are here, but these are just my thoughts. Correct me if I'm wrong and point me into the right direction.
I can provide sysprof and apitrace logs if a bcm dev is currently floating around and wants to have a look.
EDIT: apitrace attached
Next step would be to redefine some functions like glBindBuffer or glEnableVertexAttribArray. A lot of times the same buffer is bound and one could detect if the driver stalls here if the same buffer is bound again or such stupid things.
The same counts for glEnableV*, most of the time 1,5 are enabled/disabled and the same for the next draw call. This could be buffered. Only disable when not needed and not disable when not used any more.
This could drop a significant amount of calls. Upload data at beginning of frame are 200 calls, rest is 3100 (minus some general settings), each draw calls needs ~18 gl calls around so we do have ~172 glDraw* calls.
Disable/EnableVertexAttrib will be called four times each call, so there is a chance of removing around 680 state changes per frame. BindBuffer is called one time per drawcall, keeping in mind that we sometimes need to change the buffer there are 150 calls reduntant. So we could remove ~1000 gl calls per frame...
Bufferdata at beginning of frame, maybe mapping the buffer once until its full would be an easy thing to implement. But my experience says nooo, mapping buffer has bad performance if you cannot do it with unsync flag.
Currently I'm running on ubuntu 14.04, but that shouldn't make a great difference to Raspbian.
I have also disabled sound as I got a lot of erros from it. Afterwards it worked a little better, but still the main screen is stuttering which should work quite smooth.
In contrast my Nexus 7 runs it quite good, but I currently have no clue how powerful are both gfxs in comparison to each other.
apitrace dump :
https://www.dropbox.com/s/fk55593ts6shg ... trace?dl=0
EDIT: So far using vbo's seems to decrease performance... loading the data directly in VertexAttribPointer is a heck faster ?? Sorry but why is the driver stalling when using vbo's ??
Does anyone know where I can submit bugs to bcm? I can provide a simple test application for them! This isn't normal, vbo's should increase your performance and not reverse ??