JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

OpenGL ES as slow as SDL?

Fri Aug 17, 2012 8:25 am

Hello folks.

I've got to state in advance that my OpenGL knowledge is very limited.
So, in the past few days I've been poking around with Snes9x 1.53 on my Raspberry Pi.
The SDL version compiles with just a few corrections, and kind of runs. If I set frameskip=0, I get 28/32 FPS on "normal" games (i.e. no fancy chips like SuperFX).

So I started to modify the render functions using OpenGL ES. It must be said that I'm not using X11.
Code looks like this:

Code: Select all

	static const GLubyte bgverts[8] =
	{
		0,	0,
		1,	0,
		0,	1,
		1,	1
	};

	static const GLubyte bgtex[8] =
	{
		0, 1,
		1, 1,
		0, 0,
		1, 0
	};

	S9xBlitPixSimple1x1((uint8 *) GFX.Screen, GFX.Pitch, egl_buffer, GFX.Pitch>>1, 256, 239);

	glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 8, SNES_WIDTH, SNES_HEIGHT_EXTENDED,
		GL_RGB, GL_UNSIGNED_SHORT_5_6_5, egl_buffer);

	glEnableClientState(GL_VERTEX_ARRAY);
	glVertexPointer(2, GL_BYTE, 0, bgverts);
	glEnableClientState(GL_TEXTURE_COORD_ARRAY);
	glTexCoordPointer(2, GL_BYTE, 0, bgtex);
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, state->textures[0]);
	glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
	glDisable(GL_TEXTURE_2D);
	glDisableClientState(GL_TEXTURE_COORD_ARRAY);
	glDisableClientState(GL_VERTEX_ARRAY);
...and the Repaint function:

Code: Select all

static void Repaint (bool8 isFrameBoundry)
{
#if _RASPBERRY
	glMatrixMode(GL_MODELVIEW);
	eglSwapBuffers(state->display, state->surface);
#else
	SDL_Flip(GUI.sdl_screen);
#endif		// _RASPBERRY
}
And... I still get 28/32 FPS. :cry:
I even tried to inline and (partially) unwind the loop from S9xBlitPixSimple1x1, but all I got is one more FPS.
Maybe I should learn how to use FBOs, but I'm not sure that will make any difference.
Do you experts reckon it is a CPU limitation, or am I doing something wrong?

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Fri Aug 17, 2012 8:47 am

Hi JPDefault,

Two things before we get to the gist of it:
1. Don't use ubyte vertex attributes - even though the API supports them, chances are those are not native types for the vertex shader, and thus such attributes will need a silent conversion, each time you pass them to the API, which takes us to..
2. User VBOs. You can read about VBOs here: http://www.khronos.org/registry/gles/sp ... 2.0.25.pdf (chapter 2.9 Buffer Objects, and in particular 2.9.1 Vertex Arrays in Buffer Objects)

Now, the gist of it:
You upload the SNES frame texture from client space at each draw. That's inherently slow. You need an EGLImage mapped to a native bitmap/pixmap, which is then used by SNES to draw frames. As I haven't done EGLImages on the Pi yet I cannot show you a working Pi example, but here's the documentation on how those work:
http://www.khronos.org/registry/gles/ex ... _image.txt

And here's how I use them on another platform:
http://code.google.com/p/test-es/source ... _image.cpp

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Fri Aug 17, 2012 9:52 am

Thanks Blu, I've got something to study now.

Let me see if I got it right so far...

1) In my init_ogl function I would initialize my texture as usual with glGenTextures and glBindTexture, but instead of glTexImage2D, I would use eglCreateImageKHR, and then glEGLImageTargetTexture2DOES.

2) I get rid of my vertex and texture coordinates array and instead I create a struct containing vertex and texture coordinates and call glGenBuffers, glBindBuffer, glBufferData (using GL_STATIC_DRAW).

3) In the render loop, I call glBindTexture and glDrawElements.

Still unsure where do I store image data...

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Fri Aug 17, 2012 11:01 am

JPDefault wrote:Thanks Blu, I've got something to study now.

Let me see if I got it right so far...

1) In my init_ogl function I would initialize my texture as usual with glGenTextures and glBindTexture, but instead of glTexImage2D, I would use eglCreateImageKHR, and then glEGLImageTargetTexture2DOES.
Yes, that way you create a binding between the EGLImage obtained from eglCreateImageKHR() and a GL texture object.
2) I get rid of my vertex and texture coordinates array and instead I create a struct containing vertex and texture coordinates and call glGenBuffers, glBindBuffer, glBufferData (using GL_STATIC_DRAW).
Well, the AoS (array-of-structure) vs SoA (structure-of-arrays) question was not the main point (though the hw might prefer one over the other, so that's of importance too) - it's the fact you keep all your vertex data in a VBO *and* in a GPU-native format, so when data are static (like in your case) you upload them once, and never touch them again, and no intermittent conversion steps are ever needed.
3) In the render loop, I call glBindTexture and glDrawElements.
Indeed, and the continuous texture binding is actually important. When the texture is an EGLImage, the binding tells the API that the EGLImage has been updated, so the API needs to make sure no old image data is left in the pipeline (caches, etc). As regarding the DrawElements call - that's only in case you have an indexed primitive.
Still unsure where do I store image data...
The "image data" in my case comes from line 494, but that's because that code is just a conformance test. In your case, you'd use the original eglCreateImageKHR to create an image from some native bitmap object (under X11 it's usually pixmaps, but you really need to check the extension documentation I linked to, and then check in the Broadcom EGL API to see what's a 'native image' in their context) and then direct SNES to output its frames into that bitmap. Sorry that my code is not a good example of that - it's intended for other things, but that's the only relevant code I have at hand.

Something that may not have been so obvious: as the entire texture-from-EGLImage path is a very efficient shortcut to pass texture data to the GPU (an EGLImage is not copied internally by the API - its content is used as a texture directly), you need some way to make sure the EGLImage updates do not happen while the old image content is still in use (i.e. while the old draw call is still ongoing). I use an NV fence extension to ensure that.

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Fri Aug 17, 2012 2:16 pm

Thanks again for your help. Your code is actually a useful example, even if it does something different.

I'll try a step-by step approach: first of all I'll use my (slow) texture with a vertex buffer.

My variables:

Code: Select all

static GLuint textures[1];
static GLuint texture;
static GLuint vbo[3];

static const GLubyte vertex_array[8] = {
    0, 0,
    1, 0,
    0, 1,
    1, 1
};

static const GLubyte texture_coords[8] = {
    0, 1,
    1, 1,
    0, 0,
    1, 0
};

static const GLubyte idx[4] = { 0, 1, 2, 3 };
Then my init function contains:

Code: Select all

glShadeModel(GL_FLAT);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);

glGenTextures(1, textures);
glBindTexture(GL_TEXTURE_2D, textures[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_UNSIGNED_SHORT_5_6_6, NULL);
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

glGenBuffers(sizeof(vbo) / sizeof(vbo[0]), vbo);

glBindBuffer(GL_ARRAY_BUFFER, vbo[0]);
glBufferData(GL_ARRAY_BUFFER, sizeof(vertex_array), vertex_array, GL_STATIC_DDRAW);

glBindBuffer(GL_ARRAY_BUFFER, vbo[1]);
glBufferData(GL_ARRAY_BUFFER) sizeof(texture_coords), texture_coords, GL_STATIC_DRAW);

glBindBuffer(GL_ELEMENTS_ARRAY_BUFFER, vbo[2]);
glBindBuffer(GL_ELEMENTS_ARRAY_BUFFER, sizeof(idx), idx, GL_STATIC_DRAW);
And eventually, the render function:

Code: Select all

glEnable(GL_TEXTURE_2D);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);

s9xBlitSimple1x1((uint8*)GFX.Screen, GFX.Pitch, egl_buffer, GFX.Pitch>>1, 256, 239);

glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 8, 256, 239, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, egl_buffer);

glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, textures[0]);

glBindBuffer(GL_ARRAY_BUFFER, vbo[0]); // Vertex array
glVertexPointer(2, GL_BYTE, 0, NULL);

glBindBuffer(GL_ARRAY_BUFFER, vbo[1]); // Texture coords
glTexCoordPointer(2, GL_BYTE, 0, NULL);

glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

glDisable(GL_TEXTURE_2D);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
glMatrixMode(GL_MODELVIEW);
eglSwapBuffers(display, surface);
That's what I got working after some fiddling.
Unless I've done some mistake there, I'll start looking into EGLImage now.

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Sat Aug 18, 2012 11:09 pm

I spent two sleepless nights on this, trying to make EGLImage work, but the lack of progress is frustrating.

I my init function, I replaced:

Code: Select all

        glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB,
              256, 256, 0, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, NULL);
        glBindTexutre(GL_TEXTURE_2D, texture);
with:

Code: Select all

        native_img = (EGLImageKHR)eglCreateImageKHR(
                state->display, EGL_NO_CONTEXT,
                EGL_NATIVE_PIXMAP_KHR,
                (EGLClientBuffer)egl_buffer, attr_list);

        glBindTexture(GL_TEXTURE_2D, texture);
        glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, native_img);
While in my render function, I replaced:

Code: Select all

        glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 239,
                GL_RGB, GL_UNSIGNED_SHORT_5_6_5, egl_buffer);
        glBindTexture(GL_TEXTURE_2D, texture);
with:

Code: Select all

        glActiveTexture(GL_TEXTURE0);
        glBindTexture(GL_TEXTURE_2D, texture);
And I get absolutely nothing on the screen. :cry:
If I also use glDrawElements, with an index array like { 0, 1, 2, 3 } instead of glDrawArray, I get a black screen. I'm scratching my head so hard it hurts.

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 1:17 pm

JPDefault wrote:I spent two sleepless nights on this, trying to make EGLImage work, but the lack of progress is frustrating.
It's always darkest before dawn, etc ; )
I my init function, I replaced:

Code: Select all

        glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB,
              256, 256, 0, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, NULL);
        glBindTexutre(GL_TEXTURE_2D, texture);
with:

Code: Select all

        native_img = (EGLImageKHR)eglCreateImageKHR(
                state->display, EGL_NO_CONTEXT,
                EGL_NATIVE_PIXMAP_KHR,
                (EGLClientBuffer)egl_buffer, attr_list);

        glBindTexture(GL_TEXTURE_2D, texture);
        glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, native_img);
Where is the egl_buffer coming from? The way you've specified it, it should be an X11 pixmap - are you doing an X11 app? Keep in mind that X11 on the RPi is currently unaccelerated, so it's not clear whether X11 server-space would be accessible by the GPU - it might not be. I'll do some research on 'native buffers' in Broadcom's EGL once I get home tonight, as I'm curious myself.
While in my render function, I replaced:

Code: Select all

        glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 239,
                GL_RGB, GL_UNSIGNED_SHORT_5_6_5, egl_buffer);
        glBindTexture(GL_TEXTURE_2D, texture);
with:

Code: Select all

        glActiveTexture(GL_TEXTURE0);
        glBindTexture(GL_TEXTURE_2D, texture);
And I get absolutely nothing on the screen. :cry:
Rule of thumb: always do glGetError() after each non-trivial GL call in your prototype code.
If I also use glDrawElements, with an index array like { 0, 1, 2, 3 } instead of glDrawArray, I get a black screen. I'm scratching my head so hard it hurts.
Do you have an ELEMENT_ARRAY_BUFFER bound or are you running that DrawElements off client-space?

dattrax
Posts: 52
Joined: Sat Dec 24, 2011 5:09 pm

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 2:17 pm

have you set your texture filtering? In OpenGL the default is mipmapping, so the texture will be incomplete.

If you want raster graphics mapped onto a plane, the best way is just use dispmax rather than OpenGL. Just using it for composition is not a very good use of bandwidth & memory.

Jim

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 2:52 pm

blu wrote: Where is the egl_buffer coming from? The way you've specified it, it should be an X11 pixmap - are you doing an X11 app? Keep in mind that X11 on the RPi is currently unaccelerated, so it's not clear whether X11 server-space would be accessible by the GPU - it might not be. I'll do some research on 'native buffers' in Broadcom's EGL once I get home tonight, as I'm curious myself.
That is ofcourse my main problem, as I don't know the native pixmap type and I'm trying with structs from other platforms to see if I get something. I found absolutely no documentation on Broadcom's website, and Google is not my friend :cry:
I even tried X11 pixmaps, although I'm not using X11. Even with random data I should at least get a scrambled image, shouldn't I? I even tried filling a big array with 0xFF, still the screen is black.
blu wrote:Do you have an ELEMENT_ARRAY_BUFFER bound or are you running that DrawElements off client-space?
I'm doing something like:

Code: Select all

    static const GLubyte indices[] = { 0, 1, 2, 3 };

    glGenBuffers(1, &index_buffer);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_buffer);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(GLubyte) * 4, indices, GL_STATIC_DRAW);
...
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_buffer);
    glDrawElements(GL_TRIANGLE_STRIP, 4, GL_BYTE, 0);
dattrax wrote:have you set your texture filtering? In OpenGL the default is mipmapping, so the texture will be incomplete.

If you want raster graphics mapped onto a plane, the best way is just use dispmax rather than OpenGL. Just using it for composition is not a very good use of bandwidth & memory.

Jim
Hi Jim. I disable mipmapping in my init function right after I generate the texture.
I admit I've got no idea what dispmax is. :oops:

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 3:25 pm

JPDefault wrote: That is ofcourse my main problem, as I don't know the native pixmap type and I'm trying with structs from other platforms to see if I get something. I found absolutely no documentation on Broadcom's website, and Google is not my friend :cry:
I even tried X11 pixmaps, although I'm not using X11. Even with random data I should at least get a scrambled image, shouldn't I? I even tried filling a big array with 0xFF, still the screen is black.
Indeed, the documentation is amiss, but hey - where'd be the challenge if everything was documented? : ) Anyway, I intend to try first with real X11 pixmaps tonight. I'll report my findings once I have any.
I'm doing something like:

Code: Select all

    static const GLubyte indices[] = { 0, 1, 2, 3 };

    glGenBuffers(1, &index_buffer);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_buffer);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(GLubyte) * 4, indices, GL_STATIC_DRAW);
...
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_buffer);
    glDrawElements(GL_TRIANGLE_STRIP, 4, GL_BYTE, 0);
DrawElements' acceptable index types are: GL_UNSIGNED_BYTE, GL_UNSIGNED_SHORT, and with the right extension - GL_UNSIGNED_INT. GL_BYTE is a not a valid type (and glGetError() is your friend : )

dattrax
Posts: 52
Joined: Sat Dec 24, 2011 5:09 pm

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 3:27 pm

sorry I'm not used to this keyboard

should have typed 'dispmanx'. do a search, but here's a thread on it.

http://www.raspberrypi.org/phpBB3/viewt ... =33&t=7672

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 3:57 pm

dattrax wrote:sorry I'm not used to this keyboard

should have typed 'dispmanx'. do a search, but here's a thread on it.

http://www.raspberrypi.org/phpBB3/viewt ... =33&t=7672
Cheers for that. I'll look into it as well.

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 7:50 pm

Ok, here's an example of x11-pixmap-to-GLES-texture workflow which produces quite decent results on a properly-accelerated X11 platform. I'm off to chasing Broadcom's mystical extension EGL_BRCM_global_image now ; )

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 7:54 pm

blue wrote: Ok, here's an example of x11-pixmap-to-GLES-texture workflow which produces quite decent results on a properly-accelerated X11 platform. I'm off to chasing Broadcom's mystical extension EGL_BRCM_global_image now ; )
Brilliant! Thanks a lot.
blu wrote: DrawElements' acceptable index types are: GL_UNSIGNED_BYTE, GL_UNSIGNED_SHORT, and with the right extension - GL_UNSIGNED_INT. GL_BYTE is a not a valid type (and glGetError() is your friend : )
Actually, I was using GL_UNSIGNED_BYTE. Typed that wrong in my post.

Shame on me for not using glGetError() before! It reported error 0x0501 on glEGLImageTargetTexture2DOES when eglCreateImageKHR was passed EGL_NATIVE_PIXMAP_KHR.

If I pass EGL_NATIVE_PIXMAP_CLIENT_SIDE_BRCM, I get a floating point exception at runtime.
I even tried EGL_IMAGE_BRCM_RAW_PIXELS, but then I get 0x0502 from glEGLImageTargetTexture2DOES.

I see EGL/eglext_brcm.h defines EGL_BRCM_global_image and prototypes for eglCreateGlobalImageBRCM / eglDestroyGlobalImageBRCM. How to use them remains a mystery.

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Sun Aug 19, 2012 10:08 pm

JPDefault wrote: Actually, I was using GL_UNSIGNED_BYTE. Typed that wrong in my post.

Shame on me for not using glGetError() before! It reported error 0x0501 on glEGLImageTargetTexture2DOES when eglCreateImageKHR was passed EGL_NATIVE_PIXMAP_KHR.

If I pass EGL_NATIVE_PIXMAP_CLIENT_SIDE_BRCM, I get a floating point exception at runtime.
I even tried EGL_IMAGE_BRCM_RAW_PIXELS, but then I get 0x0502 from glEGLImageTargetTexture2DOES.

I see EGL/eglext_brcm.h defines EGL_BRCM_global_image and prototypes for eglCreateGlobalImageBRCM / eglDestroyGlobalImageBRCM. How to use them remains a mystery.
I haven't had much luck myself. It does not seem eglCreateGlobalImageBRCM produces 'native pixmaps' from the POV of eglCreateImageKHR. That said, I got this when I tried to tell eglCreateImageKHR that the buffer was a NATIVE_PIXMAP_KHR:

Code: Select all

Program received signal SIGSEGV, Segmentation fault.
0x40151104 in platform_get_pixmap_server_handle () from /opt/vc/lib/libEGL.so
(gdb) bt
#0  0x40151104 in platform_get_pixmap_server_handle () from /opt/vc/lib/libEGL.so
#1  0x4014ce30 in eglCreateImageKHR () from /opt/vc/lib/libEGL.so
#2  0x00010778 in testbed::hook::init_resources (argc=5, argv=0xbefff794) at app_image_native_bcm.cpp:381
#3  0x0000f1f4 in main (argc=5, argv=0xbefff794) at main_bcm.cpp:1140
(gdb)
Clearly eglCreateImageKHR knows about NATIVE_PIXMAP_KHR. I'm tempted to try with plain 'ol x11 pixmaps, but I'm too tired now, so I'm calling it a day.

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Mon Aug 20, 2012 10:14 pm

Here's more or less what I've been feeding eglCreateImageKHR with:

Code: Select all

static EGLImageKHR native_img;

typedef struct _NATIVE_PIXMAP_T {
	int width;
	int height;
	int xoffset;
	int format;
	char *data;
	int byte_order;
	int bitmap_unit;
	int bitmap_bit_order;
	int bitmap_pad;
	int depth;
	int bytes_per_line;
	int bits_per_pixel;
	unsigned long red_mask;
	unsigned long green_mask;
	unsigned long blue_mask;
} NATIVE_PIXMAP_T;

NATIVE_PIXMAP_T pixmap;
char *egl_buffer = NULL;

...

	egl_buffer = (char*)malloc(256*256*2);
	pixmap.width = 256;
	pixmap.height = 256;
	pixmap.xoffset = 1;
	pixmap.format = 2;
	pixmap.data = egl_buffer;
	pixmap.byte_order = 0;
	pixmap.bitmap_unit = 16;
	pixmap.bitmap_bit_order = 0;
	pixmap.bitmap_pad = 16;
	pixmap.depth = 16;
	pixmap.bytes_per_line = 512;
	pixmap.red_mask = 0xf800;
	pixmap.green_mask = 0x07e0;
	pixmap.blue_mask = 0x001f;

	native_img = (EGLImageKHR) eglCreateImageKHR(
		state->display, EGL_NO_CONTEXT,
		EGL_NATIVE_PIXMAP_KHR,
		(EGLClientBuffer)&pixmap,
		NULL);

	glBindTexture(GL_TEXTURE_2D, texture);

	glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, (GLeglImageOES)native_img);
I thought eglCreateImageKHR would return an EGLImageKHR, but if I omit the cast I get a "making pointer from integer without a cast" warning. :o It seems to always return 0x80000000.
glEGLImageTargetTexture2DOES generates a GL_INVALID_VALUE error. :cry:
I'll post more code tomorrow, surely I must be doing something wrong.

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Wed Aug 22, 2012 1:24 pm

Not much progress here, but today I've been playing around with eglCreatePixmapSurface.
If I do this:

Code: Select all

EGLint attributes[] = {
    EGL_RED_SIZE,   5,
    EGL_GREEN_SIZE, 6,
    EGL_BLUE_SIZE,  5,
    EGL_DEPTH_SIZE, 16,
    EGL_SURFACE_TYPE,        EGL_WINDOW_BIT | EGL_PIXMAP_BIT,
    EGL_RENDERABLE_TYPE,     EGL_OPENGL_ES2_BIT,
    EGL_BIND_TO_TEXTURE_RGB, EGL_TRUE,
    EGL_NONE
};
eglChooseConfig(display, attributes, &config, 1, &num_configs);

uint8_t* egl_buffer = (uint8_t*)malloc(256*256*2);

pixmap_surface = eglCreatePixmapSurface(display, config, (NativePixmapType)egl_buffer, NULL);
I get an EGL_BAD_MATCH, which either means that my attributes list is not valid for the given config, or the given config does not support rendering to pixmaps.

If I make some exotic stuff like:

Code: Select all

typedef struct {
    GLuint width;
    GLuint height;
    void *data;
} _NATIVE_PIXMAP_T;
_NATIVE_PIXMAP_T native_pixmap;
native_pixmap.width = 256;
native_pixmap.height = 256;
native_pixmap.data = (void*)egl_buffer;
pixmap_surface = eglCreatePixmapSurface(display, config, (NativePixmapType)native_pixmap, NULL);
then I get EGL_BAD_PIXMAP. So the raw buffer of my initial code seems to be a valid pixmap, but the configuration ain't good enough to render on to it?! :shock:

dattrax
Posts: 52
Joined: Sat Dec 24, 2011 5:09 pm

Re: OpenGL ES as slow as SDL?

Wed Aug 22, 2012 4:19 pm

Two problems with what you are trying to do.

1) the v3d block in the vc4 uses physical memory. You cannot expect to malloc a buffer and just give it to hardware.

2) the v3d block uses a tiled memory format to give consistent results round a rotational axis. Even if you managed to create a vc image and gave it to egl it would be really software unfriendly to the arm.

Going back to what i said a few days back, the best way to do this is via dispmanx. There's a function to get the pointer to the buffer, so you may not even have to memcpy the result from your software renderer.

Jim

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Wed Aug 22, 2012 6:42 pm

Hi Jim. Thanks for your answer.
I'll try to follow your advice, but honestly I've got no idea how to "do it with dispmanx".
All I could find was this post, which seems to be about something similar enough.
The lack of documentation is frightening :cry:

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Wed Aug 22, 2012 7:57 pm

dattrax, I'm not necessarily disagreeing with your general suggestion, but I just got intrigued by some details in it.
dattrax wrote:Two problems with what you are trying to do.

1) the v3d block in the vc4 uses physical memory. You cannot expect to malloc a buffer and just give it to hardware.
A client-space malloc clearly won't cut it but that's why people have been trying to figure out these obscure Broadcom extensions which supposedly provide properly GPU-space mapped chunks of contiguous memory. Given such mechanisms (and they surely exist, whether that's what the extensions do or not), it's entirely viable to pass down a client-space image to the GPU at reasonable overhead, perhaps at the expense of a linear memcpy. Whether that's the most efficient path for such applications is another subject. In the general case, though, glTexImage* is usually the slowest possible path for streaming images to a GPU, so the interest for more efficient transports from client to GPU space is quite understandable.
2) the v3d block uses a tiled memory format to give consistent results round a rotational axis. Even if you managed to create a vc image and gave it to egl it would be really software unfriendly to the arm.
While GPUs do use forms of image tiling for improved spatial coherency, the majority of the modern devices also have support for linear texture buffers for 'low latency' client uploads - situations when the time-to-screen is critical, and texture reuse (if any) is not enough to justify much on-the-fly reformatting - that's how the various video YUV paths normally operate. So unless you know for a fact that the current Videocore stack does only tiled images, I'd assume there's a linear image path in this case as well. Whether that's exposed via the aforementioned extensions is a separate subject.

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Thu Aug 23, 2012 10:22 am

The good news: with dispmanx I get 3 times the FPS than using glTex* in my test application.
The bad news: it makes absolutely no difference in Snes9x.
I reckon the CPU emulation is taking most of the resources, to the point that rendering optimization has no impact on the overall performances at all. :cry:

JPDefault
Posts: 30
Joined: Mon Jul 30, 2012 8:12 am
Location: Wiltshire

Re: OpenGL ES as slow as SDL?

Thu Aug 23, 2012 12:54 pm

dattrax wrote: 1) the v3d block in the vc4 uses physical memory. You cannot expect to malloc a buffer and just give it to hardware.

2) the v3d block uses a tiled memory format to give consistent results round a rotational axis. Even if you managed to create a vc image and gave it to egl it would be really software unfriendly to the arm.
Anyway, I still wonder how allocating a buffer and feeding it to glTexSubImage2D is different from allocating a buffer and feeding it to vc_dispmanx_resource_write_data. I suppose the buffer is still being accessed by the CPU.
Sorry if it sounds like a naive question, but I found no documentation about this anywhere.

blu
Posts: 55
Joined: Tue Jul 17, 2012 9:57 pm

Re: OpenGL ES as slow as SDL?

Thu Aug 23, 2012 6:56 pm

JPDefault wrote: Anyway, I still wonder how allocating a buffer and feeding it to glTexSubImage2D is different from allocating a buffer and feeding it to vc_dispmanx_resource_write_data. I suppose the buffer is still being accessed by the CPU.
Sorry if it sounds like a naive question, but I found no documentation about this anywhere.
The question is not naive. Actually it one of the fundamental questions of GPU pipelines today.

Your application's heap space is not what the GPU uses for its host-memory accesses. The latter is an entirely different physical space, usually referred to as 'aperture space' - a window in the host's physical memory where the GPU can exert DMA. Normally, when you send over any data to the GPU via the 'trivial means', the driver stack takes whatever you point it to from your app's address space and moves it over to the GPU's aperture space. In the case of glTexImage*, the API can also do various on-the-fly transformations on the client data - reordering (tiling, etc), bpp conversions & swizzling, perhaps some hw-friendly compression (pvrtc, s3tc, etc). At the very least, though, it involves a memcpy, perhaps even a user-space-to-kernel-space memcpy.

In contrast to the above, the various native GPU image allocation APIs get the work done 'from the other end' - they allocate a contiguous buffer in GPU aperture space, map that into your application's user-space, and give you the pointer. The so-mapped buffer might not be as quick for CPU accesses (could be subject to cache exclusion policies, etc, to avoid data coherency issues from the POV of the GPU), so you might not want to do read-modify-writes to it but it is still the fastest path to get client data to the GPU. The catch with that is that you usually have to provide all data in GPU-native formats, as the on-the-fly pre-processing step is eliminated.
Last edited by blu on Thu Aug 23, 2012 6:58 pm, edited 2 times in total.

dattrax
Posts: 52
Joined: Sat Dec 24, 2011 5:09 pm

Re: OpenGL ES as slow as SDL?

Thu Aug 23, 2012 8:12 pm

the majority of the modern devices also have support for linear texture buffers for 'low latency' client uploads.
This is true, and is the case for the v3d, however there are a number of additional factors which come into play which you may not have considered. We did some testing on raster texturing and found the bandwidth to be a few orders of magnitude greater than the number of texels fetched (on another device, but I wouldn't expect anything different for the pi).

On the pi switching off the console has a positive effect on CPU performance. More used bandwidth = more heat = more clock gating = slower performance.

There are other ways to do gpu image upload, but I'm not sure what's enabled in the pi driver tree. I'll need to scan through the code to find out.
Anyway, I still wonder how allocating a buffer and feeding it to glTexSubImage2D is different from allocating a buffer and feeding it to vc_dispmanx_resource_write_data. I suppose the buffer is still being accessed by the CPU.
There is another point to consider and that is lifecycle. The v3d is a deferred, pipelined GPU design. glTexSubImage2D() has a predicable mode of operation. Think of uploading a red texture and drawing a quad with it. call glTexSubImage2D() with green data and draw a quad with that. Obviously when you call swapbuffers you expect to see a red and green quad on the screen. In an immediate mode renderer this doesnt present an issue, however in the deferred case you need to keep a copy of the red texture for the lifecycle of the frame.

The composition engine is less pipelined and doesnt have the symantic constraints, so is a lot less code under the hood.



Jim

dattrax
Posts: 52
Joined: Sat Dec 24, 2011 5:09 pm

Re: OpenGL ES as slow as SDL?

Thu Aug 23, 2012 8:19 pm

JPDefault wrote:The good news: with dispmanx I get 3 times the FPS than using glTex* in my test application.
The bad news: it makes absolutely no difference in Snes9x.
I reckon the CPU emulation is taking most of the resources, to the point that rendering optimization has no impact on the overall performances at all. :cry:
Thats good. When you setup the plane, you get slightly more bandwidth available if you disable blending with the base layer

Code: Select all

      /* this is nothing to do with the EGL window having alpha, but how its
blended to the console underneath */
      layerAlpha.flags = DISPMANX_FLAGS_ALPHA_FIXED_ALL_PIXELS;
      layerAlpha.opacity = 255;
      layerAlpha.mask = 0;

      p->dispmanElement = vc_dispmanx_element_add(p->dispmanUpdate,
                                                  p->dispmanDisplay,
                                                  0/*layer*/,
                                                  &dstRect,
                                                  0/*src*/,
                                                  &srcRect,
                                                  DISPMANX_PROTECTION_NONE,
                                                  &layerAlpha,
                                                  0/*clamp*/,
                                                  0/*transform*/);
I found I got a little performance bump from this (and from what it looks you need it)

What resolution are you targeting? You could lower the resolution and upscale in dispmanx.

Jim

Return to “OpenGLES”