Page 1 of 1

glDrawElements abysmally slow

Posted: Sat Jun 16, 2018 7:16 am
by cnlohr
I have just come across a problem that looks a lot like this right now, and it would be very inconvenient to use X11. I am wondering if you guys are loading "vc4-fkms-v3d" or "vc4-kms-v3d" in your conifg. If I do, I can't seem to use EGL without X11.

So, I'm right now living with both of those commented out in my config.

But, seems like you guys, my pi is screaming fast normally... But over about 300 draw calls, and the thing comes to its knees. I can even do a fair bit of stuff with screen fill, etc. and it's fine, but those calls to glDrawElements

I also tried modding triangle.c from the demos that come in /opt/vc. Same deal. Over about 100-200 draw calls and things absolutely fall apart.

Does anyone have a demo of how to speak to the GPU so it doesn't go off in to la la land for forever when I make the draw call?

I can't exactly c&p my code because it's pretty heavily entrenched, but, the same issues happen in triangle.c. What's going on here?

Re: glDrawElements abysmally slow

Posted: Sat Jun 16, 2018 7:47 am
by PeterO
Without seeing some example code it is hard to diagnose your problem,

Can you at least post your modified "triangle" example that shows the problem ?

I have some quite complex OpenGL ES code that runs fine, but it is composed of lots of small objects that each have their own call to drawElements.


Re: glDrawElements abysmally slow

Posted: Sat Jun 16, 2018 5:03 pm
by cnlohr
Thank you for the response!

Here is the modified triangle.c. It obtains 10FPS, regardless of resolution, etc. ... 09beb629ba

If you did want to see my code, as-is... - most of the pertinent code is in src/spreadgine.c under SpreadRenderGeometry and rawdraw/CNFGEGLDriver.c.

Again, I want to ask, is it possible this could have to do with some sort of issue with the non-X driver?


Re: glDrawElements abysmally slow

Posted: Sun Jun 17, 2018 4:16 am
by cnlohr
@PeterO - Is there any way you would share with me example code of something you have that can make a lot of draw calls quickly?

Re: glDrawElements abysmally slow

Posted: Sun Jun 17, 2018 5:52 am
by PeterO
triangle.c looks like old GLES 1 code so I don't think it's worth trying to diagnose anything based on that.


Re: glDrawElements abysmally slow

Posted: Sun Jun 17, 2018 8:11 am
by Paeryn
Also, that is making a lot of 2-triangle draw calls. You really want to batch as many triangles as you can in each draw call and you want to make the minimum number of state changes between each draw call. Every time you have to send a command to the GPU there is an overhead, and the overheads for changing texture, matrix transformation and a draw call every 2 triangles is going to add up to a huge amount of time.

I made a few quick modifications,
  1. Turned on the Z-buffer (otherwise things further away can end up drawn on top of things near).
  2. Changed the texture usage to load the three images onto separate quadrants of one texture.
  3. Modified the texture coordinates of the cube's faces so that each face uses the relevant quadrant.
  4. Made the draw call draw all the cube in one go.
I didn't map the rotations of each face, should be a case of modifying the vertices and/or texture coords to match what was desired.
Nor did I re-order the triangle strips to make the faces follow on properly, as the cube is defined front face then back face, it will draw triangles joining those two faces which aren't wanted. The Z buffering should hide most of it but it really needs the vertices re-ordering so that each triangle follows on from the previous rather than jumping around, but this was a quick hack.

Modified triangle.c to draw each cube in one go, just the functions I changed are listed (don't think I missed anything)

Code: Select all

// Added requesting and enabling the depth buffer
static void init_ogl(CUBE_STATE_T *state)
  int32_t success = 0;
  EGLBoolean result;
  EGLint num_config;

  static EGL_DISPMANX_WINDOW_T nativewindow;

  DISPMANX_UPDATE_HANDLE_T dispman_update;
  VC_RECT_T dst_rect;
  VC_RECT_T src_rect;

     static const EGLint attribute_list[] =
         EGL_DEPTH_SIZE, 16,

     EGLConfig config;

     // get an EGL display connection
     state->display = eglGetDisplay(EGL_DEFAULT_DISPLAY);

     // initialize the EGL display connection
     result = eglInitialize(state->display, NULL, NULL);
     assert(EGL_FALSE != result);

     // get an appropriate EGL frame buffer configuration
     result = eglChooseConfig(state->display, attribute_list, &config, 1, &num_config);
     assert(EGL_FALSE != result);

     // create an EGL rendering context
     state->context = eglCreateContext(state->display, config, EGL_NO_CONTEXT, NULL);

     // create an EGL window surface
     success = graphics_get_display_size(0 /* LCD */, &state->screen_width, &state->screen_height);
     assert( success >= 0 );

     dst_rect.x = 0;
     dst_rect.y = 0;
     dst_rect.width = state->screen_width;
     dst_rect.height = state->screen_height;

     src_rect.x = 0;
     src_rect.y = 0;
     src_rect.width = state->screen_width << 16;
     src_rect.height = state->screen_height << 16;

     state->dispman_display = vc_dispmanx_display_open( 0 /* LCD */);
     dispman_update = vc_dispmanx_update_start( 0 );

     state->dispman_element = vc_dispmanx_element_add ( dispman_update, state->dispman_display,
							0/*layer*/, &dst_rect, 0/*src*/,
							&src_rect, DISPMANX_PROTECTION_NONE, 0 /*alpha*/, 0/*clamp*/, 0/*transform*/);

     nativewindow.element = state->dispman_element;
     nativewindow.width = state->screen_width;
     nativewindow.height = state->screen_height;
     vc_dispmanx_update_submit_sync( dispman_update );

     state->surface = eglCreateWindowSurface( state->display, config, &nativewindow, NULL );
     assert(state->surface != EGL_NO_SURFACE);

     // connect the context to the surface
     result = eglMakeCurrent(state->display, state->surface, state->surface, state->context);
     assert(EGL_FALSE != result);

     // Set background color and clear buffers
     glClearColor(0.15f, 0.25f, 0.35f, 1.0f);

     // Enable back face culling.



// Draw all triangles of a cube in one go
static void redraw_scene(CUBE_STATE_T *state)
  // Start with a clear screen (and depth buffer)

  // Draw all faces:
  // Bind texture surface to current vertices
  glBindTexture(GL_TEXTURE_2D, state->tex[0]);

  // Need to rotate textures - do this by rotating each cube face
  glRotatef(270.f, 0.f, 0.f, 1.f ); // front face normal along z axis

  int x, y;
  for( y = 0; y < 20; y++)
      glTranslatef( (y-10)*10, 0, 0 );
      for( x = 0; x < 20; x++)
          glTranslatef( 0, (x-10)*10, 0 );
          glScalef( 0.1f, 0.1f, 0.1f );

          // draw all 24 vertices in one go
          glDrawArrays( GL_TRIANGLE_STRIP, 0, 24);

          //***** ok, the rotates haven't been done but that should really be fixed in the model
          // same pattern for other 5 faces - rotation chosen to make image orientation 'nice'
          //glBindTexture(GL_TEXTURE_2D, state->tex[1]);
          //glRotatef(90.f, 0.f, 0.f, 1.f ); // back face normal along z axis
          //glDrawArrays( GL_TRIANGLE_STRIP, 4, 4);

          //glBindTexture(GL_TEXTURE_2D, state->tex[2]);
          //glRotatef(90.f, 1.f, 0.f, 0.f ); // left face normal along x axis
          //glDrawArrays( GL_TRIANGLE_STRIP, 8, 4);

          //glBindTexture(GL_TEXTURE_2D, state->tex[3]);
          //glRotatef(90.f, 1.f, 0.f, 0.f ); // right face normal along x axis
          //glDrawArrays( GL_TRIANGLE_STRIP, 12, 4);

          //glBindTexture(GL_TEXTURE_2D, state->tex[4]);
          //glRotatef(270.f, 0.f, 1.f, 0.f ); // top face normal along y axis
          //glDrawArrays( GL_TRIANGLE_STRIP, 16, 4);

          //glBindTexture(GL_TEXTURE_2D, state->tex[5]);
          //glRotatef(90.f, 0.f, 1.f, 0.f ); // bottom face normal along y axis
          //glDrawArrays( GL_TRIANGLE_STRIP, 20, 4);

  eglSwapBuffers(state->display, state->surface);

// Put all three images onto one texture so you don't need to keep binding
static void init_textures(CUBE_STATE_T *state)
  // load three texture buffers but use them on quadrants of one OGL|ES texture surface
  glGenTextures(1, &state->tex[0]);

  // setup texture
  glBindTexture(GL_TEXTURE_2D, state->tex[0]);
  // Create the texture at twice the resolution of one image so we can
  // put each image into it's own quadrant of the texture.
  // image 1 (old tex[0] and tex[1]) is lower left (0.0, 0.0) -> (0.5, 0.5)
  glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, IMAGE_SIZE, IMAGE_SIZE,
                  GL_RGB, GL_UNSIGNED_BYTE, state->tex_buf1);

  // image 2 (old tex[2] and tex[3] is lower right (0.5, 0.0) -> (1.0, 0.5)
                  GL_RGB, GL_UNSIGNED_BYTE, state->tex_buf2);

  // image 3 (old tex[4] and tex[5] is upper left (0.0, 0.5) -> (0.5, 1.0)
                  GL_RGB, GL_UNSIGNED_BYTE, state->tex_buf3);

  // setup overall texture environment
  glTexCoordPointer(2, GL_FLOAT, 0, texCoords);

The modified texture coordinates in cube_texture_and_coords.h

Code: Select all

/** Texture coordinates for the quad. */
static const GLfloat texCoords[6 * 4 * 2] = {
   0.0f,  0.0f,
   0.0f,  0.5f,
   0.5f,  0.0f,
   0.5f,  0.5f,

   0.0f,  0.0f,
   0.0f,  0.5f,
   0.5f,  0.0f,
   0.5f,  0.5f,

   0.5f,  0.0f,
   0.5f,  0.5f,
   1.0f,  0.0f,
   1.0f,  0.5f,

   0.5f,  0.0f,
   0.5f,  0.5f,
   1.0f,  0.0f,
   1.0f,  0.5f,

   0.0f,  0.5f,
   0.0f,  1.0f,
   0.5f,  0.5f,
   0.5f,  1.0f,

   0.0f,  0.5f,
   0.0f,  1.0f,
   0.5f,  0.5f,
   0.5f,  1.0f

Re: glDrawElements abysmally slow

Posted: Sun Jun 17, 2018 4:57 pm
by cnlohr
@PeterO: But BOTH GLES1 and GLES2 are very slow, both types of draw calls in GLES1 and GLES2 are about the same speed (30+us per call!). I know that additional draw calls are not preferred, and I do batch a fair bit but in practice, most things expect to make a few hundred, maybe 1 or 2 thousand calls per screen. In the project I am batching the draw calls as best as I practically can, but there are changes I want to make per-instance uniform-matrix wise, etc.

With your example @Paeryn, I can't even get the 90 FPS on my display unless I reduce the number of draw calls. Can you bump up the # of draw calls by 10 and seeing what kind FPS you're getting on your system? I want to make sure it's not something wrong with my pi or software setup.

There should be no issue making 400+ draw calls per frame. I am aware there's a lot of overhead every time you make a draw call, but it should be on the order of 1-5us, not > 30us like I'm getting now!

Another note, it seems that there's a huge amount of time spent "sleeping" when I make a draw call. For instance, your example only uses ~8% of one core.

Re: glDrawElements abysmally slow

Posted: Sun Jun 17, 2018 7:33 pm
by PeterO
SO one obvious question is how is the memory split set up ?

Re: glDrawElements abysmally slow

Posted: Sun Jun 17, 2018 8:45 pm
by cnlohr

Code: Select all

Here is my full config: ... nt-2621338

Still very curious what your performance is like. I'm hoping I'm not chasing against something systemic.

Re: glDrawElements abysmally slow

Posted: Mon Jun 18, 2018 12:51 am
by jpgygax68
While I haven't measured any call durations, the author of the Castle Game Engine has succeeded in dramatically improving the framerate, which was previously kept down by glDrawElements() calls, by modifying his code so that it would no longer reuse the same vbo for all instances of a given object type (in this case: images, i.e. 2D textured rectangles).

To explain this, the author referred to the OpenGL reference at ... Data.xhtml : "Consider using multiple buffer objects to avoid stalling the rendering pipeline during data store updates. If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated. "

Re: glDrawElements abysmally slow

Posted: Wed Jun 20, 2018 5:00 am
by cnlohr
@jpgygax68 - just tried. Each with a unique VBO. Actually, results are _slightly_ worse than re-using VBOs. ~33-40us each draw call, no matter what.

Can anyone else please try to see if this is an issue under all raspi 3D stuff?