audetto
Posts: 29
Joined: Fri Feb 28, 2014 8:44 pm

Fast(er) image rendering to screen on a Pi3

Sun Dec 15, 2019 7:35 pm

I am working on a computer emulator and I have the need to display a bitmap to the screen fast (60FPS?)

I have tried using Qt and I get about 30 FPS at the cost of massive CPU usage.
Tried with SDL2 (640x480 32 bit) with a code which looks like

Code: Select all

 do
  {
    SDL_Surface * bmp = (counter & 1) ? bmp1 : bmp2;
    SDL_Texture * tex = SDL_CreateTextureFromSurface(ren, bmp);

    //Draw the texture
    SDL_RenderCopy(ren, tex, NULL, NULL);
    //Update the screen
    SDL_RenderPresent(ren);

    const auto end = std::chrono::steady_clock::now();
    elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    ++counter;
  } while (elapsed < 5000);
this just alternates between 2 surfaces every other frame just to see it doing something.

and I get a bit better, 47 FPS, but still > 150% CPU usage (and none of the rest of the algorithm if even doing anything)

The question is: how do I use PI GPU or else to do this more effectively?
The image is stored in a memory buffer RGBA.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 25410
Joined: Sat Jul 30, 2011 7:41 pm

Re: Fast(er) image rendering to screen on a Pi3

Sun Dec 15, 2019 7:45 pm

Which display driver are you using, KMS, FKMS or the legacy?

For FKMS and legacy you could use dispmanx which would allow you to superimpose a bitmap over the display. Very fast, but not real interoperability with the desktop as its a layer over the top.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I own the world’s worst thesaurus. Not only is it awful, it’s awful."

audetto
Posts: 29
Joined: Fri Feb 28, 2014 8:44 pm

Re: Fast(er) image rendering to screen on a Pi3

Sun Dec 15, 2019 8:59 pm

For reference the code is here
https://github.com/audetto/TwinklebearD ... in.cpp#L59

Different display driver has a big impact as the Full KMS goes 60FPS at 40-50% of the CPU.

It still feels a lot for doing nothing, but I think the issue is with the desktop integration.
I am indeed running inside the Destop environment.

audetto
Posts: 29
Joined: Fri Feb 28, 2014 8:44 pm

Re: Fast(er) image rendering to screen on a Pi3

Sun Dec 15, 2019 9:39 pm

For reference if anyone is reading, I've tried another version

https://github.com/audetto/TwinklebearD ... in.cpp#L68

using SDL_BlitSurface + SDL_UpdateWindowSurface but it generates massive rendering artefacts.
On the other hand, it looks like it is faster.

Daniel Gessel
Posts: 119
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Fast(er) image rendering to screen on a Pi3

Thu Jan 02, 2020 1:48 am

I’m also interested in getting data to the screen quickly running under X. I’m currently experimenting with texsubimage then rendering the texture, but I’d like to find a way to map the texture buffer into cpu accessible memory or some other technique to avoid the extra copy that Texsubimage has to do. Or just some way to maintain an image shared with the x server (I’ve done little to no x programming, so this might be quite easy if I knew how).

joyrider3774
Posts: 35
Joined: Sun Mar 13, 2016 12:21 pm

Re: Fast(er) image rendering to screen on a Pi3

Sun Feb 09, 2020 1:57 pm

are you completly up to date with your packages ?

i just ran your latest version on my raspberry pi3 and i'm getting huge FPS amounts on my pi3

Code: Select all

[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
1425 frames in 5001 ms = 284.943 FPS
[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
1923 frames in 5000 ms = 384.6 FPS
[email protected]:~/projecten/SDL2TEST $ 
the 1st value is when i was looking at the screen using vnc viewer, the 2nd value is when i had vnc viewer closed (it has an impact on fps i noticed it in my game as well).

I'm just using latest raspbian buster, with an sd card that i initially had setup on a rapsberry pi4 and just inserted the sd card in my pi 3 and it gave me those numbers.

However on my pi4 i'm only getting about 70 FPS for the same compiled code while looking with vnc, have not tried without looking with vnc.

It baffles me that the rpi3 is way faster with the same image compared to the pi4 but the pi4 probably uses different implementation and not sure what the pi3 uses. I'm guessing the bcrm * files but not sure.

I really wonder why this rans so bad on my pi 4 (the last version have not tried other version) with exact same sd card


can you check to see if you are up to date (mesa got updated)

edit:

I also tried your initial version posted in 1st post there i only get these values (on the pi3)

Code: Select all

[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
172 frames in 5013 ms = 34.3108 FPS
[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
215 frames in 5020 ms = 42.8287 FPS
[email protected]:~/projecten/SDL2TEST $ 
i also have a pi 2 but when i was testing with my game SDL_RENDERER_ACCELERATED was slower than SDL_RENDERER_SOFTWARE in my game so i won't be using hw acceleration or my game can't run on rpi2 at least not nicely. I have not tried your blitting routines on my pi2 can do so if you like. And software rendering was slow if i used RGBA8888 but fast if i used ARGB8888 texture for a surface i'm blitting everything on first. something weird is going on i think at my end. my games which hardly does anything intensive gets about 50 fps if i enabled HW and with ARGB8888 & Software i get 150 fps, if i use software & RGBA8888 i get also about 50 fps for an upscaled 320x240 screen to 640x480 (using sdl) this is on a raspberry pi 3 must test again on the 4 to see the difference

Edit just upgraded raspbian packages again. On the Pi4 software is slower than hardware now. Maybe something is switched between rpi 3 & 4 for the 3 i have to use software + ARGB8888 to get 150 fps in my for the Pi4 i have to use accelerated and get 290 fps but running software only gets me around 50 fps (same code that runs at 150 fps in software i don't get it although the info on the pi4 looks more normal) the pi 3 results with my game are weird as software seems to be faster than hardware there

i also ran your code on my pi 4 now (latest version) these the results the pi 4 i get 1st try with vnc connected 2nd without vnc connect but still displaying on a screen

Code: Select all

358 frames in 5006 ms = 71.5142 FPS
[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
634 frames in 5004 ms = 126.699 FPS
[email protected]:~/projecten/SDL2TEST $ 
so it seems its slower on the pi 4.

edit2: enabled the opengl / fake kms driver on rpi 3 it was not running before only on my pi4, now my game also runs fast in hw mode on rpi 3 and slow in software mode. retesting the last test code on my rpi with opengl drivers gives me these results

1st vnc connected 2nd not

Code: Select all

[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
237 frames in 5052 ms = 46.9121 FPS
[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
460 frames in 5001 ms = 91.9816 FPS
[email protected]:~/projecten/SDL2TEST $ 


so the problem was i had legacy mode enabled on rpi 3 and not the opengl driver. But it seems that test code from last version ran faster in legacy mode somehow when i had initially tested it on my rapsberry pi 3. I will have to verify this again to be sure as i can't explain it but at least enabling opengl drivers on both systems (pi3 / 4) gives better results in hw than in software as it should and i'm seeing the same on my rpi 4 (software slower than hardware)

edit3: those high fps results in the last version of the test program was using legacy driver i just tested all 3 drivers. legacy reports 300 fps, full / fake opengl - kms driver show about the same results about 100 fps when running with vnc viewer open. I'm guessing the legacy driver was just broken or something and things don't work correctly when using hardware acceleration and it shows wrong fps results probably but i did not test further only ran that last test with the 3 modes on a rpi 3

final edit: i can reproduce the high fps on rpi4 also by disabling opengl driver

[email protected]:~/projecten/SDL2TEST $ ./SDL2Test
2876 frames in 5001 ms = 575.085 FPS
[email protected]:~/projecten/SDL2TEST $

Also the test seems broken. It does not specify RMASK, GMASK, BMASK, AMASK in the bitmap creation.

if i specify those for example

Code: Select all

  SDL_Surface *bmp1 = SDL_CreateRGBSurface(0, 640, 480, 32, 0x000000FF, 0xFF000000, 0x00FF0000, 0x0000FF00);
 
i only get 50 fps in the same test then in legacy mode and not reporting such high nrs. So the way to go is to enable opengl driver and use hardware acceleration from what i can see

joyrider3774
Posts: 35
Joined: Sun Mar 13, 2016 12:21 pm

Re: Fast(er) image rendering to screen on a Pi3

Sun Feb 09, 2020 6:43 pm

also try this in legacy mode (no opengl driver) enabled with SDL_RENDERER_SOFTWARE

Code: Select all

 SDL_Texture *buffer = SDL_CreateTexture(ren, SDL_PIXELFORMAT_ARGB8888, SDL_TEXTUREACCESS_TARGET , 640,480);
 
  
  bool blit = true;

  auto start = std::chrono::steady_clock::now();
  long elapsed;
  int counter = 0;
  srand (time(NULL));
  SDL_Rect rect;
  do
  {
  
    if (blit)
    {
      SDL_SetRenderTarget(ren, buffer);
      SDL_RenderClear(ren);
      SDL_SetRenderDrawColor(ren, rand()%255, rand()%255, rand()%255 , 255);
      rect.x = rand()%639;
      rect.y = rand()%479;
      rect.w = 50;
      rect.h = 50;
      SDL_RenderFillRect(ren, &rect );
      SDL_SetRenderTarget(ren, NULL);
      
      SDL_RenderCopy(ren, buffer , NULL, NULL);

      SDL_RenderPresent(ren);
    }

    const auto end = std::chrono::steady_clock::now();
    elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    ++counter;
  } while (elapsed < 5000);

  const double fps = counter / (elapsed / 1000.0);
  std::cout << counter << " frames in " << elapsed << " ms = " << fps << " FPS" << std::endl;
[email protected]:~/projecten/SDL2TEST $ "./SDL2Test"
1947 frames in 5001 ms = 389.322 FPS


I still get fast fps then (on rpi4 disconnect my rpi3) and i do blit to a buffer surface and the render that whole surface to the screen. if i run my game like that (where i draw not so much changing things) i get 150fps but if i enable opengl driver and try the same the speed drops to a crawl
Maybe you would just need to find a way to fill a buffer fast enough and try the code out in legacy mode with software renderer in x11 (only then it works)

if you change SDL_PIXELFORMAT_ARGB8888 to SDL_PIXELFORMATRGBA8888 fps drops by more than halve so not sure why SDL_PIXELFORMAT_ARGB8888 works better with software renderer and legacy driver mode in x11, thats what i had found out using my game when i was running in legacy mode on the rpi 3 / 4. I don't understand it myselve why i'm seeing this but it's what i was initally talking about when i was unaware my pi 4 was running with opengl driver and my raspberry pi 3 in legacy driver where the reported fps on the pi 3 was faster than on the rpi 4 with same code also do not specify vsync it does not work at least not on my end in this mode like this and enabling it may cause the renderer failing to create

joyrider3774
Posts: 35
Joined: Sun Mar 13, 2016 12:21 pm

Re: Fast(er) image rendering to screen on a Pi3

Mon Feb 10, 2020 1:40 pm

i made a video of the behaviour i'm seeing both in windows as on the pi with legacy mode driver enabled / opengl driver disabled and where i'm getting a big amount of fps with my game. I don't understand the behaviour with the legacy driver in software mode being faster (fps wise) compared to accelerated opengl driver. If someone can explain this to me that would be great. The video is available here https://www.youtube.com/watch?v=c46XZb-KKD0 thanks

Daniel Gessel
Posts: 119
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Fast(er) image rendering to screen on a Pi3

Mon Feb 10, 2020 2:05 pm

The legacy driver has an implementation of OpenGL ES 2.0 that is hardware accelerated. It is proprietary Broadcom implementation, and doesn’t go through X at all. It renders into overlays so it can get the benefit of page flipping when running its overlays on top of X. From you previous posts, I think you are using SDL, which I don’t know at all - but you may be able to get it to log some GL strings that would tell you which GL (or GLES) driver it’s using.

joyrider3774
Posts: 35
Joined: Sun Mar 13, 2016 12:21 pm

Re: Fast(er) image rendering to screen on a Pi3

Tue Feb 11, 2020 3:27 am

Ah yeah, but still it seems that driver is pretty fast in software mode with it's overlay.

btw audetto you could consider running fullscreen and see if you get more fps there is flag for the creation of the window to set it fullscreen

Return to “Graphics programming”