Using dispmanx without EGL - how to use resources?


19 posts
by jacksonliam » Wed Jun 06, 2012 7:36 pm
Im trying to output a buffer of yuv420 to the screen with dispmanx so the GPU does the colour space conversion.

I have this code, its puts a green frame around a black rectangle, I was going for a green screen but close enough!
Code: Select all
    uint32_t screen_width;
    uint32_t screen_height;
    int32_t success = 0;
    DISPMANX_ELEMENT_HANDLE_T dispman_element;
    DISPMANX_DISPLAY_HANDLE_T dispman_display;
    DISPMANX_UPDATE_HANDLE_T dispman_update;
    DISPMANX_RESOURCE_HANDLE_T dispman_resource;
    VC_RECT_T dst_rect;
    VC_RECT_T src_rect;

    uint32_t img_handle;
    uint32_t img_result;

    bcm_host_init();

    success = graphics_get_display_size(0 /* LCD */, &screen_width, &screen_height);
    assert( success >= 0 );

    dispman_display = vc_dispmanx_display_open( 0 /* LCD */);
    dispman_update = vc_dispmanx_update_start( 0 );

    dst_rect.x = 0;
    dst_rect.y = 0;
    dst_rect.width = screen_width;
    dst_rect.height = screen_height;
    src_rect.x = 0;
    src_rect.y = 0;
    src_rect.width = 720;
    src_rect.height = 576;

    dispman_element = vc_dispmanx_element_add ( dispman_update, dispman_display,
      0/*layer*/, &dst_rect, 0/*src*/,
      &src_rect, DISPMANX_PROTECTION_NONE, 0 /*alpha*/, 0/*clamp*/, 0/*transform*/);


    vc_dispmanx_display_set_background( dispman_update, dispman_display, 0x00, 0xaa, 0x00 );

    vc_dispmanx_update_submit_sync( dispman_update );


Im not sure of the next step, do I need to set up a YUV resource then call vc_dispmanx_resource_write_data? and how do I pass data to that? It needs some kind of address and I'm not quite sure how to get that!

Then I guess something needs to be called to show the resource? Is it just vc_dispmanx_display_set_destination? Or do more things need to be done with updates?

Also - sorry for the re-post but there's no edit and my original thread wasn't very clear!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by dom » Wed Jun 06, 2012 8:14 pm
I'll try to get you an example...
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by dom » Wed Jun 06, 2012 11:04 pm
Okay, this builds in the framework of hello_triangle
https://dl.dropbox.com/u/3669512/dispmanx.c

I'm afraid dealing with YUV images is a bit ugly, but this should give you a good start.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by jacksonliam » Fri Jun 08, 2012 6:23 pm
Thanks so much! I'll have a play on sunday, my un-interrupted pi time!

Any more problems ill post here!
Liam
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by jacksonliam » Sun Jun 10, 2012 4:49 pm
dom wrote:Okay, this builds in the framework of hello_triangle
https://dl.dropbox.com/u/3669512/dispmanx.c

I'm afraid dealing with YUV images is a bit ugly, but this should give you a good start.

I can't seem to get it to build anywhere, do I have to put it somewhere specific?
Hopefully the below will give you an idea of how I'm trying to build it!

Code: Select all
pi@raspberrypi:/opt/vc/src/hello_pi/hello_triangle$ sudo make
cc -DSTANDALONE -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -DTARGET_POSIX -D_LINUX -fPIC -DPIC -D_REENTRANT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -U_FORTIFY_SOURCE -Wall -g -DHAVE_LIBOPENMAX=2 -DOMX -DOMX_SKIP64BIT -ftree-vectorize -pipe -DUSE_EXTERNAL_OMX -DHAVE_LIBBCM_HOST -DUSE_EXTERNAL_LIBBCM_HOST -DUSE_VCHIQ_ARM -Wno-psabi -I/opt/vc/include/ -I/opt/vc/include/ -I./ -I../libs -g -c dispmanx.c -o dispmanx.o -Wno-deprecated-declarations
dispmanx.c:51: error: expected â)â before âtypeâ
dispmanx.c: In function âmainâ:
dispmanx.c:94: error: âVC_IMAGE_TYPE_Tâ undeclared (first use in this function)
dispmanx.c:94: error: (Each undeclared identifier is reported only once
dispmanx.c:94: error: for each function it appears in.)
dispmanx.c:94: error: expected â;â before âtypeâ
dispmanx.c:97: error: âtypeâ undeclared (first use in this function)
dispmanx.c:97: error: âVC_IMAGE_YUV420â undeclared (first use in this function)
dispmanx.c:117: warning: implicit declaration of function âFillRectâ
dispmanx.c:122: warning: implicit declaration of function âvc_dispmanx_resource_createâ
dispmanx.c:127: warning: implicit declaration of function âvc_dispmanx_rect_setâ
dispmanx.c:128: warning: implicit declaration of function âvc_dispmanx_resource_write_dataâ
dispmanx.c:167: warning: implicit declaration of function âvc_dispmanx_resource_deleteâ
make: *** [dispmanx.o] Error 1
pi@raspberrypi:/opt/vc/src/hello_pi/hello_triangle$


Thats just using this makefile
Code: Select all
OBJS=dispmanx.o
BIN=hello_dispmanx.bin

include ../Makefile.include


Cheers, Liam
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by dom » Sun Jun 10, 2012 4:53 pm
Try:
http://elinux.org/R-Pi_Troubleshooting# ... g_firmware

(the basic non-YUV hello_dispmanx is added as an example app. Just overwrite it with the linked file to get YUV).
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by jacksonliam » Sun Jun 10, 2012 9:01 pm
Cool, works great!

So say I wanted to put out a set of YUV frames from a camera, Do I need to make a resource, set rectangles, write the frame, start an update, set more rectangles, add an element and submit sync. Then start an update, remove the element, submit sync, delete the resource. And repeat for every frame?

I dunno if its my YUV data or what, but whatever I modify on your code I crash my Pi. The best I've got is green mess on the screen and "lt-sample1: sample1.c:79: sample1: Assertion `vars->resource' failed."

I'll try playing around and will try getting just one YUV frame and outputting it to the screen first though!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by dom » Sun Jun 10, 2012 9:37 pm
Personally I'd double buffer, or you'll probably get tearing.

So, set rects, write new data, update start, remove old element, add new element, submit sync.
And swap the resource you write to each time.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by jacksonliam » Tue Jun 12, 2012 8:17 pm
Thanks so much, I've finally got it working!

I don't know if it was supposed to be like that, but in your drawrectangle you do [col * pitch + row] shouldn't it be [pitch * row + col]? I had to change it to that for my algorithm below anyway!

I know its a bit cheeky to ask someone else to optimise my routine, but is there any way to do this faster? If you don't have time I more than understand!
Code: Select all
   
//Get the data (somehow)
   //Old Data
   unsigned char  *yptr = data[0];
   unsigned char  *uptr = data[1];
   unsigned char  *vptr = data[2];
   //New Data
   uint8_t *im_y = vars->image;
    uint8_t *im_u = im_y + pitch * aligned_height;
    uint8_t *im_v = im_u + (pitch>>1) * (aligned_height>>1);
   int row, col;
   
   for ( row = 0; row < height; row++ )
   {
      for ( col = 0; col < width; col++ )
      {
         im_y[col + pitch * row] = yptr[row*pFrame->linesize[0]+col];
         im_u[(row>>1) * (pitch>>1) + (col>>1)] = uptr[row/2*pFrame->linesize[1]+col/2];
         im_v[(row>>1) * (pitch>>1) + (col>>1)] = vptr[row/2*pFrame->linesize[2]+col/2];   
      }
   }

I should add that the three data[] arrays may be larger than 'width', but only the data up 'width' is proper pixel data!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by finnw » Wed Jun 13, 2012 10:49 am
@Liam, any particular reason you want to avoid EGL? The subsampling could probably be implemented more efficiently in a fragment shader.
Posts: 24
Joined: Wed May 16, 2012 7:05 pm
by dom » Wed Jun 13, 2012 11:07 am
@Liam

Note this is untested - you'll have to debug, but the look of the code should be right.
You can simplify the copying operation:
Code: Select all
   for ( row = 0; row < height; row++ )
   {
      memcpy(im_y + row*pitch, yptr + row*pFrame->linesize[0], width);
   }
   for ( row = 0; row < height>>1; row++ )
   {
      memcpy(im_u + row*(pitch>>1), uptr + row*(pFrame->linesize[1]), width>>1);
      memcpy(im_v + row*(pitch>>1), vptr + row*(pFrame->linesize[2]), width>>1);
   }

But, if you ensure pitch==pFrame->linessize[0] and (pitch>>1)==pFrame->linessize[1]==pFrame->linessize[2]
this simplifies to:
Code: Select all
      memcpy(im_y, yptr, pFrame->linesize[0] * height);
      memcpy(im_u, uptr, pFrame->linesize[1] * (height>>1));
      memcpy(im_v, vptr, pFrame->linesize[2] * (height>>1));

and if you also ensure that the height is correctly aligned, im_u==im_y + pFrame->linesize[0]*height and im_v==im_u+pFrame->linesize[1]*(height>>1)
(i.e. the Y, U and V blocks of data are contiguous) then you can just use yptr directly, without moving the pixels around.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by jacksonliam » Wed Jun 13, 2012 1:53 pm
finnw wrote:@Liam, any particular reason you want to avoid EGL? The subsampling could probably be implemented more efficiently in a fragment shader.


Probably, but i don't know opengl or graphics architectures and all the examples were for spinning cubes, i just want some gpu acceleration for color space conversion and someone in the mpeg2 thread said I could use dispmanx easily!

Thanks dom, i'll play with that tonight and hopefully get my framerate above 6fps!
I think those conditions may be met but i think pitch=linesize[0] Doesn't work, I think it has to be pitch=ALIGN_UP (width, 32)
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by dom » Wed Jun 13, 2012 2:08 pm
jacksonliam wrote:I think those conditions may be met but i think pitch=linesize[0] Doesn't work, I think it has to be pitch=ALIGN_UP (width, 32)


Yes, pitch and height need to be padded up. You should ensure that happens in the decoder app. (i.e. don't set pitch=linesize[0], but ensure linesize[0]=pitch).
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by jacksonliam » Wed Jun 13, 2012 8:01 pm
dom wrote:
jacksonliam wrote:I think those conditions may be met but i think pitch=linesize[0] Doesn't work, I think it has to be pitch=ALIGN_UP (width, 32)


Yes, pitch and height need to be padded up. You should ensure that happens in the decoder app. (i.e. don't set pitch=linesize[0], but ensure linesize[0]=pitch).

I think linesize is to do with the codec optimisations, I don't think its possible to change it :(

Fortunately, there doesn't seem to be a FPS hit from using the first routine you gave me!
First:
Code: Select all
  38 frames rendered in 2.0368 seconds -> FPS=18.6568
  39 frames rendered in 2.0390 seconds -> FPS=19.1267
  41 frames rendered in 2.0260 seconds -> FPS=20.2369
  40 frames rendered in 2.0027 seconds -> FPS=19.9729
  41 frames rendered in 2.0315 seconds -> FPS=20.1817
  38 frames rendered in 2.0490 seconds -> FPS=18.5457
  35 frames rendered in 2.0003 seconds -> FPS=17.4972
  44 frames rendered in 2.0313 seconds -> FPS=21.6609

Second:
Code: Select all
  38 frames rendered in 2.0296 seconds -> FPS=18.7229
  40 frames rendered in 2.0351 seconds -> FPS=19.6555
  42 frames rendered in 2.0201 seconds -> FPS=20.7911
  42 frames rendered in 2.0333 seconds -> FPS=20.6563
  40 frames rendered in 2.0413 seconds -> FPS=19.5956
  38 frames rendered in 2.0259 seconds -> FPS=18.7568
  38 frames rendered in 2.0138 seconds -> FPS=18.8697
  45 frames rendered in 2.0196 seconds -> FPS=22.2811

Thats with writing to GPU commented out
My upload to GPU code for each frame seems to be quite slow (~4fps hit)!
Code: Select all
  32 frames rendered in 2.0049 seconds -> FPS=15.9607
  29 frames rendered in 2.0249 seconds -> FPS=14.3220
  33 frames rendered in 2.0048 seconds -> FPS=16.4606
  34 frames rendered in 2.0447 seconds -> FPS=16.6283
  33 frames rendered in 2.0447 seconds -> FPS=16.1396
  33 frames rendered in 2.0446 seconds -> FPS=16.1402
  30 frames rendered in 2.0045 seconds -> FPS=14.9661
  30 frames rendered in 2.0045 seconds -> FPS=14.9666


Here's the code BTW, but I suspect I'm doing the bare minimum already?
Code: Select all
vc_dispmanx_rect_set( &dst_rect, 0, 0, width, type == VC_IMAGE_YUV420 ? (3*aligned_height)/2 : height);
         
         ret = vc_dispmanx_resource_write_data(  vars->resource,
                                       type,
                                       pitch,
                                       vars->image,
                                       &dst_rect );
         assert( ret == 0 );
         
         vars->update = vc_dispmanx_update_start( 10 );
         assert( vars->update );
         
         vc_dispmanx_rect_set( &src_rect, 0, 0, width << 16, height << 16 );

         vc_dispmanx_rect_set( &dst_rect, 0, 0, vars->info.width, vars->info.height );

         //remove old resource if not the first frame
         if(totalFrames > 0){
            ret = vc_dispmanx_element_remove( vars->update, vars->element );
            assert( ret == 0 );
         }
         
         //Make new one
         vars->element = vc_dispmanx_element_add(    vars->update,
                                          vars->display,
                                          2000,               // layer
                                          &dst_rect,
                                          vars->resource,
                                          &src_rect,
                                          DISPMANX_PROTECTION_NONE,
                                          &alpha,
                                          NULL,             // clamp
                                          VC_IMAGE_ROT0 );
         ret = vc_dispmanx_update_submit_sync( vars->update );
         assert( ret == 0 );


I have to thank you so much, I've learned more new stuff from this thread than my entire year at UNI :D
Also, setting the dest_rect to make the frames full screen makes the frames look better on my tv than my x86 dual core HTPC with a nVidia chip!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by dom » Wed Jun 13, 2012 9:02 pm
It's probably the sync call that's costing you the fps. That will wait (by sleeping) until the next vsync.
What you probably want to do to multithread it:
You decode to a small queue of buffers in one thread, and block when all buffers are full
You submit decoded buffers to dispman in another thread

This way the sync time can be spent doing something useful. Having the queue also means you average out the processing.
Suppose it takes 50ms to decode a B frame, and 25ms to decode a P frame, but at 25fps you want to present frames every 40ms. The queue means you can still not miss displaying any frames.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by jacksonliam » Wed Jun 13, 2012 10:02 pm
dom wrote:It's probably the sync call that's costing you the fps. That will wait (by sleeping) until the next vsync.
What you probably want to do to multithread it:
You decode to a small queue of buffers in one thread, and block when all buffers are full
You submit decoded buffers to dispman in another thread

This way the sync time can be spent doing something useful. Having the queue also means you average out the processing.
Suppose it takes 50ms to decode a B frame, and 25ms to decode a P frame, but at 25fps you want to present frames every 40ms. The queue means you can still not miss displaying any frames.

I did think about multi threading, but that's a nice bit of info on the sync call. I shall do that!
User avatar
Posts: 151
Joined: Tue Feb 07, 2012 10:09 pm
by joco » Sun Jun 24, 2012 9:18 am
hi Dom,

Could you improve your original example in order to show the proper way of double buffering and display updates? I can easily change the color space to RGB - thanks for that, but it would be
great to see, how we can do the page flip with dispmanx.

I do not really understand why we need to add / remove display elements. Once an element added, it has a memory region vars->image for the pure pixel memory. Would it be possible to update the
display rectangle without add/remove?
In the hello_triangle example there is no add/remove instructions, but somehow opengl updates the one and only rectangle. I would like to do the same thing.

A working example would be great. It is already shows how to use color spaces and pixelmaps allocated in the memory, but I am sure I am not the only one who wants to see the best and fastest
way of updating those areas.

Thank you very much in advance
Joseph
Posts: 15
Joined: Sat Jun 02, 2012 1:54 pm
by ComputerJock » Sun Aug 19, 2012 6:34 am
Where is any of this stuff documented?

I looked in /usr/share/doc/libraspberrypi-doc; no mention of any of the libs under /opt
(It's also pretty useless -- maybe not to the person who wrote it, but it's got a bunch of stuff about ports but never defines what a port is, how one gains access to it, etc. )

I downloaded Broadcom BCM2835.pdf; no mention of any of those libraries there.
I found OpenGL_ES_2-0_ProgrammingGuide_2009.pdf; no mention there either.
I found a ARM1176JZF-S Revision r0p7 Technical Reference Manual.pdf; it didn't seem to have anything about video

I thought I'd try modifying dispmanx.c to just display a png. It appeared that the VC_IMAGE_RGB565 was just an array of 16-bit pixels. So I modified it to use VC_IMAGE_RGBA32 and an array of 32-bit pixels. Doesn't work -- no rectangle, stuff looks like it's skewed. The VC_IMAGE_YUV420 looks completely different.

So could someone point me to where this stuff is documented? And the libraries in /opt/vc/lib/?

It's pretty hard to tinker with the RPi by just guessing.
Posts: 14
Joined: Tue Aug 07, 2012 11:21 pm
by dom » Wed Oct 31, 2012 6:10 pm
I've abused the API of
vc_dispmanx_resource_create( VC_IMAGE_TYPE_T type, uint32_t width, uint32_t height, uint32_t *native_image_handle );

to allow:
// Allow (width | pitch << 16) to be specified in width to force pitch.
// Alloc (height | aligned_height << 16) to be specified in height to force aligned height
// Note: insufficiently aligned pitches/heights may break vc_image functions, but HVS should still display them.

aligned_height allows the space between Y, U and V planes to be controlled (most usefully make them tightly packed).

pitch is specified in bytes, and is the distance between vertically adjacent pixels. Most usefully:

uint32_t pitch = width * 4;
vc_dispmanx_resource_create( VC_IMAGE_RGBA32, width | pitch << 16, height, &native_image_handle );

if your images are 32bpp and tightly packed.

If you don't specify a pitch, it will use the GPU's native pitch, which is typically (width aligned up to 16) * (bytes per pixel).
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 4013
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge