The GPU and ARM cores share the same memory interface.
How many things are you compositing to make your full scene?
The GPU has a very powerful block called the Hardware Video Scaler (HVS) that is given a list of items to put together into the frame output to the HDMI, each with their own cropping, location, and layer information. There are limits to the amount of downscaling and layers that can be composed in realtime, but typically you'll get a couple of overlapping layers quite merrily.
The IL video_render component just adds one extra item into the list of things the HVS is told to display, along with the appropriate parameters.
You can also add items directly using the DispmanX API. Examples at https://github.com/raspberrypi/userland ... dispmanx.c
amongst others. Unless you are layering vast numbers of objects, then I'd suggest you investigate DispmanX.
The other suggestion I'd make is to switch from IL to MMAL.
IL was written when the chip was intended to be used as a graphics co-processor, and all data had to be copied across from the main memory to the co-pro. MMAL is using exactly the same underlying components, but was written becasue (a) IL was an absolute swine to get right, and (b) there was a shift to applications processors (APs) where CPU and GPU were more tightly combined. MMAL has tricks that avoid a load of the copying by mapping the GPU memory allocation into the ARM virtual memory space (and then managing the caches carefully).
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.