Pi0-3 and Pi4 have various significant differences in the 3D hardware, so I would expect some difference in CPU load. Generally I would have expected Pi4 to be better, but it depends on exactly the paths being taken.
DRM supports direct rendering of planes, but it also only supports a single authorised client at a time. If X is running, then it is the client, and other applications can't directly add layers.
With FKMS we can cheat and ask the firmware to add layers via either DIspmanX or MMAL. Generally composition is done by EGL which isn't as efficient, and depends on how the application drives it as to how efficient it is.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.