bullen
Posts: 315
Joined: Sun Apr 28, 2013 2:52 pm

Re: Skin Mesh Animation

Tue Jan 07, 2020 2:05 pm

I'm aware of moving GPU stuff to the CPU for devices that are bottlenecked by the GPU. But:

1) Complexity, not good and specifically when GPUs will automatically become better relative to CPUs over the next 10 years. See Jetson Nano.

2) You would need multithreaded GPU access or your frames will get motion-to-photon latency. Just try playing any PS4 game, they are litteraly unplayable! Not even Vulkan gives you multithreaded parallel drawcalls.

My solution is to scale the experience to fit the hardware. So on the pi 4 you wont see all >1000 players around you.

Time solves all problems, don't over-engineer!
https://github.com/tinspin/rupy - A tiny Java async HTTP application server.

Daniel Gessel
Posts: 117
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Skin Mesh Animation

Tue Jan 07, 2020 3:33 pm

bullen wrote:
Tue Jan 07, 2020 2:05 pm
1) Complexity, not good and specifically when GPUs will automatically become better relative to CPUs over the next 10 years. See Jetson Nano.

2) You would need multithreaded GPU access or your frames will get motion-to-photon latency. Just try playing any PS4 game, they are litteraly unplayable! Not even Vulkan gives you multithreaded parallel drawcalls.
The question of addressing the issue on this particular HW was raised. Falling back to CPU is probably not that hard given Mesa has a complete JIT cpu path based on llvm, but unless I’m gonna implement it (which I’m not) I can only argue so convincingly...

I really have no idea about issue (2) - can you explain what you mean? I do understand latency (from controller to screen), and it may or may not be made worse by doing your vertex processing on the CPU, depending on communication and how it alleviates work from the GPU. Generally, GPUs render draw commands serially from a single queue of command buffers, though I’m seeing a shift there: asynchronous compute has been around for a while and multiple asynchronous graphics queues are coming on line, but not for a single rendering context (at least in existing API’s, as far as I know). API’s now support parallel GPU generation of command buffers, which can reduce some sources of latency, but so far I’m unconvinced that’s worth the complexity. If the CPU is keeping the GPU busy, you’re at peak GPU throughput. If the work per frame is too much, you get low frame rates, If the work is too variable, you get variable latency. I know a little about the PS4 gpu architecture but nothing about what games are doing with it - I’m only a casual gamer these days. Your demands seem to outstrip not just the Pi, but all of today’s systems.

I did notice that the GBM implementation on Pi 4 can quadruple buffer, which seems potentially really laggy especially on a 30hz UHD display... and it seems like this will happen automagically if the render load is low (not just variable, where multi buffering is really beneficial). I would like (A) explicit control on the number of buffers used (including single buffering!) and the ability to identify the current back buffer so partial updates can be effectively implemented.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 24989
Joined: Sat Jul 30, 2011 7:41 pm

Re: Skin Mesh Animation

Tue Jan 07, 2020 4:14 pm

Think is probably worth pointing out at this juncture that you cannot expect a device costing $35 to compete at a GPU level with devices where just the GPU can cost multiples of the Pi's entire system cost. Faster GPU's take a lot of silicon, and that has a direct influence on price. The Pi uses a mobile level GPU, and a relatively old one as well, and your expectations should reflect that.

What's important is that it is good enough for the majority of use cases we expect to see. And right now, it just about is. There will always be people wanting better, but getting to the price point is vital.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I own the world’s worst thesaurus. Not only is it awful, it’s awful."

Daniel Gessel
Posts: 117
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Skin Mesh Animation

Tue Jan 07, 2020 5:24 pm

jamesh wrote:
Tue Jan 07, 2020 4:14 pm
Think is probably worth pointing out at this juncture that you cannot expect a device costing $35 to compete at a GPU level with devices where just the GPU can cost multiples of the Pi's entire system cost.
What? But isn’t the Pi magic, whereas all the other platforms are just science and technology?
jamesh wrote:
Tue Jan 07, 2020 4:14 pm
What's important is that it is good enough for the majority of use cases we expect to see. And right now, it just about is. There will always be people wanting better, but getting to the price point is vital.
What I’d like is a little bit of API to identify the current back buffer and indicate if it’s contents have been preserved since the last draw. I almost figured out how to do this in Mesa but I’m not quite there...

Having to redraw the whole screen every frame because you don’t know if the system just allocated a new back buffer this frame or not seems like it misses an optimization opportunity on a device which reaches for the sky (dual UHD monitors!) but is on a budget. My plan for my little OpenSCAD inspired project is to have a built in text editor that will be able to run without X using kms and gbm and I’d like it to be responsive. I can see this being useful for other applications...

bullen
Posts: 315
Joined: Sun Apr 28, 2013 2:52 pm

Re: Skin Mesh Animation

Tue Jan 07, 2020 7:00 pm

Aha, interesting mesa could have some switch to automatically offload stuff to the CPU... maybe in a separate thread even!

But again complexity hurts you even if it is abstracted.

Here is a talk that shows what happens when you try to multithread graphics:

https://www.gdcvault.com/play/1022186/P ... Dog-Engine

Short version is, because memory is slow you have to push frames back to be able to use more cores to render the same frame.

The result is that on all modern games that do not (1. have super fast cores = PC 2. where developers have prioritized CPU for the playable character only! = Nobody yet, but soon everyone) have multi frame lag, so you press a button and the screen starts drawing that button press MANY frames later, I played The Last Guardian yesterday on PS 4 slim and I just stopped playing, it's UNPLAYABLE. How that technical turd got through to be released is beyond me! Even Shadow of the Colossus on PS2 was better and that was horrible.

Again excuse my french, I'm half french and I don't have time to sugarcoat things for you sensitive britts.

@jamesh Don't worry, as long as the GPU is not forcing me to code stupidly, I will push that thing to the limit. The problem is I have so many RPi 2 that are now basically dead-weights for graphics but that's ok, they serve well as servers.

Edit: I have now filmed and measured the lag in TLG on PS4 and it's almost 1 second! That's 60 frames of lagg, how is that even possible!
Last edited by bullen on Wed Jan 08, 2020 6:19 pm, edited 5 times in total.
https://github.com/tinspin/rupy - A tiny Java async HTTP application server.

Daniel Gessel
Posts: 117
Joined: Sun Dec 03, 2017 1:47 am
Location: Boston area, MA, US
Contact: Website Twitter

Re: Skin Mesh Animation

Tue Jan 07, 2020 9:20 pm

Thanks - looks like an interesting video and I’m sure it will enlighten me when I take the time to watch it later on.

My first response is that big GPUs are massively threaded and it’s super key to performance, so graphics and threading go together pretty well... but I do understand you’re talking about multithreading on the cpu and I haven’t worked on games in years, and I am no expert. Larrabee failed, so there you go. ;)

I get that you have very high standards for your project(s) and the VC6 will work better for you - enough so that supporting the Pi3 (and older) isn’t of interest. Makes perfect sense given that a Pi4 is only $35 - what is that, two months subscription to an mmo? And you’re definitely not gonna target a pi-zero! So, for you, VC4 is out.

My gut says there is a sweet spot for some (other) use where having the ability to shift vertex workload from gpu to cpu could be interesting, even doing it naively/in the app. My understanding is even on the Pi4, the VC6 has a theoretical peak of 32 Gflops (perhaps double that for f16 precision - decent for colors, not so much for vertexes or texture coords); the combined cpu cores can peak at, I think, 4 flops/cycle/core, so 24 Gflops. Somewhere in that balance is probably a pretty cool demo or use case, latency and memory performance challenges aside....

Return to “OpenGLES”