W. H. Heydt
Posts: 13609
Joined: Fri Mar 09, 2012 7:36 pm
Location: Vallejo, CA (US)

Re: RAM speed

Thu Oct 08, 2020 3:34 pm

bullen wrote:
Thu Oct 08, 2020 2:28 pm
Sorry to wake this one up, but latency of memory is going up with DDR5 (apparently DDR3 was faster than DDR4 too) so I found this page:

https://en.wikipedia.org/wiki/CAS_latency
Even while CAS latency increases, that basic clock speed also increases, so it's pretty much a wash. Takes longer to get a response from the DRAM module, but once you do, you get the data back faster.
I think my initial hunch about SRAM was right?
Nope. Still wrong for the reasons given. SRAM is far to expensive to use as main memory.
We need faster RAM, if not the raspberry 4 is peak CPU at 1 Gflops/watt, for ever, in the universe!?
There is another way to increase memory bandwidth. Fetch more data at one time. This is what add-in GPU cards (nVidia, Radeon) do. Anything up to 512 bits per fetch.

It is, coincidentally, what IBM did with mainframes in the 1960s. A 360/30 would fetch 1 byte in 1.5microseconds (us). A 360/40 had slower memory, 2us cycle time, but it would fetch 2 bytes at a time. The 360/50 had that same 2us cycle, but it fetched 4 bytes. IIRC, the top systems fetched 8 bytes at a time, still with a 2us clock speed.
Is the memory in the raspberry 4 3200MHz or 1600MHz? Or is it 1600, but at double rate?
IIRC, DDC4 (and possible DDR2 and DDR3) transmit data 4 times during a single clock cycle. The original DDR did it twice: Once on the rising edge of the clock and once on the falling edge.
Is there a way to measure CAS latency?
There certainly is. The manufacturers have to test their DRAM chips to make sure they are within specification, so they'd have to be able to test that. Doesn't mean there is a practical method to do so by an end user when it's installed in a PC. All you can do there is try tweaking the CAS (and other parameters) to see what you can get away with. However, the Pi does not have a means to do that.
What nm process is the RAM?
Read the manufacturer and part number off the package on a Pi4B and then look for the specification sheet for that part. That *might* tell you what the manufacturing process node is.

bullen
Posts: 398
Joined: Sun Apr 28, 2013 2:52 pm

Re: RAM speed

Thu Oct 08, 2020 3:40 pm

I think most people don't understand the difference between latency and bandwidth.

The bottleneck has always been latency, not bandwidth.

Bandwidth is only interesting if you are consuming, I write games that produce something, that's why latency is important.
https://github.com/tinspin/rupy - A tiny Java async HTTP application server.

W. H. Heydt
Posts: 13609
Joined: Fri Mar 09, 2012 7:36 pm
Location: Vallejo, CA (US)

Re: RAM speed

Thu Oct 08, 2020 3:44 pm

bullen wrote:
Thu Oct 08, 2020 3:40 pm
I think most people don't understand the difference between latency and bandwidth.

The bottleneck has always been latency, not bandwidth.

Bandwidth is only interesting if you are consuming, I write games that produce something, that's why latency is important.
I question the claim that your games "produce" anything. The latency issues for games are mostly from mass storage and network, both of which are either largely or completely out of the hands of the game developer. That said, to minimize DRAM latency issues (if that's provably a problem), you need to code for locality of reference and organize your data to be "cache friendly". These issues have already been discussed at length.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27417
Joined: Sat Jul 30, 2011 7:41 pm

Re: RAM speed

Thu Oct 08, 2020 4:00 pm

bullen wrote:
Thu Oct 08, 2020 3:40 pm
I think most people don't understand the difference between latency and bandwidth.

The bottleneck has always been latency, not bandwidth.

Bandwidth is only interesting if you are consuming, I write games that produce something, that's why latency is important.
AIUI, latency is fixed/improved by having caching. Overall bandwidth is improved by cache size.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

Heater
Posts: 16844
Joined: Tue Jul 17, 2012 3:02 pm

Re: RAM speed

Thu Oct 08, 2020 4:53 pm

bullen wrote:
Thu Oct 08, 2020 3:40 pm
I think most people don't understand the difference between latency and bandwidth.
That may well be true. However myself and everyone else discussing here does.
bullen wrote:
Thu Oct 08, 2020 3:40 pm
The bottleneck has always been latency, not bandwidth.

Bandwidth is only interesting if you are consuming, I write games that produce something, that's why latency is important.
I very much doubt that is true.

There is not much that can be done about latency. If could there could be it would have been done by now. We get around that by hoovering up big chunks of memory into cache whilst the CPU is munching on something else. Latency to data in cache is orders of magnitude lower than going to any external RAM. Pretty much all the latency of modern systems is masked by their multi-level caches.

If one were to build what you seem to be suggesting. A system with super fast SRAM and the lowest possible random byte by byte access from CPU to RAM, then that system would be subject to the access latency on every access. It would be an order of magnitude slower than what we have now!

The proviso of course is that ones code has to be written in such a ways as to maximize the effectiveness of those caches. Totally random access to data spattered all over the address space, as is typical in object oriented systems, is a performance killer.

Sounds like your games need a little reorganizing for cache friendliness.

Edit: I will grant you that there are times when minimal latency to RAM, and especially deterministic cycle timing with no caches in the way, can be of benefit. Typically when building real-time embedded systems with stringent timing requirements down to micro-seconds. I very much doubt your games require that. After all they only have to be deterministic in time scales perceptible to humans and meet the 60fps update rate of the video output.
Memory in C++ is a leaky abstraction .

bullen
Posts: 398
Joined: Sun Apr 28, 2013 2:52 pm

Re: RAM speed

Thu Oct 08, 2020 5:49 pm

Games "produce" sound and images with "physics from input" and resources (these are consumed only)...

That "physics from input" requires very low latency even if you manage to utilize cache:

On my C+ (C/C++) 3D MMO engine I have 2% cache-misses that need to be as fast as possible!

I would guesstimate they are responsible for alot of poorly optimized GPU cache-misses too, that I have no way to profile afaik?

Preferably the gameplay/animation/physics code should not run on anything but L1, but there is no way for me to do that on linux/ARM! (or Windows/X86 for that matter)

Nintendo did the right thing on Wii, but now they are back to bad structures with Nvidia...

My last hope is that RISC-V will add manual explicit cache control, hopefully to the GPU too!
https://github.com/tinspin/rupy - A tiny Java async HTTP application server.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27417
Joined: Sat Jul 30, 2011 7:41 pm

Re: RAM speed

Thu Oct 08, 2020 6:08 pm

bullen wrote:
Thu Oct 08, 2020 5:49 pm
Games "produce" sound and images with physics from input.

That requires very low latency even if you manage to utilize cache:
Sound doesn't need low latency! It's mindboggling slow compared with RAM speeds. 128Khz! Nowhere near the gigahertz rating of RAM. Sounds more like iffy code than an inherent problem with the RAM speed.

We have some fairly time critical stuff to do when capturing camera data and/or running the H264 encoder at full tilt on the Nokia 808 (VC4), but the ONLY problem we had with memory was when the SD RAM calibration kicked in which turned the RAM off for 90ns. This lead to gaps in the data stream which exhibited as short horizontal black lines where the camera data could not get to the RAM in time. This is with DMA stuffing data from the camera in to RAM at the full speed of a 4 lane CSI bus (4Gbits/s). We simply turned off RAM calibration whilst capturing which fixed the issue.

And that is on a VC4 on a 10 year old smart phone. It's faster now. Not sure how we calibrate nowadays.

We can run out of RAM speed of course - can happen with the camera running, encoders and decoders running at the same time with multiple HDMI displays, which are all hammering the DMA to move data around at very high speeds. But you really REALLY need to be trying hard to not meet the display FPS requirements.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

bullen
Posts: 398
Joined: Sun Apr 28, 2013 2:52 pm

Re: RAM speed

Fri Oct 09, 2020 12:16 am

I think you replied to an old post... I usually make the mistake to post first and then edit my post for a while...

But I have a few questions:

1) I found this: https://github.com/nezticle/RaspberryPi ... Core-Tools is that all the tools we have to understand the GPU?

2) I heard there are ways to tell certain processors that you want to allocate things in L1 cache manually, is that possible on the raspberry 4? If so how is that possible? Just names of what I should google is enough.
https://github.com/tinspin/rupy - A tiny Java async HTTP application server.

cleverca22
Posts: 1893
Joined: Sat Aug 18, 2012 2:33 pm

Re: RAM speed

Fri Oct 09, 2020 12:57 am

bullen wrote:
Fri Oct 09, 2020 12:16 am
I think you replied to an old post... I usually make the mistake to post first and then edit my post for a while...

But I have a few questions:

1) I found this: https://github.com/nezticle/RaspberryPi ... Core-Tools is that all the tools we have to understand the GPU?

2) I heard there are ways to tell certain processors that you want to allocate things in L1 cache manually, is that possible on the raspberry 4? If so how is that possible? Just names of what I should google is enough.
theres also:
https://github.com/hermanhermitage/vide ... ers-Manual
https://github.com/librerpi/rpi-open-fi ... ipeline.md
https://docs.broadcom.com/doc/12358545

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 9891
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: RAM speed

Fri Oct 09, 2020 10:58 am

bullen wrote:
Fri Oct 09, 2020 12:16 am
I think you replied to an old post... I usually make the mistake to post first and then edit my post for a while...

But I have a few questions:

1) I found this: https://github.com/nezticle/RaspberryPi ... Core-Tools is that all the tools we have to understand the GPU?

2) I heard there are ways to tell certain processors that you want to allocate things in L1 cache manually, is that possible on the raspberry 4? If so how is that possible? Just names of what I should google is enough.
"L1 cache" for which hardware block? VPU, QPU, TFU, HVS, or ARM?

ARM caching is under the control of the Linux kernel - mess to your heart's content if you understand the ARM MMUs well enough.

The VPU has an L1 cache, but very little is running on the VPU these days (camera and codecs).
QPUs, TFU, and HVS can generally access data through the VideoCore L2 cache, but this is normally a bad plan for large image data as it results in evicting more beneficial data. QPUs generally have enough registers to store intermediates in them rather than SDRAM.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

pica200
Posts: 219
Joined: Tue Aug 06, 2019 10:27 am

Re: RAM speed

Fri Oct 09, 2020 11:45 am

What you are likely looking for are the cache lockdown registers. Search in the Cortex-A72 technical reference manual and you will find it. ARM based CPUs had it for many years.

I would not recommend you use the cache lockdown regs because they are only privileged mode accessible and everything that is locked in cache is not available for other data lowering performance. And not sure how well cache lockdown plays with a multi process system with different address space per process. Because this requires higher privileges you will need a kernel module which is a security nightmare. A game should not be running privileged code in a kernel module.

ganzgustav22
Posts: 115
Joined: Tue Feb 11, 2020 1:04 pm

Re: RAM speed

Fri Oct 09, 2020 12:20 pm

Just make the whole game a kernel module, should be much faster :D

bullen
Posts: 398
Joined: Sun Apr 28, 2013 2:52 pm

Re: RAM speed

Fri Oct 09, 2020 5:21 pm

Would the C code to access this look something like this:

Code: Select all

u32 *ptr;
asm volatile ("pld [%0]" : : "r" (ptr));

And does this not work from user space?
https://github.com/tinspin/rupy - A tiny Java async HTTP application server.

Heater
Posts: 16844
Joined: Tue Jul 17, 2012 3:02 pm

Re: RAM speed

Fri Oct 09, 2020 5:32 pm

bullen wrote:
Fri Oct 09, 2020 5:21 pm
Would the C code to access this look something like this:

Code: Select all

u32 *ptr;
asm volatile ("pld [%0]" : : "r" (ptr));

And does this not work from user space?
I would very much hope that any possible instructions that mess with the cache are not available to user space programs.

As an experiment there is no problem, just run your game with root privs and lock the caches or whatever as much as you like.

Assuming that does not crash the entire OS I look forward to your reports as to how direct memory access speeds things up.
Memory in C++ is a leaky abstraction .

bullen
Posts: 398
Joined: Sun Apr 28, 2013 2:52 pm

Re: RAM speed

Fri Oct 09, 2020 7:17 pm

Is there a way to profile the GPU cache-misses, shader inefficiencies (like using to many registers) and such more advanced stuff?

viewtopic.php?f=63&t=287895
https://github.com/tinspin/rupy - A tiny Java async HTTP application server.

Return to “General discussion”