Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 12:09 pm

Raspberry PI 4 utilizes LPDDR4 memory at 3200MHz with 32bit width bus(It is a 3733MHz LPDDR4 memory running at 3200MHz https://www.micron.com/products/dram/lp ... dt-053-aat) and the theoratical bandwidth of the memory is about 12.8G/s. But I went through a lot of tests and found that when actually tested the memory bandwidth is only about 4G/s:
By RPi foundation https://magpi.raspberrypi.org/articles/ ... benchmarks,
Image.
By Tom's hardware https://www.tomshardware.com/news/raspb ... 8gb-tested,
Image
It is quite strange that the throughout of the memory of is only 30% of the theoratical value. I wonder if there is any structural design that limits the performance of the memory such as the memory controller, the bus or the cache.
While digging into this I found some clue.
I found somebody ran AIDA64 bandwidth benchmark on RPi4 with "windows on arm" system(https://www.bilibili.com/video/BV1Kh411Z7to) and the results are quite interesting, the AIDA64 results are as follows
Image
Though the x86 program recompiled to ARM may affect the accuacy, the memory bandwidth is totally identical to the results from Rpi foundation and tomshardware and I assume these results are reliable.
The interesting point is that the L2 cache didin't perform much better than the memory on bandwidth which is not observed in other A72 systems https://www.anandtech.com/show/11088/hi ... nd-power/3 and I think the shared inclusive L2 cache may throttle the bandwidth of the memory.
OC L2 cache may help achieve better bandwidth if possible. There could also be some other limitations but neither RPi foundation nor Broadcomm disclosed more information about the detailed design or the bus topology inside the chip so it is hard to dig these things.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

pica200
Posts: 219
Joined: Tue Aug 06, 2019 10:27 am

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 6:48 pm

The L2 cache as is was probably choosen because of cost. Don't forget this is a $35 computer. Yes, it would probably benefit from a little bigger and faster cache but it would also increase die space quite a bit.

As for the DRAM not sure. Is the number you stated explicitly for a 32 bit bus? The practically reachable speed also depends on many more factors.

User avatar
dickon
Posts: 1799
Joined: Sun Dec 09, 2012 3:54 pm
Location: Home, just outside Reading

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 7:23 pm

The main reason for the L2 cache -- or L3 on some architectures -- is not to be faster than RAM, but to reduce latency and hence pipeline stalls.

cleverca22
Posts: 1838
Joined: Sat Aug 18, 2012 2:33 pm

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 7:27 pm

just a brief glance at the numbers, if the dram can move 32 bits on every clock at 3.2ghz, then you would need the arm to run at 3.2ghz, and be doing a 32bit read on every clock
or 1.6ghz with a 64bit read on every clock

it feels like the arm freq is going to be the main bottleneck, and you would need a different peripheral that can stress the ram harder

my first guess, is that maybe they need such fast ram, to support 4k hdmi, and the arm can only use a small fraction of it, due to the arm clocks

ejolson
Posts: 5971
Joined: Tue Mar 18, 2014 11:47 am

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 7:43 pm

Gnyueh wrote:
Mon Jul 27, 2020 12:09 pm
Raspberry PI 4 utilizes LPDDR4 memory at 3200MHz with 32bit width bus(It is a 3733MHz LPDDR4 memory running at 3200MHz https://www.micron.com/products/dram/lp ... dt-053-aat) and the theoratical bandwidth of the memory is about 12.8G/s. But I went through a lot of tests and found that when actually tested the memory bandwidth is only about 4G/s
An independent collection of memory bandwidth measurements for the family of Raspberry Pi computers along with some other single board computers may be found in the thread

viewtopic.php?t=271121

The result for the 4B reported there were about 5 GB/sec depending on the exact nature of the kernel being performed. While slower than ARM Cortex designs that employ two memory chips, the speeds were comparable and consistent with expectations.

Have you computed the difference between theoretical and measured bandwidth for any x86 computers?

If anyone is listening, it would also be nice to see how the scale-up versions of IBM Power9 and System Z, which have presumably been optimised for memory bandwidth, compare with other systems.

Heater
Posts: 16827
Joined: Tue Jul 17, 2012 3:02 pm

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 7:56 pm

dickon wrote:
Mon Jul 27, 2020 7:23 pm
The main reason for the L2 cache -- or L3 on some architectures -- is not to be faster than RAM, but to reduce latency and hence pipeline stalls.
In other words, the reason for the cache is exactly to appear to be faster RAM so as to reduce latency and hence pipeline stalls.

Of course a key word there is "appear". Cache only works for you if your processing can keep it's working set of data in cache as much as possible.
Memory in C++ is a leaky abstraction .

User avatar
dickon
Posts: 1799
Joined: Sun Dec 09, 2012 3:54 pm
Location: Home, just outside Reading

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 8:11 pm

Yes, but raw bandwidth may not be the imperative. 'Fast' is a bit ambiguous: low-latency or bulk transfer. Quite different things, but English doesn't disambiguate with that word.

cleverca22
Posts: 1838
Joined: Sat Aug 18, 2012 2:33 pm

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 8:17 pm

for the VC4 line of pi's you can also measure how idle the dram controller is, and then see if the arm is even putting a load on it
cleverca22 wrote:
Sat Jul 18, 2020 2:20 pm
i found the dram usage metrics on vc4: https://github.com/librerpi/rpi-open-fi ... cc#L72-L78

Code: Select all

void report_sdram_usage() {
  uint32_t idle = SD_IDL;
  uint32_t total = SD_CYC;
  SD_IDL = 0;
  float idle_percent = ((float)idle) / ((float)total);
  printf("sdram usage: %ld %ld, %f\t", idle, total, idle_percent);
}
IDL increases by 1 for every clock cycle the dram spent while idle, CYC increases on every clock cycle, writing 0 to IDL clears both
they are both 28 bit counters, and dont overflow when hitting 2^28, so you can easily detect when an overflow happens
if the dram is running at 400mhz, that means an overflow in just 0.67 seconds, so you need to poll it at least twice a second to get reliable data

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27394
Joined: Sat Jul 30, 2011 7:41 pm

Re: Actual memory bandwidth of raspberry pi4?

Mon Jul 27, 2020 8:20 pm

cleverca22 wrote:
Mon Jul 27, 2020 7:27 pm
just a brief glance at the numbers, if the dram can move 32 bits on every clock at 3.2ghz, then you would need the arm to run at 3.2ghz, and be doing a 32bit read on every clock
or 1.6ghz with a 64bit read on every clock

it feels like the arm freq is going to be the main bottleneck, and you would need a different peripheral that can stress the ram harder

my first guess, is that maybe they need such fast ram, to support 4k hdmi, and the arm can only use a small fraction of it, due to the arm clocks
It's a very good point that if the HDMI is running that is actually using up some of the bandwidth. Easy to work out given resolution and frame rate, but you could pick a very low resolution mode and see if the figures improve just as a quick smoke test.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 4:10 am

jamesh wrote:
Mon Jul 27, 2020 8:20 pm
cleverca22 wrote:
Mon Jul 27, 2020 7:27 pm
just a brief glance at the numbers, if the dram can move 32 bits on every clock at 3.2ghz, then you would need the arm to run at 3.2ghz, and be doing a 32bit read on every clock
or 1.6ghz with a 64bit read on every clock

it feels like the arm freq is going to be the main bottleneck, and you would need a different peripheral that can stress the ram harder

my first guess, is that maybe they need such fast ram, to support 4k hdmi, and the arm can only use a small fraction of it, due to the arm clocks
It's a very good point that if the HDMI is running that is actually using up some of the bandwidth. Easy to work out given resolution and frame rate, but you could pick a very low resolution mode and see if the figures improve just as a quick smoke test.
I have also checked some other posts in the forum one interesting one is this: viewtopic.php?f=63&t=271121&hilit=l2+cache&start=25, which shows RPi memory performance shrinks with the increasing of thread,
Image
this is also observed in ryzen x86 chips but it is much slight compared to thr situation of RPi 4.
http://fractal.math.unr.edu/~ejolson/pi ... 7p1700.svg
I think VC6 is to blame. VC6 GPU is the de facto manager and dominator of the BCM2711 including MMU and booting and there is a 32bit threadx RTOS inside it
Image
(An unofficial diagram of BCM2711 https://www.heise.de/ct/artikel/Raspber ... 14399.html) .
VC6 is definitely uncompatitable for passing large chrunk of data from memory controller to ARM CPUs and when thread increases the VC6 gets overloaded with increased latency while the throughout decrease. The fabric namely AXI is 128bit width @corefreq(500MHz) with 8G/s bandwidth is not fully loaded for 5G/s of memory bandwidth and OC the fabric seems unuseful according to the this.

Image
For HDMI bandwidth, HDMI takes at most 1.5G/s bandwidth @4k60p but a lot of tests with similar results are done with SSH environment where HDMI is not enabled. And when 4k60p is enabled the fabric will OC itself to 550MHz with bandwidth of 8.8G/s which compensate the increase of HDMI bandwidth needed otherwise HDMI takes at most 0.75G/s at 4k30p.

For now since VC6 is much a black box to us and Broadcomm didn't disclose more information about VC6 so what we can do is pray to broadcomm for an update of the VC6 firmware(namely the threedx RTOS) to improve the memory efficiency which will help relief this issue. 4g/s is basically a usable bandwidth for most everyday applications that are not hunger for bandwidth, but it is still far from satisfaction of some scientifical computing needs.
Last edited by Gnyueh on Tue Jul 28, 2020 6:12 am, edited 1 time in total.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

cleverca22
Posts: 1838
Joined: Sat Aug 18, 2012 2:33 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 4:15 am

the pi also has an axi performance counter peripheral: viewtopic.php?f=29&t=274223
and if you use that, you can probably see which axi ports are using up bandwidth, and how much, and if the arm is coming close to maxing the link out

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 4:24 am

pica200 wrote:
Mon Jul 27, 2020 6:48 pm
The L2 cache as is was probably choosen because of cost. Don't forget this is a $35 computer. Yes, it would probably benefit from a little bigger and faster cache but it would also increase die space quite a bit.

As for the DRAM not sure. Is the number you stated explicitly for a 32 bit bus? The practically reachable speed also depends on many more factors.
Image
There is 2 parts of L2 inside BCM2711 one is inside ARM cluster and the other is inside the VC6 which also appears in earlier RPi products with larger latency than memory for ARM CPUs.
I assumed aida64 recognizes them as a whole while the first one is high bandwidth and low latency, the latter is no faster than memory possibly because of the VC6 and AXI fabric with large latency.
And the results illustrate the latency of L2 inside ARM A72 cluster and bandwidth of L2 inside VC6.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 4:49 am

dickon wrote:
Mon Jul 27, 2020 7:23 pm
The main reason for the L2 cache -- or L3 on some architectures -- is not to be faster than RAM, but to reduce latency and hence pipeline stalls.
For most case, an aida64 benchmark should look like this(on intel i7 1065G7 with 128bit 3733MHz LPDDR4 memory)
Image
Image
From L1 to memory, the bandwidth decreases, the latency increases, and the capacity increases, which is helpful to increase cache hit while for a part of L2 inside BCM2711 it is inside VC6 shared with the GPU so the CPU cache inside will experience frequent data evictation by GPU data so I don't think it will be helpful for CPU. That is why the L2 for ARM CPU is disabled for previous gens of Rpis.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 4:54 am

cleverca22 wrote:
Mon Jul 27, 2020 7:27 pm
just a brief glance at the numbers, if the dram can move 32 bits on every clock at 3.2ghz, then you would need the arm to run at 3.2ghz, and be doing a 32bit read on every clock
or 1.6ghz with a 64bit read on every clock

it feels like the arm freq is going to be the main bottleneck, and you would need a different peripheral that can stress the ram harder

my first guess, is that maybe they need such fast ram, to support 4k hdmi, and the arm can only use a small fraction of it, due to the arm clocks
For CPU the front bus is much wider, for A72 it should be about 128bits so achieving that bandwidth is not difficult at lower frequency. For HDMI, it only takes 1.5G/s memory speed which is 1/10 of the total theoratical bandwidth.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

cleverca22
Posts: 1838
Joined: Sat Aug 18, 2012 2:33 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 4:58 am

Gnyueh wrote:
Tue Jul 28, 2020 4:54 am
cleverca22 wrote:
Mon Jul 27, 2020 7:27 pm
just a brief glance at the numbers, if the dram can move 32 bits on every clock at 3.2ghz, then you would need the arm to run at 3.2ghz, and be doing a 32bit read on every clock
or 1.6ghz with a 64bit read on every clock

it feels like the arm freq is going to be the main bottleneck, and you would need a different peripheral that can stress the ram harder

my first guess, is that maybe they need such fast ram, to support 4k hdmi, and the arm can only use a small fraction of it, due to the arm clocks
For CPU the front bus is much wider, for A72 it should be about 128bits so achieving that bandwidth is not difficult at lower frequency. For HDMI, it only takes 1.5G/s memory speed which is 1/10 of the total theoratical bandwidth.
yeah, with a 128bit AXI bus and arm port, you only need 800mhz to fully saturate the dram
but youve also quoted that the AXI is running at 500mhz, so that will be part of the bottleneck

and what opcode in the arm lets you initiate a 128 bit wide read on each clock?, how is the ram tester stressing it fully?

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 5:02 am

ejolson wrote:
Mon Jul 27, 2020 7:43 pm
Gnyueh wrote:
Mon Jul 27, 2020 12:09 pm
Raspberry PI 4 utilizes LPDDR4 memory at 3200MHz with 32bit width bus(It is a 3733MHz LPDDR4 memory running at 3200MHz https://www.micron.com/products/dram/lp ... dt-053-aat) and the theoratical bandwidth of the memory is about 12.8G/s. But I went through a lot of tests and found that when actually tested the memory bandwidth is only about 4G/s
An independent collection of memory bandwidth measurements for the family of Raspberry Pi computers along with some other single board computers may be found in the thread

viewtopic.php?t=271121

The result for the 4B reported there were about 5 GB/sec depending on the exact nature of the kernel being performed. While slower than ARM Cortex designs that employ two memory chips, the speeds were comparable and consistent with expectations.

Have you computed the difference between theoretical and measured bandwidth for any x86 computers?

If anyone is listening, it would also be nice to see how the scale-up versions of IBM Power9 and System Z, which have presumably been optimised for memory bandwidth, compare with other systems.
An aida64 memory benchmark for my laptop with 3733MHz 128bit LPDDR4 memory, the theoratical bandwidth is 3.733G x 128bit / 8 =59.7G/s
Image
80% of the bandwidth is achieved while for RPi4 is below 40%.
Last edited by Gnyueh on Tue Jul 28, 2020 5:10 am, edited 1 time in total.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 5:10 am

cleverca22 wrote:
Tue Jul 28, 2020 4:58 am
Gnyueh wrote:
Tue Jul 28, 2020 4:54 am
cleverca22 wrote:
Mon Jul 27, 2020 7:27 pm
just a brief glance at the numbers, if the dram can move 32 bits on every clock at 3.2ghz, then you would need the arm to run at 3.2ghz, and be doing a 32bit read on every clock
or 1.6ghz with a 64bit read on every clock

it feels like the arm freq is going to be the main bottleneck, and you would need a different peripheral that can stress the ram harder

my first guess, is that maybe they need such fast ram, to support 4k hdmi, and the arm can only use a small fraction of it, due to the arm clocks
For CPU the front bus is much wider, for A72 it should be about 128bits so achieving that bandwidth is not difficult at lower frequency. For HDMI, it only takes 1.5G/s memory speed which is 1/10 of the total theoratical bandwidth.
yeah, with a 128bit AXI bus and arm port, you only need 800mhz to fully saturate the dram
but youve also quoted that the AXI is running at 500mhz, so that will be part of the bottleneck

and what opcode in the arm lets you initiate a 128 bit wide read on each clock?, how is the ram tester stressing it fully?
If AXI@8G/s is the bottleneck the memory perfomance should at least be something like 6~7.5G/s while the actual results is only 5G/s.
Also for AXI bottleneck the bandwidth is unlikely to decrease as the thread increses, for most cases the bandwidth will increase just like other single board computers.
Image
There are a lot of NEON SIMD instructions in ARM ISA that loads 128bit data per cycle, and that is how the FP bandwidth is tested.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 5:12 am

cleverca22 wrote:
Tue Jul 28, 2020 4:15 am
the pi also has an axi performance counter peripheral: viewtopic.php?f=29&t=274223
and if you use that, you can probably see which axi ports are using up bandwidth, and how much, and if the arm is coming close to maxing the link out
Thanks a lot, I will check this.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 5:20 am

ejolson wrote:
Mon Jul 27, 2020 7:43 pm
Gnyueh wrote:
Mon Jul 27, 2020 12:09 pm
Raspberry PI 4 utilizes LPDDR4 memory at 3200MHz with 32bit width bus(It is a 3733MHz LPDDR4 memory running at 3200MHz https://www.micron.com/products/dram/lp ... dt-053-aat) and the theoratical bandwidth of the memory is about 12.8G/s. But I went through a lot of tests and found that when actually tested the memory bandwidth is only about 4G/s
An independent collection of memory bandwidth measurements for the family of Raspberry Pi computers along with some other single board computers may be found in the thread

viewtopic.php?t=271121

The result for the 4B reported there were about 5 GB/sec depending on the exact nature of the kernel being performed. While slower than ARM Cortex designs that employ two memory chips, the speeds were comparable and consistent with expectations.

Have you computed the difference between theoretical and measured bandwidth for any x86 computers?

If anyone is listening, it would also be nice to see how the scale-up versions of IBM Power9 and System Z, which have presumably been optimised for memory bandwidth, compare with other systems.
The jetson utilizes 2 DDR4 memory chips and for each chip it is 16bit wide bus while RPi4 utilizes LPDDR4 chips which contains 32 bit bus and can be seperated into 2 independent 16 bit bus according to the JEDEC standard, just like 2 DDR4 chips. Other mobiles with lpddr4 32bit memory in a single chip achieve the theoratical bandwidth without difficulty. I don't think this account for the bandwidth issue.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27394
Joined: Sat Jul 30, 2011 7:41 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 7:59 am

I don't agree with your theory that the vc6 is causing problems with memory bandwidth, and its certainly not the case that any sort of firmware change could make any appreciable difference.

I suspect its just the way the memory controller works. Bus contention strikes me as a major factor in slowing things down. You have multiple memory controllers all wanting access to the memory bus at the same time for example.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 8:12 am

jamesh wrote:
Tue Jul 28, 2020 7:59 am
I don't agree with your theory that the vc6 is causing problems with memory bandwidth, and its certainly not the case that any sort of firmware change could make any appreciable difference.

I suspect its just the way the memory controller works. Bus contention strikes me as a major factor in slowing things down. You have multiple memory controllers all wanting access to the memory bus at the same time for example.
There is only one MCU in the BCM2711 VC6 core and when multiple request is issued a modern MCU should be able to work parallel and get some more bandwidth just like Ryzen and Jetson. RPi 4 performs far behind them. I dont think that is the way engineers in broadcomm want it to perform. There must be some bottleneck.
Image
Image
Image
The multithread performance utilizes some out of order techniques for MCU. On this I find this quite interesting:
The BCM2711 system uses an AMBA AXI-compatible interface structure. In order to keep the system complexity low and data throughput high, the BCM2711 AXI system does not always return read data in-order .The GPU has special logic to cope with data arriving out-of-order; however the ARM core does not contain such logic. Therefore some precautions must be taken when using the ARM to access peripherals.
The mem sub system of ARM cluster in BCM2711 seems to lack of OoO memory capacity. This will be troublesome for multithread memory performance.
Last edited by Gnyueh on Tue Jul 28, 2020 9:12 am, edited 2 times in total.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 8:45 am

jamesh wrote:
Tue Jul 28, 2020 7:59 am
I don't agree with your theory that the vc6 is causing problems with memory bandwidth, and its certainly not the case that any sort of firmware change could make any appreciable difference.

I suspect its just the way the memory controller works. Bus contention strikes me as a major factor in slowing things down. You have multiple memory controllers all wanting access to the memory bus at the same time for example.
The memory hierarchyfor BCM2835, for BCM2711 it should be similar. And inside this odd hierarchy VC MMU ( circled) is likely to be the bottleneck which can only handle around 5G/s data stream and lacks multithread capacity.

Image
Last edited by Gnyueh on Tue Jul 28, 2020 8:45 am, edited 1 time in total.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27394
Joined: Sat Jul 30, 2011 7:41 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 8:45 am

Casting aspersion on engineering gets you banned from here. You DONT know how this stuff works, so YOU don't know why engineers did it the way they did, so on here you don't have the right to criticise in the way you have here. Only warning. As long as you keep it polite, this discussion is fine. it won't make any difference to the memory bandwidth, because that is built in to the HW, but discussion is fine.


The 2711 memory management is very different to the 2835. The 3D unit has its own memory management unit, the ARM have their own memory management unit, the rest of the vc core's access memory directly. So if you have the desktop running, then the 3D unit is running, the HDMI is running, the ARM MMU's are running and almost certainly other stuff as well. All accessing memory. All contending.

So many things in contention for memory just while sitting 'idle'. I do not know what scheme was choosing for sorting out that bus contention. I do know that the guy who designed some of the MMU for the 3D is a very experienced engineer in this area. It's certain that improvements could be made, and I know that the engineers are aware of the current performance.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

Gnyueh
Posts: 47
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 8:52 am

jamesh wrote:
Tue Jul 28, 2020 8:45 am
Casting aspersion on engineering gets you banned from here. You DONT know how this stuff works, so YOU don't know why engineers did it the way they did, so on here you don't have the right to criticise in the way you have here. Only warning. As long as you keep it polite, this discussion is fine. it won't make any difference to the memory bandwidth, because that is built in to the HW, but discussion is fine.


The 3D unit has its own memory management unit, the ARM have their own memory management unit, the rest of the vc core's access memory directly. So if you have the desktop running, then the 3D unit is running, the HDMI is running, the ARM MMU's are running and almost certainly other stuff as well. All accessing memory. All contending.

So many things in contention for memory just while sitting 'idle'. I do not know what scheme was choosing for sorting out that bus contention. I do know that the guy who designed some of the MMU for the 3D is a very experienced engineer in this area. It's certain that improvements could be made, and I know that the engineers are aware of the current performance.
My apologies I was overwhelmed by frustration just now, I have editted the post. But broadcomm seems reluctant to deal with these issues and provide more information about this. Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine. It would be great if you could provide some more information on this especially for the memory sub system in BCM2711.
Thanks again for reminding me.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

jdb
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 2460
Joined: Thu Jul 11, 2013 2:37 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 9:18 am

Gnyueh wrote:
Tue Jul 28, 2020 8:52 am

My apologies I was overwhelmed by frustration just now, I have editted the post. But broadcomm seems reluctant to deal with these issues and provide more information about this. Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine. It would be great if you could provide some more information on this especially for the memory sub system in BCM2711.
Thanks again for reminding me.
Broadcom don't support end customers. They sell chips to other businesses. As such, don't expect a random Broadcom employee to start answering your questions.

What makes you think the ARM CPUs *should* be able to maximise transfers up to the theoretical bandwidth of the memory bus?
Hint: no CPUs are capable of this.
Rockets are loud.
https://astro-pi.org

Return to “General discussion”