Heater
Posts: 17438
Joined: Tue Jul 17, 2012 3:02 pm

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 3:20 am

ejolson,
The Copy test shows a 4B-like decrease in bandwidth as more cores are added...
What?!

There is nothing 4B-like (or 3B+ like) about those curves. At least not compared to your graphs here: viewtopic.php?f=63&t=271121#p1644489.

For a start, even the most rude and crude straight line fit through the Pi data points would show a big negative slope as the performance drops with cores. Conversely it would have a positive slope for the Ryzen 7.

Then, significantly, the Ryzen shows a useful gain in performance in just moving from 1 core to 2. Where as the Pi 4 immediately suffers a drop in performance of 20% or so.

Just standing way back and squinting at those graphs we would say the Ryzen is flat across the graph but the Pi 4 is clearly falling down toward the right.

My feeling just now is that the Ryzen graphs show what I would expect if even just one core can saturate memory bandwidth. You just can't go up much anymore. Where as the Pi 4 result is a disaster that should not happen.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 5:10 am

Heater wrote:
Tue Apr 21, 2020 3:20 am
ejolson,
Just standing way back and squinting at those graphs we would say the Ryzen is flat across the graph but the Pi 4 is clearly falling down toward the right.
For the Copy kernel the Ryzen starts out with a bandwidth of 28604.6 MB/s with one thread, then with two threads (one on core 0 and the other on core 4) achieves 33868.2 MB/s (that's slightly faster than one thread on core 0 and the other on core 1) and finally with all 16 threads selected drops back to 28735.6 MB/s. That's a drop of 15% from the maximum to when all threads are running.

For comparison Copy on the 4B starts with the maximum of 5525.8 MB/s and ends with 3670.1 MB/s when all cores are selected. This is a 34% decrease. While that's approximately twice the decrease, the main thing that makes the Ryzen graph look flatter is the scale needed to plot Copy, Scale, Add and Triad in the same figure. Here is what the SMT graph for the Ryzen looks like with just Copy plotted.

Image

As the hydroxychloroquine clinical trials were somewhat disappointing, I went looking for another ARM-based single-board computer with a tuberculosis vaccine and settled on the NanoPC T3, which employs the Exynos 8-core Cortex-A53 S5P6818 system on a chip. The results were

Image

Although there is clear plateauing, it's not clear the decrease is enough in order to relax the shelter-at-home order just yet. Otherwise, I would go to the office and test that noisy 6-core AMD Phenom.

I suspect the reason the Pi 4B shows decreasing memory bandwidth when more cores are selected is related to the fact that it has only one memory chip. This engineering decision appears to balance cost, size and performance such that after 8 years the price and size remain unchanged while the computer is 28 times faster.

Seen another way, if the 34 percent worst-case loss of memory bandwidth were eliminated, that would result in a performance gain. However, in my opinion, it's not important whether adding another memory chip is a win from a price performance point of view, because the opportunity cost that results from making the Pi more expensive might then exclude the beginners and makers for whom it was intended.
Last edited by ejolson on Tue Apr 21, 2020 7:02 pm, edited 1 time in total.

Heater
Posts: 17438
Joined: Tue Jul 17, 2012 3:02 pm

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 5:37 am

ejolson wrote:
Tue Apr 21, 2020 5:10 am
For the Copy kernel the Ryzen starts out with a bandwidth of 28604.6 MB/s with one thread, then with two threads (one on core 0 and the other on core 4) achieves 33868.2 MB/s (that's slightly faster than one on core 0 and one on core 1) and finally with all 16 threads selected drops back to 28735.6 MB/s. That's a drop of 15% from the maximum when all threads are running.
Now I'm confused. I can't match any of the numbers you mention there with points on the graph presented.
ejolson wrote:
Tue Apr 21, 2020 5:10 am
For comparison Copy on the 4B starts with the maximum of 5525.8 MB/s and ends with 3670.1 MB/s when all cores are selected. This is a 34% decrease. While that's approximately twice the percentage decrease, the main thing that makes the Ryzen graph look flatter is the scale needed to plot Copy, Scale, Add and Triad in the same figure. Here is what the SMT graph for the Ryzen looks like with just Copy plotted.
The Pi starts at about 5500MB/s for one core and immediately drops to below 4000 for two cores in the copy test. That is getting on for a 30% drop in performance. This is the opposite of what happens with all the non-Pi machines presented here. Something is wrong with the Pi.
ejolson wrote:
Tue Apr 21, 2020 5:10 am
I suspect the reason the Pi 4B shows decreasing memory bandwidth when more cores are selected is related to the fact that it has only one memory chip.
An interesting observation.

I don't buy it though. Mostly because I'm pretty sure the Jetson Nano only has one memory chip. The Nano goes up in performance from 1 to 2 cores.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 5:54 am

Heater wrote:
Tue Apr 21, 2020 5:37 am
Now I'm confused. I can't match any of the numbers you mention there with points on the graph presented.
You're right. Those numbers came from a separate set of runs and seem to be about 0.5 GB/sec faster in general. The system tested is in a non air-conditioned room and the ambient temperature may have changed as the sun set. I'll check whether a 15 percent decrease in copy bandwidth when all threads are selected is consistent with the original data used for the graphs tomorrow.
Last edited by ejolson on Tue Apr 21, 2020 6:54 am, edited 1 time in total.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 6:20 am

Heater wrote:
Tue Apr 21, 2020 5:37 am
I don't buy it though. Mostly because I'm pretty sure the Jetson Nano only has one memory chip. The Nano goes up in performance from 1 to 2 cores.
My understanding is that the Jetson has two memory chips. While I don't have one, here is a picture that I found on the web without the heatsink:
.
jetson_nano.jpg
jetson_nano.jpg (129.71 KiB) Viewed 2082 times
.
The memory chips appear to be Micron part number

MT53D512M32D2DS-046 WT:D

which at great expense are sold in reels of 2000 at Mouser

https://www.mouser.com/ProductDetail/Mi ... drJY8wM%3D

Heater
Posts: 17438
Joined: Tue Jul 17, 2012 3:02 pm

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 7:46 am

Dang, that is the image I could not find when I was looking for it!

I guess 2 memory chips helps spread the load. However I'm still not convinced. Why? Because the Jetson goes up in performance all the way to 4 cores. The Pi goes down pretty much every step of the way.

I still can't fathom it because cache memory should hide the 1 chip/2 chip issue. Also because if all but one thread is starved of capacity performance should flat line, not nose dive.

I naively think of it like letting a funnel drain water down a long thin tube. The tube restricts the rate of flow and hence time taken to empty the funnel. We would not expect that if we connect two funnels to the tube and put half as much water in each that the process would take longer.

It's as if adding cores on the Pi 4 throws a blockage down the pipe!
Memory in C++ is a leaky abstraction .

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 28358
Joined: Sat Jul 30, 2011 7:41 pm

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 12:00 pm

Heater wrote:
Tue Apr 21, 2020 7:46 am
Dang, that is the image I could not find when I was looking for it!

I guess 2 memory chips helps spread the load. However I'm still not convinced. Why? Because the Jetson goes up in performance all the way to 4 cores. The Pi goes down pretty much every step of the way.

I still can't fathom it because cache memory should hide the 1 chip/2 chip issue. Also because if all but one thread is starved of capacity performance should flat line, not nose dive.

I naively think of it like letting a funnel drain water down a long thin tube. The tube restricts the rate of flow and hence time taken to empty the funnel. We would not expect that if we connect two funnels to the tube and put half as much water in each that the process would take longer.

It's as if adding cores on the Pi 4 throws a blockage down the pipe!
We did various memory bandwidth tests prior to releasing the Pi4. We are not at all concerned. It's possible to game these sorts of tests, we've never bothered to do so, no idea if other SBC manufacturers do. These tests are also pathological, in that they do not usually happen in the real world - a bit like cpuburn for cpu's - an absolute worst case scenario.

I suggest reading Dom's comments above which are as much as you need to know.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

Heater
Posts: 17438
Joined: Tue Jul 17, 2012 3:02 pm

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 1:48 pm

jamesh wrote:
Tue Apr 21, 2020 12:00 pm
We did various memory bandwidth tests prior to releasing the Pi4. We are not at all concerned.
I'm sure you did and I'm not about to suggest anyone be greatly concerned.
jamesh wrote:
Tue Apr 21, 2020 12:00 pm
It's possible to game these sorts of tests, we've never bothered to do so, no idea if other SBC manufacturers do.
That is well known. I don't see it here though. All systems tested have shown an increase in performance with cores, except the Pi. That suggests something is different about the Pi.
jamesh wrote:
Tue Apr 21, 2020 12:00 pm
These tests are also pathological, in that they do not usually happen in the real world - a bit like cpuburn for cpu's - an absolute worst case scenario.
That is true.

However it does throw some light on something that has puzzled me for about a year now. What I could not understand is why some codes we have written for the various code challenges that have been going on in these pages did not scale at all well with cores on the Pi whereas they did on the PC. I thought it might be down to the compiler or threading libraries. Now I see it is specific to the Pi.
jamesh wrote:
Tue Apr 21, 2020 12:00 pm
I suggest reading Dom's comments above which are as much as you need to know.
Has been read with interest. Does not clearly resolve the issue.
Memory in C++ is a leaky abstraction .

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5710
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 3:42 pm

Heater wrote:
Tue Apr 21, 2020 1:48 pm
That is well known. I don't see it here though. All systems tested have shown an increase in performance with cores, except the Pi. That suggests something is different about the Pi.
There is only finite sdram bandwidth. On Pi4 it it 8GB/s theoretical (500MHz AXI bus x 128-bits)*, likely limited further by sdram actually doing its stuff.
I think some of your results showed up to 5.5GB/s from a single core.

That is actually showing one core being very successful at saturating the bus.
Adding additional cores thrashing away at it is not going to make that number go up (it will go down as there is more idle cycles from sdram as contention causes additional page opens/closes)

If other platforms are getting improvements when running a multicore test, it suggests that one core is less successful at utilising the sdram, possibly due to bottlenecks in different places.

Would it be preferable if we made the single core numbers go down, so there is more scope for improvements in the multicore test?

(*) Running with hdmi_enable_4kp60=1 has the side effect of running AXI bus at 550MHz so may make your results improve.
If they go up by 10%, then it would suggest this test's bottleneck is purely AXI. If they don't change, then it's likely sdram.
Quite likely it will be a mixture of the two.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 3:51 pm

jamesh wrote:
Tue Apr 21, 2020 12:00 pm
These tests are also pathological, in that they do not usually happen in the real world - a bit like cpuburn for cpu's - an absolute worst case scenario.
While stream is definitely a synthetic RAM benchmark, in my opinion copying memory around is not uncommon as are sections of code that add and rescale vectors of numbers. The stream kernels were originally developed to represent the kinds of vector operations used in a real-world ocean-modelling program and explain why the Cray supercomputers performed so well when performing such calculations. More information is available at

http://www.cs.virginia.edu/~mccalpin/ST ... -01-25.pdf
.
jamesh wrote:
Tue Apr 21, 2020 12:00 pm
I suggest reading Dom's comments above which are as much as you need to know.
Those comments refer to a limit on the number of pages (rows) of RAM that can be open at a time, then say maybe 4 or 8. Would it be possible to get an exact number to use for better understanding the results of the testing?

Heater
Posts: 17438
Joined: Tue Jul 17, 2012 3:02 pm

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 3:58 pm

dom wrote:
Tue Apr 21, 2020 3:42 pm
If other platforms are getting improvements when running a multicore test, it suggests that one core is less successful at utilising the sdram, possibly due to bottlenecks in different places.
Perhaps you misunderstand what I'm saying. Or I did not express it clearly.

I could well understand that performance does not go up with cores because the memory bandwidth is saturated. In fact that as what I would expect to happen. Even just using two cores instead of one. The expectation is that performance reaches a limit dependent on memory bandwidth and that adding cores after that does not help. So far so good.

But, in the PI case performance actually drops substantially just going from one core to two. This is not expected.

Nothing I have read here explains why that may be.

But whatever it is does seem to be the reason I have codes that scale well elsewhere but not on the Pi.
Memory in C++ is a leaky abstraction .

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 28358
Joined: Sat Jul 30, 2011 7:41 pm

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 4:58 pm

Heater wrote:
Tue Apr 21, 2020 3:58 pm
dom wrote:
Tue Apr 21, 2020 3:42 pm
If other platforms are getting improvements when running a multicore test, it suggests that one core is less successful at utilising the sdram, possibly due to bottlenecks in different places.
Perhaps you misunderstand what I'm saying. Or I did not express it clearly.

I could well understand that performance does not go up with cores because the memory bandwidth is saturated. In fact that as what I would expect to happen. Even just using two cores instead of one. The expectation is that performance reaches a limit dependent on memory bandwidth and that adding cores after that does not help. So far so good.

But, in the PI case performance actually drops substantially just going from one core to two. This is not expected.
Yes it is - Dom's explanation says exactly why it is expected.
Heater wrote:
Tue Apr 21, 2020 3:58 pm
Nothing I have read here explains why that may be.

But whatever it is does seem to be the reason I have codes that scale well elsewhere but not on the Pi.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 5:10 pm

ejolson wrote:
Tue Apr 21, 2020 5:54 am
Heater wrote:
Tue Apr 21, 2020 5:37 am
Now I'm confused. I can't match any of the numbers you mention there with points on the graph presented.
You're right. Those numbers came from a separate set of runs and seem to be about 0.5 GB/sec faster in general. The system tested is in a non air-conditioned room and the ambient temperature may have changed as the sun set. I'll check whether a 15 percent decrease in copy bandwidth when all threads are selected is consistent with the original data used for the graphs tomorrow.
Note that the scale on the original plot

Image

is given in GB/sec. The conversion to GB/sec was obtained by dividing the MB/sec bandwidth results by 1024.

Here are the exact numbers from the graphs: Maximum observed copy bandwidth of the Ryzen system was 32.5 GB/sec using two threads (one on core 0 the other on core 1) while copy using all 16 hardware threads resulted in 28.1 GB/sec. The calculation

(32.5-28.1)/32.5*100

shows that bandwidth dropped 13.5 percent from the maximum to when all threads on all cores were selected. While the exact numbers appear to change depending on the time of the day, the observed percentage decrease seems consistent to me. In particular, the difference between 13.5 and 15 percent is plausibly due to the fact that running one thread on core 0 and the other thread on core 4 results in even more bandwidth for a greater estimate of the maximum.

I've been thinking about the Jetson Nano results compared to the Pi 4B. As well as having two memory chips, it is further likely that memory bandwidth on the Jetson Nano was given high priority for the original design in order to support CUDA on the GPU.

Have you tried the suggested HDMI trick on the 4B to see if it makes a difference? I'm about to try that now.
Last edited by ejolson on Tue Apr 21, 2020 11:21 pm, edited 1 time in total.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5710
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 6:19 pm

Heater wrote:
Tue Apr 21, 2020 3:58 pm
But, in the PI case performance actually drops substantially just going from one core to two. This is not expected.

Nothing I have read here explains why that may be.
I've explained this.
Imagine a system where reading a word from an open page (4K block) of sdram takes 1 cycle.
Opening a page takes 100 cycles.
You can only read from an open page.
Lets say you can only have one page open at once (the same results would hold if you could have a few pages open but each core has a few buffers to access).

Imagine one core reading linearly through memory. In general each word read takes 1 cycle, except every 4k it takes an additional 100 cycles.
Now imagine two cores trying this using different addresses. You now could get the 100 cycle overhead every access as the two cores fight over opening pages.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 6:22 pm

dom wrote:
Tue Apr 21, 2020 3:42 pm
Running with hdmi_enable_4kp60=1 has the side effect of running AXI bus at 550MHz so may make your results improve.
If they go up by 10%, then it would suggest this test's bottleneck is purely AXI. If they don't change, then it's likely sdram.
Quite likely it will be a mixture of the two.
I reran stream both with and without hdmi_enable_4kp60=1 from fresh reboots. The best result for each kernel of eight runs is reported in each case.

Image
Last edited by ejolson on Tue Apr 21, 2020 7:54 pm, edited 3 times in total.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Tue Apr 21, 2020 7:15 pm

dom wrote:
Tue Apr 21, 2020 6:19 pm
Heater wrote:
Tue Apr 21, 2020 3:58 pm
But, in the PI case performance actually drops substantially just going from one core to two. This is not expected.

Nothing I have read here explains why that may be.
I've explained this.
Imagine a system where reading a word from an open page (4K block) of sdram takes 1 cycle.
Opening a page takes 100 cycles.
You can only read from an open page.
Lets say you can only have one page open at once (the same results would hold if you could have a few pages open but each core has a few buffers to access).

Imagine one core reading linearly through memory. In general each word read takes 1 cycle, except every 4k it takes an additional 100 cycles.
Now imagine two cores trying this using different addresses. You now could get the 100 cycle overhead every access as the two cores fight over opening pages.
Is it possible some sort of row-hammer mitigation would also make the overhead of switching rows greater? Do you know if mitigations exist in the Linux kernel and whether they are enabled on the Pi 4B? What about in the hardware?

I think the row-hammer problem was first described in

https://users.ece.cmu.edu/~yoonguk/pape ... isca14.pdf

Specifically, if one thread is trying to access memory for row X while another is trying to access row Y, then contention would result in the first thread repeatedly opening row X while the second thread repeatedly closes it in order to access row Y. Any mitigation to prevent a row from being hammered would notice row X was being opened and closed too quickly and further slow things down.

If something similar could be done to mitigate an overactive immune system, we might be on to a cure.

Thinkcat
Posts: 45
Joined: Wed Mar 14, 2018 10:50 pm
Location: Finland

Re: Memory Bandwidth of the Pi 4B

Thu Apr 23, 2020 2:17 pm

I think it's very simply that in those Ryzen systems, the limiting factor is a resource that is shared among the cores, with the sharing causing little or no side effects. In Pi 4B there is some shared resource with which the sharing causes a side effect. Of a limited magnitude, it seems. I'd guess that if there were an 8-core Pi 4B, many results would plateau after the third or the fourth core.

Now the resource is either in the page switching delay on the memory's end. Or it is somehow in the SoC. Do these tests run in kernel or user mode? If it is some kind of translation table that causes it if it's not the memory?

Thinkcat
Posts: 45
Joined: Wed Mar 14, 2018 10:50 pm
Location: Finland

Re: Memory Bandwidth of the Pi 4B

Thu Apr 23, 2020 2:19 pm

ejolson wrote:
Tue Apr 21, 2020 6:22 pm
The best result for each kernel of eight runs is reported in each case.
Would averages look different at all?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5710
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Memory Bandwidth of the Pi 4B

Thu Apr 23, 2020 3:27 pm

ejolson wrote:
Tue Apr 21, 2020 7:15 pm
Is it possible some sort of row-hammer mitigation would also make the overhead of switching rows greater? Do you know if mitigations exist in the Linux kernel and whether they are enabled on the Pi 4B? What about in the hardware?
There is no know row-hammer issue on any Pi devices, and no mitigation.
I don't believe a software mitigation would be possible in the kernel without an unusable performance overhead.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Thu Apr 23, 2020 5:09 pm

Thinkcat wrote:
Thu Apr 23, 2020 2:19 pm
ejolson wrote:
Tue Apr 21, 2020 6:22 pm
The best result for each kernel of eight runs is reported in each case.
Would averages look different at all?
Averages of separate runs do not look much different, because within a single run of the program there are also many tests performed among which only the best is reported. There is, however, the idea of unlucky memory placement, for which it might be interesting to perform a more detailed analysis of the results.

On NUMA systems one could measure how consistently the operating system uses the first-touch hinting present in the stream benchmark to allocate physical memory on the correct channels. While the Pi does not have any NUMA zones, there appears a limit on open rows of RAM and one could instead ask how consistently does the operating system allocate memory among the banks in a way that performs well.

In either case an average would be a first step towards finding a distribution that reflects the combined statistical effects of the memory allocator, time scheduler and other activities within the operating system on the stream kernels. In my opinion this is an interesting thing to do, but so far it has not been the focus.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Sat May 30, 2020 7:00 am

As mentioned in

viewtopic.php?f=63&t=275419&start=25#p1669473

the ODROID-C4 was released about the same time the 8GB Pi 4B. Odroid has a set up a remotely accessible C4 for testing and I tried it.

Image

As seems typical with single-board computers that have a two-chip memory solution, the bandwidth is much higher and scales better with number of cores. I think it is amusing that at about the same time the Raspberry Pi engineers were doubling the amount of memory someone else was doubling memory bandwidth.

I also ran the pie chart benchmark.

viewtopic.php?f=63&t=227177&p=1669793#p1669793

Those results suggest more memory bandwidth can make up for a slower processor in two of four computations.
Last edited by ejolson on Sat May 30, 2020 2:56 pm, edited 1 time in total.

User avatar
bensimmo
Posts: 5150
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Memory Bandwidth of the Pi 4B

Sat May 30, 2020 11:06 am

On a dual channel setup you would expect the 2nd core to increase in performance, each core can access a channel independently iirc. They are normally unganged, so it opens up greater bandwidth...

Ryzen, Threadripper and other high end setups can use Quad Channel for similar benefits.

Sticking another ram chip on a Pi4 compared to just replacing it (ignoring the relatively small tweaks in comparison) to create a dual channel setup has obvious limitations.
The advantage other have had is probably time and the ability to lay the board out how they like.
they do not have to keep to the same basic B+ board principle to keep compatibility with the now quite large add-on, hat, dsi, cam market.

How hard it would be to stick a chip on the back again and wire it up and if the SoC supports dual channel (I've not looked) for the benefits of 'a small percentage' I don't know.
:-)

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Sun May 31, 2020 7:25 am

As no one has run stream on the Pi 4B with 8GB RAM, I tried the ODROID-N2 which is a big.LITTLE architecture with two Cortex-A53 cores and four Cortex-A73 cores. It was difficult to interpret the results of mixing the two types of cores together, so I opted to test the big cores separately and then the little cores.

Image

Aside from all the arrows, I find it striking how much more memory bandwidth is available to the A73 cores compared to the A53 cores. Note that adding an A53 core results in an increase in aggregate memory bandwidth while adding an A73 core results in a decrease.

ejolson
Posts: 6639
Joined: Tue Mar 18, 2014 11:47 am

Re: Memory Bandwidth of the Pi 4B

Mon Jun 01, 2020 6:48 am

There was an older ODROID-XU4 available for comparison with the Pi 4B as well. This is a 32-bit big.LITTLE SOC with four ARM Cortex-A17 paired with four Cortex-A7 processors. Here is another memory bandwidth plot with too many arrows.

Image

As with the N2, the memory bandwidth of the little cores is noticeably less than the big cores. This time the bandwidth curves for the big cores intersect the Pi 4B curves. I would expect the little cores to exhibit good scaling with many parallel algorithms, because of how bandwidth increases as little cores are added.
Last edited by ejolson on Tue Jun 02, 2020 2:59 am, edited 1 time in total.

User avatar
jahboater
Posts: 6715
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Memory Bandwidth of the Pi 4B

Mon Jun 01, 2020 10:35 am

ejolson wrote:
Mon Jun 01, 2020 6:48 am
I find it striking how much more memory bandwidth is available to the A73 cores compared to the A53 cores
ejolson wrote:
Mon Jun 01, 2020 6:48 am
As with the N2, the memory bandwidth of the little cores is noticeably less than the big cores.
I guess the high end cores have more sophisticated and power hungry buffering mechanisms.
Also they may not care about alignment and may even have no penalty for mis-aligned data (like x86 CPU's).
Pi4 8GB (Raspberry Pi OS 64-bit), Pi4 4GB, Pi4 2GB, Pi1 Rev 1 256MB, Pi Zero

Return to “General discussion”