Gnyueh
Posts: 53
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 9:43 am

jdb wrote:
Tue Jul 28, 2020 9:18 am
Gnyueh wrote:
Tue Jul 28, 2020 8:52 am

My apologies I was overwhelmed by frustration just now, I have editted the post. But broadcomm seems reluctant to deal with these issues and provide more information about this. Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine. It would be great if you could provide some more information on this especially for the memory sub system in BCM2711.
Thanks again for reminding me.
Broadcom don't support end customers. They sell chips to other businesses. As such, don't expect a random Broadcom employee to start answering your questions.

What makes you think the ARM CPUs *should* be able to maximise transfers up to the theoretical bandwidth of the memory bus?
Hint: no CPUs are capable of this.
Welp at least 40% maximum utilization is totally usable, but not that great. Many ARM chips could achieve maximum 60% percent of the theoratical BW(Snapdragon 835 theoratical:30GB/s actual 18G/s https://www.anandtech.com/show/11201/qu ... -preview/2). For x86 chips, ryzen and intel could simply achieve well above 80% theoratical BW.

And the perf down-scaling as thread inceases for RPi is surprising. Hope there will be hardware or firmware improvements&enhancement on this if possible, especially for the future products.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

User avatar
PeterO
Posts: 6041
Joined: Sun Jul 22, 2012 4:14 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 9:47 am

Gnyueh wrote:
Tue Jul 28, 2020 9:43 am
jdb wrote:
Tue Jul 28, 2020 9:18 am
Gnyueh wrote:
Tue Jul 28, 2020 8:52 am

My apologies I was overwhelmed by frustration just now, I have editted the post. But broadcomm seems reluctant to deal with these issues and provide more information about this. Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine. It would be great if you could provide some more information on this especially for the memory sub system in BCM2711.
Thanks again for reminding me.
Broadcom don't support end customers. They sell chips to other businesses. As such, don't expect a random Broadcom employee to start answering your questions.

What makes you think the ARM CPUs *should* be able to maximise transfers up to the theoretical bandwidth of the memory bus?
Hint: no CPUs are capable of this.
Welp at least 40% maximum utilization is totally usable, but not that great. Many ARM chips could achieve maximum 60% percent of the theoratical BW(Snapdragon 835 theoratical:30GB/s actual 18G/s https://www.anandtech.com/show/11201/qu ... -preview/2). For x86 chips, ryzen and intel could simply achieve well above 80% theoratical BW.

And the perf down-scaling as thread inceases for RPi is surprising. Hope there will be hardware or firmware improvements&enhancement on this if possible, especially for the future products.
It's hard to see why you are using a Pi when you are clearly so dissatisfied with it's performance :shock:

PeterO
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

Gnyueh
Posts: 53
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 10:00 am

PeterO wrote:
Tue Jul 28, 2020 9:47 am
Gnyueh wrote:
Tue Jul 28, 2020 9:43 am
jdb wrote:
Tue Jul 28, 2020 9:18 am


Broadcom don't support end customers. They sell chips to other businesses. As such, don't expect a random Broadcom employee to start answering your questions.

What makes you think the ARM CPUs *should* be able to maximise transfers up to the theoretical bandwidth of the memory bus?
Hint: no CPUs are capable of this.
Welp at least 40% maximum utilization is totally usable, but not that great. Many ARM chips could achieve maximum 60% percent of the theoratical BW(Snapdragon 835 theoratical:30GB/s actual 18G/s https://www.anandtech.com/show/11201/qu ... -preview/2). For x86 chips, ryzen and intel could simply achieve well above 80% theoratical BW.

And the perf down-scaling as thread inceases for RPi is surprising. Hope there will be hardware or firmware improvements&enhancement on this if possible, especially for the future products.
It's hard to see why you are using a Pi when you are clearly so dissatisfied with it's performance :shock:

PeterO
Hard to imagine there is such issue before you really get into it. I was totally satisfied about the RPi 4 performance in most aspects especially for the 4 strong A72 cores.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27721
Joined: Sat Jul 30, 2011 7:41 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 11:04 am

Gnyueh wrote:
Tue Jul 28, 2020 9:43 am
jdb wrote:
Tue Jul 28, 2020 9:18 am
Gnyueh wrote:
Tue Jul 28, 2020 8:52 am

My apologies I was overwhelmed by frustration just now, I have editted the post. But broadcomm seems reluctant to deal with these issues and provide more information about this. Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine. It would be great if you could provide some more information on this especially for the memory sub system in BCM2711.
Thanks again for reminding me.
Broadcom don't support end customers. They sell chips to other businesses. As such, don't expect a random Broadcom employee to start answering your questions.

What makes you think the ARM CPUs *should* be able to maximise transfers up to the theoretical bandwidth of the memory bus?
Hint: no CPUs are capable of this.
Welp at least 40% maximum utilization is totally usable, but not that great. Many ARM chips could achieve maximum 60% percent of the theoratical BW(Snapdragon 835 theoratical:30GB/s actual 18G/s https://www.anandtech.com/show/11201/qu ... -preview/2). For x86 chips, ryzen and intel could simply achieve well above 80% theoratical BW.

And the perf down-scaling as thread inceases for RPi is surprising. Hope there will be hardware or firmware improvements&enhancement on this if possible, especially for the future products.
This isn't a firmware thing, it's just the way the controllers work. If it was firmware, we would have already changed things for better performance. Pretty sure I cannot say much more than that, but I would expect, if there is improvement to be made in the HW, that it would be implemented in later products. So that would be some time away, and not on Pi4, unless there was a respin on the 2711 which seems unlikely - it works well enough as it to not want to spend multiple millions of $ on something like this.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

Gnyueh
Posts: 53
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 12:40 pm

jamesh wrote:
Tue Jul 28, 2020 11:04 am
Gnyueh wrote:
Tue Jul 28, 2020 9:43 am
jdb wrote:
Tue Jul 28, 2020 9:18 am


Broadcom don't support end customers. They sell chips to other businesses. As such, don't expect a random Broadcom employee to start answering your questions.

What makes you think the ARM CPUs *should* be able to maximise transfers up to the theoretical bandwidth of the memory bus?
Hint: no CPUs are capable of this.
Welp at least 40% maximum utilization is totally usable, but not that great. Many ARM chips could achieve maximum 60% percent of the theoratical BW(Snapdragon 835 theoratical:30GB/s actual 18G/s https://www.anandtech.com/show/11201/qu ... -preview/2). For x86 chips, ryzen and intel could simply achieve well above 80% theoratical BW.

And the perf down-scaling as thread inceases for RPi is surprising. Hope there will be hardware or firmware improvements&enhancement on this if possible, especially for the future products.
This isn't a firmware thing, it's just the way the controllers work. If it was firmware, we would have already changed things for better performance. Pretty sure I cannot say much more than that, but I would expect, if there is improvement to be made in the HW, that it would be implemented in later products. So that would be some time away, and not on Pi4, unless there was a respin on the 2711 which seems unlikely - it works well enough as it to not want to spend multiple millions of $ on something like this.
Welp for now I will turn 4k60p option on to squeeze some performance even I am unlikely to plug a display. Thanks a lot for your reply on this and I am looking forward to future RPi products with improvements on bandwidth.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 27721
Joined: Sat Jul 30, 2011 7:41 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 1:55 pm

Gnyueh wrote:
Tue Jul 28, 2020 12:40 pm
jamesh wrote:
Tue Jul 28, 2020 11:04 am
Gnyueh wrote:
Tue Jul 28, 2020 9:43 am


Welp at least 40% maximum utilization is totally usable, but not that great. Many ARM chips could achieve maximum 60% percent of the theoratical BW(Snapdragon 835 theoratical:30GB/s actual 18G/s https://www.anandtech.com/show/11201/qu ... -preview/2). For x86 chips, ryzen and intel could simply achieve well above 80% theoratical BW.

And the perf down-scaling as thread inceases for RPi is surprising. Hope there will be hardware or firmware improvements&enhancement on this if possible, especially for the future products.
This isn't a firmware thing, it's just the way the controllers work. If it was firmware, we would have already changed things for better performance. Pretty sure I cannot say much more than that, but I would expect, if there is improvement to be made in the HW, that it would be implemented in later products. So that would be some time away, and not on Pi4, unless there was a respin on the 2711 which seems unlikely - it works well enough as it to not want to spend multiple millions of $ on something like this.
Welp for now I will turn 4k60p option on to squeeze some performance even I am unlikely to plug a display. Thanks a lot for your reply on this and I am looking forward to future RPi products with improvements on bandwidth.
That will use a lot more power. Just one reason why we leave it off by default.

It's a long wait for the next gen Pi model, just so you know.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
dickon
Posts: 1874
Joined: Sun Dec 09, 2012 3:54 pm
Location: Home, just outside Reading

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 2:16 pm

jamesh wrote:
Tue Jul 28, 2020 1:55 pm
It's a long wait for the next gen Pi model, just so you know.
I'm pretty sure the last time you said something like that the Pi 4 was released a week or two later...

So, er, August..? :-)
As it is apparently board policy to disallow any criticism of anything, as it appears to criticise something is to criticise all the users of that something, I will no longer be commenting in threads which are not directly relevant to my uses of the Pi.

Gnyueh
Posts: 53
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 2:19 pm

jamesh wrote:
Tue Jul 28, 2020 1:55 pm
Gnyueh wrote:
Tue Jul 28, 2020 12:40 pm
jamesh wrote:
Tue Jul 28, 2020 11:04 am


This isn't a firmware thing, it's just the way the controllers work. If it was firmware, we would have already changed things for better performance. Pretty sure I cannot say much more than that, but I would expect, if there is improvement to be made in the HW, that it would be implemented in later products. So that would be some time away, and not on Pi4, unless there was a respin on the 2711 which seems unlikely - it works well enough as it to not want to spend multiple millions of $ on something like this.
Welp for now I will turn 4k60p option on to squeeze some performance even I am unlikely to plug a display. Thanks a lot for your reply on this and I am looking forward to future RPi products with improvements on bandwidth.
That will use a lot more power. Just one reason why we leave it off by default.

It's a long wait for the next gen Pi model, just so you know.
That is okay, I will wait and maybe I will kill some time trying some other single board PC for fun after finishing configuring . 4k60p option will OC AXI which is quite helpful https://www.tomshardware.com/reviews/ra ... ,6188.html. I used to OC infinity fabric on my ryzen CPUs and for most everyday cases the extra power deserves https://www.guru3d.com/articles_pages/a ... mes,8.html.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Gnyueh
Posts: 53
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 2:19 pm

dickon wrote:
Tue Jul 28, 2020 2:16 pm
jamesh wrote:
Tue Jul 28, 2020 1:55 pm
It's a long wait for the next gen Pi model, just so you know.
I'm pretty sure the last time you said something like that the Pi 4 was released a week or two later...

So, er, August..? :-)
RPi 4+ maybe?
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

Heater
Posts: 17115
Joined: Tue Jul 17, 2012 3:02 pm

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 2:56 pm

Gnyueh wrote:
Tue Jul 28, 2020 9:43 am
Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine.
I can see why you are not satisfied, you have not done the tests.

My measurements show that for compute intensive work on the CPU's the Pi 4 is faster than the Jetson Nano. Even for large data sets parallized over multiple cores. For example a convolution over 20 million elements: https://github.com/ZiCog/rust_convolution. All this talk of memory bandwidth means nothing when you actually want to do real work rather than measure one little detail of the machine.

They both scale badly compared to Intel x86.

Please do run that on your various machines and report your results.

Yes the Jetson is a sweet machine with it's GPU, it's also three times the price. You pays your money and make your choice.

In some cases the 8GB Pi 4 may be preferred over the 4GB of the Jetson. Your money, you choice again.
Memory in C++ is a leaky abstraction .

Gnyueh
Posts: 53
Joined: Sat Jun 27, 2020 8:15 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 3:34 pm

Heater wrote:
Tue Jul 28, 2020 2:56 pm
Gnyueh wrote:
Tue Jul 28, 2020 9:43 am
Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine.
I can see why you are not satisfied, you have not done the tests.

My measurements show that for compute intensive work on the CPU's the Pi 4 is faster than the Jetson Nano. Even for large data sets parallized over multiple cores. For example a convolution over 20 million elements: https://github.com/ZiCog/rust_convolution. All this talk of memory bandwidth means nothing when you actually want to do real work rather than measure one little detail of the machine.

They both scale badly compared to Intel x86.

Please do run that on your various machines and report your results.

Yes the Jetson is a sweet machine with it's GPU, it's also three times the price. You pays your money and make your choice.

In some cases the 8GB Pi 4 may be preferred over the 4GB of the Jetson. Your money, you choice again.
Welp for CPU performance the jetson utilize A57 cores which is the previous gen of A72. So worse CPU for Jetson performance is expected.
This very discussion focus on WHY RPi4 DOESNT PERFORM BETTER ON BANDWIDTH, not on a random workload so please prove that your workload is memory intensive and desire BW to show even for memory intensive workloads the RPi 4 performs better.
RPi 4 is a great single board PC, the performance improvement is impressive compared to the previous gen and I am using it as a tiny NAS.
For Jetson it is also a good product but I wont buy it because it is basically made for CUDA programming and CV study so most of its high price lies in it GPU which is useless to me.
I mentioned Jetson only for its better memory scaling to compare with the bad one of RPi 4.
The test seems pretty interesting but I didn't install linux on my laptop. I will try it and post results of my Ryzen desktop and RPi4 after returning to school.
Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

ejolson
Posts: 6319
Joined: Tue Mar 18, 2014 11:47 am

Re: Actual memory bandwidth of raspberry pi4?

Tue Jul 28, 2020 3:51 pm

Heater wrote:
Tue Jul 28, 2020 2:56 pm
Gnyueh wrote:
Tue Jul 28, 2020 9:43 am
Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine.
All this talk of memory bandwidth means nothing when you actually want to do real work rather than measure one little detail of the machine.
I think there are a number of approaches to doing real work that are valid:
  • Hack something simple that works and then run performance analysers on it to see why it's slow.
    • Understand the performance characteristics of the hardware and then choose a suitable algorithm.
    The stereotype that web designers use the first method and the scientists the second is likely untrue. People do both and you find both approaches in almost every application domain.

    You'll likely see some interesting cache effects as the size of the convolution kernel is changed in the Rust code.

    Gnyueh
    Posts: 53
    Joined: Sat Jun 27, 2020 8:15 am

    Re: Actual memory bandwidth of raspberry pi4?

    Tue Jul 28, 2020 4:01 pm

    ejolson wrote:
    Tue Jul 28, 2020 3:51 pm
    Heater wrote:
    Tue Jul 28, 2020 2:56 pm
    Gnyueh wrote:
    Tue Jul 28, 2020 9:43 am
    Again, I don't think this memory performance for an LPDDR4 memory is satisfying. Jetson utilizes much stronger GPU and its performance scales just fine.
    All this talk of memory bandwidth means nothing when you actually want to do real work rather than measure one little detail of the machine.
    I think there are a number of approaches to doing real work that are valid:
    • Hack something simple that works and then run performance analysers on it to see why it's slow.
      • Understand the performance characteristics of the hardware and then choose a suitable algorithm.
      The stereotype that web designers use the first method and the scientists the second is likely untrue. People do both and you find both approaches in almost every application domain.

      You'll likely see some interesting cache effects as the size of the convolution kernel is changed in the Rust code.
      That is true. But some workloads are simply hunger for bandwidth even well tuned such as depression and encryption workloads which is useful for NAS and VPN.
      Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

      Heater
      Posts: 17115
      Joined: Tue Jul 17, 2012 3:02 pm

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 4:36 pm

      Gnyueh wrote:
      Tue Jul 28, 2020 4:01 pm
      That is true. But some workloads are simply hunger for bandwidth even well tuned such as depression and encryption workloads which is useful for NAS and VPN.
      OK. Now we are talking.

      Of course one can bend a little micro-benchmark around to prove almost anything. It does not mean much.

      Do you have an actual example of an actual workload where memory bandwidth is the limiting factor?

      Can you demonstrate for that work load that it is actually limited by memory bandwidth? Rather than it being just a suspicion?

      I ask because I strongly suspect that any network intensive task, will be limited by network bandwidth first.

      Encryption and compression/decompression are CPU limited in the absence of accelerators.

      Then there is the limitation on storage bandwidth...

      I'm sure it's premature to be sad about the Pi's memory bandwidth. Until proved otherwise.
      Memory in C++ is a leaky abstraction .

      jamesh
      Raspberry Pi Engineer & Forum Moderator
      Raspberry Pi Engineer & Forum Moderator
      Posts: 27721
      Joined: Sat Jul 30, 2011 7:41 pm

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 5:22 pm

      Heater wrote:
      Tue Jul 28, 2020 4:36 pm
      I'm sure it's premature to be sad about the Pi's memory bandwidth. Until proved otherwise.
      Indeed. To need to run the RAM and controllers as fast as they will go for any length of time is a very unusual use case.
      Principal Software Engineer at Raspberry Pi (Trading) Ltd.
      Contrary to popular belief, humorous signatures are allowed.
      I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

      ejolson
      Posts: 6319
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 5:33 pm

      Gnyueh wrote:
      Tue Jul 28, 2020 4:01 pm
      That is true. But some workloads are simply hunger for bandwidth even well tuned such as depression and encryption workloads which is useful for NAS and VPN.
      During these times of shelter at home, it seems optimized depression algorithms are very popular.

      With encryption there are many different algorithms, for example, Twofish versus Rijndael. Note that Twofish has greater memory requirements and naturally runs in constant time while Rijndael is more taxing on the processor because of the extra effort needed to run in constant time. While one can complain about politicised environments in which the apparent speed of Rijndael when it was implemented in a way susceptible to timing attacks was used as a point of comparison to Twofish, since the Pi doesn't implement AES in hardware it appears the better choice of cypher is ChaCha20 and the other Latin-dance themed algorithms.

      For a secure NAS, I would recommend tunneling unencrypted NFSv4 through a WireGuard VPN. In an enterprise setup you may still want to use Kerberos user level authentication, but with a Pi at home I'd turn all the ID-mapping stuff off by adding sec=sys in the export options.
      Last edited by ejolson on Wed Jul 29, 2020 5:10 am, edited 3 times in total.

      cleverca22
      Posts: 2388
      Joined: Sat Aug 18, 2012 2:33 pm

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 5:35 pm

      Gnyueh wrote:
      Tue Jul 28, 2020 8:12 am
      The BCM2711 system uses an AMBA AXI-compatible interface structure. In order to keep the system complexity low and data throughput high, the BCM2711 AXI system does not always return read data in-order .The GPU has special logic to cope with data arriving out-of-order; however the ARM core does not contain such logic. Therefore some precautions must be taken when using the ARM to access peripherals.
      The mem sub system of ARM cluster in BCM2711 seems to lack of OoO memory capacity. This will be troublesome for multithread memory performance.
      the out-of-order thing only happens if you are reading from 2 different axi slaves, all reads from a single slave (such as dram) come back in the expected order, so it only complicates trying to talk to 2 peripherals at once
      Gnyueh wrote:
      Tue Jul 28, 2020 8:45 am
      The memory hierarchyfor BCM2835, for BCM2711 it should be similar. And inside this odd hierarchy VC MMU ( circled) is likely to be the bottleneck which can only handle around 5G/s data stream and lacks multithread capacity.
      i have configured that MMU before, and its very simple, and could be implemented at the full bus speed with ease

      on the vc4:
      basically, the arm memory is broken up into 64 x 16mb chunks, totalling 1gig
      a 6 bit addresss is needed to know which chunk your refering to, and thats just a simple bit shift and selecting the right bits
      that 6 bit value, is then used as an index into a lookup table (all registers, not held in ram) and returns a new 8 bit number
      that 8bit from the lookup table, is combined with the original 24 bits in the original addr, to make a 32bit VPU address

      thats far less complex then a typical MMU, so it should be trivial to do that in effectively zero clocks (just make the clocks a bit longer)

      vc6 changes things completely due to the need for 4gig, and ive not had time to investigate how they work now, but i would assume the engineers have kept things as good/simple

      ejolson
      Posts: 6319
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 7:17 pm

      I wrote a somewhat naive program to test cache around the time the Sun IPC was still available and the Ultra-1 was new and shiny. I thought I had results for a SPARCStation 5 and various PCs as well but can't find them. Anyway, later I ran this program on the Pi 2B when it came out and other machines of similar vintage that seemed new and shiny. Those results were

      Image

      see also

      https://www.friendlyarm.com/Forum/viewt ... p=934#p934

      At any rate, the block diagram showing how the cache and memory controller are laid out reminded me that I've been planning to run the same program on the Raspberry Pi 4B. I'll post results soon.

      HiassofT
      Posts: 323
      Joined: Fri Jun 30, 2017 10:07 pm
      Location: Salzburg, Austria
      Contact: Website

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 10:43 pm

      Heater wrote:
      Tue Jul 28, 2020 4:36 pm
      Do you have an actual example of an actual workload where memory bandwidth is the limiting factor?
      HEVC hardware video decoding. At least current working theory is that 4kp60 decoding at 50-70Mbit/sec isn't working well because HW decoders are starving from memory bandwidth limitations.

      so long,

      Hias

      cleverca22
      Posts: 2388
      Joined: Sat Aug 18, 2012 2:33 pm

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 10:46 pm

      HiassofT wrote:
      Tue Jul 28, 2020 10:43 pm
      Heater wrote:
      Tue Jul 28, 2020 4:36 pm
      Do you have an actual example of an actual workload where memory bandwidth is the limiting factor?
      HEVC hardware video decoding. At least current working theory is that 4kp60 decoding at 50-70Mbit/sec isn't working well because HW decoders are starving from memory bandwidth limitations.
      but is that workload starving on reads, or writes, i feel like write bandwidth after decompression would be stressing the system a lot harder

      HiassofT
      Posts: 323
      Joined: Fri Jun 30, 2017 10:07 pm
      Location: Salzburg, Austria
      Contact: Website

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 11:11 pm

      cleverca22 wrote:
      Tue Jul 28, 2020 10:46 pm
      HiassofT wrote:
      Tue Jul 28, 2020 10:43 pm
      HEVC hardware video decoding. At least current working theory is that 4kp60 decoding at 50-70Mbit/sec isn't working well because HW decoders are starving from memory bandwidth limitations.
      but is that workload starving on reads, or writes, i feel like write bandwidth after decompression would be stressing the system a lot harder
      I don't know any actual details but most certainly it's reads - as all current video codecs HEVC needs a lot of reference information from previous frames, motion vectors etc.

      so long,

      Hias

      User avatar
      dickon
      Posts: 1874
      Joined: Sun Dec 09, 2012 3:54 pm
      Location: Home, just outside Reading

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 11:15 pm

      HiassofT wrote:
      Tue Jul 28, 2020 10:43 pm
      Heater wrote:
      Tue Jul 28, 2020 4:36 pm
      Do you have an actual example of an actual workload where memory bandwidth is the limiting factor?
      HEVC hardware video decoding. At least current working theory is that 4kp60 decoding at 50-70Mbit/sec isn't working well because HW decoders are starving from memory bandwidth limitations.
      I've been wondering why 50-70Mb/s seems like noise, and it is. The main problem I see is that 4kp60@10b is ~23Gb/s (~2GB/s), so writing that and the GPU reading it out to the HDMI display looks like it'll be getting close with the arbitration required. Add in the usual DRAM latencies on row and column selection, and it looks reasonable. All back of the envelope, of course.

      4kp60 -- no matter the encoding -- looks too tight under a general-purpose OS. You'd probably manage it flawlessly with an RTOS: get the playback engine in the L2 cache, make the data uncacheable, and you're probably fine. Interesting.
      As it is apparently board policy to disallow any criticism of anything, as it appears to criticise something is to criticise all the users of that something, I will no longer be commenting in threads which are not directly relevant to my uses of the Pi.

      User avatar
      dickon
      Posts: 1874
      Joined: Sun Dec 09, 2012 3:54 pm
      Location: Home, just outside Reading

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 11:30 pm

      HiassofT wrote:
      Tue Jul 28, 2020 11:11 pm
      I don't know any actual details but most certainly it's reads - as all current video codecs HEVC needs a lot of reference information from previous frames, motion vectors etc.
      Your stream bandwidth issues are almost certainly down to the profile / level that they imply, rather than the actual bandwidth of the datastream. Generally, the higher the bandwidth, the more capable the playback machine is expected to be, so the more reference slices and macroblocks are considered to be fair game; it's all documented. There's nothing stopping you using ludicrous bandwidths with more constrained combinations of the expensive stuff should you want.

      Fun. I hadn't realised these machines were quite so close to the limits. It certainly explains the desktop p60 speeds I'm seeing.

      (Don't get me wrong: I love this kit. I still have a hard time believing that the machine driving my RHS monitor is now the size of a creditcard, but it does take a while to drag windows around the screen).
      As it is apparently board policy to disallow any criticism of anything, as it appears to criticise something is to criticise all the users of that something, I will no longer be commenting in threads which are not directly relevant to my uses of the Pi.

      ejolson
      Posts: 6319
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Actual memory bandwidth of raspberry pi4?

      Tue Jul 28, 2020 11:51 pm

      ejolson wrote:
      Tue Jul 28, 2020 7:17 pm
      At any rate, the block diagram showing how the cache and memory controller are laid out reminded me that I've been planning to run the same program on the Raspberry Pi 4B. I'll post results soon.
      Here is a graph showing cache memory performance for the Pi 4B.

      Image

      Each consecutive dot represents a buffer size that has been increased by a factor of two. The graphs seem to imply the 3B has more level 1 cache than the 4B but that the 4B has more level 2 cache than the 3B. Is that actually true?

      Since the 4B and Zero graphs were made using gcc version 10.1 running the current version of Raspberry Pi OS while the graphs for the 3B were made using a gcc 5.x and a much older Raspian, could it be possible the cache footprint of the new kernel and systemd make it appear the 4B has less level 1 cache?

      Gnyueh
      Posts: 53
      Joined: Sat Jun 27, 2020 8:15 am

      Re: Actual memory bandwidth of raspberry pi4?

      Wed Jul 29, 2020 4:40 am

      ejolson wrote:
      Tue Jul 28, 2020 5:33 pm
      Gnyueh wrote:
      Tue Jul 28, 2020 4:01 pm
      That is true. But some workloads are simply hunger for bandwidth even well tuned such as depression and encryption workloads which is useful for NAS and VPN.
      During these times of shelter at home, it seems optimized depression algorithms are very popular.

      With encryption there are many different algorithms, for example, Twofish versus Rijndael. Note that Twofish has greater memory requirements and naturally runs in constant time while Rijndael is more taxing on the processor because of the extra effort needed to run in constant time. While one can complain about politicised environments in which the apparent speed of Rijndael when it was implemented in a way susceptible to timing attacks was used as a point of comparison to Twofish, since the Pi doesn't implement AES in hardware it appears the better choice of cypher is ChaCha20 and the other Latin-dance themed cyphers.

      For a secure NAS, I would recommend tunneling unencrypted NFSv4 through a WireGuard VPN. In an enterprise setup you may still want to use Kerberos user level authentication, but with a Pi at home I'd turn all the ID-mapping stuff off by adding sec=sys in the export options.
      A72 utilizes ARM's crypto extension included in ARMv8 and the AES performance tested on Geekbench 3 is about 1G/(s * 2.3GHz)https://www.anandtech.com/show/9878/the ... 8-review/3 so for RPi 4 3.5G/s AES performance is expected when OC to 2GHz which is enough to make memory bottleneck
      Kállio ínai mias óras elev́theri zoí, pará saránda khrónous, sklaviá kai filakí.

      Return to “General discussion”