lurk101
Posts: 526
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Fri Apr 09, 2021 6:00 pm

You could probably carry it around in a wheelbarrow.

I'm an architecture person and was mostly curious about any ARM progress with threads.
Growing old is getting old.

Heater
Posts: 17999
Joined: Tue Jul 17, 2012 3:02 pm

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Fri Apr 09, 2021 7:03 pm

lurk101 wrote:
Fri Apr 09, 2021 6:00 pm
You could probably carry it around in a wheelbarrow.
OK. As long as I don't need a 3 phase supply to power it up we are good to go.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 7069
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Fri Apr 09, 2021 7:06 pm

Heater wrote:
Fri Apr 09, 2021 5:55 pm
What exactly does your The Dripper machine look like?

Can I carry it around without noticing the weight? Does in run all day on a charge?

I'm all for a monster PC Dripper box with a decent graphics card. But that is a different use case.
Right now the 8-core AMD Ryzen 4700G (desktop), 4750G (low power) and 4700U (mobile) APUs would seem to compete well with the Apple ARM processors. It would be interesting to see a comparison between a ThinkBook and a MacBook in terms of performance, portability and price. I suspect a suitably chosen Ryzen-based ThinkBook would win against the ARM-based MacBook in all categories while including the added advantage of Linux compatibility with the ROCm software stack and fully-open GPU driver.

Back to the Pi, my opinion is having a well-supported GPU on 64-bit is currently the biggest problem for users on this forum, not CPU performance. For example, a similarly-priced and sized quad or even dual-core SBC based on an AMD APU would be very attractive mostly due to the built-in Linux support for Radeon GPUs that just works.
Last edited by ejolson on Sat Apr 10, 2021 4:51 am, edited 1 time in total.

lurk101
Posts: 526
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Fri Apr 09, 2021 7:24 pm

ejolson wrote:
Fri Apr 09, 2021 7:06 pm
Back to the Pi, my opinion is having a well-supported GPU on 64-bit is the currently the biggest problem.
Those are precisely the reasons I can't always use Pis. Those and the lack of external PCIe.
Growing old is getting old.

User avatar
davidcoton
Posts: 6118
Joined: Mon Sep 01, 2014 2:37 pm
Location: Cambridge, UK
Contact: Website

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Fri Apr 09, 2021 9:18 pm

lurk101 wrote:
Fri Apr 09, 2021 7:24 pm
ejolson wrote:
Fri Apr 09, 2021 7:06 pm
Back to the Pi, my opinion is having a well-supported GPU on 64-bit is the currently the biggest problem.
Those are precisely the reasons I can't always use Pis. Those and the lack of external PCIe.
Have you checked out the CM4 and its IO board? PCIe (at least at some level).
Not sure that anyone has got an external GPU to do anything useful on a Pi, yet.
Location: 345th cell on the right of the 210th row of L2 cache

lurk101
Posts: 526
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Fri Apr 09, 2021 11:03 pm

davidcoton wrote:
Fri Apr 09, 2021 9:18 pm
lurk101 wrote:
Fri Apr 09, 2021 7:24 pm
ejolson wrote:
Fri Apr 09, 2021 7:06 pm
Back to the Pi, my opinion is having a well-supported GPU on 64-bit is the currently the biggest problem.
Those are precisely the reasons I can't always use Pis. Those and the lack of external PCIe.
Have you checked out the CM4 and its IO board? PCIe (at least at some level).
Yes, ordered the pair last November. Still hasn't shipped. Did find a suitable alternate.
Growing old is getting old.

ejolson
Posts: 7069
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 3:44 am

Heater wrote:
Thu Apr 08, 2021 6:07 pm
So Mac 0.136, Pi 0.523.

So Mac is 3.8 times faster than Pi 4.
Your earlier runs of the pichart program

viewtopic.php?p=1814755#p1814755

suggested the M1 was

171.661/31.42=5.46

times faster than the 4B. Notably, the parallel scaling wasn't as good as one might have expected in that case either.

If memory bandwidth is the problem, one might detect this by running the stream memory bandwidth test

viewtopic.php?p=1644489#p1644489

on successively more and more cores.

To run a such a test on a Mac, you might need to work around the lack of a taskset command. Maybe one could just set the number of threads but not worry about locking them to particular cores.

lurk101
Posts: 526
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 4:45 am

Maintaining cache cohesion is another possible throughput impediment in SMP.
Growing old is getting old.

Heater
Posts: 17999
Joined: Tue Jul 17, 2012 3:02 pm

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 7:07 am

ejolson wrote:
Sat Apr 10, 2021 3:44 am
If memory bandwidth is the problem, one might detect this by running the stream memory bandwidth test
Here we go...

I build and run stream like so:

Code: Select all

#!/bin/bash

# First build stream as we like it:
gcc-10 -DSTREAM_ARRAY_SIZE=100000000 -O3 -mtune=native -march=native -fopenmp -o streamomp.100M stream.c

# Say someting about the machine
uname -a

# Run stream for a selction of thread counts
for threads in {1..64}
do
   OMP_NUM_THREADS=$threads ./streamomp.100M | grep 'Copy\|Scale\|Add\|Triad\|counted'
done
With the following results:

Code: Select all

Darwin Heaters-MacBook-Pro.local 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:06:51 PST 2021; root:xnu-7195.81.3~1/RELEASE_ARM64_T8101 arm64
Number of Threads counted = 1
Copy:           39153.4     0.040973     0.040865     0.041688
Scale:          39443.8     0.040718     0.040564     0.041317
Add:            44031.8     0.054672     0.054506     0.055224
Triad:          44079.4     0.054529     0.054447     0.054853
Number of Threads counted = 2
Copy:           39440.1     0.040608     0.040568     0.040675
Scale:          39448.9     0.040609     0.040559     0.040657
Add:            43658.5     0.055022     0.054972     0.055175
Triad:          43711.9     0.054946     0.054905     0.055042
Number of Threads counted = 3
Copy:           39242.7     0.040824     0.040772     0.040879
Scale:          39328.4     0.040723     0.040683     0.040758
Add:            43447.6     0.055305     0.055239     0.055361
Triad:          43464.1     0.055266     0.055218     0.055338
Number of Threads counted = 4
Copy:           38987.3     0.041293     0.041039     0.042791
Scale:          39196.6     0.040865     0.040820     0.040900
Add:            43321.4     0.055484     0.055400     0.055722
Triad:          43338.5     0.055455     0.055378     0.055641
Number of Threads counted = 5
Copy:           38598.9     0.041575     0.041452     0.041920
Scale:          38848.1     0.041340     0.041186     0.041518
Add:            42933.7     0.056063     0.055900     0.056334
Triad:          43009.1     0.055976     0.055802     0.056085
Number of Threads counted = 6
Copy:           38337.2     0.041887     0.041735     0.042186
Scale:          38513.4     0.041590     0.041544     0.041635
Add:            42577.3     0.056448     0.056368     0.056527
Triad:          42590.2     0.056422     0.056351     0.056515
Number of Threads counted = 7
Copy:           38113.4     0.042045     0.041980     0.042150
Scale:          38288.4     0.041827     0.041788     0.041874
Add:            42307.9     0.056807     0.056727     0.056891
Triad:          42328.0     0.056770     0.056700     0.056850
Number of Threads counted = 8
Copy:           37935.2     0.042264     0.042177     0.042370
Scale:          38181.7     0.041990     0.041905     0.042107
Add:            42106.0     0.057086     0.056999     0.057256
Triad:          42097.0     0.057112     0.057011     0.057352
Number of Threads counted = 9
Copy:           38013.8     0.042296     0.042090     0.043302
Scale:          38247.4     0.042059     0.041833     0.042953
Add:            42107.6     0.057138     0.056997     0.057680
Triad:          42137.0     0.057330     0.056957     0.058515
Number of Threads counted = 10
Copy:           38092.4     0.042153     0.042003     0.042262
Scale:          38334.3     0.041867     0.041738     0.041944
Add:            42282.5     0.056966     0.056761     0.057140
Triad:          42229.7     0.057063     0.056832     0.057230
Number of Threads counted = 11
Copy:           38120.7     0.042126     0.041972     0.042256
Scale:          38364.6     0.041918     0.041705     0.042156
Add:            42189.7     0.057088     0.056886     0.057215
Triad:          42211.1     0.057044     0.056857     0.057263
Number of Threads counted = 12
Copy:           38119.6     0.042148     0.041973     0.042218
Scale:          38286.7     0.041875     0.041790     0.041980
Add:            42294.4     0.057069     0.056745     0.057191
Triad:          42019.7     0.057188     0.057116     0.057342
Number of Threads counted = 13
Copy:           37977.7     0.042206     0.042130     0.042325
Scale:          38200.6     0.041974     0.041884     0.042086
Add:            42072.1     0.057122     0.057045     0.057204
Triad:          42106.7     0.057159     0.056998     0.057547
Number of Threads counted = 14
Copy:           38090.6     0.042169     0.042005     0.042239
Scale:          38287.8     0.041926     0.041789     0.041986
Add:            42125.1     0.057091     0.056973     0.057164
Triad:          42107.6     0.057201     0.056997     0.057361
Number of Threads counted = 15
Copy:           37997.5     0.042199     0.042108     0.042269
Scale:          38267.5     0.041967     0.041811     0.042138
Add:            42256.6     0.057024     0.056796     0.057096
Triad:          42160.7     0.057133     0.056925     0.057553
Number of Threads counted = 16
Copy:           37996.6     0.042215     0.042109     0.042270
Scale:          38190.8     0.041973     0.041895     0.042033
Add:            42107.6     0.057081     0.056997     0.057163
Triad:          42105.3     0.057112     0.057000     0.057181
Number of Threads counted = 17
Copy:           37932.8     0.042290     0.042180     0.042622
Scale:          38166.3     0.041992     0.041922     0.042198
Add:            42112.5     0.057163     0.056990     0.057586
Triad:          42085.4     0.057151     0.057027     0.057278
Number of Threads counted = 18
Copy:           37948.9     0.042280     0.042162     0.042817
Scale:          38156.1     0.041999     0.041933     0.042207
Add:            42179.2     0.057220     0.056900     0.057643
Triad:          42142.3     0.057199     0.056950     0.057500
Number of Threads counted = 19
Copy:           37946.1     0.042238     0.042165     0.042309
Scale:          38191.7     0.041988     0.041894     0.042277
Add:            42086.8     0.057136     0.057025     0.057243
Triad:          42077.2     0.057157     0.057038     0.057298
Number of Threads counted = 20
Copy:           37811.6     0.042456     0.042315     0.042770
Scale:          38033.7     0.042132     0.042068     0.042210
Add:            41956.7     0.057343     0.057202     0.057572
Triad:          41922.1     0.057425     0.057249     0.057755
Number of Threads counted = 21
Copy:           37797.4     0.042382     0.042331     0.042434
Scale:          38093.5     0.042138     0.042002     0.042265
Add:            41956.5     0.057332     0.057202     0.057433
Triad:          41989.6     0.057330     0.057157     0.057465
Number of Threads counted = 22
Copy:           37892.1     0.042313     0.042225     0.042456
Scale:          38120.5     0.042021     0.041972     0.042196
Add:            42040.4     0.057170     0.057088     0.057290
Triad:          42039.6     0.057184     0.057089     0.057451
Number of Threads counted = 23
Copy:           37903.0     0.042309     0.042213     0.042460
Scale:          38125.3     0.042068     0.041967     0.042204
Add:            42045.5     0.057174     0.057081     0.057264
Triad:          42160.7     0.057078     0.056925     0.057386
Number of Threads counted = 24
Copy:           37906.5     0.042267     0.042209     0.042329
Scale:          38096.3     0.042096     0.041999     0.042407
Add:            41963.1     0.057258     0.057193     0.057311
Triad:          42024.8     0.057152     0.057109     0.057206
Number of Threads counted = 25
Copy:           37847.5     0.042329     0.042275     0.042396
Scale:          38082.4     0.042040     0.042014     0.042094
Add:            42038.2     0.057142     0.057091     0.057211
Triad:          42075.1     0.057070     0.057041     0.057164
Number of Threads counted = 26
Copy:           37839.4     0.042390     0.042284     0.042879
Scale:          38116.8     0.042107     0.041976     0.042661
Add:            42100.2     0.057226     0.057007     0.057456
Triad:          42128.1     0.057132     0.056969     0.057692
Number of Threads counted = 27
Copy:           37880.6     0.042634     0.042238     0.045223
Scale:          38087.2     0.042515     0.042009     0.046163
Add:            41998.5     0.057904     0.057145     0.063252
Triad:          42012.4     0.057360     0.057126     0.058961
Number of Threads counted = 28
Copy:           37834.0     0.042624     0.042290     0.044752
Scale:          38079.0     0.042319     0.042018     0.043918
Add:            41999.0     0.057241     0.057144     0.057499
Triad:          42016.7     0.057368     0.057120     0.058685
Number of Threads counted = 29
Copy:           37894.1     0.042339     0.042223     0.042462
Scale:          38103.4     0.042071     0.041991     0.042166
Add:            42016.7     0.057206     0.057120     0.057463
Triad:          42058.0     0.057131     0.057064     0.057231
Number of Threads counted = 30
Copy:           37848.3     0.042374     0.042274     0.042491
Scale:          38106.0     0.042105     0.041988     0.042486
Add:            41977.1     0.057239     0.057174     0.057332
Triad:          42017.6     0.057170     0.057119     0.057210
Number of Threads counted = 31
Copy:           37863.5     0.042350     0.042257     0.042451
Scale:          38143.5     0.042037     0.041947     0.042128
Add:            41966.1     0.057427     0.057189     0.058270
Triad:          41966.8     0.059300     0.057188     0.075726
Number of Threads counted = 32
Copy:           37802.7     0.042368     0.042325     0.042488
Scale:          38158.9     0.042026     0.041930     0.042103
Add:            42042.5     0.057211     0.057085     0.057264
Triad:          42052.2     0.057185     0.057072     0.057359
Number of Threads counted = 33
Copy:           37916.5     0.042273     0.042198     0.042362
Scale:          38134.4     0.042035     0.041957     0.042194
Add:            41922.8     0.057333     0.057248     0.057477
Triad:          42006.6     0.057311     0.057134     0.057773
Number of Threads counted = 34
Copy:           37896.6     0.042246     0.042220     0.042294
Scale:          38121.6     0.042045     0.041971     0.042095
Add:            42018.8     0.057240     0.057117     0.057320
Triad:          41947.1     0.057276     0.057215     0.057422
Number of Threads counted = 35
Copy:           37836.6     0.042362     0.042287     0.042563
Scale:          38121.6     0.042041     0.041971     0.042186
Add:            41908.3     0.057323     0.057268     0.057387
Triad:          41991.0     0.057228     0.057155     0.057303
Number of Threads counted = 36
Copy:           37908.4     0.042285     0.042207     0.042338
Scale:          38143.3     0.042027     0.041947     0.042091
Add:            41919.1     0.057308     0.057253     0.057395
Triad:          42025.7     0.057235     0.057108     0.057359
Number of Threads counted = 37
Copy:           37840.2     0.042329     0.042283     0.042387
Scale:          38134.4     0.042034     0.041957     0.042081
Add:            41952.8     0.057289     0.057207     0.057357
Triad:          42018.3     0.057311     0.057118     0.057600
Number of Threads counted = 38
Copy:           37857.3     0.042353     0.042264     0.042584
Scale:          38111.6     0.042078     0.041982     0.042177
Add:            41908.1     0.057343     0.057268     0.057453
Triad:          41960.2     0.057267     0.057197     0.057323
Number of Threads counted = 39
Copy:           37927.3     0.042289     0.042186     0.042371
Scale:          38092.6     0.042119     0.042003     0.042445
Add:            41910.4     0.057345     0.057265     0.057376
Triad:          41940.4     0.057293     0.057224     0.057410
Number of Threads counted = 40
Copy:           37869.9     0.042318     0.042250     0.042438
Scale:          38038.2     0.042261     0.042063     0.043370
Add:            41980.8     0.057298     0.057169     0.057721
Triad:          41974.2     0.057299     0.057178     0.057383
Number of Threads counted = 41
Copy:           37768.9     0.042459     0.042363     0.042629
Scale:          37997.5     0.042169     0.042108     0.042224
Add:            41976.4     0.057227     0.057175     0.057302
Triad:          41980.1     0.057257     0.057170     0.057314
Number of Threads counted = 42
Copy:           37791.2     0.042450     0.042338     0.042755
Scale:          38088.1     0.042195     0.042008     0.042987
Add:            41940.4     0.057451     0.057224     0.058353
Triad:          42008.0     0.057393     0.057132     0.058315
Number of Threads counted = 43
Copy:           37827.6     0.042418     0.042297     0.042558
Scale:          38068.2     0.042110     0.042030     0.042193
Add:            41922.1     0.057322     0.057249     0.057377
Triad:          41944.8     0.057278     0.057218     0.057353
Number of Threads counted = 44
Copy:           37817.2     0.042362     0.042309     0.042420
Scale:          38020.3     0.042130     0.042083     0.042194
Add:            41925.9     0.057343     0.057244     0.057420
Triad:          41899.4     0.057348     0.057280     0.057463
Number of Threads counted = 45
Copy:           37840.2     0.042374     0.042283     0.042445
Scale:          38054.4     0.042091     0.042045     0.042151
Add:            41910.4     0.057326     0.057265     0.057383
Triad:          41963.8     0.057336     0.057192     0.057442
Number of Threads counted = 46
Copy:           37775.9     0.042434     0.042355     0.042528
Scale:          38009.1     0.042172     0.042095     0.042233
Add:            41878.1     0.057366     0.057309     0.057428
Triad:          41917.7     0.057456     0.057255     0.057926
Number of Threads counted = 47
Copy:           37828.7     0.042440     0.042296     0.042664
Scale:          38044.4     0.042163     0.042056     0.042212
Add:            41890.7     0.057379     0.057292     0.057547
Triad:          41922.3     0.057339     0.057249     0.057471
Number of Threads counted = 48
Copy:           37780.6     0.042399     0.042350     0.042444
Scale:          38063.4     0.042129     0.042035     0.042180
Add:            41849.7     0.057523     0.057348     0.057884
Triad:          41879.0     0.057362     0.057308     0.057432
Number of Threads counted = 49
Copy:           37771.6     0.042414     0.042360     0.042474
Scale:          38063.4     0.042160     0.042035     0.042261
Add:            41901.5     0.057383     0.057277     0.057452
Triad:          41906.7     0.057372     0.057270     0.057544
Number of Threads counted = 50
Copy:           37751.0     0.042532     0.042383     0.042878
Scale:          38008.5     0.042161     0.042096     0.042261
Add:            41897.3     0.057372     0.057283     0.057440
Triad:          41978.5     0.057281     0.057172     0.057362
Number of Threads counted = 51
Copy:           37742.1     0.042447     0.042393     0.042529
Scale:          37989.3     0.042213     0.042117     0.042283
Add:            41896.4     0.057380     0.057284     0.057532
Triad:          41905.3     0.057356     0.057272     0.057419
Number of Threads counted = 52
Copy:           37695.9     0.042542     0.042445     0.042617
Scale:          37976.0     0.042165     0.042132     0.042232
Add:            41877.4     0.057408     0.057310     0.057495
Triad:          41923.5     0.057374     0.057247     0.057452
Number of Threads counted = 53
Copy:           37703.1     0.042676     0.042437     0.043337
Scale:          37960.5     0.042386     0.042149     0.043027
Add:            41898.0     0.057506     0.057282     0.058410
Triad:          41872.4     0.057591     0.057317     0.058437
Number of Threads counted = 54
Copy:           37701.2     0.042497     0.042439     0.042554
Scale:          37959.6     0.042196     0.042150     0.042222
Add:            41902.4     0.057367     0.057276     0.057449
Triad:          41918.4     0.057323     0.057254     0.057406
Number of Threads counted = 55
Copy:           37756.3     0.042435     0.042377     0.042520
Scale:          37955.1     0.042208     0.042155     0.042264
Add:            41868.7     0.057491     0.057322     0.057727
Triad:          41951.4     0.057347     0.057209     0.057473
Number of Threads counted = 56
Copy:           37754.4     0.042456     0.042379     0.042526
Scale:          37974.9     0.042212     0.042133     0.042362
Add:            41892.8     0.057412     0.057289     0.057522
Triad:          41860.0     0.057423     0.057334     0.057545
Number of Threads counted = 57
Copy:           37687.8     0.042571     0.042454     0.042967
Scale:          37943.3     0.042274     0.042168     0.042344
Add:            41902.4     0.057330     0.057276     0.057430
Triad:          41914.7     0.057313     0.057259     0.057413
Number of Threads counted = 58
Copy:           37730.4     0.042547     0.042406     0.042685
Scale:          37887.9     0.042290     0.042230     0.042379
Add:            41906.0     0.057356     0.057271     0.057427
Triad:          41890.0     0.057369     0.057293     0.057452
Number of Threads counted = 59
Copy:           37715.4     0.042500     0.042423     0.042625
Scale:          37966.1     0.042190     0.042143     0.042236
Add:            41808.2     0.057529     0.057405     0.057809
Triad:          41901.0     0.057428     0.057278     0.057542
Number of Threads counted = 60
Copy:           37673.7     0.042508     0.042470     0.042566
Scale:          37982.2     0.042181     0.042125     0.042256
Add:            41824.2     0.057461     0.057383     0.057588
Triad:          41833.7     0.057413     0.057370     0.057509
Number of Threads counted = 61
Copy:           37679.0     0.042573     0.042464     0.043043
Scale:          37948.9     0.042214     0.042162     0.042277
Add:            41889.1     0.057426     0.057294     0.057508
Triad:          41843.1     0.057426     0.057357     0.057554
Number of Threads counted = 62
Copy:           37681.7     0.042521     0.042461     0.042600
Scale:          37981.3     0.042248     0.042126     0.042373
Add:            41809.6     0.057474     0.057403     0.057586
Triad:          41822.1     0.057465     0.057386     0.057700
Number of Threads counted = 63
Copy:           37657.0     0.042635     0.042489     0.043402
Scale:          37918.3     0.042274     0.042196     0.042413
Add:            41829.2     0.057473     0.057376     0.057587
Triad:          41898.7     0.057462     0.057281     0.058148
Number of Threads counted = 64
Copy:           37647.0     0.042741     0.042500     0.043633
Scale:          37972.1     0.042369     0.042136     0.043056
Add:            41863.7     0.057540     0.057329     0.058371
Triad:          41864.4     0.057558     0.057328     0.058422
Which, if I understand correctly, pees all over the results for the Ryzen 7 Pro 1700 reported here: viewtopic.php?p=1644489#p1646962
Memory in C++ is a leaky abstraction .

lurk101
Posts: 526
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 1:51 pm

I'm not quite sure what the results indicate, but here's the same for a mid-range ryzen 3700X

Code: Select all

Linux compute 5.10.27-051027-generic #202103310028 SMP Thu Apr 1 02:16:48 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Number of Threads counted = 1
Copy:           33481.1     0.047831     0.047788     0.047857
Scale:          19355.0     0.082697     0.082666     0.082743
Add:            22887.4     0.104887     0.104861     0.104936
Triad:          22902.5     0.104826     0.104792     0.104872
Number of Threads counted = 2
Copy:           36419.8     0.043990     0.043932     0.044034
Scale:          22818.4     0.070149     0.070119     0.070186
Add:            25204.0     0.095271     0.095223     0.095503
Triad:          25239.8     0.095116     0.095088     0.095150
Number of Threads counted = 4
Copy:           34958.2     0.045825     0.045769     0.045860
Scale:          22123.2     0.072357     0.072322     0.072390
Add:            24271.4     0.098937     0.098882     0.098992
Triad:          24266.7     0.098980     0.098901     0.099443
Number of Threads counted = 8
Copy:           33650.8     0.047847     0.047547     0.048005
Scale:          21404.2     0.075077     0.074752     0.075218
Add:            23610.0     0.102513     0.101652     0.102751
Triad:          23573.1     0.102490     0.101811     0.102615
Number of Threads counted = 16
Copy:           20577.7     0.077797     0.077754     0.077820
Scale:          20650.8     0.077516     0.077479     0.077553
Add:            22979.7     0.104527     0.104440     0.104783
Triad:          22986.7     0.104474     0.104408     0.104549
Number of Threads counted = 32
Copy:           20622.5     0.077814     0.077585     0.077940
Scale:          20750.1     0.077431     0.077108     0.077555
Add:            23101.8     0.104400     0.103888     0.104633
Triad:          23086.9     0.104259     0.103955     0.104543
Number of Threads counted = 64
Copy:           20561.3     0.077951     0.077816     0.078076
Scale:          20659.0     0.077575     0.077448     0.077682
Add:            22984.7     0.104681     0.104417     0.104754
Triad:          23004.1     0.104513     0.104329     0.104685
Growing old is getting old.

Heater
Posts: 17999
Joined: Tue Jul 17, 2012 3:02 pm

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 2:41 pm

No idea really.

Looks like the Ryzen's memory bandwidth is less than the M1. No matter how many threads/cores on the job.

I conclude the poor scaling on the M1 is not a result of a memory bandwidth limitation.

May be.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 7069
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 3:23 pm

Heater wrote:
Sat Apr 10, 2021 2:41 pm
No idea really.

Looks like the Ryzen's memory bandwidth is less than the M1. No matter how many threads/cores on the job.

I conclude the poor scaling on the M1 is not a result of a memory bandwidth limitation.

May be.
I did a quick web search and found
The M1's CPU is a 5nm octa-core big/little design, with four performance cores and four efficiency cores.
https://arstechnica.com/gadgets/2020/11 ... ompetitor/

This suggests the reason the M1 is not scaling well past four cores is because the other four cores are little ones. This makes one want to figure out how to set processor affinity so one can compare the speed of the little cores to the big ones.

Could it be that the four little cores will perform about the same as the Pi 4B?

lurk101
Posts: 526
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 4:23 pm

ejolson wrote:
Sat Apr 10, 2021 3:23 pm
This suggests the reason the M1 is not scaling well past four cores is because the other four cores are little ones. This makes one want to figure out how to set processor affinity so one can compare the speed of the little cores to the big ones.

Could it be that the four little cores will perform about the same as the Pi 4B?
Apple's customized BSD for M1 is an unknown, but wouldn't it favor its little cores to extend battery life? I've no idea how the M1's performance governor works. I've seen many implementations of Linux for big/little architectures that make no distinction, treating all cores as equal.

EDIT - is there a version of cat /proc/cpuinfo in BSD?
Last edited by lurk101 on Sat Apr 10, 2021 4:37 pm, edited 1 time in total.
Growing old is getting old.

ejolson
Posts: 7069
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 4:36 pm

lurk101 wrote:
Sat Apr 10, 2021 4:23 pm
ejolson wrote:
Sat Apr 10, 2021 3:23 pm
This suggests the reason the M1 is not scaling well past four cores is because the other four cores are little ones. This makes one want to figure out how to set processor affinity so one can compare the speed of the little cores to the big ones.

Could it be that the four little cores will perform about the same as the Pi 4B?
Apple's customized BSD for M1 is an unknown, but wouldn't it favor its little cores to extend battery life?
I'm pretty sure the scheduler will push a compute-bound task onto the big cores. That might even save battery compared to running for much longer on the little cores.

The laptop here reports

Code: Select all

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            7950.6     0.202614     0.201242     0.203031
Scale:           5051.5     0.316953     0.316737     0.317504
Add:             5714.7     0.420408     0.419970     0.421199
Triad:           5622.7     0.427013     0.426838     0.427204
when running on both cores with the 100M array size. Since the letters on the keyboard haven't rubbed off yet, I still consider that laptop to be new. This shows how much has changed in only a few years. Even though the BARK™ has met with production delays, maybe technological innovation has not stopped after all.

Heater
Posts: 17999
Joined: Tue Jul 17, 2012 3:02 pm

Re: Pi 4 and the Million Digit Fibonacci Challenge (Oxidized)

Sat Apr 10, 2021 5:09 pm

"BARK"?

Binary Arithmetic Random Kalculator?
Memory in C++ is a leaky abstraction .

Return to “General programming discussion”