My Bramble is up and running!


14 posts
by diereinegier » Fri Jan 11, 2013 11:11 pm
Today I got my little cluster of 4 Raspberries up and running and gave some MPI programs a go:
Code: Select all
georg@crumb0 /net/crumb0/home/georg/src/mpitutorial $ time ./apple_serial 4096
RE_START =                                       -2
RE_STOP  =                                        2
IM_START =                                        0
IM_STOP  =                                        2
XPIX     =                                     2048
YPIX     =                                     1024
MAXITER  =                                     4096
    1023
real    1m2.730s
user    1m1.910s
sys     0m0.250s
georg@crumb0 /net/crumb0/home/georg/src/mpitutorial $ time mpiexec -H crumb0,crumb1,crumb2,crumb3  apple_mpi_nblock.2 4096
RE_START =                                       -2
RE_STOP  =                                        2
IM_START =                                        0
IM_STOP  =                                        2
XPIX     =                                     2048
YPIX     =                                     1024
MAXITER  =                                     4096
MPI_Wtime is global
[...]
real    0m18.129s
user    0m16.790s
sys     0m0.840s

For this particular case there is nearly perfect scaling: one node takes 63 seconds to compute while four nodes need about 18 seconds.

Check out my MPI tutorial at https://github.com/GeorgBisseling/MPI-Tutorial/blob/master/tutorial.pdf?raw=true with full source code at https://github.com/GeorgBisseling/MPI-Tutorial/blob/master/mpitutorial.tar.gz?raw=true.

[mod edit: Links changed at the request of the OP]

DISCLAIMER: If you want to do fast computations, then do not even consider the Raspberry Pi! Clustering them is just done for fun and to learn about the problems and pitfalls of distributed memory computing.You couldn't do that with lower energy and space requirements, I guess.
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Sat Jan 12, 2013 1:58 pm
Image

Image

The box is stackable and contains not only the raspberries (lower right) but the power supplies (upper left), a 7-port USB-Hub (lower left) and a 5-port Ethernet-Switch (upper right).

Only a 230V power chord and a network cable leave the box.
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Thu Jan 24, 2013 2:05 pm
I took a nbody sample code http://www.ids.ias.edu/~piet/act/comp/algorithms/starter/index.html and parallelized it using MPI.

Making this scale over 4 Raspberries was much harder than for a Mandelbrodt code or cpi.c.
For say 50 bodies the serial program was still faster than the parallel one. I tried again for 1000 bodies.
Serial short of 42 minutes, parallel on 4 nodes short of 15 minutes. This is by no means perfect, but acceptable. I must recheck for about 100 bodies, because 1000 bodies are computed so slowly regardless of the number of Raspberries involved.

The reason is for a start that 100Mbit Ethernet is too slow and evenmore that handling the network causes far too much CPU load on the Raspberry. For the program parts that have linear computational complexity in n I do not even think of parallelizing because computing the values is faster than getting them from the other nodes. Only the computation of acceleration, jerk, potential energy and collision time estimation is parallelized. If anyone is interested I will post the code.
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by Lob0426 » Thu Jan 24, 2013 9:03 pm
diereinegier wrote:Image

Image

The box is stackable and contains not only the raspberries (lower right) but the power supplies (upper left), a 7-port USB-Hub (lower left) and a 5-port Ethernet-Switch (upper right).

Only a 230V power chord and a network cable leave the box.


The pictures are not showing in this post.
512MB version 2.0 as WordPress Server
Motorola Lapdock with 512MB
Modded Rev 1.0 with pin headers at USB

http://rich1.dyndns.tv/
(RS)Allied ships old stock to reward its Customers for long wait!
User avatar
Posts: 1934
Joined: Fri Aug 05, 2011 4:30 pm
Location: Susanville CA.
by diereinegier » Thu Jan 24, 2013 10:40 pm
For me they show in Opera, Firefox and Chrome.

What Browser do you use?
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Fri Jan 25, 2013 8:33 am
Some scaling measurements. Enjoy!
Attachments
nbody_mpi_times.png
Scaling of nbody_mpi
nbody_mpi_times.png (20.93 KiB) Viewed 4343 times
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Sat Jan 26, 2013 4:06 pm
The combined original sources and the parallelized one:
Attachments
nbody_mpi_2013-01-26.tar.gz
Enjoy!
(12.4 KiB) Downloaded 66 times
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Sat Jan 26, 2013 10:46 pm
diereinegier wrote:Some scaling measurements. Enjoy!


Unfortunately there was a typo in the spread sheet: runtime for 1000 bodies on 4 nodes was too good to be true.
Attachments
nbody_mpi_times_corrected.png
corrected version
nbody_mpi_times_corrected.png (20.55 KiB) Viewed 4215 times
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Sat Jan 26, 2013 10:55 pm
The integration scheme used in this code is called Hermite Integration and I did not quite get the point compared to other methods. The algorithm seems to be very involved. I guess that something simpler like leapfrog with softening would better fit the Raspberries.

Maybe I will come up with a simpler integration algorithm.

Meanwhile consult the fine material presented here:
http://www.artcompsci.org/kali/development.html

If you like to play around with some bigger systems on your quad core windows desktop machine I ported some of the Kali code to a C# project targeted to the cost-free Visual Studio Express 2012 (or the bigger versions of course). Some time critical computations were moved back to C:
https://github.com/GeorgBisseling/nbodysketch

That even contains a very simple player to watch your simulation in motion.
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Wed Jan 30, 2013 8:41 pm
I published the code on GitHub:
https://github.com/GeorgBisseling/nbody_mpi
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Fri Feb 08, 2013 9:19 pm
I was not entirely happy with my cluster setup: the thick and stiff Ethernet cables exerted very much of a force onto the raspberries and the USB hub together with all the very long cables just to bring the juice seemed a little overdone.

I found some nice flat Ethernet cables that do not cause errors or retransmissions at 100 Mbit.

Then I cut two USB-A-to-USB-A cables, got rid off shielding an data wires and connected them to the hub's PSU. Of course after having tested that "back powering" is stable even when overclocking to 900 MHz.

The result is much less of a mess. Even the camera had an easier time to focus, or so it seems.
Attachments
CroppedSmall.jpg
Clean Room
CroppedSmall.jpg (63.69 KiB) Viewed 3883 times
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by diereinegier » Sat Feb 23, 2013 10:51 pm
OMG, mentioning the MPI tutorial resulted in so much web traffic on my home page that I had to pay extra fees!

Nice to see that so many people are interested in MPI on the RasPi.

But - must - get - better - contract!

Please use this URL if you can:
https://github.com/GeorgBisseling/MPI-Tutorial
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany
by TravelinMax » Fri Aug 09, 2013 8:55 pm
Could you provide details on the USB hub you used to power the Pi's? I'm trying to avoid buying a phone charger for each one and would like to avoid soldering if possible because it will delay my project significantly.

Thanks!
Posts: 22
Joined: Mon Nov 26, 2012 3:46 am
Location: Michigan, USA
by diereinegier » Fri Aug 09, 2013 9:10 pm
As you can see in the last picture I now use two split USB-A-to-USB-A-Cables to distribute the power from a 3.5A power supply that came together with an otherwise unusable USB-Hub from "Logilink".

In the beginning I used a USB-Hub to distribute the power but that gave my too many cables in the box and the USB-Hub was not really used for any other purpose than the power distribution - which was a waste.

Please have a look into the non-working USB-Hubs at eLinux.org where I flagged the non-working Logilink UA0124 USB-Hub as having this beefy power supply.

cheers
Georg
Download my repositories at https://github.com/GeorgBisseling
User avatar
Posts: 145
Joined: Sun Dec 30, 2012 5:45 pm
Location: Bonn, Germany