Raspberry-Pi High Availability cluster.


8 posts
by zardoz99 » Sat Jun 16, 2012 7:41 pm
Just to prove it can be done, I have configured a pair of Raspis as a High Availability cluster using corosync and pacemaker, with a little help from the "pcs" tool from https://github.com/feist/pcs
An announcement and example of the utility can be found at http://www.gossamer-threads.com/lists/l ... aker/80072

Here's a sample output of it running with a shared IP address as the only HA resource.

Node 1.

Code: Select all
[root@raspberry1 pcs]# pcs status
Cluster Status:
 Last updated: Sat Jun 16 20:27:16 2012
 Last change: Sat Jun 16 20:27:08 2012 via cibadmin on raspberry2
 Stack: corosync
 Current DC: raspberry1 (1) - partition with quorum
 Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff
 2 Nodes configured, unknown expected votes
 1 Resources configured.

Resources:
 ClusterIP (ocf::heartbeat:IPaddr2) - Started raspberry1

Pacemaker Nodes:
 Online: raspberry1 raspberry2
[root@raspberry1 pcs]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether b8:27:eb:ae:17:38 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.99/32 brd 192.168.1.99 scope global eth0
    inet6 2001:8b0:14e:1:ba27:ebff:feae:1738/64 scope global dynamic
       valid_lft 86396sec preferred_lft 14396sec
    inet6 fe80::ba27:ebff:feae:1738/64 scope link
       valid_lft forever preferred_lft forever
[root@raspberry1 pcs]#


Node 2

Code: Select all
[root@raspberry2 pcs]# pcs resource create ClusterIP IPaddr2 ip=192.168.1.99 cidr_netmask=32 op monitor interval=30s
[root@raspberry2 pcs]# pcs status
Cluster Status:
 Last updated: Sat Jun 16 20:28:30 2012
 Last change: Sat Jun 16 20:27:08 2012 via cibadmin on raspberry2
 Stack: corosync
 Current DC: raspberry1 (1) - partition with quorum
 Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff
 2 Nodes configured, unknown expected votes
 1 Resources configured.

Resources:
 ClusterIP (ocf::heartbeat:IPaddr2) - Started raspberry1

Pacemaker Nodes:
 Online: raspberry1 raspberry2
[root@raspberry2 pcs]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether b8:27:eb:b7:02:92 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.11/24 brd 192.168.1.255 scope global eth0
    inet6 2001:8b0:14e:1:ba27:ebff:feb7:292/64 scope global dynamic
       valid_lft 86393sec preferred_lft 14393sec
    inet6 fe80::ba27:ebff:feb7:292/64 scope link
       valid_lft forever preferred_lft forever
[root@raspberry2 pcs]#


Z
User avatar
Posts: 137
Joined: Fri Jan 13, 2012 2:25 pm
by SN » Sat Jun 16, 2012 9:23 pm
Now this I like. I may pressgang R3 into service alongside R2 running my website. . .

I notice you are a Red Hat person. I have these two products running on RHEL in my 'day job' ;)
Steve N – binatone mk4->intellivision->zx81->spectrum->cbm64->cpc6128->520stfm->pc->raspi ?
User avatar
Posts: 1008
Joined: Mon Feb 13, 2012 8:06 pm
Location: Romiley, UK
by zardoz99 » Sat Jun 16, 2012 11:35 pm
I don't work for RH but I am a "Certified Engineer" installing, configuring and supporting big HPC clusters for a variety of academic and industrial users. This High Availability stuff is vital, as a single point of failure can result in literally months of work being lost. Managed to get fencing working as well now using the "external/ssh" stonith method. Nice :-)

Here's my config.

Code: Select all
root@raspberry1 ~]# crm configure show
node $id="1" raspberry1
node $id="2" raspberry2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
   params ip="192.168.1.99" cidr_netmask="32" \
   op monitor interval="30s" \
   meta target-role="Started"
primitive st-ssh stonith:external/ssh \
   params hostlist="raspberry1 raspberry2"
clone fencing st-ssh
property $id="cib-bootstrap-options" \
   dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \
   cluster-infrastructure="corosync" \
   stonith-enabled="true" \
   no-quorum-policy="ignore" \
   expected-quorum-votes="2" \
   default-resource-stickiness="5000" \
   maintenance-mode="false"
[root@raspberry1 ~]#


Z.
User avatar
Posts: 137
Joined: Fri Jan 13, 2012 2:25 pm
by ydev » Mon Sep 10, 2012 7:17 pm
Hi,

I'm a masters student at the University of Utah and I wanted to use Raspberry Pi for my masters project.

So the idea I have is to cluster many pi boards and use it to build a highly available, low-cost cloud like cluster.

So what are the options when it comes to clustering more then 2 or 4 such boards.
Posts: 1
Joined: Sun Sep 09, 2012 7:10 am
by yv1hx » Wed Jan 02, 2013 1:56 am
ydev wrote:Hi,

I'm a masters student at the University of Utah and I wanted to use Raspberry Pi for my masters project.

So the idea I have is to cluster many pi boards and use it to build a highly available, low-cost cloud like cluster.

So what are the options when it comes to clustering more then 2 or 4 such boards.


ydev: You should have a look here: http://www.raspberrypi.org/phpBB3/viewtopic.php?f=41&t=8740 :roll:
Marco-Luis
http://www.meteoven.org
http://yv1hx.no-ip.org
http://twitter.com/meteoven
Posts: 173
Joined: Sat Jul 21, 2012 10:09 pm
Location: Venezuela
by ChristopheDupriez » Sun Jan 27, 2013 5:42 pm
I think a Bramble of PIs would be a very nice way to teach / experiment High Availability and Scalability issues in our Computer Science schools. I would be very happy to learn about other teachers trying to organise a corkboard (or other board) with this kind of experiment for their class. For a LAMP system, I imagine:
2 PIs + hard disk for MySQL
2 PIs for application
2 PIs for firewall / load balancing
2 PIs for monitoring and control (+ their 2 HDMI TVs)
Some PIs to generate load for measuring performance in different setup/config.

Thanks! Christophe
Posts: 5
Joined: Tue Dec 25, 2012 8:38 am
by remsnet » Tue Apr 16, 2013 11:04 am
zardoz99 wrote:
....

Z


Well , I Confirm as well ..



corosync , pacemaker, heartbeat , ldirectord ,crm, openais , stoneit and more
All that comes with the raspberry Distros allready .
Weezly debian / FC 18 / OpenSuse tested past 2 Month - worked all.

Just the IP_VS kernel support was an issue untill Myself recompiled for that.
I use latest own kernel 3.8.7 Compiled : http://www.raspberrypi.org/phpBB3/viewtopic.php?f=87&t=40664

docu :
basics : http://clusterlabs.org/wiki/Initial_Configuration
aktive/passive cluster - http://zeldor.biz/2010/12/activepassive ... -corosync/
mysql master/slaves - http://clusterlabs.org/wiki/Load_Balanc ... ed_Cluster
mysql cluster - http://www.mysqlperformanceblog.com/201 ... b-cluster/
drbd on PI : http://blogs.linbit.com/p/406/raspberry-tau-cluster/
ldirectord setup : http://oss.clusterlabs.org/pipermail/pa ... 05816.html

My LB ´s Simple Setup:

- 2 rasperrys Type B used
- i startup the crm by heartbeat
- i have configured heartbeat using USB serial with 19k2 ( more may probably load the cpu more .. ) as major HA LINK
- secondly . I use old 2 USB ethernet 10/100 sticks + crossover ethernet cable as second HA link
this link been used by CRM and HeartBeat v3
Both Links may help to get rid for an Unknown cluster state.
- crm manage ldirectord and Virtual IP, cman , pacemaker , fencing and more.
- This setup require at least 1200mA Power to power both devices and the PI ( need more ? dev add Powerd Hub )
- i have an old used apc powerswitch to for stoneit to shoot( poweroff/reset ) the not working PI if required via snmp

Code: Select all
 crm configure primitive MAILIP ocf:heartbeat:IPaddr2 params ip="192.168.1.31" nic="eth0" \
 op monitor interval="10s" meta is-managed="true"
 crm configure primitive SQLIP ocf:heartbeat:IPaddr2 params ip="192.168.1.42" nic="eth0" \
 op monitor interval="10s" meta is-managed="true"
 crm configure primitive WEBIP ocf:heartbeat:IPaddr2 params ip="192.168.1.40" nic="eth0" \
  op monitor interval="10s" mea is-managed="true"


Code: Select all
 crm configure primitive ldirectord ocf:heartbeat:ldirectord  params configfile="/etc/ha.d/ldirectord.cf" \
     op monitor interval="2m" timeout="20s" op start interval="0" timeout="90s" \
    op stop interval="0" timeout="100s" meta target-role="Started"

You may need to create an Group at least ONE IP and the ldirectord together to get the Failover to work.

Code: Select all
 crm configure group LVS  WEBIP ldirectord  meta target-role="Started"


The Load Balancing been then managed by the ldirectord same way as the HeartBeat v2 samples on the Internet.

Code: Select all
  crm configure primitive mgmd ocf:heartbeat:anything \
        params binfile="/usr/local/bin/fake_ndb_mgmd" pidfile="/var/run/heartbeat/fake_ndb_mgmd.pid"

 crm configure primitive ndbcluster ocf:heartbeat:anything \
        params binfile="/usr/local/bin/fake_ndb_cluster_start" \
        pidfile="/var/run/heartbeat/fake_ndb_cluster_start.pid"

 crm configure primitive ndbd1-IP ocf:heartbeat:anything \
        params binfile="/usr/local/bin/fake_ndbd" pidfile="/var/run/heartbeat/fake_ndbd.pid"

 crm configure    clone ndbdclone ndbd1-IP \
        meta globally-unique="false" clone-max="2" clone-node-max="1"

 crm configure location loc-1 mgmd inf: NDB_MGMD_IP
 crm configure location loc-2 ndbcluster inf: NDB_MGMD_IP
 crm configure location loc-3 ndbdclone inf: dbndb1
 crm configure location loc-4 ndbdclone inf: dbndb2

 crm configure xml <rsc_order id="order-1"> \
        <resource_set id="ordered-set-1" sequential="true"> \
                <resource_ref id="mgmd"/> \
                <resource_ref id="ndbdclone"/> \
                <resource_ref id="ndbcluster"/> \
        </resource_set> \
</rsc_order>


Last edited by remsnet on Tue Apr 16, 2013 4:06 pm, edited 9 times in total.
Posts: 151
Joined: Wed Dec 19, 2012 7:32 pm
Location: Planet Gaia
by remsnet » Tue Apr 16, 2013 11:11 am
ChristopheDupriez wrote:I
..
Thanks! Christophe


Well Christophe,

mysql Cluster 7.1 works on PI Model B i.e.

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @192.168.2.91 (mysql-5.5.29 ndb-7.2.10, Nodegroup: 0)
id=4 @192.168.2.92 (mysql-5.5.29 ndb-7.2.10, Nodegroup: 0, Master)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.2.92 (mysql-5.5.29 ndb-7.2.10)

[mysqld(API)] 5 node(s)
id=50 @192.168.2.91 (mysql-5.5.29 ndb-7.2.10)
id=51 @192.168.2.92 (mysql-5.5.29 ndb-7.2.10)
id=52 (not connected, accepting connect from any host)
id=53 (not connected, accepting connect from any host)
id=54 (not connected, accepting connect from any host)

# uname -a
Linux dbndb1 3.8.4+ #406 PREEMPT Mon Apr 8 21:59:50 BST 2013 armv6l GNU/Linux

# w
13:13:20 up 6 days, 14:12, 1 user,
load average: 4,88, 2,29, 0,92




ToGether with an Corosync + ldirectord the load may not an problem

regards.
Posts: 151
Joined: Wed Dec 19, 2012 7:32 pm
Location: Planet Gaia