High Availability (or clustering in windows parlance) is a non-starter for the Pi as the cost of the supporting harware infrastructure will not be worth it. For true HA (and i've worked on many), you will need at least a heartbeat network (preferably low-level), shared drives (probably over iscsi if you can get it to work), and at least three nodes to provide a 'quorum'.atmosteam wrote:Hello,
I am designing a system which has a Raspberry, but I want redundancy in case of failure. If it is possible, I want active redundancy which means that if one Raspberry fails, then the other one is turned on and debugs by itself the corresponding proggram.
I ask you if you know whether it is possible or not to implement something like this working automatically.
I want to know if it is possible to have the non-used raspberry, not powered off, but in an idle mode with low consumption without working and wake it up with an input signal.
Fair enough, but everything I've said was taught at the week long clustering course I took at Veritas in Reading a few (many) years ago, and was still true for the windows cluster that I configured today.Heater wrote:asandford,
I'm going to disagree with everything you have said there. Just for the sake of argument, no hard feeling
A funny story:The acid test is to pull the mains lead(s) from your production server...
Code: Select all
try code except ---- more code
Your application can only be down for a maximum of 32 seconds per year?atmosteam wrote: This redundancy is not a super redundancy that makes the system to be operative at 99.99999% of the time, it is very simple.
It's a pretty standard way of working - WFCS for example is only designed to protect from node failure, Availability groups does the shared nothing. And even where you have shared nothing and replication to keep the disks in sync, you can corrupt your data and replicate the corruption. There are ways to mitigate that, for example by having a delay on the replication and each extra system you put in place to cover yourself must be weighed against your actual business requirement to understand if it's worthwhile. People frequently go in for overkill, when in fact in reality they can withstand a couple of hours downtime while a well rehearsed recovery and restore takes place from backups, actually a quite easy process if you have VM image backups and systems with frequent backups such as transaction log backups - and you can recover to the point before the fail / error. In practice it's a rare thing to actually need, but does happen, like fire insurance.Heater wrote:.
Shared disks sounds like a disaster waiting to happen. Once a node has gone faulty and written crap to the disk it can then die, or be killed, and the next node can inherit the crap data. Brilliant!
I had one of those 25 years ago.adlambert wrote: Here's your horror scenario:
Employee with too much access deletes important data at 9:30AM and either doesn't realise or is too scared to say anything.
Business continues to operate and millions of transactions go through the system, each transaction would have had its value influenced by the previously deleted data.
The problem is detected at 5PM (it's a Friday of course).
It also works well for application failures (crash, not garbage data).DougieLawson wrote: HA works well for hardware problems. It's next to useless for software errors and about as useful as an ashtray on a motorbike for human errors.
Long day, CBA to split yor post, so here we go:Heater wrote:asandford,
Yes that went very wrong. Not just in the formatting Let's see:
I'm sure Google, Facebook, and co. know about VCS, Windows Clustering and such. However I think you will find the don't run their businesses on Windows. Any evidence to the contrary?
We are indeed talking about fault tolerant systems. It's right there in the opening post.
Fault tolerant systems need not cost millions. It all depends on the functionality of the system, performance and level of reliability required. You could build a fault tolerant, multiply redundant LED flasher with four Arduinos! A network of four Rasperry Pis with will let you build a fault tolerant data store. I have one running etcd. You can build large scale fault tolerant databases with cheap PCs and open source databases. You can do all that on systems distributed around the world for a few dollars a month using the services of Google, AWS etc.
Yes one can predict component failure and take pre-emptive action. You are crazy to rely on that as it defies Murhy's Law!
Depending on your set up there may well be a stall when a node goes down. Some database systems have a "master" and a bunch of "slaves", perhaps all writes go through the master first, if that goes down it may take a while for the cluster to elect a new master.
If you have multiple nodes and hence copies of your data, what is the point of sharing a disk between them? Please elaborate.
For sure corrupt data on disc is not solely an application fault. There is plenty of hardware between app and disk that can fail and corrupt stuff silently.
The famous paper on the Byzantine Generals problem is here [url]http://research.microsoft.com/en-us/um/ ... yz.pdf[url]
Three nodes may well provide consensus most of the time. It does however have failure modes that can confuse it and cause it never to reach consensus.