Sat Jul 18, 2020 5:42 am
I set up a cluster once upon a time and wound up not needing it, so my experience is pretty limited. I used the corosync and pacemaker stack. I'm sure you can set up one big cluster and designate this and that subset for a given task. But the way you asked the question suggests a misunderstanding. What's the point of having one cluster if the tasks are split into two non-interacting subsets? That's basically the same thing as two separate clusters, right? On the other hand, it is true that the corosync and pacemaker stack will let you designate which tasks go to which machines and you could, if you wanted, allow machines from one subset to take over for the other subset in a failover. So I'm not sure I'm addressing your question.
As a pure aside, I struggled, stupidly as it turns out, to get drbd to work on my Pi cluster. I was using an SD card mounted through the USB ports, but it turns out that that won't work -- or at least, wouldn't back in the day. IIRC there was an issue with SD cards not playing nice with the drbd software. Would that work with hard disks or ssds? dunno. and maybe that's changed now.
If you're going for blazing speed -- a Pi cluster isn't going to get you there. So is this a load balancing or high availability cluster? I'm pretty darn sure that if you have a massive process on one machine in the cluster that fails that that process is dead -- that is, other machines will take over and let the process start all over. (Ah, I'm not familiar with minecraft, it sounds like a game.) So I'm personally skeptical about using a cluster for a compute intensive game. If a machine crashes an hour into a game (depending on how the game is written) I would expect that that game is vaporized . Though I'm not a gamer; so maybe every game is already capable of saving the state as it runs. Still, I have trouble imagining a Pi cluster as anything other than high availability. To get speed on a big computational problem you need to be able to send numbers around pretty fast. Even a gigabit ethernet has a theoretical max of around 12.5M doubles/sec and that's just not fast enough fast enough (IMHO).
My cluster was running a dns server. And I only wanted to gain experience with running a cluster. I never intended to run the Pi's in production. If DNS crashes during a lookup and it fails over, chances are you only lose whatever was happening at the exact moment of the crash.
j.