Cluster, Part 2: Grand Designs
In the last part of this series, I talked about my plans for building an ARM-based cluster, because I'm growing out of the Raspberry Pi 3B+ I currently have at home. Since then, I have decided to focus on the compute cluster first, as I have a reasonable amount of room left on the 1tB WD Pidrive I have attached to my existing Raspberry Pi 3B+.
To this end, I have been busy ordering parts and organising things to get construction of the compute cluster side of things going. The most important part of the whole cluster is the compute boards themselves. I've decided to go with 4 x Raspberry Pi 4s with 4GB RAM each for the worker nodes, and 1 x Raspberry Pi 4 with 2GB of RAM as the controller (it would have been a 1GB RAM model, but a recent announcement changed my mind :D):
(Above: The Raspberry Pi 4s I'm going to be using. The colourful heatsink cases there are to dissipate heat passively if possible and reduce the need for the fan to run as often. The one with the smaller red heatsink is the controller node - I don't anticipate the load on that node being high enough to need a bigger more expensive heatsink)
My reasoning for Raspberry Pis is software support. They are hugely popular - and from experience I can tell that they are pretty well supported on the software side of things. Issues with hardware features not being supported by the operating system are minimal - and where issues do arise they are more often than not sorted out. Regular kernel security updates are also provided - something that isn't always a thing with Linux distributions for other boards I've noticed.
Although the nodes in the cluster are very important, they are far from the only component I'll need. I'll also need a way to power it - which I've settled on an using a desktop ATX power supply (generously donated by University).
(Above: The ATX power supply, with a few wires cut and other bits and bobs attached. As of this blog post I'm in the middle of wiring it up, so I haven't finished it yet)
This adds some additional complications though, because wiring an ATX power supply up to a fleet of Raspberry Pi 4s isn't as easy as it sounds. To do that, I've decided to wire the 5V and ground wires up to 5 USB type-a breakout boards, with a 3 amp self-resettable fuse on each live (red) wire. Then I can use 5 short type-a to type-c converter cables to power the Raspberry Pi 4s.
(Above: The extra bits and bobs laid out that I'll be using to wire the ATX power supply up to the USB type-a breakout boards. From left to right: 3A self-resettable fuses, 18 AWG wire, Wagos, header pins, and finally the USB type-a breakout boards themselves)
With power to the Raspberry Pis, the core compute hardware is in place. I still need a bunch of things around the edges though, such as a (very quiet) fan to keep it cool:
(Above: A Noctua NF-P14s redux-1200)
I found this particular fan on quietpc.com. While their prices and shipping are somewhat expensive (I didn't actually buy it from there - I got a better deal on Amazon instead), they are a great place to look into the different options available for really quiet fans. I'm pretty sensitive to noise, so having a quiet fan is an important part of my cluster design.
This one is the large 14cm model, so that it fits in front of all 5 Raspberry Pis if they are stood up on their sides and stacked horizontally. It takes 12 volts, so I'll be connecting it to the 12V rail from the ATX power supply. The fan speed is also controllable via PWM (pulse-width modulation), so I plan on using an Arduino (probably one of the Arduino Unos I've got lying around) to control it and present a serial interface or something to the Raspberry Pi that's acting as the controller node in the cluster.
Lastly, another extremely important part of any cluster is a solid switch. Without a great switch at the base of the network, you'll have all sorts of connection issues and the performance of the cluster will be degraded significantly. I'm anticipating that I'll want to transfer significant amounts of data around very quickly (e.g. Docker container images, and later large blocks of data during a storage cluster rebalance).
For this reason, I've bought myself a Netgear GS116v2. While its unmanaged, I can't currently afford a more expensive managed switch at this time. It is however gigabit and also has an array of other features such as energy efficient ethernet (802.3az), full duplex gigabit (i.e. 32GB bandwidth available to all ports, which is enough for all ports to be transmitting and receiving gigabit at the same time), and a silent fanless design.
(Above: The switch I'll be using. I watched eBay and got it used for much less than it's available new)
Hardware isn't the only thing I've been thinking about. While I've been waiting for packages to arrive, I've also been planning out the software I'm going to use and how I'm going to network all my Pis together.
My plans on the networking side of things are subject to significant change depending on how many responsibilities I can convince my home router to give up, but I have drawn up a network diagram showing what I'm currently aiming towards:
The cluster is represented on the left half of the diagram. This will probably entail some considerable persuasion of my router to pull off, but a quick look reveals that it's (probably) possible with some trial-and-error.
The idea is that I have a separate subnet for the cluster than the rest of the home network. Then I can do strange stuff and fiddle with it (hopefully) without affecting everyone else on the network.
Meanwhile, out of all the different aspects of building this cluster I've got the clearest picture as to the software I'm going to be using.
I've decided that I'm going to use a container-based system. I've looked at a number of different options (such as podman and Singularity) - but I'm currently of the opinion that Docker is the most suitable option for what I'm going for. It's not as enterprisey as Singularity, and it seems to be more mature than podman. It also has a huge library of prebuilt containers too - but for learning purposes I'm going to be writing almost all my container scripts from scratch - probably using some sort of Alpine Linux container as a base. If I ever run into a situation where Docker isn't suitable and I need something closer to a VM, I'll probably use LXC, which I believe sits on top of the same underlying container runtime that Docker does.
I'm anticipating that container-based tech is going to be great for managing the stuff that's running on my cluster - so you can expect more posts that go into some depth about how it all works and how I'm setting my system up in the future.
To complement my container-based tech, I'm also going to be using a workload orchestrator. The Viper High-Performance Computer I've recently gained access to has lots of nodes in it and uses Slurm for workload orchestration, but that seems more geared towards environments that have lots of jobs that each have a defined running time. Great for scientific simulations and other such things, but not so great for personal self-hosted applications and the like.
Instead, I'm probably going to use Nomad. It looks seriously cool, and an initial look at the documentation reveals that it's probably going to be much simpler easier to understand than Kubernetes (see also), which seems to be the other competing software in the business. It also seems to integrate well with other programs done by the same company (Hashicorp) like Consul for service networking management (I'm hoping I can get DNS resolution for the services running on the cluster under control with it) and Vault for secret management (e.g. API keys, passwords, and other miscellaneous secrets) - all of which I'm going to install and experiment with (expect more on that soon).
All of those for now will be backed by an NFS share on all nodes in the cluster for the persistent volumes attached to running containers.
On the controller node I mentioned earlier I'm also going to be running a few extra items to aid in the management of the cluster:
- A Docker registry, from which the worker nodes will be pulling containers for execution (worker nodes will not have access to the public Docker registry at hub.docker.com)
- An apt caching proxy - probably
apt-cacher-ng. Since all the nodes in the cluster are going to be using the same OS, have the same packages installed, and the same configuration settings etc, it doesn't make much sense for them to be downloading apt packages from the Internet every time - so I'll be caching them locally on the controller node
- Potentially some sort of reverse proxy that sits in front of all the services running on the cluster, but I haven't decided on how this will fit into the larger puzzle just yet (more research is required). I'm already very familiar with Nginx, but I've seen Traefik recommended for dynamic container-based setups, so I'm going to investigate that too.
That about covers my high-level design ideas. As of the time of typing, the next thing I need to do is organise a case for it all to go in, fix the loose connections in the screw terminals (not pictured; they arrived after I took the pictures), and then find a place to put it....