Cluster, Part 8: The Shoulders of Giants | NFS, Nomad, Docker Registry | Stardust

Cluster, Part 8: The Shoulders of Giants | NFS, Nomad, Docker Registry

Welcome back! It's been a bit of a while, but now I'm back with the next part of my cluster series. As a refresher, here's a list of all the parts in the series so far:

In this one, we're going to look at running our first job on our Nomad cluster! If you haven't read the previous posts in this series, you'll probably want to go back and read them now, as we're going to be building on the infrastructure we've setup and the groundwork we've laid in the previous posts in this series.

Before we get to that though, we need to sort out shared storage - as we don't know which node in the cluster tasks will be running on. In my case, I'll be setting up NFS. This is hardly the only solution to the issue though - other options include:

Gluster
Ceph

If you're going to choose NFS like me though, you should be warned that it's neither encrypted not authenticated. You should ensure that NFS is only run on a trusted network. If you don't have a trusted network, use the WireGuard Mesh VPN trick in part 4 of this series.

NFS: Server

Setting up a server is relatively easy. Simply install the relevant package:

sudo apt install nfs-kernel-server

....edit /etc/exports to look something like this:

/mnt/somedrive/subdirectory 10.1.2.0/24(rw,async,no_subtree_check)

/mnt/somedrive/subdirectory is the directory you'd like clients to be able to access, and 10.1.2.0/24 is the IP range that should be allowed to talk to your NFS server.

Next, open up the relevant ports in your firewall (I use UFW):

sudo ufw allow nfs

....and you're done! Pretty easy, right? Don't worry, it'll get harder later on :P

NFS: Client

The client, in theory, is relatively straightforward too. This must be done on all nodes in the cluster - except the node that's acting as the NFS server (although having the NFS server as a regular node in the cluster is probably a bad idea). First, install the relevant package:

sudo apt install nfs-common

Then, update /etc/fstab and add the following line:

10.1.2.10:/mnt/somedrive/subdirectory   /mnt/shared nfs auto,nofail,noatime,intr,tcp,bg,_netdev 0   0

Again, 10.1.2.10 is the IP of the NFS server, and /mnt/somedrive/subdirectory must match the directory exported by the server. Finally, /mnt/shared is the location that we're going to mount the directory from the NFS server to. Speaking of, we should create that directory:

sudo mkdir /mnt/shared

I have yet to properly tune the options there on both the client and the server. If I find that I have to change anything here, I'll both come back and edit this and mention it in a future post that I did.

From here, you should be able to mount the NFS share like so:

sudo mount /mnt/shared

You should see the files from the NFS server located in /mnt/shared. You should check to make sure that this auto-mounts it on boot too (that's what the auto and _netdev are supposed to do).

If you experience issues on boot (like me), you might see something like this buried in /var/log/syslog:

mount[586]: mount.nfs: Network is unreachable

....then we can quickly hack this by creating a script in the directory /etc/network/if-up.d. It should read something like this should fix the issue:

#!/usr/bin/env bash
mount /mnt/shared

Save this to /etc/network/if-up.d/cluster-shared-nfs for example, not forgetting to mark it as executable:

sudo chmod +x /etc/network/if-up.d/cluster-shared-nfs

Alternatively, there's autofs that can do this more intelligently if you prefer.

First Nomad Job: Docker Registry

Now that we've got shared storage online, it's time for the big moment. We're finally going to start our very first job on our Nomad cluster!

It's going to be a Docker registry, and in my very specific case I'm going to be marking it as insecure (gasp!) because it's only going to be accessible from the WireGuard VPN - which I figure provides the encryption and authentication for us to get started reasonably simply without jumping through too many hoops. In the future, I'll probably revisit this in a later post to tighten things up.

Tasks on a Nomad cluster take the form of a Nomad job file. These can written in JSON or HCL (Hashicorp Configuration Language). I'll be using HCL here, because it's easier to read and we're not after machine legibility yet at this stage.

Nomad job files work a little bit like Nginx config files, in that they have nested sequences of blocks in a hierarchical structure. They loosely follow the following pattern:

job > group > task

The job is the top-level block that contains everything else. tasks are the items that actually run on the cluster - e.g. a Docker container. groups are a way to logically group tasks in a job, and are not required as far as I can tell (but we'll use one here anyway just for illustrative purposes). Let's start with the job spec:

job "registry" {
    datacenters = ["dc1"]
    # The Docker registry *is* pretty important....
    priority = 80

    # If this task was a regular task, we'd use a constraint here instead & set the weight to -100
    affinity {
        attribute   = "${attr.class}"
        value       = "controller"
        weight      = 100
    }

    # .....

}

This defines a new job called registry, and it should be pretty straight forward. We don't need to worry about the datacenters definition there, because we've only got the 1 (so far?). We set a priority of 80, and get the job to prefer running on nodes with the controller class (though I observe that this hasn't actually made much of a difference to Nomad's scheduling algorithm at all).

Let's move on to the real meat of the job file: the task definition!

group "main" {
    task "registry" {
        driver = "docker"

        config {
            image = "registry:2"
            labels { group = "registry" }

            volumes = [
                "/mnt/shared/registry:/var/lib/registry"
            ]

            port_map {
                registry = 5000
            }
        }

        resources {
            network {
                port "registry" {
                    static = 5000
                }
            }
        }

        # .......
    }
}

There's quite a bit to unpack here. The task itself uses the Docker driver, which tells Nomad to run a Docker container.

In the config block, we define the Docker driver-specific settings. The docker image we're going to run is registry:2 where registry is the image name, and 2 is the tag. This will to automatically pulled from the Docker hub. Future tasks will pull docker images from our very own private Docker registry, which we're in the process of setting up :D

We also mount a directory into the Docker container to allow it to persist the images that we push to it. This is done through a volume, which is the Docker word for bind-mounting a specific directory on the host system into a given location inside the guest container. For me I'm (currently) going to store the Docker registry data at /mnt/shared/registry - you should update this if you want to store it elsewhere. Remember this this needs to be a location on your shared storage, as we don't know which node in the cluster the Docker registry is going to run on in advance.

The port_map allows us to tell Nomad the port(s) that our service inside the Docker container listens on, and attach a logical name to them. We can then expose them in the resources block. In this specific case, I'm forcing Nomad to statically allocate port 5000 on the host system to point to port 5000 inside the container, for reasons that will become apparent later. This is done with the static keyword there. If we didn't do this, Nomad would allocate a random port number (which is normally what we'd want, because then we can run lots of copies of the same thing at the same time on the same host).

The last block we need to add to complete the job spec file is the service block. with a service block, Nomad will inform Consul that a new service is running, which will then in turn allow us to query it via DNS.

service {
    name = "${TASK}"
    tags = [ "infrastructure" ]

    address_mode = "host"
    port = "registry"
    check {
        type        = "tcp"
        port        = "registry"
        interval    = "10s"
        timeout     = "3s"
    }

}

The service name here is pulled from the name of the task. We tell Consul about the port number by specifying the logical name we assigned to it earlier.

Finally, we add a health check, to allow Consul to keep an eye on the health of our Docker registry for us. This will appear as a green tick if all is well in the web interface, which we'll be getting to in a future post. The health check in question simply ensures that the Docker registry is listening via TCP on the port it should be.

Here's the completed job file:

job "registry" {
    datacenters = ["dc1"]
    # The Docker registry *is* pretty important....
    priority = 80

    # If this task was a regular task, we'd use a constraint here instead & set the weight to -100
    affinity {
        attribute   = "${attr.class}"
        value       = "controller"
        weight      = 100
    }

    group "main" {

        task "registry" {
            driver = "docker"

            config {
                image = "registry:2"
                labels { group = "registry" }

                volumes = [
                    "/mnt/shared/registry:/var/lib/registry"
                ]

                port_map {
                    registry = 5000
                }
            }

            resources {
                network {
                    port "registry" {
                        static = 5000
                    }
                }
            }

            service {
                name = "${TASK}"
                tags = [ "infrastructure" ]

                address_mode = "host"
                port = "registry"
                check {
                    type        = "tcp"
                    port        = "registry"
                    interval    = "10s"
                    timeout     = "3s"
                }

            }
        }

        // task "registry-web" {
        //  driver = "docker"
        // 
        //  config {
        //      // We're going to have to build our own - the Docker image on the Docker Hub is amd64 only :-/
        //      // See https://github.com/Joxit/docker-registry-ui
        //      image = ""
        //  }
        // }
    }
}

Save this to a file, and then run it on the cluster like so:

nomad job run path/to/job/file.nomad

I'm as of yet unsure as to whether Nomad needs the file to persist on disk to avoid it getting confused - so it's probably best to keep your job files in a permanent place on disk to avoid issues.

Give Nomad to start the job, and then you can check on it's status like so:

nomad job status

This will print a summary of the status of all jobs on the cluster. To get detailed information about our new job, do this:

nomad job status registry

It should show that 1 task is running, like this:

ID            = registry
Name          = registry
Submit Date   = 2020-04-26T01:23:37+01:00
Type          = service
Priority      = 80
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
main        0       0         1        5       6         1

Latest Deployment
ID          = ZZZZZZZZ
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
main        1        1       1        0          2020-06-17T22:03:58+01:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created   Modified
XXXXXXXX  YYYYYYYY  main        4        run      running  6d2h ago  2d23h ago

Ignore the Failed, Complete, and Lost there in my output - I ran into some snags while learning the system and setting mine up :P

You should also be able to resolve the IP of your Docker registry via DNS:

dig +short registry.service.mooncarrot.space

mooncarrot.space is the root domain I've bought for my cluster. I highly recommend you do the same if you haven't already. Consul exposes all services under the service subdomain, so in the future you should be able to resolve the IP of all your services in the same way: service_name.service.DOMAIN_ROOT.

Take care to ensure that it's showing the right IP address here. In my case, it should be the IP address of the wgoverlay network interface. If it's showing the wrong IP address, you may need to carefully check the configuration of both Nomad and Consul. Specifically, start by checking the network_interface setting in the client block of your Nomad worker nodes from part 7 of this series.

Conclusion

We're getting there, slowly but surely. Today we've setup shared storage with NFS, and started our first Nomad job. In doing so, we've started to kick the tyres of everything we've installed so far:

wesher, our WireGuard Mesh VPN
Unbound, our DNS server
Consul, our service discovery superglue
Nomad, our task scheduler

Truly, we are standing on the shoulders of giants: a whole host of open-source software that thousands of people from across the globe have collaborated together to produce which makes this all possible.

Moving forwards, we're going to be putting that Docker registry to good use. More immediately, we're going to be setting up Fabio (who's documentation is only marginally better than Traefik's, but just good enough that I could figure out how to use it....) in order to take a peek at those cool web interfaces for Nomad and Consul that I keep talking about.

We're also going to be looking at setting up Vault for secret (and certificate, if all goes well) management.

Until then, happy cluster configuration! If you're confused about anything so far, please leave a comment below. If you've got a suggestion to make it even better, please comment also! I'd love to know.

Sources and further reading

Alternatives to NFS
- Gluster
- Ceph
/etc/network/if-up.d/ NFS automount trick
autofs
Docker Registry
- Deploy a registry server
Nomad
- Docker driver

Type this	To get this	Notes
`bold text`	bold text	-
`_italics text_`	italics text	-
`~~deleted text~~`	~~deleted~~	-
`code text`	`code text`	Inserts some monospaced code. It is preferred that large blocks of code are linked to using a service such as Pastebin, Github Gists or Ideone.
`> Quote`	Quote	-
`[display text](//google.com)`	display text	Inserts a hyperlink. Please use responsibly. `[rel=nofollow]` is in use and spam will be deleted.
`---`		Inserts a horizontal line. The previous line must be blank.

Stardust
Blog