Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Saving power in Linux Systems

Hey there! It's an impromptu blog post. Originally I wrote this in response to this Reddit post, but it got rather longer than I anticipated and I ended up expanding on it just a teensy bit more and turning into this blog post.

Saving power in a Linux system can be necessary for a number of reasons, from reducing one's electricity bill to extending battery life.

There are a number of different factors to consider to reduce power usage, which I'll be talking about in this blog post. I will be assuming a headless Linux server for the purposes of this blog post, but these suggestions can be applicable to other systems too (if there's the demand I may write a follow up specifically about Arduino and ESP-based systems, as there are a number of tricks that can be applied there that don't work the same way for a full Linux system).

Of course, power usage is highly situationally dependant, and it's all about trade-offs: less convenience, increased complexity, and so on. The suggestions below are suggestions and rules of thumb that may or may not be applicable to your specific situation.

Hardware: Older hardware is less power efficient than newer hardware. So while using that 10yr old desktop as a server sounds like a great idea to reduce upfront costs, if your electricity is expensive it might be more cost-effective to buy a newer machine such as an Intel NUC or Raspberry Pi.

Even within the realms of Raspberry Pis, not every Raspberry Pi is created equal. If you need a little low-power outpost for counting cows in field with LoRa, then something like a Raspberry Pi Zero as a base might be more suitable than a fully Raspberry Pi 4B+ for example.

CPU architecture: Different CPU architectures have different performance / watt ratios. For example. AMD CPUs are - on the whole - more efficient than Intel CPUs as of 2021. What really matters here is the manufacturing size and density - e.g. a 7nm chip will be more power efficient than a 12nm or 14nm one.

ARM CPUs (e.g. Raspberry Pi and friends) are more efficient again (though the rule-of-thumb about manufacturing size & density does not hold true here). If you haven't yet bought any hardware for your next project, this is definitely worth considering.

Auto-on: Depending on your task, you might only need your device on for a short time each day. Most BIOSes will have a setting to automatically power on at a set time, so you could do this and then set the server to automatically power off when it has completed it's task.

Another consideration is automatically entering standby. This can be done with the rtcwake command. While not as power efficient as turning completely off, it should still net measurable power savings.

Firmware: Tools such as powertop (sudo apt install powertop on Debian-based systems) can help apply a number of optimisations. In the case of powertop, don't forget to add the optimisations you choose to your /etc/rc.local to auto-apply them on boot. Example things that you can optimise using powertop include:

  • Runtime power management for WiFi / Bluetooth
  • SATA power management

Disk activity: Again situationally dependent, but if you have a lot of disks attached to your server, reducing writes can have a positive impact on power usage. Tuning this is generally done with the hdparm command (sudo apt install hdparm). See this Unix Stack Exchange question, and also this Ask Ubuntu answer for more details on how this is done.

Software: Different applications will use different amounts of system resources, which in turn will consume different amounts of power. For example, GitLab is rather resource inefficient, but Gitea is much more efficient with resources. Objectively evaluating multiple possible candidate programs that solve your given problem is important if power savings are critical to your use-case.

Measuring resource usage over time (e.g. checking the CPU Time column in htop for example) is probably the most effective way of measuring this, though you'd want to devise an experiment where you run each candidate program in turn for a defined length of time and measure a given set of metrics - e.g. CPU time.

Measurement: Speaking of metrics, it's worth noting that while all these suggestions are interesting, you should absolutely measure the real power savings you get from implementing these suggestions. Some will give you more of a net gain for less work than others.

The best way I know of to do this is to use a power monitor like this one that I've bought previously and plugging your device into it, and then coming back a given amount of time later to record the total number of watt hours of electricity used. For USB devices such as the Raspberry Pi, if I remember rightly I purchased this device a while back, and it works rather well.

This will definitively tell you whether implementing a given measure will net you a significant decrease in power usage or not, which you can then weight against the effort required.

Users and access control in the Mosquitto MQTT server

A while ago, I blogged about how to setup an MQTT server with Mosquitto. In this one, I want to talk about how to setup multiple user accounts and how to implement access control.

In this post, I'll assume that you've already followed my previous post to which I've linked above.

User accounts

User accounts are a great security measure, as they prevent anyone without a password from accessing your MQTT server. Thankfully, they are pretty easy to do too - you just need a user / password file, and a directive in the main mosquitto.conf file to get it to read from it.

First, let's create a new users file:

sudo touch /etc/mosquitto/mosquitto_users
sudo chown mosquitto:mosquitto /etc/mosquitto/mosquitto_users
sudo chmod 0640 /etc/mosquitto/mosquitto_users

Then you can create new users like this:

sudo mosquitto_passwd /etc/mosquitto/mosquitto_users new_username_1

...replacing new_username_1 with the username of the new account you want to create. Upon executing the above, it will prompt you to enter a new password. Personally I use Keepass2 for this purpose, but you can create good passwords on the command line directly too:

dd if=/dev/urandom bs=1 count=20 | base64 | tr -d '+/='

Now that we have a users file, we can tell mosquitto about it. Add the following to your /etc/mosquitto/mosquitto.conf file:

# Require a username / password to connect
allow_anonymous false
# ....which are stored in the following file
password_file /etc/mosquitto/mosquitto_users

This disables anonymous access, and tells mosquitto where the the username / password file.

In future if you want to delete a user, do that like this:

sudo mosquitto_passwd /etc/mosquitto/mosquitto_users -D new_username_1

Access control

Access control is similar to user accounts. First, we need an access control file - which describes who can access what - and then we need a directive in the mosquitto.conf file to tell Mosquitto about it. Let's start with that access control file. Mine is located at /etc/mosquitto/mosquitto_acls.

# Directives here affect anonymous users, but we've disabled anonymous access

user username_here
topic readwrite foo/#

user bob
topic read rockets/status

There are 2 parts to the ACL file. First, the user directive sets the current user for which any following topic directives apply.

The topic directive allows the current user to read, write, or readwrite (both at the same time) a given topic. MQTT as a protocol is built on the idea of publishing (writing) to or subscribing (reading from) topics. Mosquitto assumes that a user has no access at all unless 1 or more topic directives are present to allow access.

The topic directive is comprised of 3 parts. First, the word topic is the name of the directive.

Next, any 1 of the following words declares what kind of access is being granted:

  • read: Read-only access
  • write: Write-only access
  • readwrite: Both read and write access

Finally, the name of the topic that is being affected by the access rule is given. This may include a hash symbol (#) as a wildcard. For example, rockets/status would affect only that specific topic, but space/# would affect all topics that start with space/.

Here are some more examples:

# Allow read access to "my_app/news"
topic read my_app/news

# Allow write access to "rockets/status"
topic write rockets/status

# Allow read and write access to everything under "another_app/"
topic readwrite another_app/#

Once you've created your ACL file, add this to your mosquitto.conf (being careful to put it before any listener directives if you have TLS / MQTTS support enabled):

acl_file /etc/mosquitto/mosquitto_acls

This will tell Mosquitto about your new access control file.

Reloading changes

After making changes above, you'll want to tell Mosquitto to reload the configuration file. Do that like this:

sudo systemctl reload mosquitto-mqtt.service

If your systemd service file doesn't support reloading, then a restart will do. Alternatively, add this to your systemd service file to the [Service] section:

ExecReload=/bin/kill -s HUP $MAINPID

Conclusion

In this tutorially-kinda post, I've talked through how to manage user accounts for the Mosquitto MQTT. I've also talked about how to enable and manage access control lists too.

This should make your MQTT server more secure. The other thing you can do to make your MQTT server more secure is enable TLS encryption. I'm going to hold off on showing that in this file because I'm still unsure about the best way of doing it (getting Mosquitto to do it vs using Nginx as a reverse proxy - I'm currently testing the former), but if there's the demand I'll post about it in the future.

Cluster, Part 11: Lock and Key | Let's Encrypt DNS-01 for wildcard TLS certificates

Welcome one and all to another cluster blog post! Cluster blog posts always take a while to write, so sorry for the delay. As is customary, let's start this post off with a list of all the parts in the series so far:

With that out of the way, in this post we're going to look at obtaining a wildcard TLS certificate using the Let's Encrypt DNS-01 challenge. We want this because you need a TLS certificate to serve HTTPS without lighting everyone's browsers up with warnings like a Christmas tree.

The DNS-01 challenge is an alternate challenge to the default HTTP-01 challenge you may already me familiar with.

Unlike the HTTP-01 challenge which proves you have access to single domain by automatically placing a file on your web server, the DNS-01 challenge proves you have control over an entire domain - thus allowing you to obtain a wildcard certificate - which is valid for not only your domain, but all possible subdomains! This should save a lot of hassle - but it's important we keep it secure too.

As with regular Let's Encrypt certificates, we'll also need to ensure that our wildcard certificate we obtain will be auto-renewed, so we'll be setting up a periodic task on our Nomad cluster to do this for us.

If you don't have a Nomad cluster, don't worry. It's not required, and I'll be showing you how to do it without one too. But if you'd like to set one up, I recommend part 7 of this series.

In order to complete the DNS-01 challenge successfully, we need to automatically place a DNS record in our domain. This can be done via an API, if your DNS provider has one and it's supported. Personally, I have the domain name I'm using for my cluster (mooncarrot.space.) with Gandi. We'll be using certbot to perform the DNS-01 challenge, which has a plugin system for different DNS API providers.

We'll be installing the challenge provider we need with pip3 (a Python 3 package manager, as certbot is written in Python), so you can find an up-to-date list of challenge providers over on PyPi here: https://pypi.org/search/?q=certbot-dns

If you don't see a plugin for your provider, don't worry. I couldn't find one for Gandi, so I added my domain name to Cloudflare and followed the setup to change the name servers for my domain name to point at them. After doing this, I can now use the Cloudflare API through the certbot-dns-cloudflare plugin.

With that sorted, we can look at obtaining that TLS certificate. I opt to put certbot in a Docker container here so that I can run it through a Nomad periodic task. This proved to be a useful tool to test the process out though, as I hit a number of snags with the process that made things interesting.

The first order of business is to install certbot and the associate plugins. You'd think that simply doing an sudo apt install certbot certbot-dns-cloudflare would do the job, but you'd be wrong.

As it turns out, it does install that way, but it installs an older version of the certbot-dns-cloudflare plugin that requires you give it your Global API Key from your Cloudflare account, which has permission to do anything on your account!

That's no good at all, because if the key gets compromised an attacker could edit any of the domain names on our account they like, which would quickly turn into a disaster!

Instead, we want to install the latest version of certbot and the associated Cloudflare DNS plugin, which support regular Cloudflare API Tokens, upon which we can set restrictive permissions to only allow it to edit the one domain name we want to obtain a TLS certificate for.

I tried multiple different ways of installing certbot in order to get a version recent enough to get it to take an API token. The way that worked for me was a script called certbot-auto, which you can download from here: https://dl.eff.org/certbot-auto.

Now we have a way to install certbot, we also need the Cloudflare DNS plugin. As I mentioned above, we can do this using pip3, a Python package manager. In our case, the pip3 package we want is certbot-dns-cloudflare - incidentally it has the same name as the outdated apt package that would have made life so much simpler if it had supported API tokens.

Now we have a plan, let's start to draft out the commands we'll need to execute to get certbot up and running. If you're planning on following this tutorial on bare metal (i.e. without Docker), go ahead and execute these directly on your target machine. If you're following along with Docker though, hang on because we'll be wrapping these up into a Dockerfile shortly.

First, let's install certbot:

sudo apt install curl ca-certificates
cd some_permanent_directory;
curl -sS https://dl.eff.org/certbot-auto -o certbot-auto
chmod +x certbot-auto
sudo certbot-auto --debug --noninteractive --install-only

Installation with certbot-auto comprises downloading a script and executing it. with a bunch of flags. Next up, we need to shoe-horn our certbot-dns-cloudflare plugin into the certbot-auto installation. This requires some interesting trickery here, because certbot-auto uses something called virtualenv to install itself and all its dependencies locally into a single directory.

sudo apt install python3-pip
cd /opt/eff.org/certbot/venv
source bin/activate
pip install certbot-dns-cloudflare
deactivate

In short, we cd into the certbot-auto installation, activate the virtualenv local environment, install our dns plugin package, and then exit out of the virtual environment again.

With that done, we can finally add a convenience synlink so that the certbot command is in our PATH:

ln -s /opt/eff.org/certbot/venv/bin/certbot /usr/bin/certbot

That completes the certbot installation process. Then, to use certbot to create the TLS certificate, we'll need an API as mentioned earlier. Navigate to the API Tokens part of your profile and create one, and then create an INI file in the following format:

# Cloudflare API token used by Certbot
dns_cloudflare_api_token = "YOUR_API_TOKEN_HERE"

...replacing YOUR_API_TOKEN_HERE with your API token of course.

Finally, with all that in place, we can create our wildcard certificate! Do that like this:

sudo certbot certonly --dns-cloudflare --dns-cloudflare-credentials path/to/credentials.ini -d 'bobsrockets.io,*.bobsrockets.io' --preferred-challenges dns-01

It'll ask you a bunch of interactive questions the first time you do this, but follow it through and it should issue you a TLS certificate (and tell you where it stored it). Actually utilising it is beyond the scope of this post - we'll be tackling that in a future post in this series.

For those following along on bare metal, this is where you'll want to skip to the end of the post. Before you do, I'll leave you with a quick note about auto-renewing your TLS certificates. Do this:

sudo letsencrypt renew
sudo systemctl reload nginx postfix

....on a regular basis, replacing nginx postfix with a space-separated list of services that need reloading after you've renewed your certificates. A great way to do this is to setup a cron job.

Sweeping things under the carpet

For the Docker users here, we aren't quite finished yet: We need to package this mess up into a nice neat Docker container where we can forget about it :P

Some things we need to be aware of:

  • certbot has a number of data directories it interacts with that we need to ensure don't get wiped when the Docker ends instances of our container.
  • Since I'm serving the shared storage of my cluster over NFS, we can't have certbot running as root as it'll get a permission denied error when it tries to access the disk.
  • While curl and ca-certificates are needed to download certbot-auto, they aren't needed by certbot itself - so we can avoid installing them in the resulting Docker container by using a multi-stage Dockerfile.

To save you the trouble, I've already gone to the trouble of developing just such a Dockerfile that takes all of this into account. Here it is:

ARG REPO_LOCATION
# ARG BASE_VERSION

FROM ${REPO_LOCATION}minideb AS builder

RUN install_packages curl ca-certificates \
    && curl -sS https://dl.eff.org/certbot-auto -o /srv/certbot-auto \
    && chmod +x /srv/certbot-auto

FROM ${REPO_LOCATION}minideb

COPY --from=builder /srv/certbot-auto /srv/certbot-auto

RUN /srv/certbot-auto --debug --noninteractive --install-only && \
    install_packages python3-pip

WORKDIR /opt/eff.org/certbot/venv
RUN . bin/activate \
    && pip install certbot-dns-cloudflare \
    && deactivate \
    && ln -s /opt/eff.org/certbot/venv/bin/certbot /usr/bin/certbot

VOLUME /srv/configdir /srv/workdir /srv/logsdir

USER 999:994
ENTRYPOINT [ "/usr/bin/certbot", \
    "--config-dir", "/srv/configdir", \
    "--work-dir", "/srv/workdir", \
    "--logs-dir", "/srv/logsdir" ]

A few things to note here:

  • We use a multi-stage dockerfile here to avoid installing curl and ca-certificates in the resulting docker image.
  • I'm using minideb as a base image that resides on my private Docker registry (see part 8). For the curious, the script I use to do this located on my personal git server here: https://git.starbeamrainbowlabs.com/sbrl/docker-images/src/branch/master/images/minideb.
    • If you don't have minideb pushed to a private Docker registry, replace minideb with bitnami/minideb in the above.
  • We set the user and group certbot runs as to 999:994 to avoid the NFS permissions issue.
  • We define 3 Docker volumes /srv/configdir, /srv/workdir, and /srv/logsdir to contain all of certbot's data that needs to be persisted and use an elaborate ENTRYPOINT to ensure that we tell certbot about them.

Save this in a new directory with the name Dockerfile and build it:

sudo docker build --no-cache --pull --tag "certbot" .;

...if you have a private Docker registry with a local minideb image you'd like to use as a base, do this instead:

sudo docker build --no-cache --pull --tag "myregistry.seanssatellites.io:5000/certbot" --build-arg "REPO_LOCATION=myregistry.seanssatellites.io:5000/" .;

In my case, I do this on my CI server:

laminarc queue docker-rebuild IMAGE=certbot

The hows of how I set that up will be the subject of a future post. Part of the answer is located in my docker-images Git repository, but the other part is in my private continuous integration Git repo (but rest assured I'll be talking about it and sharing it here).

Anyway, with the Docker container built we can now obtain our certificates with this monster of a one-liner:

sudo docker run -it --rm -v /mnt/shared/services/certbot/workdir:/srv/workdir -v /mnt/shared/services/certbot/configdir:/srv/configdir -v /mnt/shared/services/certbot/logsdir:/srv/logsdir certbot certonly --dns-cloudflare --dns-cloudflare-credentials path/to/credentials.ini -d 'bobsrockets.io,*.bobsrockets.io' --preferred-challenges dns-01

The reason this is so long is that we need to mount the 3 different volumes into the container that contain certbot's data files. If you're running a private registry, don't forget to prefix certbot there with registry.bobsrockets.com:5000/.

Don't forget also to update the Docker volume locations on the host here to point a empty directories owned by 999:994.

Even if you want to run this on Nomad, I still advise that you execute this manually. This is because the first time you do so it'll ask you a bunch of questions interactively (which it doesn't do on subsequent times).

If you're not using Nomad, this is the point you'll want to skip to the end. As before with the bare-metal users, you'll want to add a cron job that runs certbot renew - just in your case inside your Docker container.

Nomad

For the truly intrepid Nomad users, we still have one last task to complete before our work is done: Auto-renewing our certificate(s) with a Nomad periodic task.

This isn't really that complicated I found. Here's what I came up with:

job "certbot" {
    datacenters = ["dc1"]
    priority = 100
    type = "batch"

    periodic {
        cron = "@weekly"
        prohibit_overlap = true
    }

    task "certbot" {
        driver = "docker"

        config {
            image = "registry.service.mooncarrot.space:5000/certbot"
            labels { group = "maintenance" }
            entrypoint = [ "/usr/bin/certbot" ]
            command = "renew"
            args = [
                "--config-dir", "/srv/configdir/",
                "--work-dir", "/srv/workdir/",
                "--logs-dir", "/srv/logsdir/"
            ]
            # To generate a new cert:
            # /usr/bin/certbot --work-dir /srv/workdir/ --config-dir /srv/configdir/ --logs-dir /srv/logsdir/ certonly --dns-cloudflare --dns-cloudflare-credentials /srv/configdir/__cloudflare_credentials.ini -d 'mooncarrot.space,*.mooncarrot.space' --preferred-challenges dns-01

            volumes = [
                "/mnt/shared/services/certbot/workdir:/srv/workdir",
                "/mnt/shared/services/certbot/configdir:/srv/configdir",
                "/mnt/shared/services/certbot/logsdir:/srv/logsdir"
            ]
        }
    }
}

If you want to use it yourself, replace the various references to things like the private Docker registry and the Docker volumes (which require "docker.volumes.enabled" = "True" in clientoptions in your Nomad agent configuration) with values that make sense in your context.

I have some confidence that this is working as intended by inspecting logs and watching TLS certificate expiry times. Save it to a file called certbot.nomad and then run it:

nomad job run certbot.nomad

Conclusion

If you've made it this far, congratulations! We've installed certbot and used the Cloudflare DNS plugin to obtain a DNS wildcard certificate. For the more adventurous, we've packaged it all into a Docker container. Finally for the truly intrepid we implemented a Nomad periodic job to auto-renew our TLS certificates.

Even if you don't use Docker or Nomad, I hope this has been a helpful read. If you're interested in the rest of my cluster build I've done, why not go back and start reading from part 1? All the posts in my cluster series are tagged with "cluster" to make them easier to find.

Unfortunately, I haven't managed to determine a way to import TLS certificates into Hashicorp Vault automatically, as I've stalled a bit on the Vault front (permissions and policies are wildly complicated), so in future posts it's unlikely I'll be touching Vault any time soon (if anyone has an alternative that is simpler and easier to understand / configure, please comment below).

Despite this, in future posts I've got a number of topics lined up I'd like to talk about:

  • Configuring Fabio (see part 9) to serve HTTPS and force-redirect from HTTP to HTTPS (status: implemented)
  • Implementing HAProxy to terminate port forwarding (status: initial research)
  • Password protecting the private docker registry, Consul, and Nomad (status: on the todo list)
  • Semi-automatic docker image rebuilding with Laminar CI (status: implemented)

In the meantime, please comment below if you liked this post, are having issues, or have any suggestions. I'd love to hear if this helped you out!

Sources and Further Reading

NAS, Part 4: Time machines | Automatic snapshotting with btrfs-snapshot

In the last part in this series, I compared ZFS with Btrfs. I ended up choosing Btrfs because it was easier to install and came with a number of advantages. Since last time, I've now put Btrfs to work and have about ~1.3 TiB of data stored in it (much of which is from various devices across the network automatically backing up to it). Before we continue, here's a list of the parts in the series so far:

In this post, I'm going to talk about the automatic snapshotting I've setup. Btrfs supports creating snapshots, which are defined as subvolumes that are seeded with data from another subvolume (boundaries between subvolumes are not crossed). Most of the time, these are created to be read-only. In addition because of the copy-on-write system Btrfs uses, a snapshot takes no disk space on its own (other than that required to store the fact that it exists) - it only starts to consume disk space when files that it contains are modified in the original subvolume.

To this end, we can efficiently keep a rotating series of snapshots to serve as an initial safety net should a someone accidentally delete a file. Of course, we can't assume that snapshots will be ok as the only backup (I use Restic for that - I'm in the process of reconfiguring it for my new setup) - but they are still useful things to have.

To take a Btrfs snapshot, you can do this:

sudo btrfs subvolume snapshot -r path/to/source_subvolume path/to/target

The problem here, of course, is that you also need a way to delete old snapshots too. While I could roll my own solution for this, I figured that someone has already solved this problem - so it might save me some effort if I look for a pre-existing solution first.

After doing a bit of searching without success, I asked on Reddit, and the helpful folks there gave me a number of suggestions:

Of these 3, snapper seemed to be the most popular. From some reading, it appeared to be powerful and flexible - at the cost of being easy to understand. btrbk seemed to be feature-packed too, but in the end I decided on btrfs-snapshot.

btrfs-snapshot is designed to be used with cron. For example, I have something like this for one of my subvolumes in root user's crontab:

0 * * * *       /root/btrfs-snapshot-rotation/btrfs-snapshot path/to/subvolume path/to/subvolume/.snapshots hourly 8
0 2 * * *       /root/btrfs-snapshot-rotation/btrfs-snapshot path/to/subvolume path/to/subvolume/.snapshots daily 4
0 2 * * 7       /root/btrfs-snapshot-rotation/btrfs-snapshot path/to/subvolume path/to/subvolume/.snapshots weekly 4

Given a subvolume at path/to/subvolume, it creates the following snapshots in a nested subvolume in path/to/subvolume/.snapshots (which needs to be created manually: sudo btrfs subvolume create path/to/subvolume/.snapshots):

  • 8 x hourly snapshots
  • 4 x daily snapshots
  • 4 x weekly snapshots

I find the system so beautifully simple and easy to understand. This is important for me in a system like this, as it has to be easy for me to understand when I inevitably come back to it months or even years later when I've forgotten how it works. The arguments to btrfs-snapshot are easy to understand, and are in the form path/to/source path/to/target tag_name number_of_snapshots_to_keep.

This has the added bonus that if a user deletes a file accidentally in our shared drive, they can retrieve it on their own from the .snapshots directory - without my intervention.

With this in place and the data (mostly) moved over, my NAS project is almost complete. The final task I have left to do is to setup a proper backup system with Restic to either a remote (e.g. Backblaze B2) or offline location (such as an external HDD).

The latter might prove to be a problem though, since the maximum amount of data I can store right now is 5.5 TiB and is only going to grow from there. Portable external hard drives I've seen online don't appear to go up that high, so I suspect I'll need to choose another plan.

Should I encounter some interesting issues when setting this final backup step up, I'll make an additional post in this series. If not though, this will probably be the last entry in this series. If you have any questions about my setup, please comment below! I'll dod my best to answer any questions.

NAS, Part 3: Decisions | Choosing a Filesystem

It's another entry in my NAS series! It's still 2020 for me as I type this, but I hope that 2021 is going well. Before we continue, I recommend checking out the previous posts in this series:

Part 1 in particular is useful for context as to the hardware I'm using. Part 2 is a review of my experience assembling the system. In this part, we're going to look at my choice of filesystem and OS.

I left off in the last post after I'd booted into the installer for Ubuntu Server 20.04. After running through that installer, I performed my collection of initial setup tasks for any server I manage:

  • Setup an SSH server
  • Enable UFW
  • Setup my personal ~/bin folder
  • Assign a static IP address (why won't you let me choose an IP, Netgear RAX120? Your UI lets me enter a custom IP, but it devices don't ultimately end up with the IP I tell you to assign to them....)
  • Setup Collectd
  • A number of other tasks I forget

With my basic setup completed, I also setup a few things specific to devices that have SMART-enabled storage devices:

  • Setup an email relay (via autossh) for mail delivery
  • Installed smartd (which sends you emails when there's something wrong with 1 your disks)
  • Installed and configured hddtemp, and integrated it with collectd (a topic for another post, I did this for the first time)

With these out of the way and after making a mental note to sort out backups, I could now play with filesystems with a view to making a decision. The 2 contenders:

  • (Open)ZFS
  • Btrfs

Both of these filesystems are designed to be spread across multiple disks in what's known as a pool thereof. The idea behind them is to enable multiple disks to be presented to the user as a single big directory, with the complexities as to which disk(s) a file is/are stored on. They also come with extra nice features, such as checksumming (which allows them to detect corruption), snapshotting (taking snapshots of what the filesystem looks like at a given point in time), automatic data deduplication, compression, snapshot send / receiving, and more!

Overview: ZFS

ZFS is a filesystem originally developed by Sun Microsystems in 2001. Since then, it has been continually developed and improved. After Oracle bought Sun Microsystems in 2010, the source code for ZFS was closed - hence the OpenZFS fork was born. It's licenced under the CDDL, which isn't compatible with the GPLv2 used by the Linux Kernel. This causes some minor installation issues.

As a filesystem, it seems to be widely accepted to be rock solid and mature. It's used across the globe by home users and businesses both large and small to store huge volumes of data. Given its long history, it has proven its capability to store data safely.

It does however have some limitations. For one, it only has limited support for adding drives to a zpool (a pool of disks in the ZFS world), which is a problem for me - as I'd prefer to have the ability to add drives 1 at a time. It also has limited support for changing key options such as the compression algorithm later, as this will only affect new files - and the only way to recompress old files is to copy them in and out of the disk again.

Overview: Btrfs

Btrfs, or B-Tree File System is a newer filesystem that development upon which began in 2007, and was accepted into the Linux Kernel in 2009 with the release of version 1.0. It's licenced under the GPLv2, the same licence as the Linux Kernel. As of 2020, many different distributions of Linux ship with btrfs installed by default - even if it isn't the default filesystem (that's ext4 in most cases).

Unlike ZFS, Btrfs isn't as well-tested in production settings. In particular, it's raid5 and raid6 modes of operation are not well tested (though this isn't a problem, since raid1 operates at file/block level and not disk level as it does with ZFS, which enables us to use interesting setups like raid1 striped across 3 disks). Despite this, it does look to be stable enough - particularly as openSUSE has set it to be the default filesystem.

It has a number of tempting features over ZFS too. For example, it supports adding drives 1 at a time, and you can even convert your entire pool from 1 raid level to another dynamically while it's still mounted! The same goes for converting between compression algorithms - it's all done using a generic filter system.

Such a system is useful when adding new disks to the pool too, as they it can be used to rebalance data across all the disks present - allowing for new disks to be accounted for and faulty disks to be removed, preserving the integrity of the data while a replacement disk is ordered for example.

While btrfs does have a bold list of features that they'd like to implement, they haven't gotten around to all of them yet (the status of existing features can be found here). For example, while ZFS can use an SSD as a dedicated caching device, btrfs doesn't yet have this ability - and nobody appears to have claimed the task on the wiki.

Performance

Inspired by a recent Ars Technica article, I'd like to test the performance of the 2 filesystems at hand. I ran the following tests for reading and writing separately:

  • 4k-random: Single 4KiB random read/write process
  • 64k-random-16p: 16 parallel 64KiB random read/write processes
  • 1m-random: Single 1MiB random write process

I did this for both ZFS in raid5 mode, and Btrfs in raid5 (though if I go with btrfs I'll be using raid1, as I later discovered - which I theorise would yield a minor performance improvement). I tested ZFS twice: once with gzip compression, and again with zstd compression. As far as I can tell, Btrfs doesn't have compression enabled by default. Other than the compression mode, no other tuning was done - all the settings were left at their defaults. Both filesystems were completely empty aside from the test files, which were created automatically in a chowned subdirectory by fio.

Graphs showing the results of the above tests. See the discussion below.

The graph uses a logarithmic scale. My initial impressions are that ZFS benefits from parallelisation to a much greater extent than btrfs - though I suspect that I may be CPU bound here, which is an unexpected finding. I may also be RAM-bound too, as I observed a significant increase in RAM usage when both filesystems were under load. Buying another 8GB would probably go a long way to alleviating that issue.

Other than that, zstd appears to provide a measurable performance improvement over gzip compression. Btrfs also appears to benefit from writing larger blocks over smaller ones.

Overall, some upgrades to my NAS are on the cards should I be unsatisfied with the performance in future:

  • More RAM would assist in heavy i/o loads
  • A better CPU would probably raise the peak throughput speeds - if I can figure out what to do with the old one

But for now, I'm perfectly content with these speeds. Especially since I have a single gigabit ethernet port on my storage NAS, I'm not going to need anything above 1000Mbps - which is 119.2 MiB/s if you'd like to compare against the graph above.

Conclusion

As for my final choice of filesystem, I think I'm going to go with btrfs. While I'm aware that it isn't as 'proven' as ZFS - and slightly less performant too - I have a number of reasons for this decision:

  1. Btrfs allows you to add disks 1 at a time, and ZFS makes this difficult
  2. Btrfs has the ability to convert to a different raid level at a later date if I change my mind
  3. Btrfs is easier to install, since it's already built-in to Ubuntu Server 20.04.

NAS, Part 2: Assembly and Installation

Welcome back! This is part 2 of a series of posts about my new NAS (network attached storage) device I'm building. If you haven't read it yet, I recommend you go back and read part 1, in which I talk about the hardware I'm using.

Since the Fractal Design Node 804 case came first, I was able to install the parts into it as they arrived. First up was the motherboard (an ASUS PRIME B450M-A) and CPU (an AMD Athlon 3000G).

The motherboard was a pain. As I read, the middle panel of the case has some flex in it, so you've got to hold it in place with one hand we you're screwing the motherboard in. This in and of itself wasn't an issue at all, but the screws for the motherboard were really stiff. I think this was just the motherboard, but it was annoying.

Thankfully I managed it though, and then set to work installing the CPU. This went well - the CPU came with thermal paste on top already, so I didn't need to buy my own. The installation process for the stock CPU heatsink + fan was unfamiliar, which took me a moment to decipher how the mechanism worked.

Following this, I connected the front ports from the case up to the motherboard (consulting my motherboard's documentation showed me where I needed to plug these in - I remember this being something I struggled with when I last built an (old) PC when doing some IT technician work experience some years ago). The RAM - while a little stiff (to be expected) - went in fine too. I might buy another stick later if I run into memory pressure, but I thought a single 8GB stick would be a good place to start.

The case came with a dedicated fan controller board that has a high / medium / low switch on the back too, so I wired up the 3 included Noctua case fans to this instead of the slots on the motherboard. The CPU fan (nothing special yet - just the stock fan that came with the CPU) went into the motherboard though, as the fan controller didn't have room - and I thought that the motherboard would be better placed to control the speed of that one.

The inside of the 2 sides of the case.

(Above: The inside of the 2 sides of the case. Left: The 'hot' side, Right: The 'cold' side.)

The case is split into 2 sides: 1 for 'hot' components (e.g. the motherboard and CPU), and another for 'cold' components (e.g. the HDDs and PSU). Next up were the hard disks - so I mounted the SSD for the operating system to the base of the case in the 'hot' side, as the carriage in the cold side fits only 3.5 inch disks, and my SSD is a 2.5 inch disk. While this made the cabling slightly awkward, it all worked out in the end.

For the 3.5 inch HDDs (for data storage), I found I was unable to mount them with the included pieces of bracket metal that allow you to put screws into the bottom set of holes - as the screws wouldn't fit through the top holes. I just left the metal bracket pieces out and mounted the HDDs directly into the carriage, and it seems to have worked well so far.

The PSU was uneventful too. It fit nicely into the space provided, and the semi-modular nature of the cables provided helped tremendously to avoid a mess of cables all over the place as I could remove the cables I didn't need.

Finally, the DVD writer had some stiff screws, but it seemed to mount well enough (just a note: I've been having an issue I need to investigate with this DVD drive whereby I can't take a copy of a disk - e.g. the documentation CD that came with my motherboard - with dd, as it reports an IO error. I need to investigate this further, so more on that in a later post).

The installation of the DVD drive completed the assembly process. To start it up for the first time, I connected my new NAS to my television temporarily so that I could see the screen. The machine booted fine, and I dove straight into the BIOS.

The BIOS that comes with the ASUS motherboard I bought

(Above: The BIOS that comes with the ASUS motherboard, before the clock was set by Ubuntu Server 20.04 - which I had yet to install)

Unlike my new laptop, the BIOS that comes with the ASUS motherboard is positively delightful. It has all the features you'd need, laid out in a friendly interface. I observed some minor input lag, but considering this is a BIOS we're talking about here I can definitely overlook that. It even has an online update feature, where you can plug in an Ethernet cable and download + install BIOS updates from the Internet.

I tweaked a few settings here, and then rebooted into my flash drive - onto which I loaded an Ubuntu Server 20.04 ISO. It booted into this without complaint (unlike a certain laptop I'm rather unhappy with at the moment), and then I selected the appropriate ISO and got to work installing the operating system (want your own multiboot flash drive? I've blogged about that already! :D).

In the next post, I'm going to talk about the filesystem I ultimately chose. I'm also going to show and discuss some performance tests I ran using fio following this Ars Technica guide.

Cluster, Part 10: Dockerisification | Writing Dockerfiles

Hey there - welcome to 2021! I'm back with another cluster post. In double digits too! I think this is the longest series yet on my blog. Before we start, here's a list of all the posts in the series so far:

We've got a pretty cool setup going so far! With Nomad for task scheduling (part 7), Consul to keep track of what's running where (part 6), and wesher keeping communications secured (part 4, although defence in depth says that we'll be returning later to shore up some stuff here) we have a solid starting point from which to work from. And it's only taken 9 blog posts to get to this point :P

In this post, we'll be putting all our hard work to use by looking at the basics of writing Dockerfiles. It's taken me quite a while to get my head around them, so I want to take a moment here to document some of the things I've learnt. A few other things that I want to talk about soon are Hashicorp Vault (it's still giving me major headaches trying to understand the Nomad integration though, so this may be a while), obtaining TLS certificates, and tying in with the own your code series by showing off the Docker image management script setup I have that I've worked into my Laminar CI instance, which makes it easy to rebuild images and all their dependants.

Anyway, Dockerfiles. First question: what? Dockerfiles are essentially a file containing a domain-specific language that defines how a Docker image can be built. They are usually named Dockerfile. Here I use the term image and not container:

  • Image: A Docker image that contains a bunch of files and directories that can be run
  • Container: A copy of an image that is currently running on a host system.

In short: A container is a running image, and a Docker image is the bit that a container spins up from.

Second question: why? The answer is a few different reasons. Although it adds another layer of indirection and complication, it also allows us to square applications away such that we don't care about what host they run on (too much).

A great example here is would be a static file web server. In our case, this is particularly useful because Fabio - as far as I know - isn't actually capable of serving files from disk. Personally I have a fork of a rather nice dashboard I'd like to have running for my cluster too, so I found that it fits perfectly to test the waters.

Next question: how? Well, let's break the process down:

  1. Install Node.js
  2. Install the serve npm package

Thankfully, I've recently packaged Node.js in my apt repository (finally! It's only taken me multiple years.....). Since we might want to build lots of different Node.js based container images, it makes sense to make Node.js its own separate container. I'm also using my apt repository in other container images too which don't necessarily need Node.js, so I've opted to put my apt repository into my base image (If I haven't mentioned it already, I'm using minideb as my base image - which I build with a patch to make it support Raspbian - which is now called Raspberry Pi OS. It's confusing).

To better explain the plan, let's use a diagram:

(Above: A diagram I created. Link to editing file - don't forget this blog is licenced under CC-BY-SA.)

Docker images are always based on another Docker image. Our node-serve Docker image we intend to create will be based on a minideb-node Docker image (which we'll also be creating), which itself will be based on the minideb base image. Base images are special, as they don't have a parent image. They are usually imported via a .tar.gz image for example, but that's a story for another time (also for another time are image based on scratch, a special image that's completely empty).

We'll then push the final node-serve Docker image to a Docker registry. I'm running my own private Docker registry, but you can use the Docker Hub or setup your own private Docker registry.

With this in mind, let's start with a Docker image for Node.js:

ARG REPO_LOCATION

FROM ${REPO_LOCATION}minideb

RUN install_packages libatomic1 nodejs-sbrl

Let's talk about each of the above commands in turn:

  1. ARG REPO_LOCATION: This brings in an argument which is specified at build time. Here we want to allow the user to specify the location of a private Docker registry to pull the base (or parent) image from to begin the build process with.
  2. FROM ${REPO_LOCATION}minideb: This specifies the base (or parent) image to start the build with.
  3. RUN install_packages libatomic1 nodejs-sbrl: The RUN command runs the specified command inside the Docker container, saving a new layer in the process (more on those later). In this case, we call the install_packages command, which is a helper script provided by minideb to make package installation easier.

Pretty simple! This assumes that the minideb base image you're using has my apt repository setup, which make not be the case. To this end, we'd like to automatically set that up. To do this, we'll need to use an intermediate image. This took me some time too get my head around, so if you're unsure about anything, please comment below.

Let's expand on our earlier attempt at a Dockerfile:

ARG REPO_LOCATION

FROM ${REPO_LOCATION}minideb AS builder

RUN install_packages curl ca-certificates

RUN curl -o /srv/sbrl.asc https://apt.starbeamrainbowlabs.com/aptosaurus.asc

FROM ${REPO_LOCATION}minideb

COPY --from=builder /srv/sbrl.asc /etc/apt/trusted.gpg.d/sbrl-aptosaurus.asc

RUN echo "deb https://apt.starbeamrainbowlabs.com/ /" > /etc/apt/sources.list.d/sbrl.list && \
    install_packages libatomic1 nodejs-sbrl;

This one is more complicated, so let's break it down. Here, we have an intermediate Docker image (which we name builder via the AS builder bit at the end of the 1st FROM) in which we download and install curl (the 1st RUN command there), followed by a second image in which we copy the file we downloaded from the first Docker image and place it in a specific place in the second (the COPY directive).

Docker always reads Dockerfiles from top to bottom and executes them in sequence, so it will assume that the last image created is the final one - i.e. from the last FROM directive. Every FROM directive starts afresh from a brand-new copy of the specified parent image.

We've also expanded the RUN directive at the end of the file there to echo the apt sources list file out for my apt repository. We've done it like this in a single RUN command and not 2, because every time you add another directive to a Dockerfile (except ARG and FROM), it creates a new layer in the resulting Docker image. Minimising the number of layers in a Docker image is important for performance, hence the obscurity here in chaining commands together. To build our new Dockerfile, save it to a new empty directory. Then, execute this:

cd path/to/directory/containing_the_dockerfile;
docker build  --pull --tag "minideb-node" .

If you're using a private registry, add --build-arg "REPO_LOCATION=registry.example.com:5000/" just before the . there at the end of the command and prefix the tag with registry.example.com:5000/. If you're developing a new Docker image and having trouble with the cache (Docker caches the result of directives when building images), add --no-cache.

Then, push it to the Docker registry like so:

execute docker push "minideb-node"

Again, prefix minideb-node there with registry.example.com:5000/ should you be using a private Docker registry.

Now, you should be able to start an interactive session inside your new Docker container:

docker run -it --rm minideb-node

As before, prefix minideb-node there with registry.example.com/ if you're using a private Docker registry.

Now that we've got our Docker image for Node.js, we can write another Dockerfile for serve, our static file HTTP server. Let's take a look:

ARG REPO_LOCATION

FROM ${REPO_LOCATION}minideb-node

RUN npm install --global serve && rm -rf "$(npm get cache)";

VOLUME [ "/srv" ]

USER 80:80

ENV NODE_ENV production
WORKDIR /srv
ENTRYPOINT [ "serve", "-l", "5000" ]

This looks similar to the previous Dockerfile, but with a few extra bits added on. Firstly, we use a RUN directive to install the serve npm package and delete the NPM cache in a single command (since we don't want the npm cache sticking around in the final Docker image).

We then use a VOLUME declaration to tell Docker that we expect the /srv to have a volume mounted to it. A volume here is a directory from the host system that will be mounted into the Docker container before it starts running. In this case, it's the web root that we'll be serving files from.

A USER directive tells Docker what user and group IDs we want to run all subsequent commands as. This is important, as it's a bad idea to run Docker containers as root.

The ENV directive there is just to tell Node.js it should run in production mode. Some Node.js applications have some optimisations they enable when this environment variable is set.

The WORKDIR directive defines the current working directory for future commands. It functions like the cd command in your terminal or command line. In this case, the serve npm package always serves from the current working directory - hence we set the working directory here.

Finally, the ENTRYPOINT directive tells Docker what command to execute by default. The ENTRYPOINT can get quite involved and complex, but we're keeping it simple here and telling it to execute the serve command (provided by the serve npm package, which we installed globally earlier in the Dockerfile). We also specify the port number we want serve to listen on with -l 5000 there.

That completes the Dockerfile for the serve npm package. Build it as before, and then you should be able to run it like so:

docker run -it --rm -v /absolute/path/to/local_dir:/srv node-serve

As before, prefix node-serve with the address of your private Docker registry if you're using one. The -v bit above defines the Docker volume that mounts the webroot directory inside the Docker container.

Then, you should be able to find the IP address of the Docker container and enter it into your web browser to connect to the running server!

The URL should be something like this: http://IP_ADDRESS_HERE:5000/.

If you're not running Docker on the same machine as your web browser is running on, then you'll need to do some fancy footwork to get it to display. It's at this point that I write a Nomad job file, and wire it up to Fabio my load balancer.

In the next post, we'll talk more about Fabio. We'll also look at the networking and architecture that glues the whole system together. Finally, we'll look at setting up HTTPS with Let's Encrypt and the DNS-01 challenge (which I found relatively simple - but only once I'd managed to install a new enough version of certbot - which was a huge pain!).

Digitising old audio CDs on a Linux Server

A number of people I know own a number of audio / music CDs. This is great, but unfortunately increasingly laptops aren't coming with an optical drive any more, which makes listening to said CDs challenging. To this end, making a digital copy to add to their personal digital music collections would be an ideal solution.

Recently, I build a new storage NAS (which I'm still in the process of deciding on a filesystem for, but I think I might be going with btrfs + raid1), and the Fractal Design Node 804 case I used has a dedicated space for a slimline DVD writer (e.g. like the one you might find in a car). I've found this to be rather convenient for making digital copies of old audio CDs, and wanted to share the process by which I do it in case you'd like to do it too.

To start, I'm using Ubuntu Server 20.04. This may work on other distributions too, but there are a whole bunch of packages you'll need to install - the names and commands for which you may need to convert for your distribution.

To make the digital copies, we'll be using abcde. I can't find an updated website for it, but it stands for "A Better CD Encoder". It neatly automates much of the manual labor of digitising CDs - including the downloading of metadata from the Internet. To tidy things up after abcde has run to completion, we'll be using ffmpeg for conversion and eyeD3 for mp3 metadata manipulation.

To get started, let's install some stuff!

sudo apt install --no-install-recommends abcde
sudo apt install ffmpeg mkcue eyed3 flac glyrc cdparanoia imagemagick

Lots of dependencies here. Many of them are required by abcde for various features we'll be making use of.

Next, insert the audio CD into the DVD drive. abcde assumes your DVD drive is located at /dev/sr0 I think, so if it's different you'll have to adjust the flags you pass to it.

Once done, we can call abcde and get it to make a digital copy of our CD. I recommend here that you cd to a new blank directory, as abcde creates 1 subdirectory of the current working directory for each album it copies. When you're ready, start abcde:

abcde -o flac -B -b

Here, we call abcde and ask it to save the digital copy as flac files. The reason we do this and not mp3 directly is that I've observed abcde gets rather confused with the metadata that way. By saving to flac files first, we can ensure the metadata is saved correctly.

The arguments above do the following:

  • -o flac: Save to flac files
  • -B: Automatically embed the album art into the saved music files if possible
  • -b: Preserve the relative volume differences between tracks in the album (if replaygain is enabled, which by default I don't think it is)

It will ask you a number of questions interactively. Once you've answered them, it will get to work copying the audio from the CD.

When it's done, everything should be good to go! However flac files can be large, so something more manageable is usually desired. For this, we can mass-convert our flac files to MP3. This can be done like so:

find -iname '*.flac' -type f -print0 | nice -n20 xargs -P "$(nproc)" --null --verbose -n1 -I{} sh -c 'old="{}"; new="${old%.*}.mp3"; ffmpeg -i "${old}" -ab 320k -map_metadata 0 -id3v2_version 3 "${new}";';

There's a lot to unpack here! Before I do though, let's turn it into a bash function real quick which we can put in ~/.bash_aliases for example to make it easy to invoke in the future:

# Usage:
#   flac2mp3
#   flac2mp3 path/to/directory
flac2mp3() {
    dir="${1}";
    if [[ -z "${dir}" ]]; then dir="."; fi
    find "${dir}" -iname '*.flac' -type f -print0 | nice -n20 xargs -P "$(nproc)" --null --verbose -n1 -I{} sh -c 'old="{}"; new="${old%.*}.mp3"; ffmpeg -i "${old}" -ab 320k -map_metadata 0 -id3v2_version 3 "${new}";';
}

Ah, that's better. Now, let's deconstruct it and figure out how it works. First, we have a dir variable which, by default, is set to the current working directory.

Next, we use the one-liner from before to mass-convert all flac files in the target directory recursively to mp3. It's perhaps easier to digest if we separate it out int multiple lines:

find "${dir}" -iname '*.flac' -type f -print0   # Recursively find all flac files, delimiting them with NULL (\0) characters
    | nice -n20 # Push the task into the background
        xargs # for each line of input, execute a command
            --null # Lines are delimited by NULL (\0) characters
            --verbose # Print the command that is about to be executed
            -P "$(nproc)" # Parallelise across as many cores as the machine has
            -n1 # Only pass 1 line to the command to be executed
            -I{} # Replace {} with the filename in question
            sh -c ' # Run this command
                old="{}"; # The flac filename
                new="${old%.*}.mp3"; # Replace the .flac file extension with .mp3
                ffmpeg # Call ffmpeg to convert it to mp3
                    -i "${old}" # Input the flac file
                    -ab 320k # Encode to 320kbps, the max supported by ffmpeg
                    -map_metadata 0 # Copy all the metadata
                    -id3v2_version 3 # Set the metadata tags version (may not be necessary)
                    -c:v copy -disposition:v:0 attached_pic # Copy the album art if it exists
                    "${new}"; # Output to mp3
            '; # End of command to be executed

Obviously it won't actually work when exploded and commented like this, but hopefully it gives a sense of how it functions.

I recommend checking that the album art has been transferred over. The -c:v copy -disposition:v:0 attached_pic bit in particular is required to ensure this happens (see this Unix Stack Exchange answer to a question I asked).

Sometimes abcde is unable to locate album art too, so you may need to find and download it yourself. If so, then this one-liner may come in handy:

find , -type f -iname '*.mp3' -print0 | xargs -0 -P "$(nproc)" eyeD3 --add-image "path/to/album_art.jpeg:FRONT_COVER:";

Replace path/to/album_art.jpeg with the path to the album art. Wrapping it in a bash function ready for ~/.bash_aliases makes it easier to use:

mp3cover() {
    cover="${1}";
    dir="${2}";

    if [[ -z "${cover}" ]] || [[ -z "${dir}" ]]; then
        echo "Usage:" >&2;
        echo "    mp3cover path/to/cover_image.jpg path/to/album_dir";
        return 0;
    fi

    find "${dir}" -type f -iname '*.mp3' -print0 | xargs -0 -P "$(nproc)" eyeD3 --add-image "${cover}:FRONT_COVER:"
}

Use it like this:

mp3cover path/to/cover_image.jpg path/to/album_dir

By this point, you should have successfully managed to make a digital copy of an audio CD. If you're experiencing issues, comment below and I'll try to help out.

Note that if you experience any issues with copy protection (I think this is only DVDs / films and not audio CDs, which I don't intend to investigate), I can't and won't help you, because it's there for a reason (even if I don't like it) and it's illegal to remove it - so please don't comment in this specific case.

NAS, Part 1: We need a bigger rocket

In my cluster series of posts, I've been talking about how I've built a Raspberry Pi-based cluster for running compute tasks (latest update: I've got Let's Encrypt working with the DNS-01 challenge, stay tuned for a post on that soon). Currently, this has been backed by a Raspberry Pi 3 with a 1TB WD PiDrive attached. This has a number of issues:

  • The Raspberry Pi 3 has a 100mbps network port
  • It's not redundant
  • I'm running out of storage space

I see 2 ways of solving these issues:

  1. Building a clustered file system, with 1 3.5 inch drive per Pi (or Odroid HC2 perhaps)
  2. Building a more traditional monolithic NAS

Personally, my preference here is option #2, but unfortunately due to some architectural issues in my house (read: the wiring needs redoing by an electrician) I don't actually have access to the number of wall sockets I'd need to put together a clustered setup. If I get those issues sorted, I'll certainly take a look at upgrading - but for now I've decided that I'm going to put together a more traditional monolithic NAS (maybe it can become the backup device in future, who knows) as it will only require a single wall socket (the situation is complicated. Let's just move on).

To this end, I decided to start with a case and go from there. Noise is a big concern for me, so I chose the Fractal Design Node 804, as it has a number of key features:

  • It has lots of space for disks
  • It comes with some quiet fans
  • The manufacturer appears to be quite popular and reputable

From here, I picked the basic components for the system using PC Part Picker. I haven't actually built an amd64 system from scratch before - I use laptops as my main device (see my recent review of the PC Specialist Proteus VIII), and Raspberry Pis (and an awesome little 2nd hand Netgear GS116v2 switch) currently form the backbone of my server setup.

These components included:

  • An ASUS PRIME B450M-A motherboard: 6 x SATA ports, AM4 CPU socket
  • An AMD Athlon 3000G: I don't need much compute horsepower in this build, since it's for storage (I would have got an Athlon 200GE instead as it's cheaper, but they were all out of stock)
  • 8GB Corsair Vengeance LPX DDR4 2666MHz RAM: The highest frequency the CPU supports - I got a single stick here to start with. I'll add additional sticks as and when I need them.
  • 120GB Gigabyte SSD: For the OS. Don't need a lot of storage here, since all the data is going to be on 3.5 inch HDDs instead
  • 3 x 4TB WD Red Plus WD40EFRX (CMR): These are my main data storage drives. I'm starting with 3 4TB drives, and I'll add more as I need them. The Node 804 case (mentioned above) supports up to 10 disks, apparently - so I should have plenty of space.
  • SeaSonic CORE GM 500 W 80+ Gold PSU: The most efficient PSU I could afford. I would have loved an 80+ titanium (apparently they are at least 94% efficient at 50% load), but at £250+ it's too much for my budget.
  • LG GS40N DVD writer: Apparently the Node 804 case as a slimline DVD drive slot (i.e. like one you might find in a car). It wasn't too expensive and being able to ingest CD/DVDs is appealing.

For the storage there, in particular my (initial) plan is to use OpenZFS in RAIDZ mode, which has a minimum requirement of 3 drives. Using an online calculator suggests that with the above drives I'll have 8TB of usable capacity. Initial research does suggest though that expanding a ZFS storage pool may not be as easy as I thought it was (related, see also), so more research is definitely needed before I commit to a single filesystem / set of settings there.

I've heard of BTRFS too, but I've also heard of some stability and data loss issues too. That was several years ago though, so I'll be reviewing its suitability again before making a decision here.

In future posts, I'm going to talk about my experience assembling the build. I'm also going to look at how I eventually setup the filesystem (as of yet which filesystem I'll choose is still undecided). I'll also be running some tests on the setup to evaluate how well it performs and handles failure. Finally, I may make a bonus post in this series about the challenges I encounter migrating my existing (somewhat complicated) data storage setup to the new NAS I build.

Found this interesting? Got a suggestion? Comment below!

Resizing Encrypted LVM Partitions on Linux

I found recently that I needed to resize some partitions on my new laptop as the Ubuntu installer helpfully decided to create only a 1GB swap partition, which is nowhere near enough for hibernation (you need a swap partition that's at least as big as your computer's RAM in order to hibernate). Unfortunately resizing my swap partition didn't allow me to hibernate successfully in the end, but I thought I'd still document the process here for future reference should I need to do it again in the future.

The key problem with resizing one's root partition is that you can't resize it without unmounting it, and you can't unmount it without turning off your computer. To get around this, we need to use a live distribution of Ubuntu. It doesn't actually matter how you boot into this - personally my preferred method is by using a multiboot USB flash drive, but you could just as well flash the latest ubuntu ISO to a flash drive directly.

Before you start though, it's worth mentioning that you really should have a solid backup strategy. While everything will probably be fine, there is a chance that you'll make a mistake and wind up loosing a lot of data. My favourite website that illustrates this is The Tao of Backup. Everyone who uses a computer (technically minded or not) should read it. Another way to remember it is the 3-2-1 rule: 3 backups, in 2 locations, with 1 off-site (i.e. in a different physical location).

Anyway, once you've booted into a live Ubuntu environment, open the terminal, and start a root shell. Your live distribution should come with LUKS and LVM already, but just in case it doesn't execute the following:

sudo apt update && sudo apt install -y lvm2 cryptsetup

I've talked about LVM recently when I was setting up an LVM-managed partition on an extra data hard drive for my research data. If you've read that post, then the process here may feel a little familiar to you. In this case, we're interacting with a pre-existing LVM setup that's encrypted with LUKS instead of setting up a new one. The overall process look a bit like this:

A flowchart showing the process we're going to follow. In short: open luks → LVM up → make changes → LVM down → close luks → reboot

With this in mind, let's get started. The first order of business is unlocking the LUKS encryption on the drive. This is done like so:

sudo modprobe dm-crypt
sudo cryptsetup luksOpen /dev/nvme0n1p3 crypt1

The first command there ensures that the LUKS kernel module is loaded if it isn't already, and the second unlocks the LUKS-encrypted drive. Replace /dev/nvme0n1p3 with the path to your LVM partition - e.g. /dev/sda1 for instance. The second command will prompt you for the password to unlock the drive.

It's worth mentioning here before continuing the difference between physical partitions and LVM partitions. Physical partitions are those found in the partition table on the physical disk itself, that you may find in a partition manage like GParted.

LVM partitions - for the purpose of this blog post - are those exposed by LVM. They are virtual partitions that don't have a physical counterpart on disk and are handled internally by LVM. As far as I know, you can't ask LVM easily where it stores them on disk - this is calculated and managed automatically for you.

In order to access our logical LVM partitions, the next step is to bring up LVM. To do this, we need to get LVM to re-scan the available physical partitions since we've just unlocked the one we want it to use:

sudo vgscan --mknodes

Then, we activate it:

sudo vgchange -ay

At this point, we can now do our maintenance and make any changes we need to. A good command to remember here is lvdisplay, which lists all the available LVM partitions and their paths:

sudo lvdisplay

In my case, I have /dev/vgubuntu/root and /dev/vgubuntu/swap_1. tldr-pages (for which I'm a maintainer) has a number of great LVM-related pages that were contributed relatively recently which are really helpful here. For example, to resize a logical LVM partition to be a specific size, do something like this:

sudo lvresize -L 32G /dev/vgubuntu/root

To extend a partition to fill all the remaining available free space, do something like this:

sudo lvextend -l +100%FREE /dev/vgubuntu/root

After resizing a partition, don't forget to run resize2fs. It ensures that the ext4 filesystem on top matches the same size as the logical LVM partition:

sudo resize2fs /dev/vgubuntu/root

In all of the above, replace /dev/vgubuntu/root with the path to your logical LVM partition in question of course.

Once you're done making changes, we need to stop LVM and close the LUKS encrypted disk to ensure all the changes are saved properly and to avoid any issues. This is done like so:

sudo vgchange -an
sudo cryptsetup luksClose crypt1

With that, you're done! You can now reboot / shutdown from inside the live Ubuntu environment and boot back into your main operating system. All done!

Found this helpful? Encountering issues? Comment below! It really helps my motivation.

Art by Mythdael