Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

NAS Backups, Part 2: Btrfs send / receive

Hey there! In the first post of this series, I talked about my plan for a backup NAS to complement my main NAS. In this part, I'm going to show the pair of scripts I've developed to take care of backing up btrfs snapshots.

The first script is called snapshot-send.sh, and it:

  1. Calculates which snapshot it is that requires sending
  2. Uses SSH to remote into the backup NAS
  3. Pipes the output of btrfs send to snapshot-receive.sh on the backup NAS that is called with sudo

Note there that while sudo is used for calling snapshot-receive.sh, the account it uses to SSH into the backup NAS, it doesn't have completely unrestricted sudo access. Instead, a sudo rule is used to restrict it to allow only specific commands to be called (without a password, as this is intended to be a completely automated and unattended system).

The second script is called snapshot-receive.sh, and it receives the output of btrfs send and pipes it to btrfs receive. It also has some extra logic to delete old snapshots and stuff like that.

Both of these are designed to be command line programs in their own right with a simple CLI, and useful error / help messages to assist in understanding it when I come back to it to fix an issue or extend it after many months.

snapshot-send.sh

As described above, snapshot-send.sh sends btrfs snapshot to a remote host via SSH and the snapshot-receive.sh script.

Before we continue and look at it in detail, it is important to note that snapshot-send.sh depends on btrfs-snapshot-rotation. If you haven't already done so, you should set that up first before setting up my scripts here.

If you have btrfs-snapshot-rotation setup correctly, you should have something like this in your crontab:

# Btrfs automatic snapshots
0 * * * *       cronic /root/btrfs-snapshot-rotation/btrfs-snapshot /mnt/some_btrfs_filesystem/main /mnt/some_btrfs_filesystem/main/.snapshots hourly 8
0 2 * * *       cronic /root/btrfs-snapshot-rotation/btrfs-snapshot /mnt/some_btrfs_filesystem/main /mnt/some_btrfs_filesystem/main/.snapshots daily 4
0 2 * * 7       cronic /root/btrfs-snapshot-rotation/btrfs-snapshot /mnt/some_btrfs_filesystem/main /mnt/some_btrfs_filesystem/main/.snapshots weekly 4

I use cronic there to reduce unnecessary emails. I also have a subvolume there for the snapshots:

sudo btrfs subvolume create /mnt/some_btrfs_filesystem/main/.snapshots

Because Btrfs does not take take a snapshot of any child subvolumes when it takes a snapshot, I can use this to keep all my snapshots organised and associated with the subvolume they are snapshots of.

If done right, ls /mnt/some_btrfs_filesystem/main/.snapshots should result in something like this:

2021-07-25T02:00:01+00:00-@weekly  2021-08-17T07:00:01+00:00-@hourly
2021-08-01T02:00:01+00:00-@weekly  2021-08-17T08:00:01+00:00-@hourly
2021-08-08T02:00:01+00:00-@weekly  2021-08-17T09:00:01+00:00-@hourly
2021-08-14T02:00:01+00:00-@daily   2021-08-17T10:00:01+00:00-@hourly
2021-08-15T02:00:01+00:00-@daily   2021-08-17T11:00:01+00:00-@hourly
2021-08-15T02:00:01+00:00-@weekly  2021-08-17T12:00:01+00:00-@hourly
2021-08-16T02:00:01+00:00-@daily   2021-08-17T13:00:01+00:00-@hourly
2021-08-17T02:00:01+00:00-@daily   last_sent_@daily.txt
2021-08-17T06:00:01+00:00-@hourly

Ignore the last_sent_@daily.txt there for now - it's created by snapshot-send.sh so that it can remember the name of the snapshot it last sent. We'll talk about it later.

With that out of the way, let's start going through snapshot-send.sh! First up is the CLI and associated error handling:

#!/usr/bin/env bash
set -e;

dir_source="${1}";
tag_source="${2}";
tag_dest="${3}";
loc_ssh_key="${4}";
remote_host="${5}";

if [[ -z "${remote_host}" ]]; then
    echo "This script sends btrfs snapshots to a remote host via SSH.
The script snapshot-receive must be present on the remote host in the PATH for this to work.
It pairs well with btrfs-snapshot-rotation: https://github.com/mmehnert/btrfs-snapshot-rotation
Usage:
    snapshot-send.sh <snapshot_dir> <source_tag_name> <dest_tag_name> <ssh_key> <user@example.com>

Where:
    <snapshot_dir> is the path to the directory containing the snapshots
    <source_tag_name> is the tag name to look for (see btrfs-snapshot-rotation).
    <dest_tag_name> is the tag name to use when sending to the remote. This must be unique across all snapshot rotations sent.
    <ssh_key> is the path to the ssh private key
    <user@example.com> is the user@host to connect to via SSH" >&2;
    exit 0;
fi

# $EUID = effective uid
if [[ "${EUID}" -ne 0 ]]; then
    echo "Error: This script must be run as root (currently running as effective uid ${EUID})" >&2;
    exit 5;
fi

if [[ ! -e "${loc_ssh_key}" ]]; then
    echo "Error: When looking for the ssh key, no file was found at '${loc_ssh_key}' (have you checked the spelling and file permissions?)." >&2;
    exit 1;
fi
if [[ ! -d "${dir_source}" ]]; then
    echo "Error: No source directory located at '${dir_source}' (have you checked the spelling and permissions?)" >&2;
    exit 2;
fi

###############################################################################

Pretty simple stuff. snapshot-send.sh is called like so:

snapshot-send.sh /absolute/path/to/snapshot_dir SOURCE_TAG DEST_TAG_NAME path/to/ssh_key user@example.com

A few things to unpack here.

  • /absolute/path/to/snapshot_dir is the path to the directory (i.e. btrfs subvolume) containing the snapshots we want to read, as described above.
  • SOURCE_TAG: Given the directory (subvolume) name of a snapshot (e.g. 2021-08-17T02:00:01+00:00-@daily), then the source tag is the bit at the end after the at sign @ - e.g. daily.
  • DEST_TAG_NAME: The tag name to give the snapshot on the backup NAS. Useful, because you might have multiple subvolumes you snapshot with btrfs-snapshot-rotation and they all might have snapshots with the daily tag.
  • path/to/ssh_key: The path to the (unencrypted!) SSH key to use to SSH into the remote backup NAS.
  • user@example.com: The user and hostname of the backup NAS to SSH into.

This is a good time to sort out the remote user we're going to SSH into (we'll sort out snapshot-receive.sh and the sudo rules in the next section below).

Assuming that you already have a Btrfs filesystem setup and automounting on boot on the remote NAS, do this:


sudo useradd --system --home /absolute/path/to/btrfs-filesystem/backups backups
sudo groupadd backup-senders
sudo usermod -a -G backup-senders backups
cd /absolute/path/to/btrfs-filesystem/backups
sudo mkdir .ssh
sudo touch .ssh/authorized_keys
sudo chown -R backups:backups .ssh
sudo chmod -R u=rwX,g=rX,o-rwx .ssh

Then, on the main NAS, generate the SSH key:


mkdir -p /root/backups && cd /root/backups
ssh-keygen -t ed25519 -C backups@main-nas -f /root/backups/ssh_key_backup_nas_ed25519

Then, copy the generated SSH public key to the authorized_keys file on the backup NAS (located at /absolute/path/to/btrfs-filesystem/backups/.ssh/authorized_keys).

Now that's sorted, let's continue with snapshot-send.sh. Next up are a few miscellaneous functions:


# The filepath to the last sent text file that contains the name of the snapshot that was last sent to the remote.
# If this file doesn't exist, then we send a full snapshot to start with.
# We need to keep track of this because we need this information to know which
# snapshot we need to parent the latest snapshot from to send snapshots incrementally.
filepath_last_sent="${dir_source}/last_sent_@${tag_source}.txt";

## Logs a message to stderr.
# $*    The message to log.
log_msg() {
    echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}";
}

## Lists all the currently available snapshots for the current source tag.
list_snapshots() {
    find "${dir_source}" -maxdepth 1 ! -path "${dir_source}" -name "*@${tag_source}" -type d;
}

## Returns an exit code of 0 if we've sent a snapshot, or 1 if we haven't.
have_sent() {
    if [[ ! -f "${filepath_last_sent}" ]]; then
        return 1;
    else
        return 0;
    fi
}

## Fetches the directory name of the last snapshot sent to the remote with the given tag name.
last_sent() {
    if [[ -f "${filepath_last_sent}" ]]; then
        cat "${filepath_last_sent}";
    fi
}

# Runs snapshot-receive on the remote host.
do_ssh() {
    ssh -o "ServerAliveInterval=900" -i "${loc_ssh_key}" "${remote_host}" sudo snapshot-receive "${tag_dest}";
}

Particularly of note is the filepath_last_sent variable - this is set to the path to that text file I mentioned earlier.

Other than that it's all pretty well commented, so let's continue on. Next, we need to determine the name of the latest snapshot:

latest_snapshot="$(list_snapshots | sort | tail -n1)";
latest_snapshot_dirname="$(dirname "${latest_snapshot}")";

With this information in hand we can compare it to the last snapshot name we sent. We store this in the text file mentioned above - the path to which is stored in the filepath_last_sent variable.

if [[ "$(dirname "${latest_snapshot_dirname}")" == "$(cat "${filepath_last_sent}")" ]]; then
    if [[ -z "${FORCE_SEND}" ]]; then
        echo "We've sent the latest snapshot '${latest_snapshot_dirname}' already and the FORCE_SEND environment variable is empty or not specified, skipping";
        exit 0;
    else
        echo "We've sent it already, but sending it again since the FORCE_SEND environment variable is specified";
    fi
fi

If the latest snapshot has the same name as the one we last send, we exit out - unless the FORCE_SEND environment variable is specified (to allow for an easy way to fix stuff if it goes wrong on the other end).

Now, we can actually send the snapshot to the remote:


if ! have_sent; then
    log_msg "Sending initial snapshot $(dirname "${latest_snapshot}")";
    btrfs send "${latest_snapshot}" | do_ssh;
else
    parent_snapshot="${dir_source}/$(last_sent)";
    if [[ ! -d "${parent_snapshot}" ]]; then
        echo "Error: Failed to locate parent snapshot at '${parent_snapshot}'" >&2;
        exit 3;
    fi

    log_msg "Sending incremental snapshot $(dirname "${latest_snapshot}") parent $(last_sent)";
    btrfs send -p "${parent_snapshot}" "${latest_snapshot}" | do_ssh;
fi

have_sent simply determines if we have previously sent a snapshot before. We know this by checking the filepath_last_sent text file.

If we haven't, then we send a full snapshot rather than an incremental one. If we're sending an incremental one, then we find the parent snapshot (i.e. the one we last sent). If we can't find it, we generate an error (it's because of this that you need to store at least 2 snapshots at a time with btrfs-snapshot-rotation).

After sending a snapshot, we need to update the filepath_last_sent text file:

log_msg "Updating state information";
basename "${latest_snapshot}" >"${filepath_last_sent}";
log_msg "Snapshot sent successfully";

....and that concludes snapshot-send.sh! Once you've finished reading this blog post and testing your setup, put your snapshot-send.sh calls in a script in /etc/cron.daily or something.

snapshot-receive.sh

Next up is the receiving end of the system. The CLI for this script is much simpler, on account of sudo rules only allowing exact and specific commands (no wildcards or regex of any kind). I put snapshot-receive.sh in /usr/local/sbin and called it snapshot-receive.

Let's get started:

#!/usr/bin/env bash

# This script wraps btrfs receive so that it can be called by non-root users.
# It should be saved to '/usr/local/sbin/snapshot-receive' (without quotes, of course).
# The following entry needs to be put in the sudoers file:
# 
# %backup-senders   ALL=(ALL) NOPASSWD: /usr/local/sbin/snapshot-receive TAG_NAME
# 
# ....replacing TAG_NAME with the name of tag you want to allow. You'll need 1 line in your sudoers file per tag you want to allow.
# Edit your sudoers file like this:
# sudo visudo

# The ABSOLUTE path to the target directory to receive to.
target_dir="CHANGE_ME";

# The maximum number of backups to keep.
max_backups="7";

# Allow only alphanumeric characters in the tag
tag="$(echo "${1}" | tr -cd '[:alnum:]-_')";

snapshot-receive.sh only takes a single argument, and that's the tag it should use for the snapshot being received:


sudo snapshot-receive DEST_TAG_NAME

The target directory it should save snapshots to is stored as a variable at the top of the file (the target_dir there). You should change this based on your specific setup. It goes without saying, but the target directory needs to be a directory on a btrfs filesystem (preferable raid1, though as I've said before btrfs raid1 is a misnomer). We also ensure that the tag contains only safe characters for security.

max_backups is the maximum number of snapshots to keep. Any older snapshots will be deleted.

Next, ime error handling:

###############################################################################

# $EUID = effective uid
if [[ "${EUID}" -ne 0 ]]; then
    echo "Error: This script must be run as root (currently running as effective uid ${EUID})" >&2;
    exit 5;
fi

if [[ -z "${tag}" ]]; then
    echo "Error: No tag specified. It should be specified as the 1st and only argument, and may only contain alphanumeric characters." >&2;
    echo "Example:" >&2;
    echo "    snapshot-receive TAG_NAME_HERE" >&2;
    exit 4;
fi

Nothing too exciting. Continuing on, a pair of useful helper functions:


###############################################################################

## Logs a message to stderr.
# $*    The message to log.
log_msg() {
    echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}";
}

list_backups() {
    find "${target_dir}/${tag}" -maxdepth 1 ! -path "${target_dir}/${tag}" -type d;
}

list_backups lists the snapshots with the given tag, and log_msg logs messages to stdout (not stderr unless there's an error, because otherwise cronic will dutifully send you an email every time the scripts execute). Next up, more error handling:

###############################################################################

if [[ "${target_dir}" == "CHANGE_ME" ]]; then
    echo "Error: target_dir was not changed from the default value." >&2;
    exit 1;
fi

if [[ ! -d "${target_dir}" ]]; then
    echo "Error: No directory was found at '${target_dir}'." >&2;
    exit 2;
fi

if [[ ! -d "${target_dir}/${tag}" ]]; then
    log_msg "Creating new directory at ${target_dir}/${tag}";
    mkdir "${target_dir}/${tag}";
fi

We check:

  • That the target directory was changed from the default CHANGE_ME value
  • That the target directory exists

We also create a subdirectory for the given tag if it doesn't exist already.

With the preamble completed, we can actually receive the snapshot:

log_msg "Launching btrfs in chroot mode";

time nice ionice -c Idle btrfs receive --chroot "${target_dir}/${tag}";

We use nice and ionice to reduce the priority of the receive to the lowest possible level. If you're using a Raspberry Pi (I have a Raspberry Pi 4 with 4GB RAM) like I am, this is important for stability (Pis tend to fall over otherwise). Don't worry if you experience some system crashes on your Pi when transferring the first snapshot - I've found that incremental snapshots don't cause the same issue.

We also use the chroot option there for increased security.

Now that the snapshot is transferred, we can delete old snapshots if we have too many:

backups_count="$(echo -e "$(list_backups)" | wc -l)";

log_msg "Btrfs finished, we now have ${backups_count} backups:";
list_backups;

while [[ "${backups_count}" -gt "${max_backups}" ]]; do
    oldest_backup="$(list_backups | sort | head -n1)";
    log_msg "Maximum number backups is ${max_backups}, requesting removal of backup for $(dirname "${oldest_backup}")";

    btrfs subvolume delete "${oldest_backup}";

    backups_count="$(echo -e "$(list_backups)" | wc -l)";
done

log_msg "Done, any removed backups will be deleted in the background";

Sorted! The only thing left to do here is to setup those sudo rules. Let's do that now. Execute sudoedit /etc/sudoers, and enter the following:

%backup-senders ALL=(ALL) NOPASSWD: /usr/local/sbin/snapshot-receive TAG_NAME

Replace TAG_NAME with the DEST_TAG_NAME you're using. You'll need 1 entry in /etc/sudoers for each DEST_TAG_NAME you're using.

We assign the rights to the backup-senders group we created earlier, of which the user we are going to SSH in with is a member. This make the system more flexible should we want to extend it later.

Warning: A mistake in /etc/sudoers can leave you unable to use sudo! Make sure you have a root shell open in the background and that you test sudo again after making changes to ensure you haven't made a mistake.

That completes the setup of snapshot-receive.sh.

Conclusion

With snapshot-send.sh and snapshot-receive.sh, we now have a system for transferring snapshots from 1 host to another via SSH. If combined with full disk encryption (e.g. with LUKS), this provides a secure backup system with a number of desirable qualities:

  • The main NAS can't access the backups on the backup NAS (in case fo ransomware)
  • Backups are encrypted during transfer (via SSH)
  • Backups are encrypted at rest (LUKS)

To further secure the backup NAS, one could:

  • Disable SSH password login
  • Automatically start / shutdown the backup NAS (though with full disk encryption when it boots up it would require manual intervention)

At the bottom of this post I've included the full scripts for you to copy and paste.

As it turns out, there will be 1 more post in this series, which will cover generating multiple streams of backups (e.g. weekly, monthly) from a single stream of e.g. daily backups on my backup NAS.

Sources and further reading

Full scripts

snapshot-send.sh

#!/usr/bin/env bash
set -e;

dir_source="${1}";
tag_source="${2}";
tag_dest="${3}";
loc_ssh_key="${4}";
remote_host="${5}";

if [[ -z "${remote_host}" ]]; then
    echo "This script sends btrfs snapshots to a remote host via SSH.
The script snapshot-receive must be present on the remote host in the PATH for this to work.
It pairs well with btrfs-snapshot-rotation: https://github.com/mmehnert/btrfs-snapshot-rotation
Usage:
    snapshot-send.sh <snapshot_dir> <source_tag_name> <dest_tag_name> <ssh_key> <user@example.com>

Where:
    <snapshot_dir> is the path to the directory containing the snapshots
    <source_tag_name> is the tag name to look for (see btrfs-snapshot-rotation).
    <dest_tag_name> is the tag name to use when sending to the remote. This must be unique across all snapshot rotations sent.
    <ssh_key> is the path to the ssh private key
    <user@example.com> is the user@host to connect to via SSH" >&2;
    exit 0;
fi

# $EUID = effective uid
if [[ "${EUID}" -ne 0 ]]; then
    echo "Error: This script must be run as root (currently running as effective uid ${EUID})" >&2;
    exit 5;
fi

if [[ ! -e "${loc_ssh_key}" ]]; then
    echo "Error: When looking for the ssh key, no file was found at '${loc_ssh_key}' (have you checked the spelling and file permissions?)." >&2;
    exit 1;
fi
if [[ ! -d "${dir_source}" ]]; then
    echo "Error: No source directory located at '${dir_source}' (have you checked the spelling and permissions?)" >&2;
    exit 2;
fi

###############################################################################

# The filepath to the last sent text file that contains the name of the snapshot that was last sent to the remote.
# If this file doesn't exist, then we send a full snapshot to start with.
# We need to keep track of this because we need this information to know which
# snapshot we need to parent the latest snapshot from to send snapshots incrementally.
filepath_last_sent="${dir_source}/last_sent_@${tag_source}.txt";

## Logs a message to stderr.
# $*    The message to log.
log_msg() {
    echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}";
}

## Lists all the currently available snapshots for the current source tag.
list_snapshots() {
    find "${dir_source}" -maxdepth 1 ! -path "${dir_source}" -name "*@${tag_source}" -type d;
}

## Returns an exit code of 0 if we've sent a snapshot, or 1 if we haven't.
have_sent() {
    if [[ ! -f "${filepath_last_sent}" ]]; then
        return 1;
    else
        return 0;
    fi
}

## Fetches the directory name of the last snapshot sent to the remote with the given tag name.
last_sent() {
    if [[ -f "${filepath_last_sent}" ]]; then
        cat "${filepath_last_sent}";
    fi
}

do_ssh() {
    ssh -o "ServerAliveInterval=900" -i "${loc_ssh_key}" "${remote_host}" sudo snapshot-receive "${tag_dest}";
}

latest_snapshot="$(list_snapshots | sort | tail -n1)";
latest_snapshot_dirname="$(dirname "${latest_snapshot}")";

if [[ "$(dirname "${latest_snapshot_dirname}")" == "$(cat "${filepath_last_sent}")" ]]; then
    if [[ -z "${FORCE_SEND}" ]]; then
        echo "We've sent the latest snapshot '${latest_snapshot_dirname}' already and the FORCE_SEND environment variable is empty or not specified, skipping";
        exit 0;
    else
        echo "We've sent it already, but sending it again since the FORCE_SEND environment variable is specified";
    fi
fi

if ! have_sent; then
    log_msg "Sending initial snapshot $(dirname "${latest_snapshot}")";
    btrfs send "${latest_snapshot}" | do_ssh;
else
    parent_snapshot="${dir_source}/$(last_sent)";
    if [[ ! -d "${parent_snapshot}" ]]; then
        echo "Error: Failed to locate parent snapshot at '${parent_snapshot}'" >&2;
        exit 3;
    fi

    log_msg "Sending incremental snapshot $(dirname "${latest_snapshot}") parent $(last_sent)";
    btrfs send -p "${parent_snapshot}" "${latest_snapshot}" | do_ssh;
fi


log_msg "Updating state information";
basename "${latest_snapshot}" >"${filepath_last_sent}";
log_msg "Snapshot sent successfully";

snapshot-receive.sh

#!/usr/bin/env bash

# This script wraps btrfs receive so that it can be called by non-root users.
# It should be saved to '/usr/local/sbin/snapshot-receive' (without quotes, of course).
# The following entry needs to be put in the sudoers file:
# 
# %backup-senders   ALL=(ALL) NOPASSWD: /usr/local/sbin/snapshot-receive TAG_NAME
# 
# ....replacing TAG_NAME with the name of tag you want to allow. You'll need 1 line in your sudoers file per tag you want to allow.
# Edit your sudoers file like this:
# sudo visudo

# The ABSOLUTE path to the target directory to receive to.
target_dir="CHANGE_ME";

# The maximum number of backups to keep.
max_backups="7";

# Allow only alphanumeric characters in the tag
tag="$(echo "${1}" | tr -cd '[:alnum:]-_')";

###############################################################################

# $EUID = effective uid
if [[ "${EUID}" -ne 0 ]]; then
    echo "Error: This script must be run as root (currently running as effective uid ${EUID})" >&2;
    exit 5;
fi

if [[ -z "${tag}" ]]; then
    echo "Error: No tag specified. It should be specified as the 1st and only argument, and may only contain alphanumeric characters." >&2;
    echo "Example:" >&2;
    echo "    snapshot-receive TAG_NAME_HERE" >&2;
    exit 4;
fi

###############################################################################

## Logs a message to stderr.
# $*    The message to log.
log_msg() {
    echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}";
}

list_backups() {
    find "${target_dir}/${tag}" -maxdepth 1 ! -path "${target_dir}/${tag}" -type d;
}

###############################################################################

if [[ "${target_dir}" == "CHANGE_ME" ]]; then
    echo "Error: target_dir was not changed from the default value." >&2;
    exit 1;
fi

if [[ ! -d "${target_dir}" ]]; then
    echo "Error: No directory was found at '${target_dir}'." >&2;
    exit 2;
fi

if [[ ! -d "${target_dir}/${tag}" ]]; then
    log_msg "Creating new directory at ${target_dir}/${tag}";
    mkdir "${target_dir}/${tag}";
fi

log_msg "Launching btrfs in chroot mode";

time nice ionice -c Idle btrfs receive --chroot "${target_dir}/${tag}";

backups_count="$(echo -e "$(list_backups)" | wc -l)";

log_msg "Btrfs finished, we now have ${backups_count} backups:";
list_backups;

while [[ "${backups_count}" -gt "${max_backups}" ]]; do
    oldest_backup="$(list_backups | sort | head -n1)";
    log_msg "Maximum number backups is ${max_backups}, requesting removal of backup for $(dirname "${oldest_backup}")";

    btrfs subvolume delete "${oldest_backup}";

    backups_count="$(echo -e "$(list_backups)" | wc -l)";
done

log_msg "Done, any removed backups will be deleted in the background";

NAS Backups, Part 1: Overview

After building my nice NAS, the next thing on my sysadmin todo list was to ensure it is backed up. In this miniseries (probably 2 posts), I'm going to talk about the backup NAS that I've built. As of the time of typing, it is sucessfully backing up my Btrfs subvolumes.

In this first post, I'm going to give an overview of the hardware I'm using and the backup system I've put together. In future posts, I'll go into detail as to how the system works, and which areas I still need to work on moving forwards.

Personally, I find that the 3-2-1 backup strategy is a good rule of thumb:

  • 3 copies of the data
  • 2 different media
  • 1 off-site

What this means is tha you should have 3 copies of your data, with 2 local copies and one remote copy in a different geographical location. To achieve this, I have solved first the problem of the local backup copy, since it's a lot easier and less complicated than the remote one. Although I've talked about backups before (see also), in this case my solution is slightly different - partly due to the amount of data involved, and partly due to additional security measures I'm trying to get into place.

Hardware

For hardware, I'm using a Raspberry Pi 4 with 4GB RAM (the same as the rest of my cluster), along with 2 x 4 TB USB 3 external hard drives. This is a fairly low-cost and low-performance solution. I don't particularly care how long it takes the backup to complete, and it's relatively cheap to replace if it fails without being unreliable (Raspberry Pis are tough little things).

Here's a diagram of how I've wired it up:

(Can't see the above? Try a direct link to the SVG. Created with drawio.)

I use USB Y-cables to power the hard rives directly from the USB power supply, as the Pi is unlikely to be able to supply enough power for mulitple external hard drives on it's own.

Important Note: As I've discovered with a different unrelated host on my network, if you do this you can back-power the Pi through the USB Y cable, and potentially corrupt the microSD card by doing so. Make sure you switch off the entire USB power supply at once, rather than unplug just the Pi's power cable!

For a power supply, I'm using an Anker 10 port device (though I bought through Amazon, since I wasn't aware that Anker had their own website) - the same one that powers my Pi cluster.

Strategy

To do the backup itself I'm using the fact that I store my data in Btrfs subvolumes and the btrfs send / btrfs receive commands to send my subvolumes to the remote backup host over SSH. This has a number of benefits:

  1. The host doing the backing up has no access to the resulting backups (so if it gets infected it can't also infect the backups)
  2. The backups are read-only Btrfs snapshots (so if the backup NAS gets infected my backups can't be altered without first creating a read-write snapshot)
  3. Backups are done incrementally to save time, but a full backup is done automatically on the first run (or if the local metadata is missing)

While my previous backup solution using Restic for the server that sent you this web page has point #3 on my list above, it doesn't have points 1 and 2.

Restic does encrypt backups at rest though, which the system I'm setting up doesn't do unless you use LUKS to encrypt the underlying disks that Btrfs stores it's data on. More on that in the future, as I have tentative plans to deal with my off-site backup problem using a similar technique to that which I've used here that also encrypts data at rest when a backup isn't taking place.

In the next post, I'll be diving into the implementation details for the backup system I've created and explaining it in more detail - including sharing the pair of scripts that I've developed that do the heavy lifting.

NAS, Part 4: Time machines | Automatic snapshotting with btrfs-snapshot

In the last part in this series, I compared ZFS with Btrfs. I ended up choosing Btrfs because it was easier to install and came with a number of advantages. Since last time, I've now put Btrfs to work and have about ~1.3 TiB of data stored in it (much of which is from various devices across the network automatically backing up to it). Before we continue, here's a list of the parts in the series so far:

In this post, I'm going to talk about the automatic snapshotting I've setup. Btrfs supports creating snapshots, which are defined as subvolumes that are seeded with data from another subvolume (boundaries between subvolumes are not crossed). Most of the time, these are created to be read-only. In addition because of the copy-on-write system Btrfs uses, a snapshot takes no disk space on its own (other than that required to store the fact that it exists) - it only starts to consume disk space when files that it contains are modified in the original subvolume.

To this end, we can efficiently keep a rotating series of snapshots to serve as an initial safety net should a someone accidentally delete a file. Of course, we can't assume that snapshots will be ok as the only backup (I use Restic for that - I'm in the process of reconfiguring it for my new setup) - but they are still useful things to have.

To take a Btrfs snapshot, you can do this:

sudo btrfs subvolume snapshot -r path/to/source_subvolume path/to/target

The problem here, of course, is that you also need a way to delete old snapshots too. While I could roll my own solution for this, I figured that someone has already solved this problem - so it might save me some effort if I look for a pre-existing solution first.

After doing a bit of searching without success, I asked on Reddit, and the helpful folks there gave me a number of suggestions:

Of these 3, snapper seemed to be the most popular. From some reading, it appeared to be powerful and flexible - at the cost of being easy to understand. btrbk seemed to be feature-packed too, but in the end I decided on btrfs-snapshot.

btrfs-snapshot is designed to be used with cron. For example, I have something like this for one of my subvolumes in root user's crontab:

0 * * * *       /root/btrfs-snapshot-rotation/btrfs-snapshot path/to/subvolume path/to/subvolume/.snapshots hourly 8
0 2 * * *       /root/btrfs-snapshot-rotation/btrfs-snapshot path/to/subvolume path/to/subvolume/.snapshots daily 4
0 2 * * 7       /root/btrfs-snapshot-rotation/btrfs-snapshot path/to/subvolume path/to/subvolume/.snapshots weekly 4

Given a subvolume at path/to/subvolume, it creates the following snapshots in a nested subvolume in path/to/subvolume/.snapshots (which needs to be created manually: sudo btrfs subvolume create path/to/subvolume/.snapshots):

  • 8 x hourly snapshots
  • 4 x daily snapshots
  • 4 x weekly snapshots

I find the system so beautifully simple and easy to understand. This is important for me in a system like this, as it has to be easy for me to understand when I inevitably come back to it months or even years later when I've forgotten how it works. The arguments to btrfs-snapshot are easy to understand, and are in the form path/to/source path/to/target tag_name number_of_snapshots_to_keep.

This has the added bonus that if a user deletes a file accidentally in our shared drive, they can retrieve it on their own from the .snapshots directory - without my intervention.

With this in place and the data (mostly) moved over, my NAS project is almost complete. The final task I have left to do is to setup a proper backup system with Restic to either a remote (e.g. Backblaze B2) or offline location (such as an external HDD).

The latter might prove to be a problem though, since the maximum amount of data I can store right now is 5.5 TiB and is only going to grow from there. Portable external hard drives I've seen online don't appear to go up that high, so I suspect I'll need to choose another plan.

Should I encounter some interesting issues when setting this final backup step up, I'll make an additional post in this series. If not though, this will probably be the last entry in this series. If you have any questions about my setup, please comment below! I'll dod my best to answer any questions.

NAS, Part 3: Decisions | Choosing a Filesystem

It's another entry in my NAS series! It's still 2020 for me as I type this, but I hope that 2021 is going well. Before we continue, I recommend checking out the previous posts in this series:

Part 1 in particular is useful for context as to the hardware I'm using. Part 2 is a review of my experience assembling the system. In this part, we're going to look at my choice of filesystem and OS.

I left off in the last post after I'd booted into the installer for Ubuntu Server 20.04. After running through that installer, I performed my collection of initial setup tasks for any server I manage:

  • Setup an SSH server
  • Enable UFW
  • Setup my personal ~/bin folder
  • Assign a static IP address (why won't you let me choose an IP, Netgear RAX120? Your UI lets me enter a custom IP, but it devices don't ultimately end up with the IP I tell you to assign to them....)
  • Setup Collectd
  • A number of other tasks I forget

With my basic setup completed, I also setup a few things specific to devices that have SMART-enabled storage devices:

  • Setup an email relay (via autossh) for mail delivery
  • Installed smartd (which sends you emails when there's something wrong with 1 your disks)
  • Installed and configured hddtemp, and integrated it with collectd (a topic for another post, I did this for the first time)

With these out of the way and after making a mental note to sort out backups, I could now play with filesystems with a view to making a decision. The 2 contenders:

  • (Open)ZFS
  • Btrfs

Both of these filesystems are designed to be spread across multiple disks in what's known as a pool thereof. The idea behind them is to enable multiple disks to be presented to the user as a single big directory, with the complexities as to which disk(s) a file is/are stored on. They also come with extra nice features, such as checksumming (which allows them to detect corruption), snapshotting (taking snapshots of what the filesystem looks like at a given point in time), automatic data deduplication, compression, snapshot send / receiving, and more!

Overview: ZFS

ZFS is a filesystem originally developed by Sun Microsystems in 2001. Since then, it has been continually developed and improved. After Oracle bought Sun Microsystems in 2010, the source code for ZFS was closed - hence the OpenZFS fork was born. It's licenced under the CDDL, which isn't compatible with the GPLv2 used by the Linux Kernel. This causes some minor installation issues.

As a filesystem, it seems to be widely accepted to be rock solid and mature. It's used across the globe by home users and businesses both large and small to store huge volumes of data. Given its long history, it has proven its capability to store data safely.

It does however have some limitations. For one, it only has limited support for adding drives to a zpool (a pool of disks in the ZFS world), which is a problem for me - as I'd prefer to have the ability to add drives 1 at a time. It also has limited support for changing key options such as the compression algorithm later, as this will only affect new files - and the only way to recompress old files is to copy them in and out of the disk again.

Overview: Btrfs

Btrfs, or B-Tree File System is a newer filesystem that development upon which began in 2007, and was accepted into the Linux Kernel in 2009 with the release of version 1.0. It's licenced under the GPLv2, the same licence as the Linux Kernel. As of 2020, many different distributions of Linux ship with btrfs installed by default - even if it isn't the default filesystem (that's ext4 in most cases).

Unlike ZFS, Btrfs isn't as well-tested in production settings. In particular, it's raid5 and raid6 modes of operation are not well tested (though this isn't a problem, since raid1 operates at file/block level and not disk level as it does with ZFS, which enables us to use interesting setups like raid1 striped across 3 disks). Despite this, it does look to be stable enough - particularly as openSUSE has set it to be the default filesystem.

It has a number of tempting features over ZFS too. For example, it supports adding drives 1 at a time, and you can even convert your entire pool from 1 raid level to another dynamically while it's still mounted! The same goes for converting between compression algorithms - it's all done using a generic filter system.

Such a system is useful when adding new disks to the pool too, as they it can be used to rebalance data across all the disks present - allowing for new disks to be accounted for and faulty disks to be removed, preserving the integrity of the data while a replacement disk is ordered for example.

While btrfs does have a bold list of features that they'd like to implement, they haven't gotten around to all of them yet (the status of existing features can be found here). For example, while ZFS can use an SSD as a dedicated caching device, btrfs doesn't yet have this ability - and nobody appears to have claimed the task on the wiki.

Performance

Inspired by a recent Ars Technica article, I'd like to test the performance of the 2 filesystems at hand. I ran the following tests for reading and writing separately:

  • 4k-random: Single 4KiB random read/write process
  • 64k-random-16p: 16 parallel 64KiB random read/write processes
  • 1m-random: Single 1MiB random write process

I did this for both ZFS in raid5 mode, and Btrfs in raid5 (though if I go with btrfs I'll be using raid1, as I later discovered - which I theorise would yield a minor performance improvement). I tested ZFS twice: once with gzip compression, and again with zstd compression. As far as I can tell, Btrfs doesn't have compression enabled by default. Other than the compression mode, no other tuning was done - all the settings were left at their defaults. Both filesystems were completely empty aside from the test files, which were created automatically in a chowned subdirectory by fio.

Graphs showing the results of the above tests. See the discussion below.

The graph uses a logarithmic scale. My initial impressions are that ZFS benefits from parallelisation to a much greater extent than btrfs - though I suspect that I may be CPU bound here, which is an unexpected finding. I may also be RAM-bound too, as I observed a significant increase in RAM usage when both filesystems were under load. Buying another 8GB would probably go a long way to alleviating that issue.

Other than that, zstd appears to provide a measurable performance improvement over gzip compression. Btrfs also appears to benefit from writing larger blocks over smaller ones.

Overall, some upgrades to my NAS are on the cards should I be unsatisfied with the performance in future:

  • More RAM would assist in heavy i/o loads
  • A better CPU would probably raise the peak throughput speeds - if I can figure out what to do with the old one

But for now, I'm perfectly content with these speeds. Especially since I have a single gigabit ethernet port on my storage NAS, I'm not going to need anything above 1000Mbps - which is 119.2 MiB/s if you'd like to compare against the graph above.

Conclusion

As for my final choice of filesystem, I think I'm going to go with btrfs. While I'm aware that it isn't as 'proven' as ZFS - and slightly less performant too - I have a number of reasons for this decision:

  1. Btrfs allows you to add disks 1 at a time, and ZFS makes this difficult
  2. Btrfs has the ability to convert to a different raid level at a later date if I change my mind
  3. Btrfs is easier to install, since it's already built-in to Ubuntu Server 20.04.

NAS, Part 2: Assembly and Installation

Welcome back! This is part 2 of a series of posts about my new NAS (network attached storage) device I'm building. If you haven't read it yet, I recommend you go back and read part 1, in which I talk about the hardware I'm using.

Since the Fractal Design Node 804 case came first, I was able to install the parts into it as they arrived. First up was the motherboard (an ASUS PRIME B450M-A) and CPU (an AMD Athlon 3000G).

The motherboard was a pain. As I read, the middle panel of the case has some flex in it, so you've got to hold it in place with one hand we you're screwing the motherboard in. This in and of itself wasn't an issue at all, but the screws for the motherboard were really stiff. I think this was just the motherboard, but it was annoying.

Thankfully I managed it though, and then set to work installing the CPU. This went well - the CPU came with thermal paste on top already, so I didn't need to buy my own. The installation process for the stock CPU heatsink + fan was unfamiliar, which took me a moment to decipher how the mechanism worked.

Following this, I connected the front ports from the case up to the motherboard (consulting my motherboard's documentation showed me where I needed to plug these in - I remember this being something I struggled with when I last built an (old) PC when doing some IT technician work experience some years ago). The RAM - while a little stiff (to be expected) - went in fine too. I might buy another stick later if I run into memory pressure, but I thought a single 8GB stick would be a good place to start.

The case came with a dedicated fan controller board that has a high / medium / low switch on the back too, so I wired up the 3 included Noctua case fans to this instead of the slots on the motherboard. The CPU fan (nothing special yet - just the stock fan that came with the CPU) went into the motherboard though, as the fan controller didn't have room - and I thought that the motherboard would be better placed to control the speed of that one.

The inside of the 2 sides of the case.

(Above: The inside of the 2 sides of the case. Left: The 'hot' side, Right: The 'cold' side.)

The case is split into 2 sides: 1 for 'hot' components (e.g. the motherboard and CPU), and another for 'cold' components (e.g. the HDDs and PSU). Next up were the hard disks - so I mounted the SSD for the operating system to the base of the case in the 'hot' side, as the carriage in the cold side fits only 3.5 inch disks, and my SSD is a 2.5 inch disk. While this made the cabling slightly awkward, it all worked out in the end.

For the 3.5 inch HDDs (for data storage), I found I was unable to mount them with the included pieces of bracket metal that allow you to put screws into the bottom set of holes - as the screws wouldn't fit through the top holes. I just left the metal bracket pieces out and mounted the HDDs directly into the carriage, and it seems to have worked well so far.

The PSU was uneventful too. It fit nicely into the space provided, and the semi-modular nature of the cables provided helped tremendously to avoid a mess of cables all over the place as I could remove the cables I didn't need.

Finally, the DVD writer had some stiff screws, but it seemed to mount well enough (just a note: I've been having an issue I need to investigate with this DVD drive whereby I can't take a copy of a disk - e.g. the documentation CD that came with my motherboard - with dd, as it reports an IO error. I need to investigate this further, so more on that in a later post).

The installation of the DVD drive completed the assembly process. To start it up for the first time, I connected my new NAS to my television temporarily so that I could see the screen. The machine booted fine, and I dove straight into the BIOS.

The BIOS that comes with the ASUS motherboard I bought

(Above: The BIOS that comes with the ASUS motherboard, before the clock was set by Ubuntu Server 20.04 - which I had yet to install)

Unlike my new laptop, the BIOS that comes with the ASUS motherboard is positively delightful. It has all the features you'd need, laid out in a friendly interface. I observed some minor input lag, but considering this is a BIOS we're talking about here I can definitely overlook that. It even has an online update feature, where you can plug in an Ethernet cable and download + install BIOS updates from the Internet.

I tweaked a few settings here, and then rebooted into my flash drive - onto which I loaded an Ubuntu Server 20.04 ISO. It booted into this without complaint (unlike a certain laptop I'm rather unhappy with at the moment), and then I selected the appropriate ISO and got to work installing the operating system (want your own multiboot flash drive? I've blogged about that already! :D).

In the next post, I'm going to talk about the filesystem I ultimately chose. I'm also going to show and discuss some performance tests I ran using fio following this Ars Technica guide.

A comparison of compression formats for storing JSON

Happy new year, everyone!

I've blogged about different aspects of a (not so?) little project of mine several times now (exhibits A, B, C, D, and finally E - even if I didn't know it at the time), but it appears that I end up running into all sorts of interesting problems that I invent cool solutions for. I also find myself doing a bunch of research that I'm surprised nobody's compiled into a single place yet - as is the case in this blog post. (Also, Happy 2018 everyone! First post of the year :D)

I've been refactoring the subsystem that saves a considerable amount of JSON data to a bunch of different files on disk. Obviously, I'm interested in minimising the amount of space this JSON data takes up on disk. As this saving process happens in the background on a separate thread, I'm not too concerned about performance - other than it can't be too slow. With this in mind, I've found myself testing a bunch of different compression algorithms. Let's introduce our test data:

  1. A 17MiB card game dataset, as a single minified JSON file
  2. A 40KiB 'live' specimen chunk's data, saved as a single minified JSON file.

I can't remember which card game the first dataset is from, but I do know that I found it in the awesome JSON datasets list. Next up, here's our cast of compression algorithms we'll be testing:

  1. The venerable GZip
  2. BZip2 - Apparently GZIP, but smaller and slightly more computationally expensive
  3. XZ (the newer child of LZMA2)
  4. 7zip
  5. Google's Brotli

A colourful cast, to be sure! Let's run them through their paces - starting with the card game dataset. Here are the results I observe with each set to their default settings:

Format Size
Uncompressed 17M
gzip 2.4M
bzip2 1.6M
xz 1.3M
7zip 1.4M
brotli 1.3M

Very interesting. It looks like xz and brotli are tied in first place - though I observed that brotli took ages in comparison to all the other algorithms I tested - and upon closer inspection xz beat it by 17.3KiB. Numbers are all very well, but to really see what's going on here, let's plot it on a graph:

A graph of the data in the table above.

That's better! I can actually make some comparisons now. From this graph we can observe that gzip is the worst of the lot, followed by bzip2. 7zip is surprisingly in third place, but then again it is designed for multiple files, whereas the rest of them are designed for a single stream of data. In second place is the terribly slow brotli, and finally in first place is xz.

Hrm - very interesting. How do our algorithms measure up when confronted a smaller load though? Here are the results for the sample chunk data:

Format Size
Uncompressed 40K
gzip 5.4K
bzip2 4.3K
xz 4.6K
7zip 4.9K
Brotli 4.6K

Interesting results, to be sure, but I can't discern much from that. Let's plot a graph:

A graph of the data in the above table. Further explanation below.

Very interesting. With smaller loads, it appears that bzip2 performs much better with smaller loads than any other algorithm. While gzip is still the worst performing algorithm, while xz and brotli, surprisingly, performed much worse than bzip2.

To that end, I'm think I'm going to be choosing bzip2 as my compression of choice for this job, as it produces the best results for the type of work I'm going to be doing.

I'm really surprised about brotli though, actually. I had high hopes for it, considering it's a new algorithm invented by Google. They claimed that it would provide xz-like compression with gzip-like speeds - but from what I'm seeing, it does anything but.

Sources and Further Reading

Art by Mythdael