Archive

## Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

## Mounting LVM partitions from the terminal on Linux

Hello there! Recently I found myself with the interesting task of mounting an LVM partition by hand. It wasn't completely straightforward and there was a bunch of guesswork involved, so I thought I'd document the process here.

For those who aren't aware, LVM stands for the Logical Volume Manager, and it's present on Linux system to make managing partitions easier. It can:

• Move and resize partitions while they are still mounted
• Span multiple disks

....but to my knowledge it doesn't have any redundancy (use Btrfs) or encryption (use LUKS) built in. It is commonly used to manage the partitions on your Linux desktop, as then you don't need to reboot it into a live Linux environment to fiddle with your partitions as much.

LVM works on a layered system. There are 3 layers to it:

1. Physical Volumes: Normal physical partitions on the disk.
2. Volume Groups: Groups of logical (LVM) partitions.
3. Logical Volumes: LVM-managed partitions.

In summary, logical volumes are part of a volume group, which spans 1 or more physical disks.

With this in mind, first list the available physical volumes and their associated volume groups, and identify which is the one you want to mount:

sudo vgdisplay

Notice the VG Size in the output. Comparing it with the output of lsblk -o NAME,RO,SIZE,RM,TYPE,MOUNTPOINT,LABEL,VENDOR,MODEL can be helpful to identify which one is which.

I encountered a situation where I had 2 with the same name - one from my host system I was working on, and another from the target disk I was trying to mount. In my situation each disk had it's own volume group assigned to it, so I needed to rename one of the volumes.

To do this, take the value of the VG UUID field of the volume group you want to rename from the output of sudo vgdisplay above, and then rename it like this:

sudo vgrename SOME_ID NEW_NAME

...for example, I did this:

sudo vgrename 5o1LoG-jFdv-v1Xm-m0Ca-vYmt-D5Wf-9AAFLm examplename

With that done, we can now locate the logical volume we want to mount. Do this by listing the logical volumes in the volume group you're interested in:

sudo lvdisplay vg_name

Note down the name of the logical volume you want to mount. Now we just need to figure out where it is actually located in /dev so that we can mount it. Despite the LV Path field appearing to show us this, it's not actually correct - at least on my system.

Instead, list the contents of /dev/mapper:

ls /dev/mapper

You should see the name of the logical volume that you want to mount in the form volumegroup-logicalvolumename. Once found, you should be able to mount it like so:

sudo mount /dev/mapper/volumegroup-logicalvolumename path/to/directory

...replacing path/to/directory with the path to the (empty) directory you want to mount it to.

If you can't find it, then it is probably because you plugged the drive in question in after you booted up. In this case, it's probable that the volume group is not active. You can check this is the case or not like so:

sudo lvscan

If it isn't active, then you can activate it like this:

sudo lvchange -a y vg_name

...replacing vg_name with the name of the volume group you want to activate. Once done, you can then mount the logical volume as I mentioned above.

Once you are done, unmounting it is a case of reversing these steps. First, unmount the partition:

sudo umount path/to/mount_point

Then, disable the volume group again:

sudo lvchange -a n vg_name

Finally, flush any cached writes to disk, just in case:

sync

Now, you can unplug the device from your machine.

That wraps up this quick tutorial. If you spot any mistakes in this, please do leave a comment below and I'll correct it.

## Centralising logs with rsyslog

I manage quite a number of servers at this point, and something that's been on my mind for a while now is centralising all the log files generated by them. By this, specifically I mean that I want to automatically gather all logs generated by all the systems I manage into a single place in real time.

While there are enterprise-grade log management setups such as the ELK stack (elasticsearch, logstash, and kibana), as far as I'm aware they are all quite heavy and given my infrastructure is Raspberry Pi based (seriously, they use hardly any electricity at all compared to a regular desktop PC), with such a setup I would likely need multiple Pis to run it.

With this in mind, I'm opting for a different kind of log management system, which I'm basing on rsyslog (which is installed by default in most Linux distros) and lnav (which I've blogged about before: lnav basics tutorial), which runs much lighter, requiring only a fraction of a Raspberry Pi to operate, which is good since the Raspberry Pi I've dedicated to monitoring the rest of the infrastructure currently also handles:

1. Continuous Integration: Laminar (this will eventually be a Docker container on my Hashicorp Nomad cluster)
2. Collectd (Collectd is really easy to setup and runs so light, I love it)

I'm sure you might be asking yourself what the purpose of this is. My reasoning is fourfold:

1. Having all the logs in one place makes them easier to analyse all at once, without having to SSH into many different servers
2. If a box goes down, then I can read the logs from it before start attempting to fix it, giving me a heads up as to what the problem is (this, in conjunction with my collectd monitoring system)
3. On the Raspberry Pis I manage, this prolongs the life of the microSD cards by reducing the number of writes thereto
4. I gain a little bit of security, in that if a box is compromised, then unless the attacker also gains access to my logging server, then they can't erase their tracks as easily as might otherwise have done

With all this in mind, I thought that it's about time I actually did something about this. I've found that while the solution is actually really quite simple, it's not particularly easy to find, so I thought I'd post about it here.

In my setup, I'm going to be using a Raspberry Pi 4 4GB RAM I've dubbed eldarion, which is the successor to an earlier Raspberry Pi 3B+ that died some years prior I called elessar as the server upon which I centralise my logs. It has a 120GB SATA SSD attached in a case that used to house a WD PiDrive (they don't sell those anymore :-/) that I had lying around, which I've formatted with Btrfs.

Before we begin, let's outline the setup we're aiming for with a diagram to avoid confusion:

eldarion will host the rsyslog server (which is essentially just a reconfiguration of the existing rsyslog server it is most likely already running), while other servers connect using the syslog protocol via a TCP connection, which is encrypted with TLS, using the GnuTLS engine (the default built into rsyslog). TLS here is important, since logs are naturally rather sensitive as I'm sure you can imagine.

To follow along here, you will need a valid Let's Encrypt certificate. It just so happens that I have a web server hosting my collectd graph panel interface, so I'm using that.

Of course, rsyslog can be configured in arbitrarily complex ways (such as having clients send logs to servers that they themselves forward to yet other servers), but at least for now I'm keeping it (relatively) simple.

### Preparing the server

To start this process, we want to ensure the logs for the local system are stored in the right place. In my case, I have my SSD mounted to /mnt/eldarion-data2, so I want to put my logs in /mnt/eldarion-data2/syslog/localhost. There are 2 ways of accomplishing this:

1. Reconfigure rsyslog to save logs elsewhere
2. Be lazy, and bind mount the target location to /var/log

Since I'm feeling lazy today, I'm going to go with option 2 here. It's also a good idea if a program is badly written and decides it's a brilliant idea to write logs directly to /var/log itself instead of going through syslog.

If you're using DietPi, before you continue, do sudo dietpi-software and remove the existing logging system.

A bind mount is like a hard link of a directory, in that it makes a directory appear in multiple places at once. It acts as a separate "filesystem" though I assume to allow for avoiding infinite loops. They are also the tech behind volumes in Docker's backend containerd.

Open /etc/fstab for editing, and something like this on a new line:

/mnt/eldarion-data2/syslog/localhost    /var/log    none    auto,defaults,bind  0   0

..where /mnt/eldarion-data2/syslog/localhost is the location we want the data to be stored, and /var/log is the location we want to bind mount it to. Save and close /etc/fstab, and then mount the bind mount like so. Make sure /var/log is empty before mounting!

sudo mount /var/log

Next, we need to install some dependencies:

sudo apt install rsyslog rsyslog-gnutls

For some strange reason, TLS support is in a separate package on Debian-based systems. You'll need to investigate package names and translate this command for your distribution, of course.

### Configuring the server

Now we have that taken care of, we can actually configure our server. Open /etc/rsyslog.conf for editing, and at the top put this:

# The $Thing syntax is apparently 'legacy', but I can't find how else we're supposed to do this$DefaultNetstreamDriver gtls
$DefaultNetstreamDriverCAFile /etc/letsencrypt/live/mooncarrot.space/chain.pem$DefaultNetstreamDriverCertFile /etc/letsencrypt/live/mooncarrot.space/cert.pem
$DefaultNetstreamDriverKeyFile /etc/letsencrypt/live/mooncarrot.space/privkey.pem # StreamDriver.Mode=1 means TLS-only mode module(load="imtcp" MaxSessions="500" StreamDriver.Mode="1" StreamDriver.AuthMode="anon") input(type="imtcp" port="514")$template remote-incoming-logs,"/mnt/eldarion-data2/syslog/hosts/%HOSTNAME%/%PROGRAMNAME%.log"
*.* ?remote-incoming-logs

You'll need to edit these bits to match your own setup:

• /etc/letsencrypt/live/mooncarrot.space/: Path to the live directory there that contains the symlinks to the certs your Let's Encrypt client obtained for you
• /mnt/eldarion-data2/syslog/hosts: The path to the directory we want to store the logs in

Save and close this, and then restart your server like so:

sudo systemctl restart rsyslog.service

Then, check to see if there were any errors:

sudo systemctl status rsyslog.service

Lastly, I recommend assigning a DNS subdomain to the server hosting the logs, such as logs.mooncarrot.space in my case. A single server can have multiple domain names of course, and this just makes it convenient if we every move the rsyslog server elsewhere - as we won't have to go around and edit like a dozen config files (which would be very annoying and tedious).

### Configuring a client

Now that we have our rsyslog server setup, it should be relatively straightforward to configure a client box to send logs there. This is a 3 step process:

1. Configure the existing /var/log to be an in-memory tmpfs to avoid any potential writes to disk
2. Add a cron script to wipe /var/log every hour to avoid it getting full by accident
3. Reconfigure (and install, if necessary) rsyslog to send logs to our shiny new server rather than save them to disk

If you haven't already confgiured /var/log to be an in-memory tmpfs, it is relatively simple. If you're unsure whether it is or not, do df -h.

First, open /etc/fstab for editing, and add the following line somewhere:

tmpfs /var/log tmpfs size=50M,noatime,lazytime,nodev,nosuid,noexec,mode=1777

Then, save + close it, and mount /var/log. Again, make sure /var/log is empty before mounting! Weird things happen if you don't.

sudo mount /var/log

Secondly, save the following to /etc/cron.hourly/clear-logs:

#!/usr/bin/env bash
rm -rf /var/log/*

Then, mark it executable:

sudo chmod +x /etc/cron.hourly/clear-logs

Lastly, we can reconfigure rsyslog. The specifics of how you do this varies depending on what you want to achieve, but for a host where I want to send all the logs to the rsyslog server and avoid saving them to the local in-memory tmpfs at all, I have a config file like this:

#################
#### MODULES ####
#################

module(load="imuxsock") # provides support for local system logging
module(load="imklog")   # provides kernel logging support
#module(load="immark")  # provides --MARK-- message capability

###########################
#### GLOBAL DIRECTIVES ####
###########################

$IncludeConfig /etc/rsyslog.d/*.conf # Where to place spool and state files$WorkDirectory /var/spool/rsyslog

###############
#### RULES ####
###############
$DefaultNetstreamDriverCAFile /etc/ssl/isrg-root-x1-cross-signed.pem$DefaultNetstreamDriver         gtls
$ActionSendStreamDriverMode 1 # Require TLS$ActionSendStreamDriverAuthMode anon
*.* @@(o)logs.mooncarrot.space:514  # Forward everything to our rsyslog server

#
# Emergencies are sent to everybody logged in.
#
*.emerg             :omusrmsg:*

The rsyslog config file in question this needs to be saved to is located at /etc/rsyslog.conf. In this case, I replace the entire config file with the above, but you can pick and choose (e.g. on some hosts I want to save to the local disk and and to the rsyslog server).

Un the above you'll need to change the logs.mooncarrot.space bit - this should be the (sub)domain that you pointed at your rsyslog server earlier. The number after the colon (514) is the port number. The *.* tells it to send everything to the remote rsyslog server.

Before we're done here, we need to provide the rsyslog client with the CA certificate of the server (because, apparently, it isn't capable of ferreting around in /etc/ssl/certs like everyone else is). Since I'm using Let's Encrypt here, I downloaded their root certificate like this and it seemed to do the job:

sudo curl -sSL https://letsencrypt.org/certs/isrg-root-x1-cross-signed.pem -o /etc/ssl/isrg-root-x1-cross-signed.pem

Of course, one could generate their own CA and do mutual authentication for added security, but that's complicated, lots of effort, and probably unnecessary for my purposes as far as I can tell. I'll leave a link in the sources and further reading on how to do this if you're interested.

If you have a different setup, it's the $DefaultNetstreamDriverCAFile in the above you need to change to point at your actual CA certificate. With that all configured, we can now restart the rsyslog client: sudo systemctl restart rsyslog.service ...and, of course, check to see if there were any errors: sudo systemctl status rsyslog.service Finally, we also need to configure logrotate to rotate all these new log files. First, install logrotate if the logrotate command doesn't exist: sudo apt install logrotate Then, place the following in the file /etc/logrotate.d/centralisedlogging: /mnt/eldarion-data2/syslog/hosts/*/*.log { rotate 12 weekly missingok notifempty compress delaycompress } Of course, you'll want to replace /mnt/eldarion-data2/syslog/hosts/ with the directory you're storing the logs from the remote server in, and also customise the log rotation. For example, the 12 there is the number of old log files to keep, and weekly can be swapped for daily or even monthly if you like. ### Conclusion This has been a very quick whistle-stop tour of setting up an rsyslog server to centralise your logs. We've setup our rsyslog server to use a TLS encrypted connection to receive logs, which 1 or more clients can send logs to. We've also configured /var/log on both the server and the client to avoid awkward issues. Moving forwards, I recommend reading my lnav basics tutorial blog post, which should be rather helpful in analysing the resulting log files. lnav was not helpful however when I asked it to look at all the log files separately with sudo lnav */*.log, deciding to treat them as "generic logs" rather than "syslog logs", meaning that it didn't colour them properly, and also didn't allow for proper filter. To this end, it may be benefical to store all the logs in 1 file rather than in separate files. I'll keep an eye on this, and update this post if figure out how to convince lnav to treat them properly. Another slightly snag with my approach here is that for some reason all the logs from elsewhere also end up in the generic /var/log/syslog file (hence how I found a 'workaround' the above issue), resulting in duplicated logs. I have yet to find a solution to this issue, but I'm also not sure whether I want to keep the logs in 1 big file or in many smaller files yet. These issues aside, I'm pretty satisfied with the results. Together with my existing collectd-based monitoring system (which I'll blog about how I've set that up if there's any interest - collectd is really easy to use), this is another step towards greater transparency into the infrastructure I manage. In the future, I want to investigate generating notifications alerts for issues in my infrastructure. These could come either from collectd, or from rsyslog, and I envision them going to a variety of places: 1. Email (a daily digest perhaps?) 2. XMPP (I've bridged to it from shell scripts before) Given that my infrastructure is just something I run at home and I don't mind so much if it's down for a few hours, my focus here is not on notifying my as soon as possible, but notifying myself in a way that doesn't disturb me so I can check into it in my own time. If you found this tutorial / guide useful, please do comment below! It's really cool and motivating to see that the stuff I post on here helps others out. ### Sources and further reading ## How to pin an apt repository for preferential package installation As described in my last post, pinning apt repositories is now necessary if you want to install Firefox from an apt repository (e.g. if you want to install Firefox Beta). This is not an especially difficult process, but it is significantly confusing, so I thought I'd write a post about it. Pinning an apt repository means that even if there's a newer version of a package elsewhere, the 'older' version will still be installed from the apt repository you pin. Be very careful with this technique. You can easily cause major issues with your system if you pin the wrong repository! Firstly, you want to head to /etc/apt/sources.list.d/ and find the .list file for the repository you want to pin. Take note of the URL inside that file, and then run this command: apt-cache policy No root is necessary here, as it's still a read-only command. Depending on how many apt repositories you have installed in your system, there may be a significant amount of output. Find the lines that correspond to the apt repository you want to preferentially install from in this output. For this example, I'm going to pin the excellent nautilus-typeahead apt repository, so the bit I'm looking for looks like this: 999 http://ppa.launchpad.net/lubomir-brindza/nautilus-typeahead/ubuntu jammy/main amd64 Packages release v=22.04,o=LP-PPA-lubomir-brindza-nautilus-typeahead,a=jammy,n=jammy,l=nautilus-typeahead,c=main,b=amd64 origin ppa.launchpad.net From here, take a note of the o= bit. In my case, it's o=LP-PPA-lubomir-brindza-nautilus-typeahead. Then, create a new file in /etc/apt/preferences.d with the following content: Package: * Pin: release o=LP-PPA-lubomir-brindza-nautilus-typeahead Pin-Priority: 1001 See that o=.... bit there? Replace it with the one for the repository you want to pin. The number there is the new priority of the repository. The numbers at the beginning of each line in the output of the apt-cache policy command are the priorities of your existing apt repositories, so this should give you an idea as to what number you need to use here - a higher number means a higher priority regardless of the version number of the packages contained therein. Then, simply sudo apt update and sudo apt dist-upgrade, and apt should pick up the "upgrades" from your newly pinned repository! In some situations you may need to remove and reinstall the offending package if you encounter issues. ## Sources and further reading ## Using whiptail for text-based user interfaces One of my ongoing projects is to implement a Bash-based raspberry pi provisioning system for hosts in my raspberry pi cluster. This is particularly important given that Debian 11 bullseye was released a number of months ago, and while it is technically possible to upgrade a host in-place from Debian 10 buster to Debian 11 bullseye, this is a lot of work that I'd rather avoid. In implementing a Bash-based provisioning system, I'll have a system that allows me to rapidly provision a brand-new DietPi (or potentially other OSes in the future, but that's out-of-scope of version 1) automatically. Once the provisioning process is complete, I need only reboot it and potentially set a static IP address on my router and I'll then have a fully functional cluster host that requires no additional intervention (except to update it regularly of course). The difficulty here is I don't yet have enough hosts in my cluster that I can have a clear server / worker division, since my Hashicorp Nomad and Consul clusters both have 3 server nodes for redundancy rather than 1. It is for this reason I need a system in my provisioning system that can ask me what configuration I want the new host to have. To do this, I rediscovered the whiptail command, which is installed by default on pretty much every system I've encountered so far, and it allows you do develop surprisingly flexible text based user interfaces with relatively little effort, so I wanted to share it here. Unfortunately, while it's very cool and also relatively easy to use, it also has a lot of options and can result in command invocations like this: whiptail --title "Some title" --inputbox "Enter a hostname:" 10 40 "default_value" 3>&1 1>&2 2>&3; ...and it only gets more complicated from here. In particular the 2>&1 1>&2 2>&3 bit there is a fancy way of flipping the standard output and standard error. I thought to myself that surely there must be a way that I can simplify this down to make it easier to use, so I implemented a number of wrapper functions: ask_yesno() { local question="$1";

whiptail --title "Step ${step_current} /${step_max}" --yesno "${question}" 40 8; return "$?"; # Not actually needed, but best to be explicit
}

This first one asks a simple yes/no question. Use it like this:

if ask_yesno "Some question here"; then
echo "Yep!";
else
echo "Nope :-/";
fi

Next up, to ask the user for a string of text:

# Asks the user for a string of text.
# $1 The window title. #$2    The question to ask.
# $3 The default text value. # Returns the answer as a string on the standard output. ask_text() { local title="$1";
local question="$2"; local default_text="$3";
whiptail --title "${title}" --inputbox "${question}" 10 40 "${default_text}" 3>&1 1>&2 2>&3; return "$?"; # Not actually needed, but best to be explicit
}

# $1 The window title. #$2    The question to ask.
# $3 The default text value. # Returns the answer as a string on the standard output. ask_password() { local title="$1";
local question="$2"; local default_text="$3";
whiptail --title "${title}" --passwordbox "${question}" 10 40 "${default_text}" 3>&1 1>&2 2>&3; return "$?"; # Not actually needed, but best to be explicit
}

These both work in the same way - it's just that with ask_password it uses asterisks instead of the actual characters the user is typing to hide what they are typing. Use them like this:

new_hostname="$(ask_text "Provisioning step 1 / 4" "Enter a hostname:" "${HOSTNAME}")";
sekret="$(ask_password "Provisioning step 2 / 4" "Enter a sekret:")"; The default value there is of course optional, since in Bash if a variable does not hold a value it is simply considered to be empty. Finally, I needed a mechanism to ask the user to choose at most 1 value from a predefined list: # Asks the user to choose at most 1 item from a list of items. #$1        The window title.
# $2..$n    The items that the user must choose between.
# Returns the chosen item as a string on the standard output.
local title="$1"; shift; local args=(); while [[ "$#" -gt 0 ]]; do
args+=("$1"); args+=("$1");
shift;
done
whiptail --nocancel --notags --menu "$title" 15 40 5 "${args[@]}" 3>&1 1>&2 2>&3;
return "$?"; # Not actually needed, but best to be explicit } This one is a bit special, as it stores the items in an array before passing it to whiptail. This works because of word splitting, which is when the shell will substitute a variable with it's contents before splitting the arguments up. Here's how you'd use it: choice="$(ask_multichoice "How should I install Consul?" "Don't install" "Client mode" "Server mode")";

As an aside, the underlying mechanics as to why this works is best explained by example. Consider the following:

oops="a value with spaces";

node src/index.mjs --text $oops; Here, we store value we want to pass to the --text argument in a variable. Unfortunately, we didn't quote $oops when we passed it to our fictional Node.js script, so the shell actually interprets that Node.js call like this:

node src/index.mjs --text a value with spaces;

That's not right at all! Without the quotes around a value with spaces there, process.argv will actually look like this:

[
'/usr/local/lib/node/bin/node',
'/tmp/test/src/index.mjs',
'--text',
'a',
'value',
'with',
'spaces'
]

The a value with spaces there has been considered by the Node.js subprocess as 4 different values!

Now, if we include the quotes there instead like so:

oops="a value with spaces";

node src/index.mjs --text "$oops"; ...the shell will correctly expand it to look like this: node src/index.mjs --text "a value with spaces"; ... which then looks like this to our Node.js subprocess: [ '/usr/local/lib/node/bin/node', '/tmp/test/src/index.mjs', '--text', 'a value with spaces' ] Much better! This is important to understand, as when we start talking about arrays in Bash things start to work a little differently. Consider this example: items=("an apple" "a banana" "an orange") /tmp/test.mjs --text "${item[@]}"

Can you guess what process.argv will look like? The result might surprise you:

[
'/usr/local/lib/node/bin/node',
'/tmp/test.mjs',
'--text',
'an apple',
'a banana',
'an orange'
]

Each element of the Bash array has been turned into a separate item - even when we quoted it and the items themselves contain spaces! What's going on here?

In this case, we used [@] when addressing our items Bash array, which causes Bash to expand it like this:

/tmp/test.mjs --text "an apple" "a banana" "an orange"

....so it quotes each item in the array separately. If we forgot the quotes instead like this:

/tmp/test.mjs --text ${item[@]} ...we would get this in process.argv: [ '/usr/local/lib/node/bin/node', '/tmp/test.mjs', '--text', 'an', 'apple', 'a', 'banana', 'an', 'orange' ] Here, Bash still expands each element separately, but does not quote each item. Because each item isn't quoted, when the command is actually executed, it splits everything a second time! As a side note, if you want all the items in a Bash array in a single quoted item, you need to use an asterisk * instead of an at-sign @ like so: /tmp/test.mjs --text "${a[*]}";

....which would yield the following process.argv:

[
'/usr/local/lib/node/bin/node',
'/tmp/test.mjs',
'--text',
'an apple a banana an orange'
]

With that, we have a set of functions that make whiptail much easier to use. Once it's finished, I'll write a post on my Bash-based cluster host provisioning script and explain my design philosophy behind it and how it works.

## systemquery, part 1: encryption protocols

Unfortunately, my autoplant project is taking longer than I anticipated to setup and debug. In the meantime, I'm going to talk about systemquery - another (not so) little project I've been working on in my spare time.

As I've acquired more servers of various kinds (mostly consisting of Raspberry Pis), I've found myself with an increasing need to get a high-level overview of the status of all the servers I manage. At the moment, this need is satisfied by my monitoring system's (collectd, which while I haven't blogged about my setup directly, I have posted about it here and here) web-based dashboard called Collectd Graph Panel (sadly now abandonware, but still very useful):

This is great and valuable, but if I want to ask questions like "are all apt updates installed", or "what's the status of this service on all hosts?", or "which host haven't I upgraded to Debian bullseye yet?", or "is this mount still working", I currently have to SSH into every host to find the information I'm looking for.

To solve this problem, I discovered the tool osquery. Osquery is a tool to extract information from a network of hosts with an SQL-like queries. This is just what I'm looking for, but unfortunately it does not support the armv7l architecture - which most of my cluster currently runs on - thereby making it rather useless to me.

Additionally, from looking at the docs it seems to be extremely complicated to setup. Finally, it does not seem to have a web interface. While not essential, it's a nice-to-have

To this end, I decided to implement my own system inspired by osquery, and I'm calling it systemquery. I have the following goals:

1. Allow querying all the hosts in the swarm at once
2. Make it dead-easy to install and use (just like Pepperminty Wiki)
3. Make it peer-to-peer and decentralised
4. Make it tolerate random failures of nodes participating in the systemquery swarm
5. Make it secure, such that any given node must first know a password before it is allowed to join the swarm, and all network traffic is encrypted

As a stretch goal, I'd also like to implement a mesh message routing system too, so that it's easy to connect multiple hosts in different networks and monitor them all at once.

Another stretch goal I want to work towards is implementing a nice web interface that provides an overview of all the hosts in a given swarm.

### Encryption Protocols

With all this in mind, the first place to start is to pick a language and platform (Javascript + Node.js) and devise a peer-to-peer protocol by which all the hosts in a given swarm can communicate. My vision here is to encrypt everything using a join secret. Such a secret would lend itself rather well to a symmetrical encryption scheme, as it could act as a pre-shared key.

A number of issues stood in the way of actually implementing this though. At first, I thought it best to use Node.js' built-in TLS-PSK (stands for Transport Layer Security - Pre-Shared Key) implementation. Unlike regular TLS which uses asymmetric cryptography (which works best in client-server situations), TLS-PSK uses a pre-shared key and symmetrical cryptography.

Unfortunately, although Node.js advertises support for TLS-PSK, it isn't actually implemented or is otherwise buggy. This not only leaves me with the issue of designing a encryption protocol, but also:

1. The problem of transferring binary data
2. The problem of perfect forward secrecy
3. The problem of actually encrypting the data

Problem #1 here turned out to be relatively simple. I ended up abstracting away a raw TCP socket into a FramedTransport class, which implements a simple protocol that sends and receives messages in the form <length_in_bytes><data....>, where <length_in_bytes> is a 32 bit unsigned integer.

With that sorted and the nasty buffer manipulation safely abstracted away, I could turn my attention to problems 2 and 3. Let's start with problem 3 here. There's a saying when programming things relating to cryptography: never roll your own. By using existing implementations, these existing implementations are often much more rigorously checked for security flaws.

In the spirit of this, I sought out an existing implementation of a symmetric encryption algorithm, and found tweetnacl. Security audited, it provides what looks to be a secure symmetric encryption API, which is the perfect foundation upon which to build my encryption protocol. My hope is that by simply exchanging messages I've encrypted with an secure existing algorithm, I can reduce the risk of a security flaw.

This is a good start, but there's still the problem of forward secrecy to tackle. To explain, perfect forward secrecy is where should an attacker be listening to your conversation and later learn your encryption key (in this case the join secret), they still are unable to decrypt your data.

This is achieved by using session keys and a key exchange algorithm. Instead of encrypting the data with the join secret directly, we use it only to encrypt the initial key-exchange process, which then allows 2 communicating parties to exchange a session key, which used to encrypt all data from then on. By re-running the key-exchange process to and generating new session keys at regular intervals, forward secrecy can be achieved: even if the attacker learns a session key, it does not help them to obtain any other session keys, because even knowledge of the key exchange algorithm messages is not enough to derive the resulting session key.

Actually implementing this in practice is another question entirely however. I did some research though and located a pre-existing implementation of JPAKE on npm: jpake.

With this in hand, the problem of forward secrecy was solved for now. The jpake package provides a simple API by which a key exchange can be done, so then it was just a case of plugging it into the existing system.

### Where next?

After implementing an encryption protocol as above (please do comment below if you have any suggestions), the next order of business was to implement a peer-to-peer swarm system where agents connect to the network and share peers with one another. I have the basics of this implemented already: I just need to test it a bit more to verify it works as I intend.

It would also be nice to refactor this system into a standalone library for others to use, as it's taken quite a bit of effort to implement. I'll be holding off on doing this though until it's more stable however, as refactoring it now would just slow down development since it has yet to stabilise as of now.

On top of this system, the plan is to implement a protocol by which any peer can query any other peer for system information, and then create a command-line interface for easily querying it.

To make querying flexible, I plan on utilising some form of in-memory database that is populated with queries to other hosts based on the tables mentioned in the user's query. SQLite3 is the obvious choice here, but I'm reluctant to choose it as it requires compilation upon installation - and given that I have experienced issues with this in the past, I feel this has the potential to limit compatibility with some system configurations. I'm going to investigate some other in-memory database libraries for Javascript - giving preference to those which are both light and devoid of complex installation requirements (pure JS is best if I can manage it I think). If you know of a pre Javascript in-memory database that has a query syntax, do let me know in the comments below!

As for querying system information directly, that's an easy one. I've previously found systeminformation - which seems to have an API to fetch pretty much anything you'd ever want to know about the host system!

## How to contribute code to git repositories that aren't hosted on GitHub

With just over 48 million public repositories (and growing fast [^repos]), GitHub is pretty much the de-facto place to host code, as pretty much everyone has an account there. By far the most useful feature GitHub provides is the ability to open pull requests (PRs).

Not all code repositories are hosted on GitHub, however - and these repositories do not get the same exposure and hence level of participation and collaboration that those on GitHub do, due in no small part (other reasons exist too though) I suspect because contributing to these repositories is unfortunately more complicated than opening a PR.

It needn't be this way though - so in this post I'll show you how to unlock the power of contributing code to quite literally any project that is under git version control. While knowledge of your command line is necessary, basic familiarity will suffice (see also my blog post on learning your terminal). I'll also assume that you have git installed, and that Windows users have already opened Git Bash and navigated to the cloned repository in question with cd.

### Step 0: Making your changes

This is the easy part. After cloning your repository in the normal way, make a new branch for your changes. GUI users should be able to navigate their interfaces. For those using the command line, do this from the source branch you want to branch from:

git switch -c new_branch_name

Then, make your changes in the usual way.

### Step 1: Find contact details

Once you have your changes, you need to find somewhere to send them. This is different for every repository, but here are some common places to check for contact details:

• The project's website (if it has one)
• Track down the author's name on other websites

### Step 2: Make a patch file

Now that you've found a place to send your contribution to, we need to pack it into a nice neat box that can be transported (usually via email as an attachment). Doing so is fairly simple. You need to first identify the hashes of the commits you want to include. Do that with this command:

git log --one-line --graph --decorate

You might get some output that looks a bit like this:

* c443459 (HEAD -> some-patch) wireframe/corner_set: fix luacheck warnings
* 3d12345 //smake: fix luacheck warnings
* 4c7bb6a //sfactor: fix luacheck warnings; fix crash
* ee46507 fixup
* 58933c6 README: Update command list
* 6c49b9d fixup again
* 364de73 fixup

In your terminal it will probably be coloured. The 7 digit hexadecimal value (e.g. 4c7bb6a) there is the commit hash. Copy the commit hash of the oldest and the newest commits in question, and then do this:

git format-patch --stdout OLDHASH..NEWHASH >somefilename.patch

...replacing OLDHASH and NEWHASH with the oldest and newest commit hashes respectively. If the newest commit hash is the latest commit on the branch, then the keyword HEAD can also be used instead.

### Step 3: Submit patch file

Now that you have a patch file, you can send it to the author. By email, instant messaging, or avian carriers - any means of communication will do!

This is all there is to it. If you've received such a patch and are unsure about what to do though, keep reading.

### But what happens if I receive a contribution?

If you've received a patch file generated by the above method and don't know what to do with it, read on! You may have received a patch file for a variety of reasons:

• Someone's interested in improving your project
• You've previously sent a contribution to someone else, and they've sent back a patch of their own along with a code review of things you need to change or improve

Either way, it's easy to apply it to your git repository. First, make sure you have the branch in question you want to apply the commits to checked out. Then, download the patch file, and do this:

git am path/to/somefile.patch

...this will apply the commits contained within to the currently checked out branch for you. If you're unsure about what they contain, don't forget that you can always open the patch file in your text editor and inspect it, or do this to see a quick summary:

grep Subject: path/to/somefile.patch

Once a patch file is applied, you can handle things in the usual way - for example you'll probably want to use git push to push the commit(s) to your remote, or perhaps git rebase -i to clean them up first.

### Conclusion

In this post, I've shown you how to create and apply patch files. This is extremely useful when dealing with sending patches to code repositories that are either on servers where you can't create an account to open a pull request (e.g. Gitea) or just simply doesn't have a pull request system at all. It can even be used in extreme situations where a given code repository doesn't have a central remote server at all - this is surely where git get's its reputation as a distributed version control system.

[repos]: Ref https://github.com/search?q=is:public as of 2022-01-06

## Backing up with tar, curl, and SFTP with key-based authentication

I have multiple backup strategies, from restic (which was preceded by duplicity) to btrfs snapshots that I sync over ssh. You can never have too many backups though (especially for your most valuable data that can't be easily replaced), so in this post I want to share another of the mechanisms I employ.

Backup systems have to suit the situation at hand, and in this case I have a personal git server which I backup daily to Backblaze B2. In order to be really absolutely sure that I don't lose it though, I also back it up to my home NAS (see also the series that I wrote on it). As you might have guessed km the title of this post, it takes backups using tar. I have recently upgraded it to transfer these backups over SFTP (SSH File Transfer Protocol).

Given that the sftp command exists, one might wonder why I use curl instead. Unfortunately, sftp as far as I can tell does not support uploading a file passed in though stdin - which is very useful when you have limited disk space on the source host! But using curl, we can pipe the output of tar directly to curl without touching the disk.

Documentation is sadly rather sparse on using curl to upload via SFTP, so it took some digging to figure out how to do it using SSH keys. SSH keys are considerably more secure than using a password (and a growing number of my systems are setup to disallow password authentication altogether), so I'll be using SSH key based authentication in this post.

To start, you'll need to generate a new SSH keypair. I like to use ed25519:

ssh-keygen -t ed25519

When prompted, choose where you want to save it to (preferably with a descriptive name), and then do not put a password on it. This is important, because at least in my case want this to operate completely autonomously without any user input.

Then, copy the public SSH key to your remote server (I strongly recommend using an account that is locked to be SFTP-only and no shell access - this tutorial seems to be good at explaining the steps involved in doing this), and then on the device doing the backing up do a test to both make sure it works and add the remote server to the known_hosts file:

sudo -u backupuser bash
ssh -i path/to/keyfile -T remoteuser@remotehost

Now we've got our SSH / SFTP setup done, we can do the backup itself:

ionice -c Idle nice -n20 tar --create --exclude-tag .BACKUP_IGNORE --gzip --file path/to/dir_to_backup | curl -sS --user "remoteuser:" --key "path/to/sshkey_ed25519" --pubkey "path/to/sshkey_ed25519.pub" -T - "sftp://example.com/path/on/remote/upload_filename.tar.gz"

Let's break this down a bit:

• ionice -c Idle node -n20: Push the backup job into the background - both for the CPU and disk priorities. Optional.
• tar --create --exclude-tag .BACKUP_IGNORE --gzip --file path/to/dir_to_backup: An example tar command. Use whatever you want here.
• --user "remoteuser:": The remoteuser bit there is the user to login to the remote host with. The bit after the colon is technically the password, but we're leaving that blank 'cause we're using SSH keys instead.
• --key "path/to/sshkey_ed25519": The path to the SSH private key.
• -T -: Upload the standard input instead of a file on disk
• --pubkey "path/to/sshkey_ed25519.pub": The path to the SSH public key.
• sftp://example.com/path/on/remote/upload_filename.tar.gz: The host to upload to and path thereon to upload the standard input to. If you need to specify a custom port here, do sftp://example.com:20202/path/blah/.... instead, where 20202 is your custom port number.

Personally, I'm using this technique with an SSH tunnel, so my variant of the above command looks a bit like this (extra bits around the edges stripped away for clarity):

git_backup_user="sftpbackups";
git_backup_location="sftp://localhost:20204/git-backups";
git_backup_key="path/to/sshkey_ed25519";
upload_filename="git-$(date +"%Y-%m-%d").tar.gz"; nice -n20 tar --create --exclude-tag .BACKUP_IGNORE --gzip --file - git/{data,gitea,repos}/ www/blog | curl -sS --user "${git_backup_user}:" --key "${git_backup_key}" --pubkey "${git_backup_key}.pub" -T - "${git_backup_location}/${upload_filename}"

That's it for this post. If you've got any questions or comments, please post them below.

## lnav basics tutorial

Last year, I blogged about lnav. lnav is a fantastic tool for analysing log files, and after getting a question from CrimsonTome I thought I'd write up a longer-form tutorial on the basics of using it, as I personally find it exceedingly useful.

I'll be using an Ubuntu Server 20.04 instance for this tutorial, but anything Linuxy will work just fine. As mentioned in my previous post, it's available in the default repositories for your distribution. For apt-based systems, install like so:

sudo apt install lnav

Adjust for your own package manager. For example, pacman-based distributions should do this:

sudo pacman -S lnav

lnav operates on 1 or more input files. It's common to use logrotate to rotate log files, so this is what I'd recommend to analyse all your logs of a particular type in 1 go (here I analyse generic syslog logs):

lnav /var/log/syslog*

On your system you may need to sudo that. Once you've got lnav started, you may need to wait a moment for it to parse all the log files - especially if you have multi-million line logfiles.

After it's finished loading, we can get to analysing the logs at hand. The most recent logs appear at the bottom, and you'll notice that lnav will have coloured various parts of each log message - the reason for this will become apparently later on. lnav should also livestream log lines from disk too.

Use the arrow keys or scroll up / down to navigate log messages.

lnav operates via a command pallette system, which if you use GitHub's [Atom IDE] (https://atom.io/) or Sublime Text (which is apparently where the feature originated) may already be familiar to you. In lnav's case, it's also crossed with a simple shell. Let's start with the most important command: :filter-out.

To execute a command, simply start typing. Commands in lnav are prefixed with a colon :. :filter-out takes a regular expression as it's only argument and filters all log lines which match the given regular expression out and hides them. Sticking with our earlier syslog theme, here's an example:

:filter-out kernel:

You'll notice that once you've finished typing :filter-out, lnav will show you some help in a pane at the bottom of the screen showing you how to use that command.

:filter-out has a twin that's also useful to remember: :filter-in. Unlike :filter-out, :filter-in does the opposite - anything that doesn't match the specified pattern is hidden from view. Very useful if you know what kind of log messages you're looking for, and they are a (potentially very small) subset of a much larger and more unstructured log file.

:filter-in dovecot:

To delete all existing filters and reset the view, hit Ctrl + R.

lnav has many other built-in commands. Check out the full reference here: https://docs.lnav.org/en/latest/commands.html.

The other feature that lnav comes with is also the most powerful: SQLite3 support. By parsing common log file formats (advanced users can extend lnav by defining their own custom formats, but the specifics of how to do this are best left to the lnav documentation), it can enable you to query your log files by writing arbitrary SQLite queries!

To understand how to query a file, first hit the p key. This will show you how lnav has parsed the log line at the top of the screen (scroll as normal to look at different lines, and hit p again to hide). Here's an example:

Using this information, we can then make an SQL query against the data. Press semicolon ; to open the SQL query prompt, and then enter something like this:

SELECT * FROM syslog_log WHERE log_procname == "gitea";

....hit the enter key when you're done composing your query, and the results should then appear! You can scroll through them just like you do with the regular log viewer - you just can't use :filter-in and :filter-out until you leave the query results window with the q key (this would be a really useful feature though!).

If you're running lnav on your Nginx logs (located in /var/log/nginx/ by default), then I find this query to be of particular use:

SELECT COUNT(cs_referer) AS count, cs_referer FROM access_log GROUP BY cs_referer ORDER BY COUNT(cs_referer) DESC

That concludes this basic tutorial on lnav. There are many more features that lnav offers:

• :filter-expr for filtering the main view by SQL query
• Analysing files on remote hosts over SSH
• Search logs for a given string (press / and start typing)
• Too many others to list here

Check out the full documentation here: https://docs.lnav.org/

## Tips for training (large numbers of) AI models

As part of my PhD, I'm training AI models. The specifics as to what for don't particularly matter for this post (though if you're curious I recommend my PhD update blog post series). Over the last year or so, I've found myself training a lot of AI models, and dealing with a lot of data. In this post, I'm going to talk about some of the things I've found helpful and some of the things things I've found that are best avoided. Note that this is just a snapshot of my current practices now - this will probably gradually change over time.

I've been working with Tensorflow.js and Tensorflow for Python on various Linux systems. If you're on another OS or not working with AI then what I say here should still be somewhat relevant.

### Datasets

First up: a quick word on datasets. While this post is mainly about AI models, datasets are important too. Keeping them organised is vitally important. Keeping all the metadata that associated with them is also vitally important. Keeping a good directory hierarchy is the best way to achieve this.

I also recommend sticking with a standard format that's easy to parse using your preferred language - and preferably lots of other languages too. Json Lines is my personal favourite format for data - potentially compressed with Gzip if the filesize of is very large.

### AI Models

There are multiple facets to the problem of wrangling AI models:

1. Code that implements the model itself and supporting code
2. Checkpoints from the training process
3. Analysis results from analysing such models

All of these are important for different reasons - and are also affected by where it is that you're going to be training your model.

By far the most important thing I recommend doing is using Git with a remote such as GitHub and committing regularly. I can't stress enough how critical this is - it's the best way to both keep a detailed history of the code you've written and keep a backup at the same time. It also makes working on multiple computers easy. Getting into the habit of using Git for any project (doesn't matter what it is) will make your life a lot easier. At the beginning of a programming session, pull down your changes. Then, as you work, commit your changes and describe them properly. Finally, push your changes to the remote after committing to keep them backed up.

Coming in at a close second is implementing is a command line interface with the ability to change the behaviour of your model. This includes:

• Setting input datasets
• Specifying output directories
• Model hyperparameters (e.g. input size, number of layers, number of units per layer, etc)

This is invaluable for running many different variants of your model quickly to compare results. It is also very useful when training your model in headless environments, such as on High Performance Computers (HPCs) such as Viper that my University has.

For HPCs that use Slurm, a great tip here is that when you call sbatch on your job file (e.g. sbatch path/to/jobfile.job), it will preserve your environment. This lets you pass in job-specific parameters by writing a script like this:

#!/usr/bin/env bash
#SBATCH -J TwImgCCT
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --gres=gpu:1
#SBATCH -o %j.%N.%a.out
#SBATCH -e %j.%N.%a.err
#SBATCH -p gpu05,gpu
#SBATCH --time=5-00:00:00
#SBATCH --mem=25600
# 25600 = 25GiB memory required

# Viper use Trinity ClusterVision: https://clustervision.com/trinityx-cluster-management/ and https://github.com/clustervision/trinityX

echo ">>> Installing requirements";
conda run -n py38 pip install -r requirements.txt;
echo ">>> Training model";
/usr/bin/env time --verbose conda run -n py38 src/my_model.py ${PARAMS} echo ">>> exited with code$?";

....which you can call like so:

PARAMS="--size 4 --example 'something else' --input path/to/file --output outputs/20211002-resnet" sbatch path/to/jobfile.job

You may end up finding you have rather a lot of code behind your model - especially for data preprocessing depending on your dataset. To handle this, I go by 2 rules of thumb:

1. If a source file of any language is more than 300 lines long, it should be split into multiple files
2. If a collection of files do a thing together rather nicely, they belong in a separate Git repository.

To elaborate on these, having source code files become very long makes them difficult to maintain, understand, and re-use in future projects. Splitting them up makes your life much easier.

Going further, modularising your code is also an amazing paradigm to work with. I've broken many parts of my various codebases I've implemented for my PhD out as open-source projects on npm (the Node Package Manager) - most notably applause-cli, terrain50, terrain50-cli, nimrod-data-downloader, and twitter-academic-downloader.

By making them open-source, I'm not only making my research and methods more transparent and easier for others to independently verify, but I'm also allowing others to benefit from them (and potentially improve them) too! As they say, there's no need to re-invent the wheel.

Eventually, I will be making the AI models I'm implementing for my PhD open-source too - but this will take some time as I want to ensure that the models actually work before doing so (I've got 1 model I implemented fully and documented too, but in the end it has a critical bug that means the whole thing is useless.....).

Saving checkpoints from the training process of your model is also essential. I recommend doing so at the end of each epoch. As part of this, it's also useful to have a standard format for your output artefacts from the training process. Ideally, these artefacts can be used to identify precisely what dataset and hyperparameters that model and checkpoints were trained with.

At the moment, my models output something like this:

+ output_dir/
+ summary.txt       Summary of the layers of the model and their output shapes
+ metrics.tsv       TSV file containing training/validation loss/accuracy and epoch numbers
+ settings.toml     The TOML settings that the model was trained with
+ checkpoints/      Directory containing the checkpoints - 1 per epoch
+ checkpoint_e1_val_acc0.699.hdf5   Example checkpoint filename [Tensorflow for Python]
+ 0/            OR, if using Tensorflow.js instead of Tensorflow for Python, 1 directory per checkpoint
+ this_run.log      Logfile for this run [depends on where the program is being executed]

settings.toml leads me on to settings files. Personally I use TOML for mine, and I use 2 files:

• settings.default.toml - Contains all the default values of the settings, and is located alongside the code for my model
• example.toml - Custom settings that override values in the default settings file can be specified using my standard --config CLI argument.

Having a config file is handy when you have multiple dataset input files that rarely change. Generally speaking you want to ensure that you minimise the number of CLI arguments that you have to specify when running your model, as then it reduces cognitive load when you're training many variants of a model at once (I've found that wrangling dozens of different dataset files and model variants is hard enough to focus on and keep organised :P).

Analysis results are the final aspect here that it's important to keep organised - and the area in which I have the least experience. I've found it's important to keep track of which model checkpoint it was that the analysis was done with and which dataset said model was trained on. Keeping the entire chain of dataflow clear and easy to follow is difficult because the analysis one does is usually ad-hoc, and often has to be repeated many times on different model variants.

For this, so far I generate statistics and some graphs on the command line. If you're not already familiar with the terminal / command line of your machine, I can recommend checking out my earlier post Learn Your Terminal, which has a bunch of links to tutorials for this. In addition, jq is an amazing tool for manipulating JSON data. It's not installed by default on most systems, but it's available in most default repositories and well worth the install.

For some graphs, I use Gnuplot. Usually though this is only for more complex plots, as it takes a moment to write a .plt file to generate the graph I want in it.

I'm still looking for a good tool that makes it easy to generate basic graphs from the command line, so please get in touch if you've found one.

I'm also considering integrating some of the basic analysis into my model training program itself, such that it generates e.g. confusion matrices automatically as part of the training process. matplotlib seems to do the job here for plotting graphs in Python, but I have yet to find an equivalent library for Javascript. Again, if you've found one please get in touch by leaving a comment below.

### Conclusion

In this post, I've talked about some of the things I've found helpful so far while I've been training models. From using Git to output artefacts to implementing command line interfaces and wrangling datasets, implementing the core AI model itself is actually only a very small part of an AI project.

Hopefully this post has given you some insight into the process of developing an AI model / AI-powered system. While I've been doing some of these things since before I started my PhD (like Git), others have taken me a while to figure out - so I've noted them down here so that you don't have to spend ages figuring out the same things!

If you've got some good tips you'd like to share on developing AI models (or if you've found the tips here in this blog post helpful!), please do share them below.

## NAS Backups, Part 2: Btrfs send / receive

Hey there! In the first post of this series, I talked about my plan for a backup NAS to complement my main NAS. In this part, I'm going to show the pair of scripts I've developed to take care of backing up btrfs snapshots.

The first script is called snapshot-send.sh, and it:

1. Calculates which snapshot it is that requires sending
2. Uses SSH to remote into the backup NAS
3. Pipes the output of btrfs send to snapshot-receive.sh on the backup NAS that is called with sudo

Note there that while sudo is used for calling snapshot-receive.sh, the account it uses to SSH into the backup NAS, it doesn't have completely unrestricted sudo access. Instead, a sudo rule is used to restrict it to allow only specific commands to be called (without a password, as this is intended to be a completely automated and unattended system).

The second script is called snapshot-receive.sh, and it receives the output of btrfs send and pipes it to btrfs receive. It also has some extra logic to delete old snapshots and stuff like that.

Both of these are designed to be command line programs in their own right with a simple CLI, and useful error / help messages to assist in understanding it when I come back to it to fix an issue or extend it after many months.

### snapshot-send.sh

As described above, snapshot-send.sh sends btrfs snapshot to a remote host via SSH and the snapshot-receive.sh script.

Before we continue and look at it in detail, it is important to note that snapshot-send.sh depends on btrfs-snapshot-rotation. If you haven't already done so, you should set that up first before setting up my scripts here.

If you have btrfs-snapshot-rotation setup correctly, you should have something like this in your crontab:

# Btrfs automatic snapshots
0 * * * *       cronic /root/btrfs-snapshot-rotation/btrfs-snapshot /mnt/some_btrfs_filesystem/main /mnt/some_btrfs_filesystem/main/.snapshots hourly 8
0 2 * * *       cronic /root/btrfs-snapshot-rotation/btrfs-snapshot /mnt/some_btrfs_filesystem/main /mnt/some_btrfs_filesystem/main/.snapshots daily 4
0 2 * * 7       cronic /root/btrfs-snapshot-rotation/btrfs-snapshot /mnt/some_btrfs_filesystem/main /mnt/some_btrfs_filesystem/main/.snapshots weekly 4

I use cronic there to reduce unnecessary emails. I also have a subvolume there for the snapshots:

sudo btrfs subvolume create /mnt/some_btrfs_filesystem/main/.snapshots

Because Btrfs does not take take a snapshot of any child subvolumes when it takes a snapshot, I can use this to keep all my snapshots organised and associated with the subvolume they are snapshots of.

If done right, ls /mnt/some_btrfs_filesystem/main/.snapshots should result in something like this:

2021-07-25T02:00:01+00:00-@weekly  2021-08-17T07:00:01+00:00-@hourly
2021-08-01T02:00:01+00:00-@weekly  2021-08-17T08:00:01+00:00-@hourly
2021-08-08T02:00:01+00:00-@weekly  2021-08-17T09:00:01+00:00-@hourly
2021-08-14T02:00:01+00:00-@daily   2021-08-17T10:00:01+00:00-@hourly
2021-08-15T02:00:01+00:00-@daily   2021-08-17T11:00:01+00:00-@hourly
2021-08-15T02:00:01+00:00-@weekly  2021-08-17T12:00:01+00:00-@hourly
2021-08-16T02:00:01+00:00-@daily   2021-08-17T13:00:01+00:00-@hourly
2021-08-17T02:00:01+00:00-@daily   last_sent_@daily.txt
2021-08-17T06:00:01+00:00-@hourly

Ignore the last_sent_@daily.txt there for now - it's created by snapshot-send.sh so that it can remember the name of the snapshot it last sent. We'll talk about it later.

With that out of the way, let's start going through snapshot-send.sh! First up is the CLI and associated error handling:

#!/usr/bin/env bash
set -e;

dir_source="${1}"; tag_source="${2}";
tag_dest="${3}"; loc_ssh_key="${4}";
remote_host="${5}"; if [[ -z "${remote_host}" ]]; then
echo "This script sends btrfs snapshots to a remote host via SSH.
The script snapshot-receive must be present on the remote host in the PATH for this to work.
It pairs well with btrfs-snapshot-rotation: https://github.com/mmehnert/btrfs-snapshot-rotation
Usage:
snapshot-send.sh <snapshot_dir> <source_tag_name> <dest_tag_name> <ssh_key> <user@example.com>

Where:
<snapshot_dir> is the path to the directory containing the snapshots
<source_tag_name> is the tag name to look for (see btrfs-snapshot-rotation).
<dest_tag_name> is the tag name to use when sending to the remote. This must be unique across all snapshot rotations sent.
<ssh_key> is the path to the ssh private key
<user@example.com> is the user@host to connect to via SSH" >&2;
exit 0;
fi

# $EUID = effective uid if [[ "${EUID}" -ne 0 ]]; then
echo "Error: This script must be run as root (currently running as effective uid ${EUID})" >&2; exit 5; fi if [[ ! -e "${loc_ssh_key}" ]]; then
echo "Error: When looking for the ssh key, no file was found at '${loc_ssh_key}' (have you checked the spelling and file permissions?)." >&2; exit 1; fi if [[ ! -d "${dir_source}" ]]; then
echo "Error: No source directory located at '${dir_source}' (have you checked the spelling and permissions?)" >&2; exit 2; fi ############################################################################### Pretty simple stuff. snapshot-send.sh is called like so: snapshot-send.sh /absolute/path/to/snapshot_dir SOURCE_TAG DEST_TAG_NAME path/to/ssh_key user@example.com A few things to unpack here. • /absolute/path/to/snapshot_dir is the path to the directory (i.e. btrfs subvolume) containing the snapshots we want to read, as described above. • SOURCE_TAG: Given the directory (subvolume) name of a snapshot (e.g. 2021-08-17T02:00:01+00:00-@daily), then the source tag is the bit at the end after the at sign @ - e.g. daily. • DEST_TAG_NAME: The tag name to give the snapshot on the backup NAS. Useful, because you might have multiple subvolumes you snapshot with btrfs-snapshot-rotation and they all might have snapshots with the daily tag. • path/to/ssh_key: The path to the (unencrypted!) SSH key to use to SSH into the remote backup NAS. • user@example.com: The user and hostname of the backup NAS to SSH into. This is a good time to sort out the remote user we're going to SSH into (we'll sort out snapshot-receive.sh and the sudo rules in the next section below). Assuming that you already have a Btrfs filesystem setup and automounting on boot on the remote NAS, do this:  sudo useradd --system --home /absolute/path/to/btrfs-filesystem/backups backups sudo groupadd backup-senders sudo usermod -a -G backup-senders backups cd /absolute/path/to/btrfs-filesystem/backups sudo mkdir .ssh sudo touch .ssh/authorized_keys sudo chown -R backups:backups .ssh sudo chmod -R u=rwX,g=rX,o-rwx .ssh  Then, on the main NAS, generate the SSH key:  mkdir -p /root/backups && cd /root/backups ssh-keygen -t ed25519 -C backups@main-nas -f /root/backups/ssh_key_backup_nas_ed25519  Then, copy the generated SSH public key to the authorized_keys file on the backup NAS (located at /absolute/path/to/btrfs-filesystem/backups/.ssh/authorized_keys). Now that's sorted, let's continue with snapshot-send.sh. Next up are a few miscellaneous functions:  # The filepath to the last sent text file that contains the name of the snapshot that was last sent to the remote. # If this file doesn't exist, then we send a full snapshot to start with. # We need to keep track of this because we need this information to know which # snapshot we need to parent the latest snapshot from to send snapshots incrementally. filepath_last_sent="${dir_source}/last_sent_@${tag_source}.txt"; ## Logs a message to stderr. #$*    The message to log.
log_msg() {
echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}"; } ## Lists all the currently available snapshots for the current source tag. list_snapshots() { find "${dir_source}" -maxdepth 1 ! -path "${dir_source}" -name "*@${tag_source}" -type d;
}

## Returns an exit code of 0 if we've sent a snapshot, or 1 if we haven't.
have_sent() {
if [[ ! -f "${filepath_last_sent}" ]]; then return 1; else return 0; fi } ## Fetches the directory name of the last snapshot sent to the remote with the given tag name. last_sent() { if [[ -f "${filepath_last_sent}" ]]; then
cat "${filepath_last_sent}"; fi } # Runs snapshot-receive on the remote host. do_ssh() { ssh -o "ServerAliveInterval=900" -i "${loc_ssh_key}" "${remote_host}" sudo snapshot-receive "${tag_dest}";
}

Particularly of note is the filepath_last_sent variable - this is set to the path to that text file I mentioned earlier.

Other than that it's all pretty well commented, so let's continue on. Next, we need to determine the name of the latest snapshot:

latest_snapshot="$(list_snapshots | sort | tail -n1)"; latest_snapshot_dirname="$(dirname "${latest_snapshot}")"; With this information in hand we can compare it to the last snapshot name we sent. We store this in the text file mentioned above - the path to which is stored in the filepath_last_sent variable. if [[ "$(dirname "${latest_snapshot_dirname}")" == "$(cat "${filepath_last_sent}")" ]]; then if [[ -z "${FORCE_SEND}" ]]; then
echo "We've sent the latest snapshot '${latest_snapshot_dirname}' already and the FORCE_SEND environment variable is empty or not specified, skipping"; exit 0; else echo "We've sent it already, but sending it again since the FORCE_SEND environment variable is specified"; fi fi If the latest snapshot has the same name as the one we last send, we exit out - unless the FORCE_SEND environment variable is specified (to allow for an easy way to fix stuff if it goes wrong on the other end). Now, we can actually send the snapshot to the remote:  if ! have_sent; then log_msg "Sending initial snapshot$(dirname "${latest_snapshot}")"; btrfs send "${latest_snapshot}" | do_ssh;
else
parent_snapshot="${dir_source}/$(last_sent)";
if [[ ! -d "${parent_snapshot}" ]]; then echo "Error: Failed to locate parent snapshot at '${parent_snapshot}'" >&2;
exit 3;
fi

log_msg "Sending incremental snapshot $(dirname "${latest_snapshot}") parent $(last_sent)"; btrfs send -p "${parent_snapshot}" "${latest_snapshot}" | do_ssh; fi have_sent simply determines if we have previously sent a snapshot before. We know this by checking the filepath_last_sent text file. If we haven't, then we send a full snapshot rather than an incremental one. If we're sending an incremental one, then we find the parent snapshot (i.e. the one we last sent). If we can't find it, we generate an error (it's because of this that you need to store at least 2 snapshots at a time with btrfs-snapshot-rotation). After sending a snapshot, we need to update the filepath_last_sent text file: log_msg "Updating state information"; basename "${latest_snapshot}" >"${filepath_last_sent}"; log_msg "Snapshot sent successfully"; ....and that concludes snapshot-send.sh! Once you've finished reading this blog post and testing your setup, put your snapshot-send.sh calls in a script in /etc/cron.daily or something. ### snapshot-receive.sh Next up is the receiving end of the system. The CLI for this script is much simpler, on account of sudo rules only allowing exact and specific commands (no wildcards or regex of any kind). I put snapshot-receive.sh in /usr/local/sbin and called it snapshot-receive. Let's get started: #!/usr/bin/env bash # This script wraps btrfs receive so that it can be called by non-root users. # It should be saved to '/usr/local/sbin/snapshot-receive' (without quotes, of course). # The following entry needs to be put in the sudoers file: # # %backup-senders ALL=(ALL) NOPASSWD: /usr/local/sbin/snapshot-receive TAG_NAME # # ....replacing TAG_NAME with the name of tag you want to allow. You'll need 1 line in your sudoers file per tag you want to allow. # Edit your sudoers file like this: # sudo visudo # The ABSOLUTE path to the target directory to receive to. target_dir="CHANGE_ME"; # The maximum number of backups to keep. max_backups="7"; # Allow only alphanumeric characters in the tag tag="$(echo "${1}" | tr -cd '[:alnum:]-_')"; snapshot-receive.sh only takes a single argument, and that's the tag it should use for the snapshot being received:  sudo snapshot-receive DEST_TAG_NAME  The target directory it should save snapshots to is stored as a variable at the top of the file (the target_dir there). You should change this based on your specific setup. It goes without saying, but the target directory needs to be a directory on a btrfs filesystem (preferable raid1, though as I've said before btrfs raid1 is a misnomer). We also ensure that the tag contains only safe characters for security. max_backups is the maximum number of snapshots to keep. Any older snapshots will be deleted. Next, ime error handling: ############################################################################### #$EUID = effective uid
if [[ "${EUID}" -ne 0 ]]; then echo "Error: This script must be run as root (currently running as effective uid${EUID})" >&2;
exit 5;
fi

if [[ -z "${tag}" ]]; then echo "Error: No tag specified. It should be specified as the 1st and only argument, and may only contain alphanumeric characters." >&2; echo "Example:" >&2; echo " snapshot-receive TAG_NAME_HERE" >&2; exit 4; fi Nothing too exciting. Continuing on, a pair of useful helper functions:  ############################################################################### ## Logs a message to stderr. #$*    The message to log.
log_msg() {
echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}"; } list_backups() { find "${target_dir}/${tag}" -maxdepth 1 ! -path "${target_dir}/${tag}" -type d; } list_backups lists the snapshots with the given tag, and log_msg logs messages to stdout (not stderr unless there's an error, because otherwise cronic will dutifully send you an email every time the scripts execute). Next up, more error handling: ############################################################################### if [[ "${target_dir}" == "CHANGE_ME" ]]; then
echo "Error: target_dir was not changed from the default value." >&2;
exit 1;
fi

if [[ ! -d "${target_dir}" ]]; then echo "Error: No directory was found at '${target_dir}'." >&2;
exit 2;
fi

if [[ ! -d "${target_dir}/${tag}" ]]; then
log_msg "Creating new directory at ${target_dir}/${tag}";
mkdir "${target_dir}/${tag}";
fi

We check:

• That the target directory was changed from the default CHANGE_ME value
• That the target directory exists

We also create a subdirectory for the given tag if it doesn't exist already.

With the preamble completed, we can actually receive the snapshot:

log_msg "Launching btrfs in chroot mode";

time nice ionice -c Idle btrfs receive --chroot "${target_dir}/${tag}";

We use nice and ionice to reduce the priority of the receive to the lowest possible level. If you're using a Raspberry Pi (I have a Raspberry Pi 4 with 4GB RAM) like I am, this is important for stability (Pis tend to fall over otherwise). Don't worry if you experience some system crashes on your Pi when transferring the first snapshot - I've found that incremental snapshots don't cause the same issue.

We also use the chroot option there for increased security.

Now that the snapshot is transferred, we can delete old snapshots if we have too many:

backups_count="$(echo -e "$(list_backups)" | wc -l)";

log_msg "Btrfs finished, we now have ${backups_count} backups:"; list_backups; while [[ "${backups_count}" -gt "${max_backups}" ]]; do oldest_backup="$(list_backups | sort | head -n1)";
log_msg "Maximum number backups is ${max_backups}, requesting removal of backup for$(dirname "${oldest_backup}")"; btrfs subvolume delete "${oldest_backup}";

backups_count="$(echo -e "$(list_backups)" | wc -l)";
done

log_msg "Done, any removed backups will be deleted in the background";

Sorted! The only thing left to do here is to setup those sudo rules. Let's do that now. Execute sudoedit /etc/sudoers, and enter the following:

%backup-senders ALL=(ALL) NOPASSWD: /usr/local/sbin/snapshot-receive TAG_NAME

Replace TAG_NAME with the DEST_TAG_NAME you're using. You'll need 1 entry in /etc/sudoers for each DEST_TAG_NAME you're using.

We assign the rights to the backup-senders group we created earlier, of which the user we are going to SSH in with is a member. This make the system more flexible should we want to extend it later.

Warning: A mistake in /etc/sudoers can leave you unable to use sudo! Make sure you have a root shell open in the background and that you test sudo again after making changes to ensure you haven't made a mistake.

That completes the setup of snapshot-receive.sh.

### Conclusion

With snapshot-send.sh and snapshot-receive.sh, we now have a system for transferring snapshots from 1 host to another via SSH. If combined with full disk encryption (e.g. with LUKS), this provides a secure backup system with a number of desirable qualities:

• The main NAS can't access the backups on the backup NAS (in case fo ransomware)
• Backups are encrypted during transfer (via SSH)
• Backups are encrypted at rest (LUKS)

To further secure the backup NAS, one could:

• Automatically start / shutdown the backup NAS (though with full disk encryption when it boots up it would require manual intervention)

At the bottom of this post I've included the full scripts for you to copy and paste.

As it turns out, there will be 1 more post in this series, which will cover generating multiple streams of backups (e.g. weekly, monthly) from a single stream of e.g. daily backups on my backup NAS.

### Full scripts

#### snapshot-send.sh

#!/usr/bin/env bash
set -e;

dir_source="${1}"; tag_source="${2}";
tag_dest="${3}"; loc_ssh_key="${4}";
remote_host="${5}"; if [[ -z "${remote_host}" ]]; then
echo "This script sends btrfs snapshots to a remote host via SSH.
The script snapshot-receive must be present on the remote host in the PATH for this to work.
It pairs well with btrfs-snapshot-rotation: https://github.com/mmehnert/btrfs-snapshot-rotation
Usage:
snapshot-send.sh <snapshot_dir> <source_tag_name> <dest_tag_name> <ssh_key> <user@example.com>

Where:
<snapshot_dir> is the path to the directory containing the snapshots
<source_tag_name> is the tag name to look for (see btrfs-snapshot-rotation).
<dest_tag_name> is the tag name to use when sending to the remote. This must be unique across all snapshot rotations sent.
<ssh_key> is the path to the ssh private key
<user@example.com> is the user@host to connect to via SSH" >&2;
exit 0;
fi

# $EUID = effective uid if [[ "${EUID}" -ne 0 ]]; then
echo "Error: This script must be run as root (currently running as effective uid ${EUID})" >&2; exit 5; fi if [[ ! -e "${loc_ssh_key}" ]]; then
echo "Error: When looking for the ssh key, no file was found at '${loc_ssh_key}' (have you checked the spelling and file permissions?)." >&2; exit 1; fi if [[ ! -d "${dir_source}" ]]; then
echo "Error: No source directory located at '${dir_source}' (have you checked the spelling and permissions?)" >&2; exit 2; fi ############################################################################### # The filepath to the last sent text file that contains the name of the snapshot that was last sent to the remote. # If this file doesn't exist, then we send a full snapshot to start with. # We need to keep track of this because we need this information to know which # snapshot we need to parent the latest snapshot from to send snapshots incrementally. filepath_last_sent="${dir_source}/last_sent_@${tag_source}.txt"; ## Logs a message to stderr. #$*    The message to log.
log_msg() {
echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}"; } ## Lists all the currently available snapshots for the current source tag. list_snapshots() { find "${dir_source}" -maxdepth 1 ! -path "${dir_source}" -name "*@${tag_source}" -type d;
}

## Returns an exit code of 0 if we've sent a snapshot, or 1 if we haven't.
have_sent() {
if [[ ! -f "${filepath_last_sent}" ]]; then return 1; else return 0; fi } ## Fetches the directory name of the last snapshot sent to the remote with the given tag name. last_sent() { if [[ -f "${filepath_last_sent}" ]]; then
cat "${filepath_last_sent}"; fi } do_ssh() { ssh -o "ServerAliveInterval=900" -i "${loc_ssh_key}" "${remote_host}" sudo snapshot-receive "${tag_dest}";
}

latest_snapshot="$(list_snapshots | sort | tail -n1)"; latest_snapshot_dirname="$(dirname "${latest_snapshot}")"; if [[ "$(dirname "${latest_snapshot_dirname}")" == "$(cat "${filepath_last_sent}")" ]]; then if [[ -z "${FORCE_SEND}" ]]; then
echo "We've sent the latest snapshot '${latest_snapshot_dirname}' already and the FORCE_SEND environment variable is empty or not specified, skipping"; exit 0; else echo "We've sent it already, but sending it again since the FORCE_SEND environment variable is specified"; fi fi if ! have_sent; then log_msg "Sending initial snapshot$(dirname "${latest_snapshot}")"; btrfs send "${latest_snapshot}" | do_ssh;
else
parent_snapshot="${dir_source}/$(last_sent)";
if [[ ! -d "${parent_snapshot}" ]]; then echo "Error: Failed to locate parent snapshot at '${parent_snapshot}'" >&2;
exit 3;
fi

log_msg "Sending incremental snapshot $(dirname "${latest_snapshot}") parent $(last_sent)"; btrfs send -p "${parent_snapshot}" "${latest_snapshot}" | do_ssh; fi log_msg "Updating state information"; basename "${latest_snapshot}" >"${filepath_last_sent}"; log_msg "Snapshot sent successfully"; #### snapshot-receive.sh #!/usr/bin/env bash # This script wraps btrfs receive so that it can be called by non-root users. # It should be saved to '/usr/local/sbin/snapshot-receive' (without quotes, of course). # The following entry needs to be put in the sudoers file: # # %backup-senders ALL=(ALL) NOPASSWD: /usr/local/sbin/snapshot-receive TAG_NAME # # ....replacing TAG_NAME with the name of tag you want to allow. You'll need 1 line in your sudoers file per tag you want to allow. # Edit your sudoers file like this: # sudo visudo # The ABSOLUTE path to the target directory to receive to. target_dir="CHANGE_ME"; # The maximum number of backups to keep. max_backups="7"; # Allow only alphanumeric characters in the tag tag="$(echo "${1}" | tr -cd '[:alnum:]-_')"; ############################################################################### #$EUID = effective uid
if [[ "${EUID}" -ne 0 ]]; then echo "Error: This script must be run as root (currently running as effective uid${EUID})" >&2;
exit 5;
fi

if [[ -z "${tag}" ]]; then echo "Error: No tag specified. It should be specified as the 1st and only argument, and may only contain alphanumeric characters." >&2; echo "Example:" >&2; echo " snapshot-receive TAG_NAME_HERE" >&2; exit 4; fi ############################################################################### ## Logs a message to stderr. #$*    The message to log.
log_msg() {
echo "[ $(date +%Y-%m-%dT%H:%M:%S) ] remote/${HOSTNAME}: >>> ${*}"; } list_backups() { find "${target_dir}/${tag}" -maxdepth 1 ! -path "${target_dir}/${tag}" -type d; } ############################################################################### if [[ "${target_dir}" == "CHANGE_ME" ]]; then
echo "Error: target_dir was not changed from the default value." >&2;
exit 1;
fi

if [[ ! -d "${target_dir}" ]]; then echo "Error: No directory was found at '${target_dir}'." >&2;
exit 2;
fi

if [[ ! -d "${target_dir}/${tag}" ]]; then
log_msg "Creating new directory at ${target_dir}/${tag}";
mkdir "${target_dir}/${tag}";
fi

log_msg "Launching btrfs in chroot mode";

time nice ionice -c Idle btrfs receive --chroot "${target_dir}/${tag}";

backups_count="$(echo -e "$(list_backups)" | wc -l)";

log_msg "Btrfs finished, we now have ${backups_count} backups:"; list_backups; while [[ "${backups_count}" -gt "${max_backups}" ]]; do oldest_backup="$(list_backups | sort | head -n1)";
log_msg "Maximum number backups is ${max_backups}, requesting removal of backup for$(dirname "${oldest_backup}")"; btrfs subvolume delete "${oldest_backup}";

backups_count="$(echo -e "$(list_backups)" | wc -l)";
done

log_msg "Done, any removed backups will be deleted in the background";
Art by Mythdael