Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Defending against DDoS attacks hammering my git server

It's no sekret that I have a git server. I host all sorts of stuff on there - from stuff I've talked about on this blog to many other things I have, and still others that are private repositories that I can't share / yet for one reason or another.

While I can't remember exactly when I first set it up, I do remember that gitea wasn't even a thing back then, and I originally setup go git service.

Nowadays, I run the fork of gitea called forgejo, which is a fork of go git service.

Either way, it's been around for a while!

Unfortunately, now that smaller git servers are becoming more common (we still need a social/federated git standard like e.g. ActivityPub), so are attacks against such servers², and one I dealt with yesterday was particularly nasty, so I decided to make a blog post about it.

I'll have CPU for breakfast, lunch, and tea thank you

Before I explain how I dealt with it (mitigated is the technical term I understand), it's important to know the anatomy of the attack. After all, security is important but we can only be secure if we know what we're defending against.

The threat model, if you will.

In this case, the attacker sent random requests to random files on random commits in a large git repository I have on my aforementioned git server.

Yesterday, I measured almost 1 million unique IP addresses making exactly 2 requests at a time each.

If I had the energy, I'd plot em all on a hilbert curve² with a colour gradient for age, maybe even with an animation.

The result of all of this is 100% CPU usage on my 3rd generation dedicated server I rent and a slow terminal experience, because to serve each request Forgejo has to call a git subprocess to inspect the repository and extract the version of the file requested.

That's a very expensive way to handle a HTTP/S request!

At first, I thought I was infected, but further inspection of the logs revealed it not to be so.

With all this in mind, the goal of my expedition was to avoid the spammy HTTP/S calls from hitting the application server (forgejo).

This is all interesting, because it means that a number of common steps to achieve this won't work:

  1. We can't just block the IP address, because there are too many and most of them will be compromised IoT (Internet of Terrible security) devices etc in peoples' homes that are roped into being a botnet.
  2. We can't keep the git server turned off, because I need to use it
  3. I can't block access to the problematic paths on the server, because then the attacker will switch to another set and access to the git server is still impaired
  4. I can't just allow specific IP addresses through, as I have blog post stuff hosted on there and you, one of my readers, would be cut off from accessing it (and I access from my phone sometimes which doesn't have a fixed IP)

...so that just leaves us stuck right?

Teh solutionses!

No so. There's still a strategy that we haven't tried: a Web Application Firewall. Traditionally, such tools are big and very very expensive, but I discovered the other week a tool that did the job and inside an envelope (a couple of megabytes) and price point (free!) I could afford.

That tool is Anubis, and despite the.... interesting name it acts like something of a firewall that sits in front of on the application server, but behind your reverse proxy:

 Public Internet ║ Inside server                                               
                 ║                                                             
                 ╟───────────────┐         ┌───────────────┐  ┌───────────────┐
                 ║     Caddy     │         │    Anubis     │  │    Forgejo    │
Inbound  ────────▶       •       ├─────────▶       •       ├──▶       •       │
requests   80/tcp║ Reverse proxy │         │   Firewall    │  │  App server   │
          443/tcp╟───────────────┘         └───────────────┘  └───────────────┘
                 ║                                localhost           localhost
                 ║                                 2999/tcp            3000/tcp

Essentially, when each request comes in it weighs the risk of a request. 'high-risk' requests, such as those coming from browsers which attackers love to impersonate, get served a small challenge that they must solve to gain access to the website. Low-risk clients, such as git or curl or elinks can go straight through.

This is in the form of a hashing problem: the browser must tell the server what nonce (number only used once) that, alongside a given unique challenge string, produces a hash with a certain number of zeroes (0) when hashed.

Correctly completing the challenge (which doesn't take very long), sets a cookie for that client to gain access to the website without completing another challenge for a certain period of time.

I could go on, but the official documentation explains it pretty well.

Essentially, by serving challenges to high-risk clients instead of allowing requests straight through attempts to access expensive HTTP/S calls (such as loading a random file from a random commit in a random git repo) a server's resources can be protected to give a better experience to the users who use it on a day-to-day basis.

This isn't without its flaws - namely inadvertently blocking good bots - but it does strike enough of a balance that I can keep my git server online without giving up the entirety of my server's resources in the process, which I need to use for other things.

But how?!

I'll assume you already have some sort of reverse proxy in front of some sort of application server. In my case, that's caddy and forgejo.

Anubis' latest release can be downloaded from here, but for Debian/Ubuntu users who want an apt repository I'm rehosting the .deb files from Anubis' releases page in my personal apt repository:

https://apt.starbeamrainbowlabs.com/

Assuming you have an e.g. Ubuntu server, you'll want to install anubis and then navigate to /etc/anubis, in which you should create a configuration file with the name of the user account you'll be starting anubis under.

Each instance of anubis can only handle 1 domain/app at a time, so you'll want 1 system user account per application you want to protect.

For example, I have a config file at /etc/anubis/anubis-git.env with the following content:

TARGET=http://[::1]:3000
BIND=:2999
METRICS_BIND=:2998

....my internal git server is listening on port 3000 on the IPv6 localhost address ::1 for HTTP requests, so that's the target that anubis should forward requests to, as in the ASCII diagram above (made in monosketch).

Then, start the new anubis instance like so:

sudo systemctl enable --now anubis@anubis-git.service

....in my case, the username I created (sudo useradd --system anubis-git etc etc) was anubis-git, so that's what goes in the filename above and after the @ sign when we start the service.

If you haven't seen this syntax before in systemd service names, it allows you to set the username that a supporting service file will start a service with. syncthing does the same thing with the default systemd service definition it provides.

In other words, it lets you start multiple instances of the same service without them clashing with each other.

At any rate, the final piece of the puzzle is telling your reverse proxy to talk to anubis:

git.starbeamrainbowlabs.com {
    log

    reverse_proxy http://[::1]:2999 {
        # ref anubis config setup both of these are required
        header_up X-Http-Version {http.request.proto}
        # ref anubis config, this is esp. required
        header_up X-Real-Ip {remote_host}
    }
}

Replace http://[::1]:2999 with the address of Anubis instead of your application server directly, then check the config and reload:

sudo caddy validate -c /etc/caddy/Caddyfile && sudo systemctl reload caddy

(replacing /etc/caddy/Caddyfile with the path to your Caddyfile of course)

Conclusion

....and you're done!

We've successfully put an application server behind anubis to protect it from malicious requests.

Over time, I assume I will need to tweak the anubis settings, which is possible through what seems to be a rather detailed policy file system (which allows RSS/Atom files through by default, if you're crazy enough to be subbed to any feeds from my git server).

If something seems broken to you now that I've set this up, please do get in touch and I'll try my best to help you out.

I'll be continuing to keep an eye on my web server traffic to see if anything gets through that shouldn't, and adjusting my response as necessary.

Thanks for sticking with me, and when I have the energy I have lots of other cool things to talk about here soon.

--Starbeamrainbowlabs

Aside: IP blocking with Caddy

While implementing the above approach, I found I did need to bring my git server up for my Continuous Integration system (I implemented it well before forgejo got workers and I haven't checked out the latter yet) to work.

To do this, I temporarily implemented an IP address-based allowlist.

If you're curious, here's the code for that:

# temp solution to block anyone who isn't in the allowlist outright
# note that given the sheer range of IPs from what's probably a compromised device-based botnet, we can't just IP block this long-term.
@denied not client_ip 1.2.3.4 5.6.7.8/24 127.0.0.1/8 ::1/128
abort @denied

....throw this in one of the server blocks in your Caddyfile before a reverse_proxy directive - changing the allowed IP addresses of course (leave the IPv4 & IPv6 ones!) - validate & reload, and you should have an instant IP address allowlist system in place!

Compiling the wacom driver from source to fix tilt & rotation support

I was all sat down and setup to do some digital drawing the other day, and then I finally snapped. My graphics tablet (a secondhand Wacom Intuos Pro S from Vinted) - which supports pen tilt - was not functioning correctly. Due to a bug that has yet to be patched, the tilt X/Y coordinates were being wrongly interpreted as unsigned integers (i.e. uint32) instead of signed integers (e.g. int32). This had the effect of causing the rotational calculation to jump around randomly, making it difficult when drawing.

So, given that someone had kindly posted a source patch, I set about compiling the driver from source. For some reason that is currently unclear to me, it is not being merged into the main wacom tablet driver repository. This leaves compiling from source with the patch the only option here that is currently available.

It worked! I was so ecstatic. I had tilt functionality for the first time!

Fast-forward to yesterday....... and it broke again, and I first noticed because I am left-handed and I have a script that flips the mapping of the pad around so I can use it the opposite way around.

I have since fixed it, but the entire process took me long enough to figure out that I realised that I was halfway there to writing a blog post as a comment on the aforementioned GitHub issue, so I decided to just go the rest of the way and write this up into a full blog post / tutorially kinda thing and do the drawing I wanted to do in the first place tomorrow.

In short, there are 2 parts to this:

  • input-wacom, the kernel driver
  • xf86-input-wacom, the X11 driver that talks to the kernel driver

....and they both have to be compiled separately, as I discovered yesterday.

Who is this for?

If you've got a Wacom Intuos tablet that supports pen tilt / rotation, then this blog post is for you.

Mine is a Wacom Intuos Pro S PTH-460.

This tutorial has been written on Ubuntu 24.04, but it should work for other systems too.

If there's the demand I might put together a package and put it in my apt repo, though naturally this will be limited to the versions of Ubuntu I personally use on my laptop - though do tend to upgrade through the 6-monthly updates.

I could also put together an AUR package, but currently on the devices I run Artix (Arch derivative) I don't usually have a tilt-supporting graphics tablet physically nearby when I'm using them and they run Wayland for unavoidable reasons.

Linux MY_DEVICE_NAME 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kompiling the kernel module

Navigate to a clean directory somewhere persistent, as you may need to get back to it later.

If you have the kernel driver installed, then uninstall it now.

On Ubuntu / apt-based systems, they bundle the kernel module and the X11 driver bit all in a single package..... hence the reason why we hafta do all the legwork of compiling and installing both the kernel module and the X11 driver from source :-/

e.g. on Ubuntu:

sudo apt remove xserver-xorg-input-wacom

Then, clone the git repo and checkout the right branch:

git clone https://github.com/jigpu/input-wacom.git -b fix-445
cd input-wacom;

....then, ref the official instructions install build-time dependencies if required:

sudo apt-get install build-essential autoconf linux-headers-$(uname -r)

...check if you have these installed already by replacing apt-get install with apt-cache policy.

Then, build and install all-in-one:

if test -x ./autogen.sh; then ./autogen.sh; else ./configure; fi && make && sudo make install || echo "Build Failed"

....this will prompt for a password to install directly into your system. I think they recommend to do it this way to simplify the build process for people.

This should complete our khecklist for the kernel module, but to activate it you'll need to reboot.

Don't bother doing that right now though on Ubuntu, since we have the X11 driver to go. For users on systems lucky enough to split the 2 drivers up, then you can just reboot here.

You can check (after rebooting!) if you've got the right input-wacom kernel module with this command:

grep "" /sys/module/wacom*/version

....my research suggests you need to have a wacom tablet plugged in for this to work.

If you get something like this:

$ grep "" /sys/module/wacom*/version
v2.00

....then you're still using your distribution-provided wacom kernel module. Go uninstall it!

The output you're looking for should look a bit like this:

$ grep "" /sys/module/wacom*/version
v2.00-1.2.0.37.g2c27caa

Compiling the X11 driver

Next up is xf86-input-wacom, the X11 side of things.

Instructions for this are partially sourced from https://github.com/linuxwacom/xf86-input-wacom/wiki/Building-The-Driver#building-with-autotools.

First, install dependencies:

sudo apt-get install autoconf pkg-config make xutils-dev libtool xserver-xorg-dev$(dpkg -S $(which Xorg) | grep -Eo -- "-hwe-[^:]*") libx11-dev libxi-dev libxrandr-dev libxinerama-dev libudev-dev

Then, clone the git repository and checkout the latest release:

git clone https://github.com/linuxwacom/xf86-input-wacom.git
cd "xf86-input-wacom";
git tag; # Pick the latest one from this list
git switch "$(git tag | tail -n1)"; # Basically git switch TAG_NAME

It should be at the bottom, or at least that's what I found. For me, that was xf86-input-wacom-1.2.3.

Then, to build and install the software from source, run these 2 commands one at a time:

set -- --prefix="/usr" --libdir="$(readlink -e $(ls -d /usr/lib*/xorg/modules/input/../../../ | head -n1))"
if test -x ./autogen.sh; then ./autogen.sh "$@"; else ./configure "$@"; fi && make && sudo make install || echo "Build Failed"

Now you should have the X11 side of things installed. In my case that includes xsetwacom, the (questionably designed) CLI for managing the properties of connected graphics tablets.

If that is not the case for you, you can extract it from the Ubuntu apt package:

apt download xserver-xorg-input-wacom
dpkg -x DEB_FILEPATH_HERE .
ar xv DEB_FILEPATH_HERE # or, if you don't have dpkg for some reason

....then, go locate the tool and put it somewhere in your PATH. I recommend somewhere towards the end in case you forget and fiddle with your setup some more later, so it gets overridden automatically. When I was fidddling around, that was /usr/local/games for me.

Making X11 like the kernel Driver

Or also known as enabling hotplug support. Or getting the kernel module and X11 to play nicely with each other.

This is required to make udev (the daemon that listens for devices to be plugged into the machine and then performs custom actions on them) tell the X server that you've plugged in your graphics tablet, or X11 to recognise that tablet devices are indeed tablet devices, or something else vaguely similar to that effect.

Thankfully, this just requires the installation of a single configuration file in a directory that may not exist for you yet - especially if you uninstalled your distro's wacom driver package.

Do it like this:

mkdir -p /etc/X11/xorg.conf.d/;
sudo curl -sSv https://raw.githubusercontent.com/linuxwacom/xf86-input-wacom/refs/heads/master/conf/70-wacom.conf -o /etc/X11/xorg.conf.d/70-wacom.conf

Just case they move things around as I've seen happen in far too many tutorials with broken links before, the direct link to the exact commit of this file I used is:

https://github.com/linuxwacom/xf86-input-wacom/blob/47552e13e714ab6b8c2dcbce0d7e0bca6d8a8bf0/conf/70-wacom.conf

Final steps

With all that done and out of the way, reboot. This serves 2 purposes:

  1. Reloading the correct kernel module
  2. Restarting the X11 server so it has the new driver.

Make sure to use the above instructions to check you are indeed running the right version of the input-wacom kernel module.

If all goes well, tilt/rotation support should now work in the painting program of your choice.

For me, that's Krita, the AppImage of which I bundle into my apt repository because I like the latest version:

https://apt.starbeamrainbowlabs.com/

The red text "Look! Negative TX/TY (TiltX / TiltY) numbers!" crudely overlaid using the Shutter screenshotting tool on top of a screenshot of the Krita tablet tester with a red arrow pointing at the TX/TY values highlighted in yellow.

Conclusion

Phew, I have no idea where this blog post has come from. Hopefully it is useful to someone else out there who also owns an tilt-supporting wacom tablet who is encountering a similar kinda issue.

Ref teaching and the previous post, preparing teaching content is starting to slwo down now thankfully. Ahead are the uncharted waters of assessment - it is unclear to me how much energy that will take to deal with.

Hopefully though there will be more PhD time (post on PhD corrections..... eventually) and free energy to spend on writing more blog posts for here! This one was enjoyable to write, if rather unexpected.

Has this helped you? Are you still stuck? Do report any issues to the authors of the above two packages I've shown in this post!

Comments below are also appreciated, both large and small.

Ubuntu 24.04 upgrade report

Heya! I just upgraded to from Ubuntu 23.10 to Ubuntu 24.04 today, so I thought I'd publish a quick blog post on my experience. There are a number of issues to watch out for on this one.

tldr: Do not upgrade a machine to which you do not have physical access to 24.04 until the first point-release comes out!

While the do-release-upgrade itself went relatively well, I encountered a number of problematic issues that significantly affected the stability of my system afterwards, which I describe below, along with the fixes and workarounds that I applied.

Illustration of a striped numbat, looking up at fireflies against a pink and purple gradient background with light rays coming from the top corners

(Above: One of the official wallpapers for Ubuntu 24.04 Noble Numbat entitled "Little numbat boy", drawn by azskalt in Krita)

apt sources

Of course, any do-release-upgrade you run is going to disable third-party sources. But this time there's a new mysterious format for apt sources that looks a bit like this:

Enabled: yes
Signed-By: /etc/apt/trusted.gpg.d/sbrl.asc
Types: deb
URIs: https://apt.starbeamrainbowlabs.com/
Suites: ./
Components: 

....pretty strange, right? As it turns out, Ubuntu 24.04 has decided to switch to this new "DEB822" apt sources format by default, though I believe the existing format that looks like this:

deb [signed-by=/etc/apt/trusted.gpg.d/sbrl.asc] https://apt.starbeamrainbowlabs.com/ ./ # apt.starbeamrainbowlabs.com

....should still work. Something else to note: the signed-by there is now required, and sources won't work without it.

For more information, see steeldriver's Ask Ubuntu Answer here:

Where is the documentation for the new apt sources format used in 24.04? - Ask Ubuntu

Boot failure: plymouth and the splash screen

Another issue I encountered was this bug:

boot - Kubuntu 24.04 Black Screen / Not Booting After Upgrade - Ask Ubuntu

...basically, there's a problem with the splash screen which crashes the system because it tries to load an image before the graphics drivers load. The solution here is to disable the splash option in the grub settings.

This can be done either before you reboot into 24.04, or if you have already rebooted into 24.04, in the grub menu you can simply hit e on the default Ubuntu entry in your grub menu and then remove the word splash from the boot line there.

If you are lucky enough to see this post before you reboot, then simply edit /etc/default/grub and change quiet splash under GRUB_CMDLINE_LINUX_DEFAULT to be an empty string:

GRUB_CMDLINE_LINUX_DEFAULT=""

...and then update grub like so:

sudo update-grub

Boot failure: unable to even reach grub

A strange one I encountered was an inability to even reach grub, even if I manually select the grub.efi as a boot target via my UEFI firmware settings (I'm on an entroware laptop so that's F2, but your key will vary).

This one kinda stumped me, so I found this page:

Boot-Repair - Community Help Wiki

...which suggests a boot repair tool. Essentially it reinstalls grub and fixes a number of other common issues, such as a missing nvram entry for grub (UEFI systems need bootloaders registering against them), missing packages - I suspect this was the issue this time - and other common issues.

It did claim that my nvram was locked, but it still seems to have resolved the issue anyway. I do recommend booting into the live Ubuntu session with the toram kernel parameter (press e in the grub menu → add kernel parameter → press ctrl + x) and them removing your flash drive before running this tool, just to avoid it getting confused and messing with the bootloader on your flash drive - thus rendering it unusable - by accident.

Essentially, boot into a live environment, connect to the Internet, and run then these commands:

sudo add-apt-repository ppa:yannubuntu/boot-repair && sudo apt update
sudo apt install -y boot-repair
boot-repair

sudo is not required for some strange reason.

indicator-keyboard-service memory leak

Finally, there is a significant memory leak in indicator-keyboard-service - which I assume provides the media/function key functionality, which I only noticed because I have a system resource monitor running in my system tray (indicator-multiload; multiload-ng is an alternative version that may work if you have issues with the former).

The workaround I implemented was to move the offending binary aside and install a stub script in its place:

cd /usr/libexec/indicator-keyboard
sudo mv indicator-keyboard-service indicator-keyboard-service.bak
sudo nano indicator-keyboard-service

In the text editor for the replacement for indicator-keyboard-service, paste the following content:

#!/usr/bin/env sh
exit 0

...save and exit. Then, chmod +x:

sudo chmod +x indicator-keyboard-service

....this should at least workaround the issue so that you can regain system stability.

I run the Unity desktop, but this will likely affect the GNOME desktop and others too. There's already a bug report on Launchpad here:

Bug #2055388 "suspected memory leak with indicator-keyboard (causing gnome-session-flashback to freeze after startup)" : Bugs : indicator-keyboard package : Ubuntu

...if this issue affects you, do make sure to go and click the green text at this top-ish of the page to say so. The more people that say it affects them, the higher it will be on the priority list to fix.

Conclusion

A number of significant issues currently plague the upgrade process to 24.04:

  • Memory leaks from indicator-keyboard-service
  • Multiple issues preventing systems from booting by default

...I recommend that upgrading to 24.04 is done cautiously at this time. If you do not have physical access to a given system or do not have the time/energy to fix issues that prevent your system from booting successfully, I strongly recommend waiting for the first or second point release (i.e. 24.04.1 / 24.04.2) before upgrading.

If you haven't already, I also strongly recommend configuring timeshift to take automated snapshots of your system so that you can easily roll back in case of a failure.

Finally, I also recommend upgrading via the command line with this command:

sudo do-release-upgrade

...and carefully monitoring the logs as the upgrade process is running. Then, do not reboot as it asks you to until you have checked and resolved all of the above issues.

That's all I have at the moment for upgrading Ubuntu. I have 3 other systems to upgrade from 22.04, but I'll be waiting for the first point release before attempting that. I'll make another post (or a comment on this one) to let everyone know how it went when I do begin the process of upgrading them.

If you've encountered any issues in the upgrade process to 24.04 (or have any further insight into the issues I describe here), please do leave a comment below!

A memory tester for the days of UEFI

For the longest time, memtest86+ was a standard for testing sticks of RAM that one suspects may be faulty. I haven't used it in a while, but when I do use it I find that an OS-independent tool (i.e. one that you boot into instead of your normal operating system) is the most reliable way to identify faults with RAM.

It may surprise you, but I've had this post mostly written up for about 2 years...! I remembered about this post recently, and decided to rework some of it and post it here.

Since UEFI was invented (Unified Extensible Firmware Interface) and replaced the traditional BIOS for booting systems around the world, booting memtest86+ suddenly became more challenging, as it is not currently compatible with UEFI. Now, it has been updated to support UEFI though, so I thought I'd write a blog post about it - mainly because there are very rarely guides on booting images like memtest86+ from a multiboot flash drive, like the one I have blogged about before.

Before we begin, worthy of note is memtest86. While it has a very similar name, it is a variant of memtest86+ that is not open source. I have tried it though, and it works well too - brief instructions can be found for it at the end of this blog post.

I will assume that you have already followed my previous guide on setting up a multiboot flash drive. You can find that guide here:

Multi-boot + data + multi-partition = octopus flash drive 2.0?

Alternatively, anywhere you can find a grub config file you can probably follow this guide. I have yet to find an actually decent reference for the grub configuration file language, but if you know of one, please do post it in the comments.

Memtest86+ (the open source one)

Personally, I recommend the open-source Memtest86+. Since the update to version 7.0, it is now compatible with both BIOS and UEFI-based systems without any additional configuration, which is nice. See the above link to one of my previous blog posts if you would like a flash drive that boots both BIOS and UEFI grub at the same time.

To start, visit the official website, and scroll down to the download section. From here, you want to download the "Binary Files (.bin/.efi) For PXE and chainloading" version. Unzip the file you download, and you should see the following files:

memtest32.bin
memtest32.efi
memtest64.bin
memtest64.efi

....discard the files with the .efi file extension - these are for booting directly instead of being chainloaded by grub. As the names suggest, the ones with 64 in the filename are the ones for 64-bit systems, which includes most systems today. Copy these to the device of your choice, and the open up your relevant grub.cfg (or equivalent grub configuration file - /etc/default/grub on an already-installed system) for editing. Then, somewhere in there add the following:

submenu "Memtest86+" {
    if loadfont unicode ; then
        set gfxmode=1024x768,800x600,auto
        set gfxpayload=800x600,1024x768
        terminal_output gfxterm
    fi

    insmod linux

    menuentry "[amd64] Start Memtest86+, use built-in support for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=both
    }
    menuentry "[amd64] Start Memtest86+, use BIOS legacy emulation for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=legacy
    }
    menuentry "[amd64] Start Memtest86+, disable SMP and memory identification" {
        linux /images/memtest86/memtest64.bin nosmp nosm nobench
    }
}

...replace /images/memtest86/memtest64.bin with the path to the memtest64.bin (or memtest32.bin) file, relative to your grub.cfg file. I forget where I took the above config file from, but I can't find it in my history.

If you are doing this on an installed OS instead of a USB flash drive, then things get a little more complicated. You will need to dig around and find what your version of grub considers paths to be relative to, and put your memtest64.bin file somewhere nearby. If you have experience with this, then please do leave a comment below.

This should be all you need. For those using a grub setup for an already-installed OS (e.g. via /etc/default/grub), then you will need to run a command for your changes to take effect:

sudo update-grub

Adding shutdown/reboot/reboot to bios setup/firmware options

Another thing I discovered recently is how to add options to my grub menu to reboot, shutdown, and reboot into firmware settings. rEFInd (an alternative bootloader to grub that I like very much, but I haven't yet explored for booting multiple ISOs on a flash drive) has these in its menus by default, but grub doesn't - so since I discovered how to do it recently I thought I'd include the config here for reference.

Simply add the following somewhere in your grub configuration file:

menuentry "Reboot" {
    reboot
}

menuentry "Shut Down" {
    halt
}

menuentry "UEFI Firmware / BIOS Settings" {
    fwsetup
}

Bonus: Memtest86 (non open-source)

I followed [https://www.yosoygames.com.ar/wp/2020/03/installing-memtest86-on-uefi-grub2-ubuntu/] this guide, but ended up changing a few things, so I'll outline the process here. Again, I'll assume you alreaady have a multiboot flash drive.

Firstly, download memtest86-usb.zip and extract the contents. Then, find the memtest86-usb.img file and find the offset of the partition that contains the actual EFI image that is the memtest86 program:


fdisk -lu memtest86-usb.img

Disk memtest86-usb.img: 500 MiB, 524288000 bytes, 1024000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 68264C0F-858A-49F0-B692-195B64BE4DD7

Device              Start     End Sectors  Size Type
memtest86-usb.img1   2048  512000  509953  249M Microsoft basic data
memtest86-usb.img2 514048 1023966  509919  249M EFI System

Then, take the start position of the second partition (the last line that is highlighted), and multiply it by 512, the sector size. In my case, the number is 263192576. Then, mount the partition into a directory you have already created:

sudo mount -o loop,offset=263192576 memtest86-usb.img /absolute/path/to/dir

Then, browse the contents of the mounted partition and copy the EFI/BOOT directory off to your flash drive, and rename it to memtest86 or something.

Now, update your grub.cfg and add the following:

menuentry "memtest86" {
    chainloader /images/memtest/BOOTX64.efi
}

....replacing /images/memtest/BOOTX64.efi with the path to the BOOTX64.efi file that should be directly in the BOOT directory you copied off.

Finally, you should be able to try it out! Boot into your multiboot flash drive as normal, and then select the memtest86 option from the grub menu.

Extra note: booting from hard drives

This post is really turning into a random grab-bag of items in my grub config file, isn't it? Anyway, An option I don't use all that often (but is very useful when I do need it), are options to boot from the different hard drives in a machine. Since you can't get grub to figure out how many there are in advance, you have to statically define them ahead of time:



submenu "Boot from Hard Drive" {
    menuentry "Hard Drive 0" {
        set root=(hd0)
        chainloader +1
    }
    menuentry "Hard Drive 1" {
        set root=(hd1)
        chainloader +1
    }
    menuentry "Hard Drive 2" {
        set root=(hd2)
        chainloader +1
    }
    menuentry "Hard Drive 3" {
        set root=(hd3)
        chainloader +1
    }
}

....chainloading (aka calling another bootloader) is a wonderful thing :P

Of course, expand this as much as you like. I believe this approach also works with specific partitions with the syntax (hd0,X), where X is the partition number starting from 0.

Again, add to your grub.cfg file and update as above.

Conclusion

This post is more chaotic and disorganised than I expected, but I thought it would be useful to document some of the tweaks I've made to my multiboot flash drive setup over the years - something that has more proven its worth many many times since I first set it up.

We've added a memory (RAM) tester to our setup, using the open-source Memtest86+, and the alternative non-open-source version. We've also added options to reboot, shutdown, and enter the bios/uefi firmware settings.

Finally, we took a quick look at adding options to boot from different hard drives and partitions. If anyone knows how to add a menu item that could allow one to distinguish between different hard disks, partitions, their sizes, and their content more easily, please do leave a comment below.

Sources and further reading

Encrypting and formatting a disk with LUKS + Btrfs

Hey there, a wild tutorial appeared! This is just a quick one for self-reference, but I hope it helps others too.

The problem at hand is that of formatting a data disk (if you want to format your root / disk please look elsewhere - it usually has to be done before or during installation unless you like fiddling around in a live environment) with Btrfs.... but also encrypting the disk, which isn't something that Btrfs natively supports.

I'm copying over some data to my new lab PC, and I've decided to up the security on the data disk I store my research data on.

Unfortunately, both GParted and KDE Partition Manager were unable to help me (the former not supporting LUKS, and the latter crashing with a strange error), so I ended up looking through more posts that should be reasonable to find a solution that didn't involve encrypting either / or /boot.

It's actually quite simple. First, find your disk's name via lsblk, and ensure you have created the partition in question. You can format it with anything (e.g. using the above) since we'll be overwriting it anyway.

Note: You may need to reboot after creating the partition (or after some of the below) if you encounter errors, as Linux sometimes doesn't like new partitions appearing out of the blue with names that were used previously on that boot very much.

Then, format it with LUKS, the most common encryption scheme on Linux:

sudo cryptsetup luksFormat /dev/nvmeXnYpZ

...then, formatting with Btrfs is a 2-step process. First we hafta unlock the LUKS encrypted partition:

sudo cryptsetup luksOpen /dev/nvme0n1p1 SOME_MAPPER_NAME

...this creates a virtual 'mapper' block device we can hit like any other normal (physical) partition. Change SOME_MAPPER_NAME to anything you like so long as it doesn't match anything else in lsblk/df -h and also doesn't contain spaces. Avoid unicode/special characters too, just to be safe.

Then, format it with Btrfs:

sudo mkfs.btrfs --metadata single --data single --label "SOME_LABEL" /dev/mapper/SOME_MAPPER_NAME

...replacing SOME_MAPPER_NAME (same value you chose earlier) and SOME_LABEL as appropriate. If you have multiple disks, rinse and repeat the above steps for them, and then bung them on the end:

sudo mkfs.btrfs --metadata raid1 --data raid1 --label "SOME_LABEL" /dev/mapper/MAPPER_NAME_A /dev/mapper/MAPPER_NAME_B ... /dev/mapper/MAPPER_NAME_N

Note the change from single to raid1. raid1 stores at least 2 copies on different disks - it's a bit of a misnomer as I've talked about before.

Now that you have a kewl Btrfs-formatted partition, mount it as normal:

sudo mount /dev/mapper/SOME_MAPPER_NAME /absolute/path/to/mount/point

For Btrfs filesystems with multiple disks, it shouldn't matter which source partition you pick here as Btrfs should pick up on the other disks.

Automation

Now that we have it formatted, we don't want to hafta keep typing all those commands again. The simple solution to this is to create a shell script and put it somewhere in our $PATH.

To do this, we should ensure we have a robust name for the disk instead of /dev/nvme, which could point to a different disk in future if your motherboard or kernel decides to present them in a different order for a giggle. That's easy by looking over the output of blkid and cross-referencing it with lsblk and/or df -h:

sudo lsblk
sudo df -h
sudo blkid # → UUID

The number you're after should be in the UUID="" field. The shell script I came up with is short and sweet:

#!/usr/bin/env bash
disk_id="ID_FROM_BLKID";
mapper_name="SOME_NAME";
mount_path="/absolute/path/to/mount/dir";

sudo cryptsetup luksOpen "/dev/disk/by-uuid/${disk_id}" "${mapper_name}";
sudo mount "/dev/mapper/${mapper_name}" "${mount_path}"

Fill in the values as appropriate:

  • disk_id: The UUID of the disk in question from blkid.
  • mapper_name: A name of your choosing that doesn't clash with anything else in /dev/mapper on your system
  • mount_path: The absolute path to the directory that you want to mount into - usually in /mnt or /media.

Put this script in e.g. $HOME/.local/bin or somewhere else in $PATH that suits you and your setup. Don't forget to run chmod +x path/to/script!

Conclusion

We've formatted an existing partition with LUKS and Btrfs, and written a quick-and-dirty shell script to semi-automate the process of mounting it here.

If this has been useful or if you have any suggestions, please do leave a comment below!

Sources and further reading

.desktop files: Launcher icons on Linux

Heya! Just thought I'd write a quick reminder post for myself on the topic of .desktop files. In most Linux distributions, launcher icons for things are dictated by files with the file extension .desktop.

Of course, most programs these days come with a .desktop file automatically, but if you for example download an AppImage, then you might not get an auto-generated one. You might also be packaging something for your distro's package manager (go you!) - something I do semi-regularly when apt repos for software I need isn't updated (see my apt repository!).

They can live either locally to your user account (~/.local/share/applications) or globally (/usr/share/applications), and they follow the XDG desktop entry specification (see also the Arch Linux docs page, which is fabulous as usual ✨). It's basically a fancy .ini file:

[Desktop Entry]
Encoding=UTF-8
Type=Application
Name=Krita
Comment=Krita: Professional painting and digital art creation program
Version=1.0
Terminal=false
Exec=/usr/local/bin/krita
Icon=/usr/share/icons/krita.png

Basically, leave the first line, the Type, the Version, the Terminal, and the Encoding directives alone, but the others you'll want to customise:

  • Name: The name of the application to be displayed in the launcher
  • Comment: The short 1-line description. Some distros display this as the tooltip on hover, others display it in other ways.
  • Exec: Path to the binary to execute. Prepend with env Foo=bar etc if you need to set some environment variables (e.g. running on a discrete card - 27K views? wow O.o)
  • Icon: Path to the icon to display as the launcher icon. For global .desktop files, this should be located somewhere in /usr/share/icon.

This is just the basics. There are many other directives you can include - like the Category directive, which describes - if your launcher supports categories - what categories a given launch icon should appear under.

Troubleshooting: It took me waaay too long to realise this, but if you have put your .desktop file in the right place and it isn't appearing - even after a relog - then the desktop-file-validate command could come in handy:

desktop-file-validate path/to/file.desktop

It validates the content of a given .desktop file. If it contains any errors, then it will complain about them for you - unlike your desktop environment which just ignores .desktop files that are invalid.

If you found this useful, please do leave a comment below about what you're creating launcher icons for!

Sources and further reading

NAS Series List

Somehow, despite posting about my NAS back in 2021 I have yet to post a proper series list post about it! I'm rectifying that now with this quick post.

I wrote this series of 4 posts back when I first built my new NAS box.

Here's the full list of posts in the main NAS series:

Additionally, as a bonus, I also later in 2021 I wrote a pair of posts back how I was backing up my NAS to a backup NAS. Here they are:

How (not) to recover a consul cluster

Hello again! I'm still getting used to a new part-time position at University which I'm not quite ready to talk about yet, but in the mean time please bear with me as I shuffle my schedule around.

As I've explained previously on here, I have a consul cluster (superglue service discovery!) that forms the backbone of my infrastructure at home. Recently, I had a small powercut that knocked everything offline, and as the recovery process was quite interesting I thought I'd blog about it here.

The issue at had happened at about 5pm, but I only discovered it was a problem until a few horus later when I got home. Essentially, a small powercut knocked everything offline. While my NAS rebooted automatically afterwards, my collection of Raspberry Pis weren't so lucky. I can only suspect that they were caught in some transient state or something. None of them responded when I pinged them, and later inspection of the logs on my collectd instance revealed that they were essentially non-functional until after they were rebooted manually.

A side effect of this was that my Consul (and, by extension, my Nomad cluster) cluster was knocked offline.

Anyway, at first I only rebooted the controller host (that has both a Consul and Nomad server running on it, but does not accept and run jobs). This rebooted just fine and came back online, so I then rebooted my monitoring box (that also runs continuous integration), which also came back online.

Due to the significantly awkward physical location I keep my cluster in with the rest of the Pis, I decided to flip the power switch on the extension to restart all my hosts at the same time.

While this worked..... it also caused my cluster controller node to reboot, which caused its raft epoch number to increment by 1... which broke the quorum (agreement) of my cluster, and required manual intervention to resolve.

Raft quorum

To understand the specific issue here, we need to look at the Raft consensus algorithm. Raft is, as the name suggests, a consensus algorithm. Such an algorithm is useful when you have a cluster of servers that need to work together in a redundant fault-tolerant fashion on some common task, such as in our case Consul (service discovery) and Nomad (task scheduling).

The purpose of a raft server is to maintain agreement amongst all nodes in a cluster as to the global state of an application. It does this using a distributed log that it replicates through a fancy but surprisingly simple algorithm.

At the core of this algorithm is the concept of a leader. The cluster leader is responsible for managing and committing updates to the global state, as well as sending out the global state to everyone else in the cluster. In the case of Consul, the Consul servers are the cluster (the clients simply connect back to whichever servers are available) - and I have 3 of them, since Raft requires an odd number of nodes.

When the cluster first starts up or the leader develops a fault (e.g. someone sets off a fork bomb on it just for giggles), an election occurs to decide on a new leader. The election term number (or epoch number) is incremented by one, and everyone votes on who the new leader should be. The node with the most votes becomes the new leader, and quorum (agreement) is achieved across the entire cluster.

Consul and Raft

In the case of Consul, everyone must cast a vote for the vote to be considered valid, otherwise the vote is considered invalid and the election process must begin again. Crucially, the election term number must also be the same across everyone voting.

In my case, because I started my cluster controller and then rebooted it before it had a chance to achieve quorum, it incremented it's election term number and additional time than the rest of the cluster did, which caused the cluster to fail to reach quorum as the other 2 nodes in the Consul server cluster consider the controller node's vote to be invalid, yet they still demanded that all servers vote to elect a new leader.

The practical effect of this was tha because the Consul cluster failed to agree on who the leader should be, the Nomad cluster (which hangs off the Consul cluster, using it to find each other) also failed to start and subsequently reach quorum, which knocked all my jobs offline.

The solution

Thankfully, the Hashicorp Consul documentation for this specific issue is fabulous:

https://developer.hashicorp.com/consul/tutorials/datacenter-operations/recovery-outage#failure-of-a-server-in-a-multi-server-cluster

To summarise:

  1. Boot the cluster as normal if it isn't booted already
  2. Stop the failed node
  3. Create a special config file (raft/peers.json) that will cause the failed node to drop it's state and accept the state of the incomplete cluster, allowing it to rejoin and the cluster gain collective quorum once more.

The documentation to perform this recovery protocol is quite clear. While there is an option to recover a failed node if you still have a working cluster with a leader, in my case I didn't so I had to use the alternate route.

Conclusion

I've talked briefly about an interesting issue that caused my Consul cluster to break quorum, which inadvertently brought my entire infrastructure down until resolved the issue.

While Consul is normally really quite resilient, you can break it if you aren't careful. Having an understanding of the underlying consensus algorithm Raft is very helpful to diagnosing and resolving issues, though the error messages and documentation I looked through were generally clear and helpful.

Considerations on monitoring infrastructure

I like Raspberry Pis. I like them so much that by my count I have at least 8 in operation at the time of typing performing various functions for me, including a cluster for running various services.

Having Raspberry Pis and running services on servers is great, but once you have some infrastructure setup hosting something you care about your thoughts naturally turn to mechanisms by which you can ensure that such infrastructure continues to run without incident, and if problems do occur they can be diagnosed and fixed efficiently.

Such is the thought that is always on my mind when managing my own infrastructure, sprawls across multiple physical locations. To this end, I thought I'd blog what my monitoring system looks like - what it's strengths are, and what it could do better.

A note before we begin: I continue to have a long-term commitment to posting on this blog - I have just started a part-time position alongside my PhD due to the end of my primary research period, which has been taking up a lot of my mental energy. Things should get slowly back to normal soon-ish.

Keep in mind as you read this that my situation may be different to your own. For example, monitoring a network primary consisting of Raspberry Pis demands a very different approach than an enterprise setup (if you're looking for a monitoring solution for a bunch of big powerful servers, I've heard the TICK stack is a good place to start).

Monitoring takes many forms and purposes. Broadly speaking, I split the monitoring I have on my infrastructure into the following categories:

  1. Logs (see my earlier post on Centralising logs with rsyslog)
  2. System resources (e.g. CPU/RAM/disk/etc usage) - I use collectd for this
  3. Service health - I use Consul for my cluster, and Uptime Robot for this website.
  4. Server health (e.g. whether a server is down or not, hanging due to a bad mount, etc.)

I've found that as there are multiple categories of things that need monitoring, there isn't a single one-size-fits-all solution to the problem, so different tools are needed to monitor different things.

Logs - centralised rsyslog

At the moment, monitoring logs is a solved problem for me. I've talked about my setup previously, in which I have a centralised rsyslog server which receives and stores all logs from my entire infrastructure (barring a few select boxes I need to enrol in this system). Storing logs nets me 2 things:

  1. The ability to reference them (e.g. with lnav) later in the event of an issue for diagnostic purposes
  2. The ability to inspect the logs during routine maintenance for any anomalies, issues, or errors that might become problematic later if left unattended

System information - collectd

Similarly, storing information about system resource usage - such as CPU load or disk usage for instance - is more useful than you'd think for spotting and pinpointing issues with one's infrastructure - be it a single server or an entire fleet. In my case, this also includes monitoring network latency (useful should my ISP encounter issues, as then I can identify if it's a me or a them problem) and HTTP response times.

For this, I use collectd, backed by rrd (round-robin database) files. These are fixed-size files that contain ring buffers that it iteratively writes over, allowing efficient storage of up to 1 year's worth of history.

To visualise this in the browser, I use Collectd Graph Panel, which is unfortunately pretty much abandonware (I haven't found anything better).

To start with the strengths of this system, it's very computationally efficient. I have tried previously to setup a TICK (Telegraf, InfluxDB, Chronograf, and Kapacitor) stack on a Raspberry Pi, but it was way too heavy - especially considering the Raspberry Pi my monitoring system runs on is also my continuous integration server. Collectd, on the other hand, runs quietly in the background, barely using any resources at all.

Another strength is that it's easy and simple. You throw a config file at it (which could be easily standardised across an entire fleet of servers), and collectd will dutifully send encrypted system metrics to a given destination for you with minimal fuss. Meanwhile, the browser-based dashboard I use automatically plots graphs and displays them for you without any tedious creation of a custom dashboard.

Having a system monitor things is good, but having it notify you in the event of an anomaly is even better. While collectd does have the ability to generate and send notifications, its capacity to do this is unfortunately rather limited.

Another limitation of collectd is that accessing and processing the stored system metrics data is not a trivial process, since it's stored in rrd databases, the parsing of which is surprisingly difficult due to a lack of readily available libraries to do this. This makes it difficult to integrate it with other systems, such as n8n for example, which I have recently setup to replace some functions of IFTTT to automatically repost my blog posts here to Reddit and Discord.

Collectd can write to multiple sources however (e.g. MQTT), so I might look into this as an option to connect it to some other program to deliver more flexible notifications about issues.

Service health

Service health is what most people might think of when I initially said that this blog post would be about monitoring. In many ways, it's one of the most important things to monitor - especially if other people rely on infrastructure which is managed by you.

Currently, I achieve this in 2 ways. Firstly, for services running on the server that hosts this website I have a free Uptime Robot account which monitors my server and website. It costs me nothing, and I get monitoring of my server from a completely separate location. In the event my server or the services thereon that it monitors are down, I will get an email telling me as such - and another email once it goes back up again.

Secondly, for services running on my cluster I use Consul's inbuilt service monitoring functionality, though I don't yet have automated emails to notify me of failures (something I need to investigate a solution for).

The monitoring system you choose here depends on your situation, but I strongly recommend having at least some form of external monitoring of whether your target boxes go down that can notify you of this. If your monitoring is hosted on the box that goes down, it's not really of much use...!

Monitoring service health more robustly and notifying myself about issues is currently on my todo list.

Server health

Server health ties into service health, and perhaps also system information too. Knowing which servers are up and which ones are down is important - not least because of the services running thereon.

The tricky part of this is that if a server goes down, it could be because of any one of a number of issues - ranging from a simple software/hardware failure, all the way up to an entire-building failure (e.g. a powercut) or a natural disaster. With this in mind, it's important to plan your monitoring carefully such that you still get notified in the event of a failure.

Conclusion

In this post, I've talked a bit about my monitoring infrastructure, and things to consider more generally when planning monitoring for new or existing infrastructure.

It's never too late to iteratively improve your infrastructure monitoring system - whether it be enrolling that box in the corner that never got added to the system, or implementing a totally kind of monitoring - e.g. centralised logging, or in my case I need to work on more notifications for when things go wrong.

On a related note, what do your backups look like right now? Are they automated? Do they cover all your important data? Could you restore them quickly and efficiently?

If you've found this interesting, please leave a comment below!

NSD, Part 2: Dynamic DNS

Hey there! In the last post, I showed you how to setup nsd, the Name Server Daemon, an authoritative DNS server to serve records for a given domain. In this post, I'm going to talk through how to extend that configuration to support Dynamic DNS.

Normally, if you query, say, the A or AAAA records for a domain or subdomain like git.starbeamrainbowlabs.com, it will return the same IP address that you manually set in the DNS zone file, or if you use some online service then the value you manually set there. This is fine if your IP address does not change, but becomes problematic if your IP address may change unpredictably.

The solution, as you might have guessed, lies in dynamic DNS. Dynamic DNS is a fancy word for some kind of system where the host system that a DNS record points to (e.g. compute.bobsrockets.com) informs the DNS server about changes to its IP address.

This is done by making a network request from the host system to some kind of API that automatically updates the DNS server - usually over HTTP (though anything else could work too, but please make sure it's encrypted!).

You may already be familiar with using a HTTP API to inform your cloud-based registrar (e.g. Cloudflare, Gandi, etc) of IP address changes, but in this post we're going to set dynamic DNS up with the nsd server we configured in the previous post mentioned above.

The first order of business is to find some software to do this. You could also write a thing yourself (see also setting up a systemd service). There are several choices, but I went with dyndnsd (I may update this post if I ever write my own daemon for this).

Next, you need to determine what subdomain you'll use for dynamic dns. Since DNS is hierarchical, an entire subdomain is required - you can't just do dynamic DNS for, say, wiki.bobsrockets.com - since dyndnsd will manage it's own DNS zone file, all dynamic DNS hostnames will be under that subdomain - e.g. wiki.dyn.bobsrockets.com.

Configuring the server

For the server, I will be assuming that the dynamic dns daemon will be running on the same server as the nsd daemon.

For this tutorial, we'll be setting it up unencrypted. This is a security risk if you are setting it up to accept requests over the Internet rather than a local trusted network! Notes on how to fix this at the end of this post.

Since this is a Ruby-based program (which I do generally recommend avoiding since Ruby is generally an inefficient language to write a program in I've observed), first we need to install gem, the Ruby package manager:

sudo apt install ruby ruby-rubygems ruby-dev

Then, we can install the gem Ruby package manager:

sudo gem install dyndnsd

Now, we need to configure it. dyndnsd is configured using a YAML (ew) configuration file. It's probably best to show an example configuration file and explain it afterwards:

# listen address and port
host: "0.0.0.0"
port: 5354
# The internal database file. We'll create this in a moment.
db: "/var/lib/dyndnsd/db.json"
# enable debug mode?
debug: false
# all hostnames are required to be cool-name.dyn.bobsrockets.com
domain: "dyn.bobsrockets.com"
# configure the updater, here we use command_with_bind_zone, params are updater-specific
updater:
  name: "command_with_bind_zone"
  params:
    zone_file: "/etc/dyndnsd/zones/dyn.bobsrockets.com.zone"
    command: "systemctl reload nsd"
    ttl: "5m"
    dns: "bobsrockets.com."
    email_addr: "bob.bobsrockets.com"
# Users with the hostnames they are allowed to create/update
users:
  computeuser: # <--- Username
    password: "alongandrandomstring"
    hosts:
      - compute1.dyn.bobsrockets.com
  computeuser2:
    password: "anotherlongandrandomstring"
    hosts:
      - compute2.dyn.bobsrockets.com
      - compute3.dyn.bobsrockets.com

...several things to note here that I haven't already noted in comments.

  • zone_file: "/etc/nsd/zones/dyn.bobsrockets.com.zone": This is the path to the zone file dyndnsd should update.
  • dns: "bobsrockets.com.": This is the fully-qualified hostname with a dot at the end of the DNS server that will be serving the DNS records (i.e. the nsd server).
  • email_addr: "bob.bobsrockets.com": This sets the email address of the administrator of the system, but the @ at sign is replaced with a dot .. If your email address contains a dot . in the user (e.g. bob.rockets@example.com), then it won't work as expected here.

Also important here is that although when dealing with domains like this it is less confusing to always require a dot . at the end of fully qualified domain names, this is not always the case here.

Once you've written the config file,, create the directory /etc/dyndnsd and write it to /etc/dyndnsd/dyndnsd.yaml.

With the config file written, we now need to create and assign permissions to the data directory it will be using. Do that like so:

sudo useradd --no-create-home --system --home /var/lib/dyndnsd dyndnsd
sudo mkdir /var/lib/dyndnsd
sudo chown dyndnsd:dyndnsd /var/lib/dyndnsd

Also, we need to create the zone file and assign the correct permissions so that it can write to it:

sudo mkdir /etc/dyndnsd/zones
sudo chown dyndnsd:dyndnsd /etc/dyndnsd/zones
# symlink the zone file into the nsd zones directory. This way dyndns isn't allowed to write to all of /etc/nsd/zones - just the 1 zone file it is supposed to update.
sudo ln -s /etc/dyndnsd/zones/dyn.bobsrockets.com.zone /etc/nsd/zones/dyn.bobsrockets.com.zone

Now, we can write a systemd service file to run dyndnsd for us:

[Unit]
Description=dyndnsd: Dynamic DNS record updater
Documentation=https://github.com/cmur2/dyndnsd

[Service]
User=dyndnsd
Group=dyndnsd
ExecStart=/usr/local/bin/dyndnsd /etc/dyndnsd/dyndnsd.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=dyndnsd

[Install]
WantedBy=multi-user.target

Save this to /etc/systemd/system/dyndnsd.service. Then, start the daemon like so:

sudo systemctl daemon-reload
sudo systemctl enable --now dyndnsd.service

Finally, don't forget to update your firewall to allow requests through to dyndnsd. For UFW, do this:

sudo ufw allow 5354/tcp comment dyndnsd

That completes the configuration of dyndnsd on the server. Now we just need to update the nsd config file to tell it about the new zone.

nsd's config file should be at /etc/nsd/nsd.conf. Open it for editing, and add the following to the bottom:

zone:
    name: dyn.bobsrockets.com
    zonefile: dyn.bobsrockets.com.zone

...and you're done on the server!

Configuring the client(s)

For the clients, all that needs doing is configuring them to make regular requests to the dyndnsd server to keep it appraised of their IP addresses. This is done by making a HTTP request, so we can test it with curl like this:

curl http://computeuser:alongandrandomstring@bobsrockets.com:5354/nic/update?hostname=compute1.dyn.bobsrockets.com

...where computeuser is the username, alongandrandomstring is the password, and compute1.dyn.bobsrockets.com is the hostname it should update.

The server will be able to tell what the IP address is it should set for the subdomain compute1.dyn.bobsrockets.com by the IP address of the client making the request.

The simplest way of automating this is using cron. Add the following cronjob (sudo crontab -e to edit the crontab):

*/5 * * * *     curl -sS http://computeuser:alongandrandomstring@bobsrockets.com:5354/nic/update?hostname=compute1.dyn.bobsrockets.com

....and that's it! It really is that simple. Windows users will need to setup a scheduled task instead and install curl, but that's outside the scope of this post.

Conclusion

In this post, I've given a whistle-stop tour of setting up a simple dynamic dns server. This can be useful if a host as a dynamic IP address on a local network but it still needs a (sub)domain for some reason.

Note that this is not suitable for untrusted networks! For example, setting dyndnsd to accept requests over the Internet is a Bad Idea, as this simple setup is not encrypted.

If you do want to set this up over an untrusted network, you must encrypt the connection to avoid nasty DNS poisoning attacks. Assuming you already have a working reverse proxy setup on the same machine (e.g. Nginx), you'll need to add a new virtual host (a server { } block in Nginx) that reverse-proxies to your dyndnsd daemon and sets the X-Real-IP HTTP header, and then ensure port 5354 is closed on your firewall to prevent direct access.

This is beyond this scope of this post and slightly different depending on your setup, but if there's the demand I can blog about how to do this.

Sources and further reading

Art by Mythdael