Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

NAS Series List

Somehow, despite posting about my NAS back in 2021 I have yet to post a proper series list post about it! I'm rectifying that now with this quick post.

I wrote this series of 4 posts back when I first built my new NAS box.

Here's the full list of posts in the main NAS series:

Additionally, as a bonus, I also later in 2021 I wrote a pair of posts back how I was backing up my NAS to a backup NAS. Here they are:

How (not) to recover a consul cluster

Hello again! I'm still getting used to a new part-time position at University which I'm not quite ready to talk about yet, but in the mean time please bear with me as I shuffle my schedule around.

As I've explained previously on here, I have a consul cluster (superglue service discovery!) that forms the backbone of my infrastructure at home. Recently, I had a small powercut that knocked everything offline, and as the recovery process was quite interesting I thought I'd blog about it here.

The issue at had happened at about 5pm, but I only discovered it was a problem until a few horus later when I got home. Essentially, a small powercut knocked everything offline. While my NAS rebooted automatically afterwards, my collection of Raspberry Pis weren't so lucky. I can only suspect that they were caught in some transient state or something. None of them responded when I pinged them, and later inspection of the logs on my collectd instance revealed that they were essentially non-functional until after they were rebooted manually.

A side effect of this was that my Consul (and, by extension, my Nomad cluster) cluster was knocked offline.

Anyway, at first I only rebooted the controller host (that has both a Consul and Nomad server running on it, but does not accept and run jobs). This rebooted just fine and came back online, so I then rebooted my monitoring box (that also runs continuous integration), which also came back online.

Due to the significantly awkward physical location I keep my cluster in with the rest of the Pis, I decided to flip the power switch on the extension to restart all my hosts at the same time.

While this worked..... it also caused my cluster controller node to reboot, which caused its raft epoch number to increment by 1... which broke the quorum (agreement) of my cluster, and required manual intervention to resolve.

Raft quorum

To understand the specific issue here, we need to look at the Raft consensus algorithm. Raft is, as the name suggests, a consensus algorithm. Such an algorithm is useful when you have a cluster of servers that need to work together in a redundant fault-tolerant fashion on some common task, such as in our case Consul (service discovery) and Nomad (task scheduling).

The purpose of a raft server is to maintain agreement amongst all nodes in a cluster as to the global state of an application. It does this using a distributed log that it replicates through a fancy but surprisingly simple algorithm.

At the core of this algorithm is the concept of a leader. The cluster leader is responsible for managing and committing updates to the global state, as well as sending out the global state to everyone else in the cluster. In the case of Consul, the Consul servers are the cluster (the clients simply connect back to whichever servers are available) - and I have 3 of them, since Raft requires an odd number of nodes.

When the cluster first starts up or the leader develops a fault (e.g. someone sets off a fork bomb on it just for giggles), an election occurs to decide on a new leader. The election term number (or epoch number) is incremented by one, and everyone votes on who the new leader should be. The node with the most votes becomes the new leader, and quorum (agreement) is achieved across the entire cluster.

Consul and Raft

In the case of Consul, everyone must cast a vote for the vote to be considered valid, otherwise the vote is considered invalid and the election process must begin again. Crucially, the election term number must also be the same across everyone voting.

In my case, because I started my cluster controller and then rebooted it before it had a chance to achieve quorum, it incremented it's election term number and additional time than the rest of the cluster did, which caused the cluster to fail to reach quorum as the other 2 nodes in the Consul server cluster consider the controller node's vote to be invalid, yet they still demanded that all servers vote to elect a new leader.

The practical effect of this was tha because the Consul cluster failed to agree on who the leader should be, the Nomad cluster (which hangs off the Consul cluster, using it to find each other) also failed to start and subsequently reach quorum, which knocked all my jobs offline.

The solution

Thankfully, the Hashicorp Consul documentation for this specific issue is fabulous:

https://developer.hashicorp.com/consul/tutorials/datacenter-operations/recovery-outage#failure-of-a-server-in-a-multi-server-cluster

To summarise:

  1. Boot the cluster as normal if it isn't booted already
  2. Stop the failed node
  3. Create a special config file (raft/peers.json) that will cause the failed node to drop it's state and accept the state of the incomplete cluster, allowing it to rejoin and the cluster gain collective quorum once more.

The documentation to perform this recovery protocol is quite clear. While there is an option to recover a failed node if you still have a working cluster with a leader, in my case I didn't so I had to use the alternate route.

Conclusion

I've talked briefly about an interesting issue that caused my Consul cluster to break quorum, which inadvertently brought my entire infrastructure down until resolved the issue.

While Consul is normally really quite resilient, you can break it if you aren't careful. Having an understanding of the underlying consensus algorithm Raft is very helpful to diagnosing and resolving issues, though the error messages and documentation I looked through were generally clear and helpful.

Considerations on monitoring infrastructure

I like Raspberry Pis. I like them so much that by my count I have at least 8 in operation at the time of typing performing various functions for me, including a cluster for running various services.

Having Raspberry Pis and running services on servers is great, but once you have some infrastructure setup hosting something you care about your thoughts naturally turn to mechanisms by which you can ensure that such infrastructure continues to run without incident, and if problems do occur they can be diagnosed and fixed efficiently.

Such is the thought that is always on my mind when managing my own infrastructure, sprawls across multiple physical locations. To this end, I thought I'd blog what my monitoring system looks like - what it's strengths are, and what it could do better.

A note before we begin: I continue to have a long-term commitment to posting on this blog - I have just started a part-time position alongside my PhD due to the end of my primary research period, which has been taking up a lot of my mental energy. Things should get slowly back to normal soon-ish.

Keep in mind as you read this that my situation may be different to your own. For example, monitoring a network primary consisting of Raspberry Pis demands a very different approach than an enterprise setup (if you're looking for a monitoring solution for a bunch of big powerful servers, I've heard the TICK stack is a good place to start).

Monitoring takes many forms and purposes. Broadly speaking, I split the monitoring I have on my infrastructure into the following categories:

  1. Logs (see my earlier post on Centralising logs with rsyslog)
  2. System resources (e.g. CPU/RAM/disk/etc usage) - I use collectd for this
  3. Service health - I use Consul for my cluster, and Uptime Robot for this website.
  4. Server health (e.g. whether a server is down or not, hanging due to a bad mount, etc.)

I've found that as there are multiple categories of things that need monitoring, there isn't a single one-size-fits-all solution to the problem, so different tools are needed to monitor different things.

Logs - centralised rsyslog

At the moment, monitoring logs is a solved problem for me. I've talked about my setup previously, in which I have a centralised rsyslog server which receives and stores all logs from my entire infrastructure (barring a few select boxes I need to enrol in this system). Storing logs nets me 2 things:

  1. The ability to reference them (e.g. with lnav) later in the event of an issue for diagnostic purposes
  2. The ability to inspect the logs during routine maintenance for any anomalies, issues, or errors that might become problematic later if left unattended

System information - collectd

Similarly, storing information about system resource usage - such as CPU load or disk usage for instance - is more useful than you'd think for spotting and pinpointing issues with one's infrastructure - be it a single server or an entire fleet. In my case, this also includes monitoring network latency (useful should my ISP encounter issues, as then I can identify if it's a me or a them problem) and HTTP response times.

For this, I use collectd, backed by rrd (round-robin database) files. These are fixed-size files that contain ring buffers that it iteratively writes over, allowing efficient storage of up to 1 year's worth of history.

To visualise this in the browser, I use Collectd Graph Panel, which is unfortunately pretty much abandonware (I haven't found anything better).

To start with the strengths of this system, it's very computationally efficient. I have tried previously to setup a TICK (Telegraf, InfluxDB, Chronograf, and Kapacitor) stack on a Raspberry Pi, but it was way too heavy - especially considering the Raspberry Pi my monitoring system runs on is also my continuous integration server. Collectd, on the other hand, runs quietly in the background, barely using any resources at all.

Another strength is that it's easy and simple. You throw a config file at it (which could be easily standardised across an entire fleet of servers), and collectd will dutifully send encrypted system metrics to a given destination for you with minimal fuss. Meanwhile, the browser-based dashboard I use automatically plots graphs and displays them for you without any tedious creation of a custom dashboard.

Having a system monitor things is good, but having it notify you in the event of an anomaly is even better. While collectd does have the ability to generate and send notifications, its capacity to do this is unfortunately rather limited.

Another limitation of collectd is that accessing and processing the stored system metrics data is not a trivial process, since it's stored in rrd databases, the parsing of which is surprisingly difficult due to a lack of readily available libraries to do this. This makes it difficult to integrate it with other systems, such as n8n for example, which I have recently setup to replace some functions of IFTTT to automatically repost my blog posts here to Reddit and Discord.

Collectd can write to multiple sources however (e.g. MQTT), so I might look into this as an option to connect it to some other program to deliver more flexible notifications about issues.

Service health

Service health is what most people might think of when I initially said that this blog post would be about monitoring. In many ways, it's one of the most important things to monitor - especially if other people rely on infrastructure which is managed by you.

Currently, I achieve this in 2 ways. Firstly, for services running on the server that hosts this website I have a free Uptime Robot account which monitors my server and website. It costs me nothing, and I get monitoring of my server from a completely separate location. In the event my server or the services thereon that it monitors are down, I will get an email telling me as such - and another email once it goes back up again.

Secondly, for services running on my cluster I use Consul's inbuilt service monitoring functionality, though I don't yet have automated emails to notify me of failures (something I need to investigate a solution for).

The monitoring system you choose here depends on your situation, but I strongly recommend having at least some form of external monitoring of whether your target boxes go down that can notify you of this. If your monitoring is hosted on the box that goes down, it's not really of much use...!

Monitoring service health more robustly and notifying myself about issues is currently on my todo list.

Server health

Server health ties into service health, and perhaps also system information too. Knowing which servers are up and which ones are down is important - not least because of the services running thereon.

The tricky part of this is that if a server goes down, it could be because of any one of a number of issues - ranging from a simple software/hardware failure, all the way up to an entire-building failure (e.g. a powercut) or a natural disaster. With this in mind, it's important to plan your monitoring carefully such that you still get notified in the event of a failure.

Conclusion

In this post, I've talked a bit about my monitoring infrastructure, and things to consider more generally when planning monitoring for new or existing infrastructure.

It's never too late to iteratively improve your infrastructure monitoring system - whether it be enrolling that box in the corner that never got added to the system, or implementing a totally kind of monitoring - e.g. centralised logging, or in my case I need to work on more notifications for when things go wrong.

On a related note, what do your backups look like right now? Are they automated? Do they cover all your important data? Could you restore them quickly and efficiently?

If you've found this interesting, please leave a comment below!

NSD, Part 2: Dynamic DNS

Hey there! In the last post, I showed you how to setup nsd, the Name Server Daemon, an authoritative DNS server to serve records for a given domain. In this post, I'm going to talk through how to extend that configuration to support Dynamic DNS.

Normally, if you query, say, the A or AAAA records for a domain or subdomain like git.starbeamrainbowlabs.com, it will return the same IP address that you manually set in the DNS zone file, or if you use some online service then the value you manually set there. This is fine if your IP address does not change, but becomes problematic if your IP address may change unpredictably.

The solution, as you might have guessed, lies in dynamic DNS. Dynamic DNS is a fancy word for some kind of system where the host system that a DNS record points to (e.g. compute.bobsrockets.com) informs the DNS server about changes to its IP address.

This is done by making a network request from the host system to some kind of API that automatically updates the DNS server - usually over HTTP (though anything else could work too, but please make sure it's encrypted!).

You may already be familiar with using a HTTP API to inform your cloud-based registrar (e.g. Cloudflare, Gandi, etc) of IP address changes, but in this post we're going to set dynamic DNS up with the nsd server we configured in the previous post mentioned above.

The first order of business is to find some software to do this. You could also write a thing yourself (see also setting up a systemd service). There are several choices, but I went with dyndnsd (I may update this post if I ever write my own daemon for this).

Next, you need to determine what subdomain you'll use for dynamic dns. Since DNS is hierarchical, an entire subdomain is required - you can't just do dynamic DNS for, say, wiki.bobsrockets.com - since dyndnsd will manage it's own DNS zone file, all dynamic DNS hostnames will be under that subdomain - e.g. wiki.dyn.bobsrockets.com.

Configuring the server

For the server, I will be assuming that the dynamic dns daemon will be running on the same server as the nsd daemon.

For this tutorial, we'll be setting it up unencrypted. This is a security risk if you are setting it up to accept requests over the Internet rather than a local trusted network! Notes on how to fix this at the end of this post.

Since this is a Ruby-based program (which I do generally recommend avoiding since Ruby is generally an inefficient language to write a program in I've observed), first we need to install gem, the Ruby package manager:

sudo apt install ruby ruby-rubygems ruby-dev

Then, we can install the gem Ruby package manager:

sudo gem install dyndnsd

Now, we need to configure it. dyndnsd is configured using a YAML (ew) configuration file. It's probably best to show an example configuration file and explain it afterwards:

# listen address and port
host: "0.0.0.0"
port: 5354
# The internal database file. We'll create this in a moment.
db: "/var/lib/dyndnsd/db.json"
# enable debug mode?
debug: false
# all hostnames are required to be cool-name.dyn.bobsrockets.com
domain: "dyn.bobsrockets.com"
# configure the updater, here we use command_with_bind_zone, params are updater-specific
updater:
  name: "command_with_bind_zone"
  params:
    zone_file: "/etc/dyndnsd/zones/dyn.bobsrockets.com.zone"
    command: "systemctl reload nsd"
    ttl: "5m"
    dns: "bobsrockets.com."
    email_addr: "bob.bobsrockets.com"
# Users with the hostnames they are allowed to create/update
users:
  computeuser: # <--- Username
    password: "alongandrandomstring"
    hosts:
      - compute1.dyn.bobsrockets.com
  computeuser2:
    password: "anotherlongandrandomstring"
    hosts:
      - compute2.dyn.bobsrockets.com
      - compute3.dyn.bobsrockets.com

...several things to note here that I haven't already noted in comments.

  • zone_file: "/etc/nsd/zones/dyn.bobsrockets.com.zone": This is the path to the zone file dyndnsd should update.
  • dns: "bobsrockets.com.": This is the fully-qualified hostname with a dot at the end of the DNS server that will be serving the DNS records (i.e. the nsd server).
  • email_addr: "bob.bobsrockets.com": This sets the email address of the administrator of the system, but the @ at sign is replaced with a dot .. If your email address contains a dot . in the user (e.g. bob.rockets@example.com), then it won't work as expected here.

Also important here is that although when dealing with domains like this it is less confusing to always require a dot . at the end of fully qualified domain names, this is not always the case here.

Once you've written the config file,, create the directory /etc/dyndnsd and write it to /etc/dyndnsd/dyndnsd.yaml.

With the config file written, we now need to create and assign permissions to the data directory it will be using. Do that like so:

sudo useradd --no-create-home --system --home /var/lib/dyndnsd dyndnsd
sudo mkdir /var/lib/dyndnsd
sudo chown dyndnsd:dyndnsd /var/lib/dyndnsd

Also, we need to create the zone file and assign the correct permissions so that it can write to it:

sudo mkdir /etc/dyndnsd/zones
sudo chown dyndnsd:dyndnsd /etc/dyndnsd/zones
# symlink the zone file into the nsd zones directory. This way dyndns isn't allowed to write to all of /etc/nsd/zones - just the 1 zone file it is supposed to update.
sudo ln -s /etc/dyndnsd/zones/dyn.bobsrockets.com.zone /etc/nsd/zones/dyn.bobsrockets.com.zone

Now, we can write a systemd service file to run dyndnsd for us:

[Unit]
Description=dyndnsd: Dynamic DNS record updater
Documentation=https://github.com/cmur2/dyndnsd

[Service]
User=dyndnsd
Group=dyndnsd
ExecStart=/usr/local/bin/dyndnsd /etc/dyndnsd/dyndnsd.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=dyndnsd

[Install]
WantedBy=multi-user.target

Save this to /etc/systemd/system/dyndnsd.service. Then, start the daemon like so:

sudo systemctl daemon-reload
sudo systemctl enable --now dyndnsd.service

Finally, don't forget to update your firewall to allow requests through to dyndnsd. For UFW, do this:

sudo ufw allow 5354/tcp comment dyndnsd

That completes the configuration of dyndnsd on the server. Now we just need to update the nsd config file to tell it about the new zone.

nsd's config file should be at /etc/nsd/nsd.conf. Open it for editing, and add the following to the bottom:

zone:
    name: dyn.bobsrockets.com
    zonefile: dyn.bobsrockets.com.zone

...and you're done on the server!

Configuring the client(s)

For the clients, all that needs doing is configuring them to make regular requests to the dyndnsd server to keep it appraised of their IP addresses. This is done by making a HTTP request, so we can test it with curl like this:

curl http://computeuser:alongandrandomstring@bobsrockets.com:5354/nic/update?hostname=compute1.dyn.bobsrockets.com

...where computeuser is the username, alongandrandomstring is the password, and compute1.dyn.bobsrockets.com is the hostname it should update.

The server will be able to tell what the IP address is it should set for the subdomain compute1.dyn.bobsrockets.com by the IP address of the client making the request.

The simplest way of automating this is using cron. Add the following cronjob (sudo crontab -e to edit the crontab):

*/5 * * * *     curl -sS http://computeuser:alongandrandomstring@bobsrockets.com:5354/nic/update?hostname=compute1.dyn.bobsrockets.com

....and that's it! It really is that simple. Windows users will need to setup a scheduled task instead and install curl, but that's outside the scope of this post.

Conclusion

In this post, I've given a whistle-stop tour of setting up a simple dynamic dns server. This can be useful if a host as a dynamic IP address on a local network but it still needs a (sub)domain for some reason.

Note that this is not suitable for untrusted networks! For example, setting dyndnsd to accept requests over the Internet is a Bad Idea, as this simple setup is not encrypted.

If you do want to set this up over an untrusted network, you must encrypt the connection to avoid nasty DNS poisoning attacks. Assuming you already have a working reverse proxy setup on the same machine (e.g. Nginx), you'll need to add a new virtual host (a server { } block in Nginx) that reverse-proxies to your dyndnsd daemon and sets the X-Real-IP HTTP header, and then ensure port 5354 is closed on your firewall to prevent direct access.

This is beyond this scope of this post and slightly different depending on your setup, but if there's the demand I can blog about how to do this.

Sources and further reading

The NSD Authoritative DNS Server: What, why, and how

In a previous blog post, I explained how to setup unbound, a recursive resolving DNS server. I demonstrated how to setup a simple split-horizon DNS setup, and forward DNS requests to an upstream DNS server - potentially over DNS-over-TLS.

Recently, for reasons that are rather complicated, I found myself in an awkward situation which required an authoritative DNS server - and given my love of explaining complicated and rather niche concepts here on my blog, I thought this would be a fabulous opportunity to write a 2-part series :P

In this post, I'm going to outline the difference between a recursive resolver and an authoritative DNS server, and explain why you'd want one and how to set one up. I'll explain how it fits as a part of a wider system.

Go grab your snacks - you'll be learning more about DNS than you ever wanted to know....

DNS in a (small) nutshell

As I'm sure you know if you're reading this, DNS stands for the Domain Name System. It translates domain names (e.g. starbeamrainbowlabs.com.) into IP addresses (e.g. 5.196.73.75, or 2001:41d0:e:74b::1). Every network-connected system will make use of a DNS server at one point or another.

DNS functions on records. These define how a given domain name should be resolved to it's corresponding IP address (or vice verse, but that's out-of-scope of this post). While there are many different types of DNS record, here's a quick reference for the most common one's you'll encounter when reading this post.

  • A: As simple as it gets. An A record defines the corresponding IPv4 address for a domain name.
  • AAAA: Like an A record, but for IPv6.
  • CNAME: An alias, like a symlink in a filesystem [Linux] or a directory junction [Windows]
  • NS: Specifies the domain name of the authoritative DNS server that holds DNS records for this domain. See more on this below.

A tale of 2 (DNS) servers

Consider your laptop, desktop, phone, or other device you're reading this on right now. Normally (if you are using DHCP, which is a story for another time), your router (which usually acts as the DHCP server on most home networks) will tell you what DNS server(s) to use.

These servers that your device talks to is what's known as a recursive resolving DNS server. These DNS servers do not have any DNS records themselves: their entire purpose is to ask other DNS servers to resolve queries for them.

At first this seems rather counterintuitive. Why bother when you can have a server that actually hosts the DNS records themselves and just ask that every time instead?

Given the size of the Internet today, this is unfortunately not possible. If we all used the same DNS server that hosted all DNS records, it would be drowned in DNS queries that even the best Internet connection would not be abel to handle. It would also be a single point of failure - bringing the entire Internet crashing down every time maintenance was required.

To this end, a more scaleable system was developed. By having multiple DNS servers between users and the authoritative DNS servers that actually hold the real DNS records, we can ensure the system scales virtually infinitely.

The next question that probably comes to mind is where the name recursive resolvers DNS server comes from. This name comes from the way that these recursive DNS servers ask other DNS servers for the answer to a query, instead of answering based on records they hold locally (though most recursive resolving DNS servers also have a cache for performance, but this is also a tale for another time).

Some recursive resolving DNS servers - such as the one built into your home router - simply asks 1 or 2 upstream DNS servers - usually either provided by your ISP or manually set by you (I recommend 1.1.1.1/1.0.0.1), but others are truly recursive.

Take peppermint.mooncarrot.space. for example. If we had absolutely no idea where to start resolving this domain, we would first ask a DNS root server for help. Domain names are hierarchical in nature - sub.example.com. is a subdomain of example.com.. The same goes then that mooncarrot.space. is a subdomain of space., which is itself a subdomain of ., the DNS root zone. It is no accident that all the domain names in this blog post have a dot at the end of them (try entering starbeamrainbowlabs.com. into your browser, and watch as your browser auto-hides the trailing dot .).

In this way, if we know the IP address of a DNS root server (e.g. 193.0.14.129, or 2001:7fd::1), we can recurse through this hierarchical tree to discover the IP address associated with a domain name we want to resolve

First, we'd ask a root server to tell us the authoritative DNS server for the space. domain name. We do this by asking it for the NS record for the space. domain.

Once we know the address of the authoritative DNS server for space., we can ask it to give us the NS record for mooncarrot.space. for us. We may repeat this process a number of times - I'll omit the specific details of this for brevity (if anyone's interested, I can write a full deep dive post into this, how it works, and how it's kept secure - comment below) - and then we can finally ask the authoritative DNS server we've tracked down to resolve the domain name peppermint.mooncarrot.space. to an IP address for us (e.g. by asking for the associated A or AAAA record).

Authoritative DNS servers

With this in mind, we can now move on to the main purpose of this post: setting up an authoritative DNS server. As you might have guessed by now, the purpose of an authoritative DNS server is to hold records about 1 or more domain names.

While most of the time the authoritative DNS server for your domain name will be either your registrar or someone like Cloudflare, there are a number of circumstances in which it can be useful to run your own authoritative DNS server(s) and not rely on your registrar:

  • If you need more control over the DNS records served for your domain than your registrar provides
  • Serving complex DNS records for a domain name on an internal network (split-horizon DNS)
  • Setting up your own dynamic DNS system (i.e. where you dynamically update the IP address(es) that a domain name resolves to via an API call)

Other situations certainly exist, but these are 2 that come to mind at the moment (comment below if you have any other uses for authoritative DNS servers).

The specific situation I found myself was a combination of the latter 2 points here, so that's the context in which I'll be talking.

To set one up, we first need some software to do this. There are a number of DNS servers out there:

  • Bind9 [recursive; authoritative]
  • Unbound [recursive; not really authoritative; my favourite]
  • Dnsmasq [recursive]
  • systemd-resolved [recursive; it always breaks for me so I don't use it]

As mentioned Unbound is my favourite, so for this post I'll be showing you how to use it's equally cool sibling, nsd (Name Server Daemon).

The Name Server Daemon

Now that I've explained what an authoritative DNS server is and why it's important, I'll show you how to install and configure one, and then convince another recursive resolving DNS server that's under your control to ask your new authoritative DNS server instead of it's default upstream to resolve DNS queries for a given domain name.

It goes without saying that I'll be using Linux here. If you haven't already, I strongly recommend using Linux for hosting a DNS server (or any other kind of server). You'll have a bad day if you don't.

I will also be assuming that you have a level of familiarity with the Linux terminal. If you don't learn your terminal and then come back here.

nsd is available in all major distributions of Linux in the default repositories. Adjust as appropriate for your distribution:

sudo apt install nsd

nsd has 2 configuration files that are important. First is /etc/nsd/nsd.conf, which configures the nsd daemon itself. Let's do this one first. If there's an existing config file here, move it aside and then paste in something like this:

server:
    port: 5353

    server-count: 1
    username: nsd

    logfile: "/var/log/nsd.log"
    pidfile: "/run/nsd.pid"

    # The zonefile directive(s) below is prefixed by this path
    zonesdir: /etc/nsd/zones

zone:
    name: example.com
    zonefile: example.com.zone

...replace example.com with the domain name that you want the authoritative DNS server to serve DNS records for. You can also have multiple zone: blocks for different (sub)domains - even if those domain names are subdomains of others.

For example, I could have a zone: block for both example.com and dyn.example.com. This can be useful if you want to run your own dynamic DNS server, which will write out a full DNS zone file (a file that contains DNS records) without regard to any other DNS records that might have been in that DNS zone.

Replace also 5353 with the port you want nsd to listen on. In my case I have my authoritative DNS server running on the same box as the regular recursive resolver, so I've had to move the authoritative DNS server aside to a different port as dnsmasq (the recursive DNS server I have running on this particular box) has already taken port 53.

Next up, create the directory /etc/nsd/zones, and then open up example.com.zone for editing inside that new directory. In here, we will put the actual DNS records we want nsd to serve.

The format of this file is governed by RFC1035 section 5 and RFC1034 section 3.6.1, but the nsd docs provide a simpler example. See also the wikipedia page on DNS zone files.

Here's an example:

; example.com.
$TTL 300
example.com. IN     SOA    a.root-servers.net. admin.example.com. (
                2022090501  ; Serial
                3H          ; refresh after 3 hours
                1H          ; retry after 1 hour
                1W          ; expire after 1 week
                1D)         ; minimum TTL of 1 day

; Name Server
IN  NS  dns.example.com.

@                   IN A        5.196.73.75
example.com.        IN AAAA     2001:41d0:e:74b::1
www                 IN CNAME    @
ci                  IN CNAME    @

Some notes about the format to help you understand it:

  • Make sure ALL your fully-qualified domain names have the trailing dot at the end otherwise you'll have a bad day.
  • $TTL 300 specifies the default TTL (Time To Live, or the time DNS records can be cached for) in seconds for all subsequent DNS records.
  • Replace example.com. with your domain name.
  • admin.example.com. should be the email address of the person responsible for the DNS zone file, with the @ replaced with a dot instead.
  • dns.example.com. in the NS record must be set to the domain name of the authoritative DNS server serving the zone file.
  • @ IN A 5.196.73.75 is the format for defining an A record (see the introduction to this blog post) for example.com. - @ is automatically replaced with the domain name in question - in this case example.com.
  • When declaring a record, if you don't add the trailing dot then it is assumed you're referring to a subdomain of the domain this DNS zone file is for - e.g. if you put www it assumes you mean www.example.com.

Once you're done, all that's left for configuring nsd is to start it up for the first time (and on boot). Do that like so:

sudo systemctl restart nsd
sudo systemctl enable nsd

Now, you should be able to query it to test it. I like to use dig for this:

dig -p 5353 +short @dns.example.com example.com

...this should return a result based on the DNS zone file you defined above. Replace 5353 with the port number your authoritative DNS server is running on, or omit -p 5353 altogether if it's running on port 53.

Try it out by updating your DNS zone file and reloading nsd: sudo systemctl reload nsd

Congratulations! You now have an authoritative DNS server under your control! This does not mean that it will be queried by any other DNS servers on your network though - read on.....

Integration with the rest of your network

The final part of this post will cover integrating an authoritative DNS server with another DNS server on your network - usually a recursive one. How you do this will vary depending on the target DNS server you want to convince to talk to your authoritative DNS server.

For Unbound:

I've actually covered this in a previous blog post. Simply update /etc/unbound/unbound.conf with a new block like this:

forward-zone:
    name: "example.com."
    forward-addr: 127.0.0.1@5353

...where example.com. is the domain name to forward for (WITH THE TRAILING DOT; and all subdomains thereof), 127.0.0.1 is the IP address of the authoritative DNS server, and 5353 is the port number of the authoritative DNS server.

Then, restart Unbound like so:

sudo systemctl restart unbound

For dnsmasq:

Dnsmasq's main config file is located at /etc/dnsmasq.conf, but there may be other config files located in /etc/dnsmasq.d/ that might interfere. Either way, update dnsmasq's config file with this directive:

server=/example.com./127.0.0.1#5353

...where example.com. is the domain name to forward for (WITH THE TRAILING DOT; and all subdomains thereof), 127.0.0.1 is the IP address of the authoritative DNS server, and 5353 is the port number of the authoritative DNS server.

If there's another server=/example.com./... directive elsewhere in your dnsmasq config, it may override your new definition.

Then, restart dnsmasq like so:

sudo systemctl restart dnsmasq

If there's another DNS server that I haven't included here that you use, please leave a comment on how to reconfigure it to forward a specific domain name to a different DNS server.

Conclusion

In this post, I've talked about the difference between an authoritative DNS server and a recursive resolving DNS server. I've shown why authoritative DNS servers are useful, and alluded to reasons why running your own authoritative DNS server can be beneficial.

In the second post in this 2-part miniseries, I'm going to go into detail on dynamic DNS, why it's useful, and how to set up a dynamic dns server.

As always, this blog post is a starting point - not an ending point. DNS is a surprisingly deep subject: from DNS root hint files to mDNS (multicast DNS) to the various different DNS record types, there are many interesting and useful things to learn about it.

After all, it's always DNS..... especially when you don't think it is.

Sources and further reading

Mounting LVM partitions from the terminal on Linux

Hello there! Recently I found myself with the interesting task of mounting an LVM partition by hand. It wasn't completely straightforward and there was a bunch of guesswork involved, so I thought I'd document the process here.

For those who aren't aware, LVM stands for the Logical Volume Manager, and it's present on Linux system to make managing partitions easier. It can:

  • Move and resize partitions while they are still mounted
  • Span multiple disks

....but to my knowledge it doesn't have any redundancy (use Btrfs) or encryption (use LUKS) built in. It is commonly used to manage the partitions on your Linux desktop, as then you don't need to reboot it into a live Linux environment to fiddle with your partitions as much.

LVM works on a layered system. There are 3 layers to it:

  1. Physical Volumes: Normal physical partitions on the disk.
  2. Volume Groups: Groups of logical (LVM) partitions.
  3. Logical Volumes: LVM-managed partitions.

In summary, logical volumes are part of a volume group, which spans 1 or more physical disks.

With this in mind, first list the available physical volumes and their associated volume groups, and identify which is the one you want to mount:

sudo vgdisplay

Notice the VG Size in the output. Comparing it with the output of lsblk -o NAME,RO,SIZE,RM,TYPE,MOUNTPOINT,LABEL,VENDOR,MODEL can be helpful to identify which one is which.

I encountered a situation where I had 2 with the same name - one from my host system I was working on, and another from the target disk I was trying to mount. In my situation each disk had it's own volume group assigned to it, so I needed to rename one of the volumes.

To do this, take the value of the VG UUID field of the volume group you want to rename from the output of sudo vgdisplay above, and then rename it like this:

sudo vgrename SOME_ID NEW_NAME

...for example, I did this:

sudo vgrename 5o1LoG-jFdv-v1Xm-m0Ca-vYmt-D5Wf-9AAFLm examplename

With that done, we can now locate the logical volume we want to mount. Do this by listing the logical volumes in the volume group you're interested in:

sudo lvdisplay vg_name

Note down the name of the logical volume you want to mount. Now we just need to figure out where it is actually located in /dev so that we can mount it. Despite the LV Path field appearing to show us this, it's not actually correct - at least on my system.

Instead, list the contents of /dev/mapper:

ls /dev/mapper

You should see the name of the logical volume that you want to mount in the form volumegroup-logicalvolumename. Once found, you should be able to mount it like so:

sudo mount /dev/mapper/volumegroup-logicalvolumename path/to/directory

...replacing path/to/directory with the path to the (empty) directory you want to mount it to.

If you can't find it, then it is probably because you plugged the drive in question in after you booted up. In this case, it's probable that the volume group is not active. You can check this is the case or not like so:

sudo lvscan

If it isn't active, then you can activate it like this:

sudo lvchange -a y vg_name

...replacing vg_name with the name of the volume group you want to activate. Once done, you can then mount the logical volume as I mentioned above.

Once you are done, unmounting it is a case of reversing these steps. First, unmount the partition:

sudo umount path/to/mount_point

Then, disable the volume group again:

sudo lvchange -a n vg_name

Finally, flush any cached writes to disk, just in case:

sync

Now, you can unplug the device from your machine.

That wraps up this quick tutorial. If you spot any mistakes in this, please do leave a comment below and I'll correct it.

Configuring an endlessh honeypot with rsyslog email notifications

Security is all about defence in depth, so I'm always looking for ways to better secure my home network. For example, I have cluster management traffic running over a Wireguard mesh VPN. Now, I'm turning my attention to the rest of my network.

To this end, while I have a guest network with wireless isolation enabled, I do not currently have a way to detect unauthorised devices connecting to my home WiFi network, or fake WiFi networks with the same name, etc. Detecting this is my next focus. While I've seen nzyme recently and it looks fantastic, it also looks more complicated to setup.

While I look into the documentation for nzyme, inspired by this reddit post I decided to setup a honeypot on my home network.

The goal of a honeypot is to detect threats moving around in a network. In my case, I want to detect if someone has connected to my network who shouldn't have done. Honeypots achieve this by pretending to be a popular service, but in reality they are there to collect information about potential threats.

To set one up, I found endlessh, which pretends to be an SSH server - but instead slowly sends an endless banner to the client, keeping the connection open as long as possible. It can also connection attempts to syslog, which allows us to detect connections and send an alert.

Implementing this comes in 2 steps. First, we setup endlessh and configure it to log connection attempts. Then, we reconfigure rsyslog to send email alerts.

Setting up endlessh

I'm working on one of the Raspberry Pis running Raspberry Pi OS in my network, but this should with with other machines too.

If you're following along to implement this yourself, make sure you've moved SSH to another port number before you continue, as we'll be configuring endlessh to listen on port 22 - the default port for ssh, as this is the port I imagine that an automated network scanner might attempt to connect to by default if it were looking for ssh servers to attempt to crack.

Conveniently, endlessh has a package in the default Debian repositories:

sudo apt install endlessh

...adjust this for your own package manager if you aren't on an apt-based system.

endlessh has a configuration file at /etc/endlessh/config by default. Open it up for editing, and make it look something like this:

# The port on which to listen for new SSH connections.
Port 22

# Set the detail level for the log.
#   0 = Quiet
#   1 = Standard, useful log messages
#   2 = Very noisy debugging information
LogLevel 1

Beforee we can start the endlessh service, we need to reconfigure it to allow it to listen on port 22, as this is a privileged port number. Doing this requires 2 steps. First, allow the binary to listen on privileged ports:

sudo setcap CAP_NET_BIND_SERVICE=+eip "$(which "endlessh")";

Then, if you are running systemd (most distributions do by default), execute the following command:

sudo systemctl edit endlessh.service

This will allow you to append some additional directives to the service definition for endlessh, without editing the original apt-managed systemd service file. Add the following, and then save and quit:

[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE
PrivateUsers=false

Finally, we can restart the endlessh service:

sudo systemctl restart endlessh
sudo systemctl enable --now endlessh

That completes the setup of endlessh!

Configuring rsyslog to send email alerts

The second part of this process is to send automatic alerts whenever anyone connects to our endlessh service. Since endlessh forwards logs to syslog by default, reconfiguring rsyslog to send the alerts seems like the logical choice. In my case, I'm going to send email alerts - but other ways of sending alerts do exist - I just haven't looked into them yet.

To do this requires that you have either a working email server (I followed the Ars Technica taking email back series, but whatever you do it's not for the faint for heart! Command line experience is definitely required - if you're looking for a nice first project to try, a web server instead), or an email account you can use. Note that I do not recommend using your own personal email account, as you'll have to store the password in plain text!

In my case, I have my own email server, and I have forwarded port 25 down an SSH tunnel so that I can use it to send emails (in the future I want to configure a proper smart host that listen on port 25 and forwards emails by authenticating against my server properly, but that's for another time as I have yet to find a relay-only MTA that also listens on port 25).

In a previous post, implemented centralised logging - so I'm going to be reconfiguring my main centralised rsyslog instance.

To do this, open up /etc/rsyslog.d/10-endlessh.conf for editing, and paste in something like this:

template (name="mailSubjectEndlessh" type="string" string="[HONEYPOT] endlessh connection on %hostname%")

if ( ($programname == 'endlessh') and (($msg contains "ACCEPT") or ($msg contains "CLOSE")) ) then {
    action(type="ommail" server="localhost" port="20205"
        mailfrom="sender@example.com"
        mailto=["bill@billsboosters.net"]
        subject.template="mailSubjectEndlessh"
        action.execonlyonceeveryinterval="3600"
    )
}

...where:

  • [HONEYPOT] endlessh connection on %hostname% is the subject name, and %hostname% is substituted for the actual hostname the honeypot is running on
  • sender@example.com is the address that you want to send the alert FROM
  • bill@billsboosters.net is the address that you want to send the alert TO
  • 3600 is the minimum interval between emails, in seconds. Log lines are not collected up - only 1 log line is sent at a time, and others logged in-between are ignored and handled as if the above email directive doesn't exist until the given number of seconds expires - at which point it will then email for the next log line that comes through, and the cycle then repeats. If anyone knows how to change that, please leave a command below.

Note that the template line is outside the if statement. This is important - I got a syntax error if I put it inside the if statement.

The if statement specifically looks for log messages with a tag of endlessh that contain either the substring ACCEPT or CLOSE. Only if those conditions are true will it send an email.

I have yet to learn how to configure rsyslog to authenticate while sending emails. I would suspect though that the easiest way of achieving this is to setup a local SMTP relay-only MTA (Mail Transfer Agent) that rsyslog can connect to and send emails, and then the relay will authenticate against the real server and send the email on rsyslog's behalf. I have yet to find such an MTA however other than Postfix - which, while great, can be hugely complicated to setup. Other alternatives I've tried include:

....but they all implement sendmail and while that's useful they do not listen on port 25 (or any other port for that matter) as far as I can tell.

Anyway, the other file you need to edit is /etc/rsyslog.conf. Open it up for editing, and put this near the top:

module(load="ommail")

...this loads the mail output plugin that sends the emails.

Now that we've reconfigured rsyslog, we need to restart it:

sudo systemctl restart rsyslog

rsyslog is picky about it's config file syntax, so make sure to check it's status for error messages:

sudo systemctl status rsyslog

You can also use lnav analyse your logs and find any error messages there too.

Conclusion

We've setup endlessh as a honeypot, and then reconfigured rsyslog to send email alerts. Test the system like so on your local machine:

ssh -vvv -p 22 someuser@yourserver

...and watch your inbox for the email alert that will follow shortly!

While this system isn't particularly useful on it's own, it's a small part of a larger strategy for securing my network. It's also been a testing ground for me to configure rsyslog to send email alerts - something I may want to configure my centralised rsyslog logging system to do for other things in the future.

If you've found this post useful or you have some suggestions, please leave a comment below!

Sources and further reading

Centralising logs with rsyslog

I manage quite a number of servers at this point, and something that's been on my mind for a while now is centralising all the log files generated by them. By this, specifically I mean that I want to automatically gather all logs generated by all the systems I manage into a single place in real time.

While there are enterprise-grade log management setups such as the ELK stack (elasticsearch, logstash, and kibana), as far as I'm aware they are all quite heavy and given my infrastructure is Raspberry Pi based (seriously, they use hardly any electricity at all compared to a regular desktop PC), with such a setup I would likely need multiple Pis to run it.

With this in mind, I'm opting for a different kind of log management system, which I'm basing on rsyslog (which is installed by default in most Linux distros) and lnav (which I've blogged about before: lnav basics tutorial), which runs much lighter, requiring only a fraction of a Raspberry Pi to operate, which is good since the Raspberry Pi I've dedicated to monitoring the rest of the infrastructure currently also handles:

  1. Continuous Integration: Laminar (this will eventually be a Docker container on my Hashicorp Nomad cluster)
  2. Collectd (Collectd is really easy to setup and runs so light, I love it)

I'm sure you might be asking yourself what the purpose of this is. My reasoning is fourfold:

  1. Having all the logs in one place makes them easier to analyse all at once, without having to SSH into many different servers
  2. If a box goes down, then I can read the logs from it before start attempting to fix it, giving me a heads up as to what the problem is (this, in conjunction with my collectd monitoring system)
  3. On the Raspberry Pis I manage, this prolongs the life of the microSD cards by reducing the number of writes thereto
  4. I gain a little bit of security, in that if a box is compromised, then unless the attacker also gains access to my logging server, then they can't erase their tracks as easily as might otherwise have done

With all this in mind, I thought that it's about time I actually did something about this. I've found that while the solution is actually really quite simple, it's not particularly easy to find, so I thought I'd post about it here.

In my setup, I'm going to be using a Raspberry Pi 4 4GB RAM I've dubbed eldarion, which is the successor to an earlier Raspberry Pi 3B+ that died some years prior I called elessar as the server upon which I centralise my logs. It has a 120GB SATA SSD attached in a case that used to house a WD PiDrive (they don't sell those anymore :-/) that I had lying around, which I've formatted with Btrfs.

Before we begin, let's outline the setup we're aiming for with a diagram to avoid confusion:

A diagram of the rsyslog setup we're aiming for. See explanation below.

eldarion will host the rsyslog server (which is essentially just a reconfiguration of the existing rsyslog server it is most likely already running), while other servers connect using the syslog protocol via a TCP connection, which is encrypted with TLS, using the GnuTLS engine (the default built into rsyslog). TLS here is important, since logs are naturally rather sensitive as I'm sure you can imagine.

To follow along here, you will need a valid Let's Encrypt certificate. It just so happens that I have a web server hosting my collectd graph panel interface, so I'm using that.

Of course, rsyslog can be configured in arbitrarily complex ways (such as having clients send logs to servers that they themselves forward to yet other servers), but at least for now I'm keeping it (relatively) simple.

Preparing the server

To start this process, we want to ensure the logs for the local system are stored in the right place. In my case, I have my SSD mounted to /mnt/eldarion-data2, so I want to put my logs in /mnt/eldarion-data2/syslog/localhost. There are 2 ways of accomplishing this:

  1. Reconfigure rsyslog to save logs elsewhere
  2. Be lazy, and bind mount the target location to /var/log

Since I'm feeling lazy today, I'm going to go with option 2 here. It's also a good idea if a program is badly written and decides it's a brilliant idea to write logs directly to /var/log itself instead of going through syslog.

If you're using DietPi, before you continue, do sudo dietpi-software and remove the existing logging system.

A bind mount is like a hard link of a directory, in that it makes a directory appear in multiple places at once. It acts as a separate "filesystem" though I assume to allow for avoiding infinite loops. They are also the tech behind volumes in Docker's backend containerd.

Open /etc/fstab for editing, and something like this on a new line:

/mnt/eldarion-data2/syslog/localhost    /var/log    none    auto,defaults,bind  0   0

..where /mnt/eldarion-data2/syslog/localhost is the location we want the data to be stored, and /var/log is the location we want to bind mount it to. Save and close /etc/fstab, and then mount the bind mount like so. Make sure /var/log is empty before mounting!

sudo mount /var/log

Next, we need to install some dependencies:

sudo apt install rsyslog rsyslog-gnutls

For some strange reason, TLS support is in a separate package on Debian-based systems. You'll need to investigate package names and translate this command for your distribution, of course.

Configuring the server

Now we have that taken care of, we can actually configure our server. Open /etc/rsyslog.conf for editing, and at the top put this:

# The $Thing syntax is apparently 'legacy', but I can't find how else we're supposed to do this
$DefaultNetstreamDriver gtls
$DefaultNetstreamDriverCAFile   /etc/letsencrypt/live/mooncarrot.space/chain.pem
$DefaultNetstreamDriverCertFile /etc/letsencrypt/live/mooncarrot.space/cert.pem
$DefaultNetstreamDriverKeyFile  /etc/letsencrypt/live/mooncarrot.space/privkey.pem

# StreamDriver.Mode=1 means TLS-only mode
module(load="imtcp" MaxSessions="500" StreamDriver.Mode="1" StreamDriver.AuthMode="anon")
input(type="imtcp" port="514")

$template remote-incoming-logs,"/mnt/eldarion-data2/syslog/hosts/%HOSTNAME%/%PROGRAMNAME%.log"
*.* ?remote-incoming-logs

You'll need to edit these bits to match your own setup:

  • /etc/letsencrypt/live/mooncarrot.space/: Path to the live directory there that contains the symlinks to the certs your Let's Encrypt client obtained for you
  • /mnt/eldarion-data2/syslog/hosts: The path to the directory we want to store the logs in

Save and close this, and then restart your server like so:

sudo systemctl restart rsyslog.service

Then, check to see if there were any errors:

sudo systemctl status rsyslog.service

Lastly, I recommend assigning a DNS subdomain to the server hosting the logs, such as logs.mooncarrot.space in my case. A single server can have multiple domain names of course, and this just makes it convenient if we every move the rsyslog server elsewhere - as we won't have to go around and edit like a dozen config files (which would be very annoying and tedious).

Configuring a client

Now that we have our rsyslog server setup, it should be relatively straightforward to configure a client box to send logs there. This is a 3 step process:

  1. Configure the existing /var/log to be an in-memory tmpfs to avoid any potential writes to disk
  2. Add a cron script to wipe /var/log every hour to avoid it getting full by accident
  3. Reconfigure (and install, if necessary) rsyslog to send logs to our shiny new server rather than save them to disk

If you haven't already confgiured /var/log to be an in-memory tmpfs, it is relatively simple. If you're unsure whether it is or not, do df -h.

First, open /etc/fstab for editing, and add the following line somewhere:

tmpfs /var/log tmpfs size=50M,noatime,lazytime,nodev,nosuid,noexec,mode=1777

Then, save + close it, and mount /var/log. Again, make sure /var/log is empty before mounting! Weird things happen if you don't.

sudo mount /var/log

Secondly, save the following to /etc/cron.hourly/clear-logs:

#!/usr/bin/env bash
rm -rf /var/log/*

Then, mark it executable:

sudo chmod +x /etc/cron.hourly/clear-logs

Lastly, we can reconfigure rsyslog. The specifics of how you do this varies depending on what you want to achieve, but for a host where I want to send all the logs to the rsyslog server and avoid saving them to the local in-memory tmpfs at all, I have a config file like this:

#################
#### MODULES ####
#################

module(load="imuxsock") # provides support for local system logging
module(load="imklog")   # provides kernel logging support
#module(load="immark")  # provides --MARK-- message capability

###########################
#### GLOBAL DIRECTIVES ####
###########################

$IncludeConfig /etc/rsyslog.d/*.conf

# Where to place spool and state files
$WorkDirectory /var/spool/rsyslog

###############
#### RULES ####
###############
$DefaultNetstreamDriverCAFile   /etc/ssl/isrg-root-x1-cross-signed.pem
$DefaultNetstreamDriver         gtls
$ActionSendStreamDriverMode     1       # Require TLS
$ActionSendStreamDriverAuthMode anon
*.* @@(o)logs.mooncarrot.space:514  # Forward everything to our rsyslog server

#
# Emergencies are sent to everybody logged in.
#
*.emerg             :omusrmsg:*

The rsyslog config file in question this needs to be saved to is located at /etc/rsyslog.conf. In this case, I replace the entire config file with the above, but you can pick and choose (e.g. on some hosts I want to save to the local disk and and to the rsyslog server).

Un the above you'll need to change the logs.mooncarrot.space bit - this should be the (sub)domain that you pointed at your rsyslog server earlier. The number after the colon (514) is the port number. The *.* tells it to send everything to the remote rsyslog server.

Before we're done here, we need to provide the rsyslog client with the CA certificate of the server (because, apparently, it isn't capable of ferreting around in /etc/ssl/certs like everyone else is). Since I'm using Let's Encrypt here, I downloaded their root certificate like this and it seemed to do the job:

sudo curl -sSL https://letsencrypt.org/certs/isrg-root-x1-cross-signed.pem -o /etc/ssl/isrg-root-x1-cross-signed.pem

Of course, one could generate their own CA and do mutual authentication for added security, but that's complicated, lots of effort, and probably unnecessary for my purposes as far as I can tell. I'll leave a link in the sources and further reading on how to do this if you're interested.

If you have a different setup, it's the $DefaultNetstreamDriverCAFile in the above you need to change to point at your actual CA certificate.

With that all configured, we can now restart the rsyslog client:

sudo systemctl restart rsyslog.service

...and, of course, check to see if there were any errors:

sudo systemctl status rsyslog.service

Finally, we also need to configure logrotate to rotate all these new log files. First, install logrotate if the logrotate command doesn't exist:

sudo apt install logrotate

Then, place the following in the file /etc/logrotate.d/centralisedlogging:

/mnt/eldarion-data2/syslog/hosts/*/*.log {
    rotate 12
    weekly
    missingok
    notifempty
    compress
    delaycompress
}

Of course, you'll want to replace /mnt/eldarion-data2/syslog/hosts/ with the directory you're storing the logs from the remote server in, and also customise the log rotation. For example, the 12 there is the number of old log files to keep, and weekly can be swapped for daily or even monthly if you like.

Conclusion

This has been a very quick whistle-stop tour of setting up an rsyslog server to centralise your logs. We've setup our rsyslog server to use a TLS encrypted connection to receive logs, which 1 or more clients can send logs to. We've also configured /var/log on both the server and the client to avoid awkward issues.

Moving forwards, I recommend reading my lnav basics tutorial blog post, which should be rather helpful in analysing the resulting log files.

lnav was not helpful however when I asked it to look at all the log files separately with sudo lnav */*.log, deciding to treat them as "generic logs" rather than "syslog logs", meaning that it didn't colour them properly, and also didn't allow for proper filter. To this end, it may be benefical to store all the logs in 1 file rather than in separate files. I'll keep an eye on this, and update this post if figure out how to convince lnav to treat them properly.

Another slightly snag with my approach here is that for some reason all the logs from elsewhere also end up in the generic /var/log/syslog file (hence how I found a 'workaround' the above issue), resulting in duplicated logs. I have yet to find a solution to this issue, but I'm also not sure whether I want to keep the logs in 1 big file or in many smaller files yet.

These issues aside, I'm pretty satisfied with the results. Together with my existing collectd-based monitoring system (which I'll blog about how I've set that up if there's any interest - collectd is really easy to use), this is another step towards greater transparency into the infrastructure I manage.

In the future, I want to investigate generating notifications alerts for issues in my infrastructure. These could come either from collectd, or from rsyslog, and I envision them going to a variety of places:

  1. Email (a daily digest perhaps?)
  2. XMPP (I've bridged to it from shell scripts before)

Given that my infrastructure is just something I run at home and I don't mind so much if it's down for a few hours, my focus here is not on notifying my as soon as possible, but notifying myself in a way that doesn't disturb me so I can check into it in my own time.

If you found this tutorial / guide useful, please do comment below! It's really cool and motivating to see that the stuff I post on here helps others out.

Sources and further reading

How to pin an apt repository for preferential package installation

As described in my last post, pinning apt repositories is now necessary if you want to install Firefox from an apt repository (e.g. if you want to install Firefox Beta). This is not an especially difficult process, but it is significantly confusing, so I thought I'd write a post about it.

Pinning an apt repository means that even if there's a newer version of a package elsewhere, the 'older' version will still be installed from the apt repository you pin.

Be very careful with this technique. You can easily cause major issues with your system if you pin the wrong repository!

Firstly, you want to head to /etc/apt/sources.list.d/ and find the .list file for the repository you want to pin. Take note of the URL inside that file, and then run this command:

apt-cache policy

No root is necessary here, as it's still a read-only command. Depending on how many apt repositories you have installed in your system, there may be a significant amount of output. Find the lines that correspond to the apt repository you want to preferentially install from in this output. For this example, I'm going to pin the excellent nautilus-typeahead apt repository, so the bit I'm looking for looks like this:

999 http://ppa.launchpad.net/lubomir-brindza/nautilus-typeahead/ubuntu jammy/main amd64 Packages
    release v=22.04,o=LP-PPA-lubomir-brindza-nautilus-typeahead,a=jammy,n=jammy,l=nautilus-typeahead,c=main,b=amd64
    origin ppa.launchpad.net

From here, take a note of the o= bit. In my case, it's o=LP-PPA-lubomir-brindza-nautilus-typeahead. Then, create a new file in /etc/apt/preferences.d with the following content:

Package: *
Pin: release o=LP-PPA-lubomir-brindza-nautilus-typeahead
Pin-Priority: 1001

See that o=.... bit there? Replace it with the one for the repository you want to pin. The number there is the new priority of the repository. The numbers at the beginning of each line in the output of the apt-cache policy command are the priorities of your existing apt repositories, so this should give you an idea as to what number you need to use here - a higher number means a higher priority regardless of the version number of the packages contained therein.

Then, simply sudo apt update and sudo apt dist-upgrade, and apt should pick up the "upgrades" from your newly pinned repository! In some situations you may need to remove and reinstall the offending package if you encounter issues.

Sources and further reading

Ubuntu 22.04 upgrade report

A slice of the official Ubuntu 22.04 Jammy Jellyfish wallpaper (Above: A slice of the official Ubuntu 22.04 Jammy Jellyfish wallpaper)

Hey there! Since Ubuntu 22.04 Jammy Jellyfish has recently been released, I've upgraded multiple machines to it, and I have enough to talk about that I thought it would be a good idea to write them up into a proper blog post for the benefit of others.

For reference, I've upgraded my main laptop on 20th May 2022 (10 days ago as of writing this psot), and I've also upgraded one of my desktops I use at University. I have yet to upgrade starbeamrainbowlabs.com - the server this blog post is hosted on - as I'm waiting for 22.04.1 for that (it would be very awkward indeed if the upgrade failed or there was some other issue I'm not yet aware of).

The official release notes for the Ubuntu 22.04 can be found here: https://discourse.ubuntu.com/t/jammy-jellyfish-release-notes/24668

There's also an official blog post that's ranked much higher in search engines, but it's not really very informative for me as I don't use the GNOME desktop - you're better off reading the real release notes above.

Thankfully, I have not encountered as many issues (so far!) with this update as I have with previous updates. While this update doesn't seem to change all that much aside from a few upgrades here and there, by far the biggest annoyance is shipping Firefox as a snapd by default.

Not only are they shipping it as a snap package, but they have bumped the epoch number, which means that the packages in the official firefox apt repository (beta users like me, use this one instead) are ignored in favour of the new snap package! I mean I get that shipping packages simplifies build systems for large projects like Firefox, but I have a number of issues with snapd:

  • Extra disk space usage: every snap package has it's own version of it's dependencies
  • Permissions: as far as I'm aware (please comment below if this is now fixed), there are permissions issues if you try to load a file from some places on disk when you're running an app installed via snapd, as it runs in a sandbox (this is also true of apps installed with flatpak). This makes using most applications completely impractical
  • Ease of updates: A minor annoyance, but with apps installed via snap I have 2 different package managers to worry about
  • Observability: Another minor concern, but with every package having it's own local dependencies, I'm makes it more difficult to observe and understand what's going on, and fix any potential issues

This aside, apt does allow for pinning apt repositories to work around this issue. I'll be posting a blog post on how this works more generally hopefully soon, but for now, you want to put this in a file at /etc/apt/preferences.d/firefox (after installing one of the above 2 apt repositories if you haven't done so already):

Package: *
Pin: release o=LP-PPA-mozillateam
Pin-Priority: 1001

...then run this sequence of commands:

sudo apt update
sudo apt purge firefox # This will *not* delete your user data - that's stored in your local user profile 
sudo apt install firefox

The above works for both the stable and beta versions. Optionally: sudo apt purge snapd.

I also found this necessary for the wonderful nautilus-typeahead apt repository.

This was the most major issue I encountered. Other than this, I ran into a number of little things that are worth noting before you decide to upgrade. Firstly, for those who dual (or triple or even more!) boot, the version of the grub bootloader shipped with Ubuntu does not detect other bootable partitions!

Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.

....so if you do run more than a single OS on your system, make sure you correct this after upgrading.

Another thing is that, as usual, Ubuntu disables all third party apt repositories on upgrade. I strongly recommend paying very close attention to the list of packages that do-release-upgrade decides it needs to remove, as if you install e.g. Inkscape or Krita via an apt repository to get the latest versions thereof, you'll need to reinstall them after re-enabling your apt repositories. Personally, I say "no" to the reboot at the end of the upgrade process and fix my apt repositories before then running:

sudo apt update
sudo apt dist-upgrade
sudo apt autoremove
sudo apt autoclean
# See also https://gitlab.com/sbrl/bin/-/blob/master/update-system

...and only then rebooting.

While GitHub's Atom seems to be more and more inactive these days as people move over to Visual Studio Code, I still find myself using it regularly as my primary code editor. Unfortunately, I encountered this bug, so I needed to edit /usr/share/applications/atom.desktop to add --no-sandbox to the execution line when starting Atom. The Exec= line in that file now reads:

Exec=env ATOM_DISABLE_SHELLING_OUT_FOR_ENVIRONMENT=false /usr/bin/atom --no-sandbox %F

This issue only occurred on 1 of the 2 systems I've upgraded though, so I'm not sure of the root cause. Other random issues I encountered:

  • GDM has a truly awful shade of grey in the background now. This repository gives a way to fix this problem. Try to avoid an image that's too light in colour, as the white text of the lock screen becomes rather difficult to see.
  • Speaking of backgrounds, the upgrade reset my desktop background on both machines I upgraded. Make sure you have a copy of it stored away somewhere, as you'll need it. lightdm (the login screen I use on my main laptop in place of gdm) seems to be fine though.
  • tumbler - a d-bus thumbnailing service - was also automatically removed. This does not appear to have caused me any problems so far (though image previews now make transparent pixels appear white, which is really annoying and I haven't yet looked into a fix on that one), so I need to look into this one further.
  • If you're a regular user of Memtest86+, it may disappear from your grub bootloader menu if you use EFI boot now for some strange reason.
  • The colour scheme in the address bar of Nautilus (the file manager) seems a bit messed up for me, but this may have more to do with the desktop theme I'm using.

If I encounter any other issues while upgrading my servers in the future, I'll make another post here about it if it's a significant issue, or comment on/edit this post if it's a minor thing.

If you encounter any other issues upgrading that aren't mentioned here, please do leave a comment below with the issue you encountered and the solution / workaround you implemented to fix it.

Art by Mythdael