Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

NAS, Part 3: Decisions | Choosing a Filesystem

It's another entry in my NAS series! It's still 2020 for me as I type this, but I hope that 2021 is going well. Before we continue, I recommend checking out the previous posts in this series:

Part 1 in particular is useful for context as to the hardware I'm using. Part 2 is a review of my experience assembling the system. In this part, we're going to look at my choice of filesystem and OS.

I left off in the last post after I'd booted into the installer for Ubuntu Server 20.04. After running through that installer, I performed my collection of initial setup tasks for any server I manage:

  • Setup an SSH server
  • Enable UFW
  • Setup my personal ~/bin folder
  • Assign a static IP address (why won't you let me choose an IP, Netgear RAX120? Your UI lets me enter a custom IP, but it devices don't ultimately end up with the IP I tell you to assign to them....)
  • Setup Collectd
  • A number of other tasks I forget

With my basic setup completed, I also setup a few things specific to devices that have SMART-enabled storage devices:

  • Setup an email relay (via autossh) for mail delivery
  • Installed smartd (which sends you emails when there's something wrong with 1 your disks)
  • Installed and configured hddtemp, and integrated it with collectd (a topic for another post, I did this for the first time)

With these out of the way and after making a mental note to sort out backups, I could now play with filesystems with a view to making a decision. The 2 contenders:

  • (Open)ZFS
  • Btrfs

Both of these filesystems are designed to be spread across multiple disks in what's known as a pool thereof. The idea behind them is to enable multiple disks to be presented to the user as a single big directory, with the complexities as to which disk(s) a file is/are stored on. They also come with extra nice features, such as checksumming (which allows them to detect corruption), snapshotting (taking snapshots of what the filesystem looks like at a given point in time), automatic data deduplication, compression, snapshot send / receiving, and more!

Overview: ZFS

ZFS is a filesystem originally developed by Sun Microsystems in 2001. Since then, it has been continually developed and improved. After Oracle bought Sun Microsystems in 2010, the source code for ZFS was closed - hence the OpenZFS fork was born. It's licenced under the CDDL, which isn't compatible with the GPLv2 used by the Linux Kernel. This causes some minor installation issues.

As a filesystem, it seems to be widely accepted to be rock solid and mature. It's used across the globe by home users and businesses both large and small to store huge volumes of data. Given its long history, it has proven its capability to store data safely.

It does however have some limitations. For one, it only has limited support for adding drives to a zpool (a pool of disks in the ZFS world), which is a problem for me - as I'd prefer to have the ability to add drives 1 at a time. It also has limited support for changing key options such as the compression algorithm later, as this will only affect new files - and the only way to recompress old files is to copy them in and out of the disk again.

Overview: Btrfs

Btrfs, or B-Tree File System is a newer filesystem that development upon which began in 2007, and was accepted into the Linux Kernel in 2009 with the release of version 1.0. It's licenced under the GPLv2, the same licence as the Linux Kernel. As of 2020, many different distributions of Linux ship with btrfs installed by default - even if it isn't the default filesystem (that's ext4 in most cases).

Unlike ZFS, Btrfs isn't as well-tested in production settings. In particular, it's raid5 and raid6 modes of operation are not well tested (though this isn't a problem, since raid1 operates at file/block level and not disk level as it does with ZFS, which enables us to use interesting setups like raid1 striped across 3 disks). Despite this, it does look to be stable enough - particularly as openSUSE has set it to be the default filesystem.

It has a number of tempting features over ZFS too. For example, it supports adding drives 1 at a time, and you can even convert your entire pool from 1 raid level to another dynamically while it's still mounted! The same goes for converting between compression algorithms - it's all done using a generic filter system.

Such a system is useful when adding new disks to the pool too, as they it can be used to rebalance data across all the disks present - allowing for new disks to be accounted for and faulty disks to be removed, preserving the integrity of the data while a replacement disk is ordered for example.

While btrfs does have a bold list of features that they'd like to implement, they haven't gotten around to all of them yet (the status of existing features can be found here). For example, while ZFS can use an SSD as a dedicated caching device, btrfs doesn't yet have this ability - and nobody appears to have claimed the task on the wiki.

Performance

Inspired by a recent Ars Technica article, I'd like to test the performance of the 2 filesystems at hand. I ran the following tests for reading and writing separately:

  • 4k-random: Single 4KiB random read/write process
  • 64k-random-16p: 16 parallel 64KiB random read/write processes
  • 1m-random: Single 1MiB random write process

I did this for both ZFS in raid5 mode, and Btrfs in raid5 (though if I go with btrfs I'll be using raid1, as I later discovered - which I theorise would yield a minor performance improvement). I tested ZFS twice: once with gzip compression, and again with zstd compression. As far as I can tell, Btrfs doesn't have compression enabled by default. Other than the compression mode, no other tuning was done - all the settings were left at their defaults. Both filesystems were completely empty aside from the test files, which were created automatically in a chowned subdirectory by fio.

Graphs showing the results of the above tests. See the discussion below.

The graph uses a logarithmic scale. My initial impressions are that ZFS benefits from parallelisation to a much greater extent than btrfs - though I suspect that I may be CPU bound here, which is an unexpected finding. I may also be RAM-bound too, as I observed a significant increase in RAM usage when both filesystems were under load. Buying another 8GB would probably go a long way to alleviating that issue.

Other than that, zstd appears to provide a measurable performance improvement over gzip compression. Btrfs also appears to benefit from writing larger blocks over smaller ones.

Overall, some upgrades to my NAS are on the cards should I be unsatisfied with the performance in future:

  • More RAM would assist in heavy i/o loads
  • A better CPU would probably raise the peak throughput speeds - if I can figure out what to do with the old one

But for now, I'm perfectly content with these speeds. Especially since I have a single gigabit ethernet port on my storage NAS, I'm not going to need anything above 1000Mbps - which is 119.2 MiB/s if you'd like to compare against the graph above.

Conclusion

As for my final choice of filesystem, I think I'm going to go with btrfs. While I'm aware that it isn't as 'proven' as ZFS - and slightly less performant too - I have a number of reasons for this decision:

  1. Btrfs allows you to add disks 1 at a time, and ZFS makes this difficult
  2. Btrfs has the ability to convert to a different raid level at a later date if I change my mind
  3. Btrfs is easier to install, since it's already built-in to Ubuntu Server 20.04.

NAS, Part 2: Assembly and Installation

Welcome back! This is part 2 of a series of posts about my new NAS (network attached storage) device I'm building. If you haven't read it yet, I recommend you go back and read part 1, in which I talk about the hardware I'm using.

Since the Fractal Design Node 804 case came first, I was able to install the parts into it as they arrived. First up was the motherboard (an ASUS PRIME B450M-A) and CPU (an AMD Athlon 3000G).

The motherboard was a pain. As I read, the middle panel of the case has some flex in it, so you've got to hold it in place with one hand we you're screwing the motherboard in. This in and of itself wasn't an issue at all, but the screws for the motherboard were really stiff. I think this was just the motherboard, but it was annoying.

Thankfully I managed it though, and then set to work installing the CPU. This went well - the CPU came with thermal paste on top already, so I didn't need to buy my own. The installation process for the stock CPU heatsink + fan was unfamiliar, which took me a moment to decipher how the mechanism worked.

Following this, I connected the front ports from the case up to the motherboard (consulting my motherboard's documentation showed me where I needed to plug these in - I remember this being something I struggled with when I last built an (old) PC when doing some IT technician work experience some years ago). The RAM - while a little stiff (to be expected) - went in fine too. I might buy another stick later if I run into memory pressure, but I thought a single 8GB stick would be a good place to start.

The case came with a dedicated fan controller board that has a high / medium / low switch on the back too, so I wired up the 3 included Noctua case fans to this instead of the slots on the motherboard. The CPU fan (nothing special yet - just the stock fan that came with the CPU) went into the motherboard though, as the fan controller didn't have room - and I thought that the motherboard would be better placed to control the speed of that one.

The inside of the 2 sides of the case.

(Above: The inside of the 2 sides of the case. Left: The 'hot' side, Right: The 'cold' side.)

The case is split into 2 sides: 1 for 'hot' components (e.g. the motherboard and CPU), and another for 'cold' components (e.g. the HDDs and PSU). Next up were the hard disks - so I mounted the SSD for the operating system to the base of the case in the 'hot' side, as the carriage in the cold side fits only 3.5 inch disks, and my SSD is a 2.5 inch disk. While this made the cabling slightly awkward, it all worked out in the end.

For the 3.5 inch HDDs (for data storage), I found I was unable to mount them with the included pieces of bracket metal that allow you to put screws into the bottom set of holes - as the screws wouldn't fit through the top holes. I just left the metal bracket pieces out and mounted the HDDs directly into the carriage, and it seems to have worked well so far.

The PSU was uneventful too. It fit nicely into the space provided, and the semi-modular nature of the cables provided helped tremendously to avoid a mess of cables all over the place as I could remove the cables I didn't need.

Finally, the DVD writer had some stiff screws, but it seemed to mount well enough (just a note: I've been having an issue I need to investigate with this DVD drive whereby I can't take a copy of a disk - e.g. the documentation CD that came with my motherboard - with dd, as it reports an IO error. I need to investigate this further, so more on that in a later post).

The installation of the DVD drive completed the assembly process. To start it up for the first time, I connected my new NAS to my television temporarily so that I could see the screen. The machine booted fine, and I dove straight into the BIOS.

The BIOS that comes with the ASUS motherboard I bought

(Above: The BIOS that comes with the ASUS motherboard, before the clock was set by Ubuntu Server 20.04 - which I had yet to install)

Unlike my new laptop, the BIOS that comes with the ASUS motherboard is positively delightful. It has all the features you'd need, laid out in a friendly interface. I observed some minor input lag, but considering this is a BIOS we're talking about here I can definitely overlook that. It even has an online update feature, where you can plug in an Ethernet cable and download + install BIOS updates from the Internet.

I tweaked a few settings here, and then rebooted into my flash drive - onto which I loaded an Ubuntu Server 20.04 ISO. It booted into this without complaint (unlike a certain laptop I'm rather unhappy with at the moment), and then I selected the appropriate ISO and got to work installing the operating system (want your own multiboot flash drive? I've blogged about that already! :D).

In the next post, I'm going to talk about the filesystem I ultimately chose. I'm also going to show and discuss some performance tests I ran using fio following this Ars Technica guide.

Cluster, Part 10: Dockerisification | Writing Dockerfiles

Hey there - welcome to 2021! I'm back with another cluster post. In double digits too! I think this is the longest series yet on my blog. Before we start, here's a list of all the posts in the series so far:

We've got a pretty cool setup going so far! With Nomad for task scheduling (part 7), Consul to keep track of what's running where (part 6), and wesher keeping communications secured (part 4, although defence in depth says that we'll be returning later to shore up some stuff here) we have a solid starting point from which to work from. And it's only taken 9 blog posts to get to this point :P

In this post, we'll be putting all our hard work to use by looking at the basics of writing Dockerfiles. It's taken me quite a while to get my head around them, so I want to take a moment here to document some of the things I've learnt. A few other things that I want to talk about soon are Hashicorp Vault (it's still giving me major headaches trying to understand the Nomad integration though, so this may be a while), obtaining TLS certificates, and tying in with the own your code series by showing off the Docker image management script setup I have that I've worked into my Laminar CI instance, which makes it easy to rebuild images and all their dependants.

Anyway, Dockerfiles. First question: what? Dockerfiles are essentially a file containing a domain-specific language that defines how a Docker image can be built. They are usually named Dockerfile. Here I use the term image and not container:

  • Image: A Docker image that contains a bunch of files and directories that can be run
  • Container: A copy of an image that is currently running on a host system.

In short: A container is a running image, and a Docker image is the bit that a container spins up from.

Second question: why? The answer is a few different reasons. Although it adds another layer of indirection and complication, it also allows us to square applications away such that we don't care about what host they run on (too much).

A great example here is would be a static file web server. In our case, this is particularly useful because Fabio - as far as I know - isn't actually capable of serving files from disk. Personally I have a fork of a rather nice dashboard I'd like to have running for my cluster too, so I found that it fits perfectly to test the waters.

Next question: how? Well, let's break the process down:

  1. Install Node.js
  2. Install the serve npm package

Thankfully, I've recently packaged Node.js in my apt repository (finally! It's only taken me multiple years.....). Since we might want to build lots of different Node.js based container images, it makes sense to make Node.js its own separate container. I'm also using my apt repository in other container images too which don't necessarily need Node.js, so I've opted to put my apt repository into my base image (If I haven't mentioned it already, I'm using minideb as my base image - which I build with a patch to make it support Raspbian - which is now called Raspberry Pi OS. It's confusing).

To better explain the plan, let's use a diagram:

(Above: A diagram I created. Link to editing file - don't forget this blog is licenced under CC-BY-SA.)

Docker images are always based on another Docker image. Our node-serve Docker image we intend to create will be based on a minideb-node Docker image (which we'll also be creating), which itself will be based on the minideb base image. Base images are special, as they don't have a parent image. They are usually imported via a .tar.gz image for example, but that's a story for another time (also for another time are image based on scratch, a special image that's completely empty).

We'll then push the final node-serve Docker image to a Docker registry. I'm running my own private Docker registry, but you can use the Docker Hub or setup your own private Docker registry.

With this in mind, let's start with a Docker image for Node.js:

ARG REPO_LOCATION

FROM ${REPO_LOCATION}minideb

RUN install_packages libatomic1 nodejs-sbrl

Let's talk about each of the above commands in turn:

  1. ARG REPO_LOCATION: This brings in an argument which is specified at build time. Here we want to allow the user to specify the location of a private Docker registry to pull the base (or parent) image from to begin the build process with.
  2. FROM ${REPO_LOCATION}minideb: This specifies the base (or parent) image to start the build with.
  3. RUN install_packages libatomic1 nodejs-sbrl: The RUN command runs the specified command inside the Docker container, saving a new layer in the process (more on those later). In this case, we call the install_packages command, which is a helper script provided by minideb to make package installation easier.

Pretty simple! This assumes that the minideb base image you're using has my apt repository setup, which make not be the case. To this end, we'd like to automatically set that up. To do this, we'll need to use an intermediate image. This took me some time too get my head around, so if you're unsure about anything, please comment below.

Let's expand on our earlier attempt at a Dockerfile:

ARG REPO_LOCATION

FROM ${REPO_LOCATION}minideb AS builder

RUN install_packages curl ca-certificates

RUN curl -o /srv/sbrl.asc https://apt.starbeamrainbowlabs.com/aptosaurus.asc

FROM ${REPO_LOCATION}minideb

COPY --from=builder /srv/sbrl.asc /etc/apt/trusted.gpg.d/sbrl-aptosaurus.asc

RUN echo "deb https://apt.starbeamrainbowlabs.com/ /" > /etc/apt/sources.list.d/sbrl.list && \
    install_packages libatomic1 nodejs-sbrl;

This one is more complicated, so let's break it down. Here, we have an intermediate Docker image (which we name builder via the AS builder bit at the end of the 1st FROM) in which we download and install curl (the 1st RUN command there), followed by a second image in which we copy the file we downloaded from the first Docker image and place it in a specific place in the second (the COPY directive).

Docker always reads Dockerfiles from top to bottom and executes them in sequence, so it will assume that the last image created is the final one - i.e. from the last FROM directive. Every FROM directive starts afresh from a brand-new copy of the specified parent image.

We've also expanded the RUN directive at the end of the file there to echo the apt sources list file out for my apt repository. We've done it like this in a single RUN command and not 2, because every time you add another directive to a Dockerfile (except ARG and FROM), it creates a new layer in the resulting Docker image. Minimising the number of layers in a Docker image is important for performance, hence the obscurity here in chaining commands together. To build our new Dockerfile, save it to a new empty directory. Then, execute this:

cd path/to/directory/containing_the_dockerfile;
docker build  --pull --tag "minideb-node" .

If you're using a private registry, add --build-arg "REPO_LOCATION=registry.example.com:5000/" just before the . there at the end of the command and prefix the tag with registry.example.com:5000/. If you're developing a new Docker image and having trouble with the cache (Docker caches the result of directives when building images), add --no-cache.

Then, push it to the Docker registry like so:

execute docker push "minideb-node"

Again, prefix minideb-node there with registry.example.com:5000/ should you be using a private Docker registry.

Now, you should be able to start an interactive session inside your new Docker container:

docker run -it --rm minideb-node

As before, prefix minideb-node there with registry.example.com/ if you're using a private Docker registry.

Now that we've got our Docker image for Node.js, we can write another Dockerfile for serve, our static file HTTP server. Let's take a look:

ARG REPO_LOCATION

FROM ${REPO_LOCATION}minideb-node

RUN npm install --global serve && rm -rf "$(npm get cache)";

VOLUME [ "/srv" ]

USER 80:80

ENV NODE_ENV production
WORKDIR /srv
ENTRYPOINT [ "serve", "-l", "5000" ]

This looks similar to the previous Dockerfile, but with a few extra bits added on. Firstly, we use a RUN directive to install the serve npm package and delete the NPM cache in a single command (since we don't want the npm cache sticking around in the final Docker image).

We then use a VOLUME declaration to tell Docker that we expect the /srv to have a volume mounted to it. A volume here is a directory from the host system that will be mounted into the Docker container before it starts running. In this case, it's the web root that we'll be serving files from.

A USER directive tells Docker what user and group IDs we want to run all subsequent commands as. This is important, as it's a bad idea to run Docker containers as root.

The ENV directive there is just to tell Node.js it should run in production mode. Some Node.js applications have some optimisations they enable when this environment variable is set.

The WORKDIR directive defines the current working directory for future commands. It functions like the cd command in your terminal or command line. In this case, the serve npm package always serves from the current working directory - hence we set the working directory here.

Finally, the ENTRYPOINT directive tells Docker what command to execute by default. The ENTRYPOINT can get quite involved and complex, but we're keeping it simple here and telling it to execute the serve command (provided by the serve npm package, which we installed globally earlier in the Dockerfile). We also specify the port number we want serve to listen on with -l 5000 there.

That completes the Dockerfile for the serve npm package. Build it as before, and then you should be able to run it like so:

docker run -it --rm -v /absolute/path/to/local_dir:/srv node-serve

As before, prefix node-serve with the address of your private Docker registry if you're using one. The -v bit above defines the Docker volume that mounts the webroot directory inside the Docker container.

Then, you should be able to find the IP address of the Docker container and enter it into your web browser to connect to the running server!

The URL should be something like this: http://IP_ADDRESS_HERE:5000/.

If you're not running Docker on the same machine as your web browser is running on, then you'll need to do some fancy footwork to get it to display. It's at this point that I write a Nomad job file, and wire it up to Fabio my load balancer.

In the next post, we'll talk more about Fabio. We'll also look at the networking and architecture that glues the whole system together. Finally, we'll look at setting up HTTPS with Let's Encrypt and the DNS-01 challenge (which I found relatively simple - but only once I'd managed to install a new enough version of certbot - which was a huge pain!).

Digitising old audio CDs on a Linux Server

A number of people I know own a number of audio / music CDs. This is great, but unfortunately increasingly laptops aren't coming with an optical drive any more, which makes listening to said CDs challenging. To this end, making a digital copy to add to their personal digital music collections would be an ideal solution.

Recently, I build a new storage NAS (which I'm still in the process of deciding on a filesystem for, but I think I might be going with btrfs + raid1), and the Fractal Design Node 804 case I used has a dedicated space for a slimline DVD writer (e.g. like the one you might find in a car). I've found this to be rather convenient for making digital copies of old audio CDs, and wanted to share the process by which I do it in case you'd like to do it too.

To start, I'm using Ubuntu Server 20.04. This may work on other distributions too, but there are a whole bunch of packages you'll need to install - the names and commands for which you may need to convert for your distribution.

To make the digital copies, we'll be using abcde. I can't find an updated website for it, but it stands for "A Better CD Encoder". It neatly automates much of the manual labor of digitising CDs - including the downloading of metadata from the Internet. To tidy things up after abcde has run to completion, we'll be using ffmpeg for conversion and eyeD3 for mp3 metadata manipulation.

To get started, let's install some stuff!

sudo apt install --no-install-recommends abcde
sudo apt install ffmpeg mkcue eyed3 flac glyrc cdparanoia imagemagick

Lots of dependencies here. Many of them are required by abcde for various features we'll be making use of.

Next, insert the audio CD into the DVD drive. abcde assumes your DVD drive is located at /dev/sr0 I think, so if it's different you'll have to adjust the flags you pass to it.

Once done, we can call abcde and get it to make a digital copy of our CD. I recommend here that you cd to a new blank directory, as abcde creates 1 subdirectory of the current working directory for each album it copies. When you're ready, start abcde:

abcde -o flac -B -b

Here, we call abcde and ask it to save the digital copy as flac files. The reason we do this and not mp3 directly is that I've observed abcde gets rather confused with the metadata that way. By saving to flac files first, we can ensure the metadata is saved correctly.

The arguments above do the following:

  • -o flac: Save to flac files
  • -B: Automatically embed the album art into the saved music files if possible
  • -b: Preserve the relative volume differences between tracks in the album (if replaygain is enabled, which by default I don't think it is)

It will ask you a number of questions interactively. Once you've answered them, it will get to work copying the audio from the CD.

When it's done, everything should be good to go! However flac files can be large, so something more manageable is usually desired. For this, we can mass-convert our flac files to MP3. This can be done like so:

find -iname '*.flac' -type f -print0 | nice -n20 xargs -P "$(nproc)" --null --verbose -n1 -I{} sh -c 'old="{}"; new="${old%.8}.mp3"; ffmpeg -i "${old}" -ab 320k -map_metadata 0 -id3v2_version 3 "${new}";';

There's a lot to unpack here! Before I do though, let's turn it into a bash function real quick which we can put in ~/.bash_aliases for example to make it easy to invoke in the future:

# Usage:
#   flac2mp3
#   flac2mp3 path/to/directory
flac2mp3() {
    dir="${1}";
    if [[ -z "${dir}" ]]; then dir="."; fi
    find "${dir}" -iname '*.flac' -type f -print0 | nice -n20 xargs -P "$(nproc)" --null --verbose -n1 -I{} sh -c 'old="{}"; new="${old%.*}.mp3"; ffmpeg -i "${old}" -ab 320k -map_metadata 0 -id3v2_version 3 "${new}";';
}

Ah, that's better. Now, let's deconstruct it and figure out how it works. First, we have a dir variable which, by default, is set to the current working directory.

Next, we use the one-liner from before to mass-convert all flac files in the target directory recursively to mp3. It's perhaps easier to digest if we separate it out int multiple lines:

find "${dir}" -iname '*.flac' -type f -print0   # Recursively find all flac files, delimiting them with NULL (\0) characters
    | nice -n20 # Push the task into the background
        xargs # for each line of input, execute a command
            --null # Lines are delimited by NULL (\0) characters
            --verbose # Print the command that is about to be executed
            -P "$(nproc)" # Parallelise across as many cores as the machine has
            -n1 # Only pass 1 line to the command to be executed
            -I{} # Replace {} with the filename in question
            sh -c ' # Run this command
                old="{}"; # The flac filename
                new="${old%.*}.mp3"; # Replace the .flac file extension with .mp3
                ffmpeg # Call ffmpeg to convert it to mp3
                    -i "${old}" # Input the flac file
                    -ab 320k # Encode to 320kbps, the max supported by ffmpeg
                    -map_metadata 0 # Copy all the metadata
                    -id3v2_version 3 # Set the metadata tags version (may not be necessary)
                    -c:v copy -disposition:v:0 attached_pic # Copy the album art if it exists
                    "${new}"; # Output to mp3
            '; # End of command to be executed

Obviously it won't actually work when exploded and commented like this, but hopefully it gives a sense of how it functions.

I recommend checking that the album art has been transferred over. The -c:v copy -disposition:v:0 attached_pic bit in particular is required to ensure this happens (see this Unix Stack Exchange answer to a question I asked).

Sometimes abcde is unable to locate album too, so you may need to find and download it yourself. If so, then this one-liner may come in handy:

find , -type f -iname '*.mp3' -print0 | xargs -0 -P "$(nproc)" eyeD3 --add-image "path/to/album_art.jpeg:FRONT_COVER:";

Replace path/to/album_art.jpeg with the path to the album art. Wrapping it in a bash function ready for ~/.bash_aliases makes it easier to use:

mp3cover() {
    cover="${1}";
    dir="${2}";

    if [[ -z "${cover}" ]] || [[ -z "${dir}" ]]; then
        echo "Usage:" >&2;
        echo "    mp3cover path/to/cover_image.jpg path/to/album_dir";
        return 0;
    fi

    find "${dir}" -type f -iname '*.mp3' -print0 | xargs -0 -P "$(nproc)" eyeD3 --add-image "${cover}:FRONT_COVER:"
}

Use it like this:

mp3cover path/to/cover_image.jpg path/to/album_dir

By this point, you should have successfully managed to make a digital copy of an audio CD. If you're experiencing issues, comment below and I'll try to help out.

Note that if you experience any issues with copy protection (I think this is only DVDs / films and not audio CDs, which I don't intend to investigate), I can't and won't help you, because it's there for a reason (even if I don't like it) and it's illegal to remove it - so please don't comment in this specific case.

NAS, Part 1: We need a bigger rocket

In my cluster series of posts, I've been talking about how I've built a Raspberry Pi-based cluster for running compute tasks (latest update: I've got Let's Encrypt working with the DNS-01 challenge, stay tuned for a post on that soon). Currently, this has been backed by a Raspberry Pi 3 with a 1TB WD PiDrive attached. This has a number of issues:

  • The Raspberry Pi 3 has a 100mbps network port
  • It's not redundant
  • I'm running out of storage space

I see 2 ways of solving these issues:

  1. Building a clustered file system, with 1 3.5 inch drive per Pi (or Odroid HC2 perhaps)
  2. Building a more traditional monolithic NAS

Personally, my preference here is option #2, but unfortunately due to some architectural issues in my house (read: the wiring needs redoing by an electrician) I don't actually have access to the number of wall sockets I'd need to put together a clustered setup. If I get those issues sorted, I'll certainly take a look at upgrading - but for now I've decided that I'm going to put together a more traditional monolithic NAS (maybe it can become the backup device in future, who knows) as it will only require a single wall socket (the situation is complicated. Let's just move on).

To this end, I decided to start with a case and go from there. Noise is a big concern for me, so I chose the Fractal Design Node 804, as it has a number of key features:

  • It has lots of space for disks
  • It comes with some quiet fans
  • The manufacturer appears to be quite popular and reputable

From here, I picked the basic components for the system using PC Part Picker. I haven't actually built an amd64 system from scratch before - I use laptops as my main device (see my recent review of the PC Specialist Proteus VIII), and Raspberry Pis (and an awesome little 2nd hand Netgear GS116v2 switch) currently form the backbone of my server setup.

These components included:

  • An ASUS PRIME B450M-A motherboard: 6 x SATA ports, AM4 CPU socket
  • An AMD Athlon 3000G: I don't need much compute horsepower in this build, since it's for storage (I would have got an Athlon 200GE instead as it's cheaper, but they were all out of stock)
  • 8GB Corsair Vengeance LPX DDR4 2666MHz RAM: The highest frequency the CPU supports - I got a single stick here to start with. I'll add additional sticks as and when I need them.
  • 120GB Gigabyte SSD: For the OS. Don't need a lot of storage here, since all the data is going to be on 3.5 inch HDDs instead
  • 3 x 4TB WD Red Plus WD40EFRX (CMR): These are my main data storage drives. I'm starting with 3 4TB drives, and I'll add more as I need them. The Node 804 case (mentioned above) supports up to 10 disks, apparently - so I should have plenty of space.
  • SeaSonic CORE GM 500 W 80+ Gold PSU: The most efficient PSU I could afford. I would have loved an 80+ titanium (apparently they are at least 94% efficient at 50% load), but at £250+ it's too much for my budget.
  • LG GS40N DVD writer: Apparently the Node 804 case as a slimline DVD drive slot (i.e. like one you might find in a car). It wasn't too expensive and being able to ingest CD/DVDs is appealing.

For the storage there, in particular my (initial) plan is to use OpenZFS in RAIDZ mode, which has a minimum requirement of 3 drives. Using an online calculator suggests that with the above drives I'll have 8TB of usable capacity. Initial research does suggest though that expanding a ZFS storage pool may not be as easy as I thought it was (related, see also), so more research is definitely needed before I commit to a single filesystem / set of settings there.

I've heard of BTRFS too, but I've also heard of some stability and data loss issues too. That was several years ago though, so I'll be reviewing its suitability again before making a decision here.

In future posts, I'm going to talk about my experience assembling the build. I'm also going to look at how I eventually setup the filesystem (as of yet which filesystem I'll choose is still undecided). I'll also be running some tests on the setup to evaluate how well it performs and handles failure. Finally, I may make a bonus post in this series about the challenges I encounter migrating my existing (somewhat complicated) data storage setup to the new NAS I build.

Found this interesting? Got a suggestion? Comment below!

Resizing Encrypted LVM Partitions on Linux

I found recently that I needed to resize some partitions on my new laptop as the Ubuntu installer helpfully decided to create only a 1GB swap partition, which is nowhere near enough for hibernation (you need a swap partition that's at least as big as your computer's RAM in order to hibernate). Unfortunately resizing my swap partition didn't allow me to hibernate successfully in the end, but I thought I'd still document the process here for future reference should I need to do it again in the future.

The key problem with resizing one's root partition is that you can't resize it without unmounting it, and you can't unmount it without turning off your computer. To get around this, we need to use a live distribution of Ubuntu. It doesn't actually matter how you boot into this - personally my preferred method is by using a multiboot USB flash drive, but you could just as well flash the latest ubuntu ISO to a flash drive directly.

Before you start though, it's worth mentioning that you really should have a solid backup strategy. While everything will probably be fine, there is a chance that you'll make a mistake and wind up loosing a lot of data. My favourite website that illustrates this is The Tao of Backup. Everyone who uses a computer (technically minded or not) should read it. Another way to remember it is the 3-2-1 rule: 3 backups, in 2 locations, with 1 off-site (i.e. in a different physical location).

Anyway, once you've booted into a live Ubuntu environment, open the terminal, and start a root shell. Your live distribution should come with LUKS and LVM already, but just in case it doesn't execute the following:

sudo apt update && sudo apt install -y lvm2 cryptsetup

I've talked about LVM recently when I was setting up an LVM-managed partition on an extra data hard drive for my research data. If you've read that post, then the process here may feel a little familiar to you. In this case, we're interacting with a pre-existing LVM setup that's encrypted with LUKS instead of setting up a new one. The overall process look a bit like this:

A flowchart showing the process we're going to follow. In short: open luks → LVM up → make changes → LVM down → close luks → reboot

With this in mind, let's get started. The first order of business is unlocking the LUKS encryption on the drive. This is done like so:

sudo modprobe dm-crypt
sudo cryptsetup luksOpen /dev/nvme0n1p3 crypt1

The first command there ensures that the LUKS kernel module is loaded if it isn't already, and the second unlocks the LUKS-encrypted drive. Replace /dev/nvme0n1p3 with the path to your LVM partition - e.g. /dev/sda1 for instance. The second command will prompt you for the password to unlock the drive.

It's worth mentioning here before continuing the difference between physical partitions and LVM partitions. Physical partitions are those found in the partition table on the physical disk itself, that you may find in a partition manage like GParted.

LVM partitions - for the purpose of this blog post - are those exposed by LVM. They are virtual partitions that don't have a physical counterpart on disk and are handled internally by LVM. As far as I know, you can't ask LVM easily where it stores them on disk - this is calculated and managed automatically for you.

In order to access our logical LVM partitions, the next step is to bring up LVM. To do this, we need to get LVM to re-scan the available physical partitions since we've just unlocked the one we want it to use:

sudo vgscan --mknodes

Then, we activate it:

sudo vgchange -ay

At this point, we can now do our maintenance and make any changes we need to. A good command to remember here is lvdisplay, which lists all the available LVM partitions and their paths:

sudo lvdisplay

In my case, I have /dev/vgubuntu/root and /dev/vgubuntu/swap_1. tldr-pages (for which I'm a maintainer) has a number of great LVM-related pages that were contributed relatively recently which are really helpful here. For example, to resize a logical LVM partition to be a specific size, do something like this:

sudo lvresize -L 32G /dev/vgubuntu/root

To extend a partition to fill all the remaining available free space, do something like this:

sudo lvextend -l +100%FREE /dev/vgubuntu/root

After resizing a partition, don't forget to run resize2fs. It ensures that the ext4 filesystem on top matches the same size as the logical LVM partition:

sudo resize2fs /dev/vgubuntu/root

In all of the above, replace /dev/vgubuntu/root with the path to your logical LVM partition in question of course.

Once you're done making changes, we need to stop LVM and close the LUKS encrypted disk to ensure all the changes are saved properly and to avoid any issues. This is done like so:

sudo vgchange -an
sudo cryptsetup luksClose crypt1

With that, you're done! You can now reboot / shutdown from inside the live Ubuntu environment and boot back into your main operating system. All done!

Found this helpful? Encountering issues? Comment below! It really helps my motivation.

Running multiple local versions of CUDA on Ubuntu without sudo privileges

I've been playing around with Tensorflow.js for my PhD (see my PhD Update blog post series), and I had some ideas that I wanted to test out on my own that aren't really related to my PhD. In particular, I've found this blog post to be rather inspiring - where the author sets up a character-based recurrent neural network to generate text.

The idea of transcoding all those characters to numerical values and back seems like too much work and too complicated just for a quick personal project though, so my plan is to try and develop a byte-based network instead, in the hopes that I can not only teach it to generate text as in the blog post, but valid Unicode as well.

Obviously, I can't really use the University's resources ethically for this (as it's got nothing to do with my University work) - so since I got a new laptop recently with an Nvidia GeForce RTX 2060, I thought I'd try and use it for some machine learning instead.

The problem here is that Tensorflow.js requires only CUDA 10.0, but since I'm running Ubuntu 20.10 with all the latest patches installed, I have CUDA 11.1. A quick search of the apt repositories on my system reveals nothing that suggests I can install older versions of CUDA alongside the newer one, so I had to devise another plan.

I discovered some months ago (while working with Viper - my University's HPC - for my PhD) that you can actually extract - without sudo privileges - the contents of the CUDA .run installers. By then fiddling with your PATH and LD_LIBRARY_PATH environment variables, you can get any program you run to look for the CUDA libraries elsewhere instead of loading the default system libraries.

Since this is the second time I've done this, I thought I'd document the process for future reference.

First, you need to download the appropriate .run installer for the CUDA libraries. In my case I need CUDA 10.0, so I've downloaded mine from here:

https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_type=runfilelocal

Next, we need to create a new subdirectory and extract the .run file into it. Do that like so:

cd path/to/runfile_directory;
mkdir cuda-10.0
./cuda_10.0.130_410.48_linux.run --extract=${PWD}/cuda-10.0/

Make sure that the current working directory contains no spaces, no preferably no other special characters either. Also, adjust the file and directory names to suit your situation.

Once done, this will have extract 3 subfiles - which also have the suffix .run. We're only interested in CUDA itself, so we only need to extract the the one that starts with cuda-linux. Do that like so (adjusting file/directory names as before):

cd cuda-10.0;
./cuda-linux.10.0.130-24817639.run -noprompt -prefix=$PWD/cuda;
rm *.run;
mv cuda/* .;
rmdir cuda;

If you run ./cuda-linux.10.0.130-24817639.run --help, it's actually somewhat deceptive - since there's a typo in the help text! I corrected it for this above though. Once done, this should leave the current working directory containing the CUDA libraries - that is a subdirectory next to the original .run file:

+ /path/to/some_directory/
    + ./cuda_10.0.130_410.48_linux.run
    + cuda-10.0/
        + version.txt
        + bin/
        + doc/
        + extras/
        + ......

Now, it's just a case of fiddling with some environment variables and launching your program of choice. You can set the appropriate environment variables like this:

export PATH="/absolute/path/to/cuda-10.0/bin:${PATH}";
if [[ ! -z "${LD_LIBRARY_PATH}" ]]; then
    export LD_LIBRARY_PATH="/absolute/path/to/cuda-10.0/lib64:${LD_LIBRARY_PATH}";
else
    export LD_LIBRARY_PATH="/absolute/path/to/cuda-10.0/lib64";
fi

You could save this to a shell script (putting #!/usr/bin/env bash before it as the first line, and then running chmod +x path/to/script.sh), and then execute it in the context of the current shell for example like so:

source path/to/activate-cuda-10.0.sh

Many deep learning applications that use CUDA also use CuDNN, a deep learning library provided by Nvidia for accelerating deep learning applications. The archived versions of CuDNN can be found here: https://developer.nvidia.com/rdp/cudnn-archive

When downloading (you need an Nvidia developer account, but thankfully this is free), pay attention to the version being requested in the error messages generated by your application. Also take care to download the version of CUDA you're using, and match the CuDNN version appropriately.

When you download, select the "cuDNN Library for Linux" option. This will give you a tarball, which contains a single directory cuda. Extract the contents of this directory over the top of your CUDA directory from following my instructions above, and it should work as intended. I used my graphical archive manager for this purpose.

Proteus VIII Laptop from PC Specialist in Review

Recently I bought a new laptop from PC Specialist. Unfortunately I'm lost the original quote / specs that were sent to me, but it was a Proteus VIII. It has the following specs:

  • CPU: Intel i7-10875H
  • RAM: 32 GiB DDR4 2666MHz
  • Disk: 1 TiB SSD (M.2; nvme)
  • GPU: Nvidia GeForce RTX 2060

In this post, I want to give a review now that I've had the device for a short while. I'm still experiencing some teething issues (more on those later), but I've experienced enough of the device to form an opinion on it. This post will also serve as a sort-of review of the installation process of Ubuntu too.

It arrived in good time - thankfully I didn't have any issues with their choice of delivery service (DPD in my area have some problems). I did have to wait a week or 2 for them to build the system, but I wasn't in any rush so this was fine for me. The packaging it arrived it was ok. It came in a rather large cardboard box, inside which there was some plastic padding (sad face), inside which there was another smaller cardboard box. Work to be done in the eco-friendly department, but on the whole good here.

I ordered without an operating system, as my preferred operating system is Ubuntu (the latest version is currently 20.10 Groovy Gorilla). The first order of business was the OS installation here. This went went fine - but only after I could actually get the machine to boot! It turns out that despite it appearing to have support for booting from USB flash drives as advertised in the boot menu, this feature doesn't actually work. I tried the following:

  • The official Ubuntu ISO flashed to a USB 3 flash drive
  • A GRUB installation on a USB 3 flash drive
  • A GRUB installation on a USB 2 flash drive
  • Ubuntu 20.10 burned to a DVD in an external DVD drive (ordered with the laptop)

....and only the last one worked. I've worked with a diverse range of different devices, but never have I encountered one that completely refused to boot at all from a USB drive. Clearly some serious work is required on the BIOS. The number of different settings in the BIOS were also somewhat limited compared to other systems I've poked around on, but I can't give any specific examples here of things that were missing (other than a setting to toggle the virtualisation extensions, which was on by default) - so I guess it doesn't matter all that much. The biggest problem is the lack of USB flash drive boot support - that was really frustrating.

When installing Ubuntu this time around, I decided to try enabling LVM (Logical Volume Management, it's very cool I've discovered) and a LUKS encrypted hard drive. Although I've encountered these technologies before, this will be my first time using them regularly myself. Thankfully, the Ubuntu installer did a great job of setting this up automatically (except the swap partition, which was too small to hibernate, but I'll talk about that in a moment).

Once installed, I got to doing the initial setup. I'm particularly picky here - I use the Unity 7.5 Desktop (yes, I know Ubuntu now uses the GNOME shell, and no I haven't yet been able to get along with it). I'll skip over the details of the setup here, as it's not really relevant to the review. I will mention though that I'm also using X11, not Wayland at the moment - and that I have the propriety Nvidia driver installed (version 450 at the time of typing).

Although I've had a discrete graphics card before (most recently an AMD Radeon R7 M445, and an Nvidia 525M), this is the first time I've had one that's significantly more powerful than the integrated graphics that's built into the CPU. My experience with this so far is mostly positive (it's rather good at rendering in Blender, but I have yet to stress it significantly), and in some graphical tests it gives significantly higher frame rates than the integrated graphics. If you use the propriety graphics drivers, I recommend going into the Nvidia X server settings (accessed through the launcher) → PRIME Profiles, changing it to "On-Demand", and then rebooting. This will prolong your battery life and reduce the noise from the fans by using the integrated graphics by default, but allow you to run select applications on the GPU (see my recent post on how to do this).

It's not without its teething issues though. I think I'm just unlucky, but I have yet to setup a system with an Nvidia graphics card where I haven't had some kind of problem. In this case, it's screen flickering. To alleviate this somewhat, I found and followed the instructions in this Ask Ubuntu Answer. I also found I had to enable the Force synchronization between X and GLX workaround (and maybe another one as well, I can't remember). Even with these enabled, sometimes I still get flickering after it resumes from suspension / stand by.

Speaking of stand by mode, I've found that this laptop does not like hibernation at all. I'm unsure as to whether this is just because I'm using LVM + LUKS, or whether it's an issue with the device more generally, but if I try sudo pm-hibernate from the terminal, the screen flashes a bit, the mouse cursor disappears, and then the fan spins up - with the screen still on and all my windows apparently still open.

I haven't experimented with the quirks / workarounds provided yet, but I guess ties into the early issues with the BIOS, in that there are some clear issues with the BIOS that need to be resolved.

This hibernation issue also ties into the upower subsystem, in that even if you tell it (in both the Unity and GNOME desktop shells) to "do nothing" on low battery, it will forcefully turn the device off - even if you're in the middle of typing a sentence! I think this is because upower doesn't seem to have an option for suspend or "do nothing" in /etc/Upower/UPower.conf or something? I'm still investigating this issue (if you have any suggestions, please do get in touch!).

Despite these problems, the build quality seems good. It's certainly nice having a metal frame, as it feels a lot more solid than my previous laptop. The keyboard feels great too - the feedback from pressing the keys enhances the feeling of a solid frame. The keyboard is backlit too, which makes more a more pleasant experience in dimly lit rooms (though proper lighting is a must in any workspace).

The layout of the keyboard feels a little odd to me. It's a UK keyboard yes (I use a UK keyboard myself), but it doesn't have dedicated Home / End / Page Up / Page Down keys - these are built into the number pad at the right hand side of the keyboard. It's taken some getting used to toggling the number lock every time I want to use these keys, which increases cognitive load.

It does have a dedicated SysRq key though (which my last laptop didn't have), so now I can articles like this one and use the SysRq feature to talk to the Linux Kernel directly in case of a lock-up or crash (I have had the screen freeze on me once or twice - I later discovered this was because it had attempted to hibernate and failed, and I also ran into this problem, which I have yet to find a resolution to), or in case I accidentally set off a program that eats all of the available RAM.

The backlight of the keyboard goes from red at the left-hand side to green in the middle, and blue at the right-hand side. According to the PC Specialist forums, there's a driver that you can install to control this, but the installation seems messy - and would probably need recompiling every time you install a new kernel since DKMS (Dynamic Kernel Module System, I think) isn't used. I'm ok with the default for now, so I haven't bothered with this.

The touchpad does feel ok. It supports precision scrolling, has a nice feel to it, and isn't too small, so I can't complain about it.

The laptop doesn't have an inbuilt optical drive, which is another first for me. I don't use optical disks often, but it was nice having a built-in drive for this in previous laptops. An external one just feels clunky - but I guess I can't complain too much because of the extra components and power that are built-in to the system.

The airflow of the system - as far as I can tell so far, is very good. Air comes in through the bottom, and is then pushed out again through the back and the back of the sides by 2 different fans. These fans are, however, rather noisy at times - and have taken some getting used to as my previous Dell laptop's fans were near silent until I started to stress the system. The noise they make is also slightly higher pitched too, which makes it more noticeable - and sound like a jet engine (though I admit I've never heard a real one in person, and I'm also somewhat hypersensitive to sound) when at full blast. Curiously, there's a dedicated key on the keyboard that - as far as I can tell - toggles between the normal on-demand fan mode and locking the fans at full blast. Great to quickly cool down the system if the fans haven't kicked in yet, but not so great for your ears!

I haven't tested the speakers much, but from what I can tell they are appropriately placed in front of the keyboard just before the hinge for the screen - which is a much better placement than on the underside at the front in my last laptop! Definitely a positive improvement there.

I wasn't sure based on the details on the PC specialist website, but the thickness of the base is 17.5mm at the thickest point, and 6mm for the screen - making ~23.5mm in total (although my measurements may not be completely accurate).

To summarise, the hardware I received was great - overlooking a few pain points such as the BIOS and poor keyboard layout decisions. Some work is still needed on environmental issues and sustainability, but packaging was on the whole ok. Watch out for the delivery service, as my laptop was delivered by DPD who don't have a great track record in my area.

Overall, the hardware build quality is excellent. I'm not sure if I can recommend them yet, but if you want a new PC or laptop they are certainly not a bad place to look.

Found this helpful? Got a suggestion? Want to say hi? Comment below!

Run a Program on your dedicated Nvidia graphics card on Linux

I've got a new laptop, and in it I have an Nvidia graphics card. The initial operating system installation went ok (Ubuntu 20.10 Groovy Gorilla is my Linux distribution of choice here), but after I'd done a bunch of configuration and initial setup tasks it inevitably came around to the point in time that I wanted to run an application on my dedicated Nvidia graphics card.

Doing so is actually really quite easy - it's figuring out the how that was the hard bit! To that end, this is a quick post to document how to do this so I don't forget next time.

Before you continue, I'll assume here that you're using the Nvidia propriety drivers. If not, consult your distribution's documentation on how to install and enable this.

With Nvidia cards, it depends greatly on what you want to run as to how you go about doing this. For CUDA applications, you need to install the nvidia-cuda-toolkit package:

sudo apt install nvidia-cuda-toolkit

...and then there will be an application-specific menu you'll need to navigate to tell it which device to use.

For traditional graphical programs, the process is different.

In the NVIDIA X Server Settings, go to PRIME Profiles, and then ensure NVIDIA On-Demand mode is selected. If not, select it and then reboot.

The NVIDIA X Server Settings dialog showing the correct configuration as described above.

Then, to launch a program on your Nvidia dedicated graphics card, do this:

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia command_name arguments

In short, the __NV_PRIME_RENDER_OFFLOAD environment variable must be set to 1 , and the __GLX_VENDOR_LIBRARY_NAME environment variable must be set to nvidia. If you forget the latter here or try using DRI_PRIME=1 instead of both of these (as I advise in my previous post about AMD dedicated graphics cards), you'll get an error like this:

libGL error: failed to create dri screen
libGL error: failed to load driver: nouveau

The solution is to set the above environment variables instead of DRI_PRIME=1. Thanks to this Ask Ubuntu post for the answer!

Given that I know I'll be wanting to use this regularly, I've created a script in my bin folder for it. It looks like this:

#!/usr/bin/env bash
export __NV_PRIME_RENDER_OFFLOAD=1;
export __GLX_VENDOR_LIBRARY_NAME=nvidia;

$@;

Save the above to somewhere in your PATH (e.g. your bin folder, if you have one). Don't forget to run chmod +x path/to/file to mark it executable.

Found this useful? Does this work on other systems and setups too? Comment below!

Stitching videos from frames with ffmpeg (and audio/video editing tricks)

ffmpeg is awesome. In case you haven't heard, ffmpeg is a command-line tool for editing and converting audio and video files. It's taken me a while to warm up to it (the command-line syntax is pretty complicated), but since I've found myself returning to it again and again I thought I'd blog about it so you can use it too (and also for my own reference :P).

I have 2 common tasks I perform with ffmpeg:

  1. Converting audio/video files to a more efficient format
  2. Stitching PNG images into a video file

I've found that most tasks will fall into 1 of the 2 boxes. Sometimes I want to do something more complicated, but I'll usually just look that up and combine the flags I find there with the ones I usually use for one of the 2 above tasks.

horizontal rule made up of an orange rotating maze cube

Stitching PNG images into a video

I don't often render an animation, but when I do I always render to individual images - that way if it crashes half way through I can more easily restart where I left off (and also divide the workload more easily across multiple machines if I need it done quickly).

Individual image files are all very well, but they have 2 problems:

  1. They take up lots of space
  2. You can't easily watch them like a video

Solving #1 is relatively easy with optipng (sudo apt install optipng), for example:

find . -iname "*.png" -print0 | xargs -0 -P4 -n1 optipng -preserve

I cooked that one up a while ago and I've had it saved in my favourites for ages. It finds all png images in the current directory (recursively) and optimises them with optipng.

To resolve #2, we can use ffmpeg as I mentioned earlier in this post. Here's one I use:

ffmpeg -r 24 -i path/to/frame_%04d.png -c:v libvpx-vp9 -b:v 2000k -crf 31 path/to/output.webm

A few points on this one:

  • -r 24: This sets the frames per second.
  • -i path/to/frame_%04d.png: The path to the png images to stitch. %04d refers to 4 successive digits in a row that count up from 1 - e.g. 0001, 0002, etc. %d would mean 1, 2, ...., and %02d for example would mean 01, 02, 03....
  • -c:v libvpx-vp9: The codec. We're exporting to webm, and vp9 is the latest available codec for webm video files.
  • -b:v 2000k: The bitrate. Higher values mean more bits will be used per second of video to encode detail. Raise to increase quality.
  • -crf 31: The encoding quality. Should be anumber betwee 0 and 63 - higher means lower quality and a lower filesize.

Read more about encoding VP9 webm videos here: Encode/VP9- ffmpeg

Converting audio / video files

I've talked about downmuxing audio before, so in this post I'm going to be focusing on video files instead. The above command for encoding to webm can be used with minimal adjustment to transcode videos from 1 format to another:

ffmpeg -hide_banner -i "path/to/input.avi" -c:v libvpx-vp9 -c:a libopus -crf 31 -b:v 0 "path/to/output/webm";

In this one, we handle the audio as well as the video all in 1 go, encoding the audio to use the opus codec, and the video as before. This is particularly useful if you've got a bunch of old video files generated by an old camera. I've found that often old cameras like to save videos as raw uncompressed AVI files, and that I can reclaim a sigificant amount of disk space if I transcode them to something more efficient.

Naturally, I cooked up a one-liner that finds all relevant video files recursively in a given directory and transcodes them to 1 standard efficient format:

find . -iname "*.AVI" -print0 | nice -n20 xargs --verbose -0 -n1 -I{} sh -c 'old="{}"; new="${old%.*}.webm"; ffmpeg -hide_banner -i "${old}" -c:v libvpx-vp9 -c:a libopus -crf 30 -b:v 0 "${new}" && rm "${old}"';

Let's break this down. First, let's look at the commands in play:

  • find: Locates the target files to transcode
  • nice: Runs the following command in the background
  • xargs: Runs the following command over and over again for every line of input
  • sh: A shell, to execute multiple command in sequence
  • ffmpeg: Does the transcoding
  • rm: Deletes the old file if transcoding completes successfully

The old="{}"; new="${old%.*}.webm"; bit is a bit of gymnastics to pull in the path to the target file (the -I {} tells xargs where to substitute it in) and determine the new filepath by replacing the file extension with .webm.

I've found that this does the job quite nicely, and I can set it running and go off to do something else while it happily sits there and transcodes all my old videos, reclaiming lots of disk space in the process.

Found this useful? Got a cool command for processing audio/video/image files in bulk? Comment below!

Art by Mythdael