Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Own Your Code Series List

Hey there! It's time for another series list. This time it's for my Own Your Code series, where I take a look into Gitea and Laminar CI.

Following this series, I plan to also post about my apt repository, which is hosting a growing list of software - including the tiled map editor (support them with a donation if you can), gossa (a minimalist file browser interface), and webhook - if you find any issues, you can always get in touch.

Anyway, here's the full list of posts in the series in the Own Your Code series:

In the unlikely event I post another entry in this series, I'll come back and update this list. Most likely though I'll be posting related things standalone, rather than part of this series - so subscribe for updates with your favourite method if you'd like to stay up-to-date with my latest blog posts (Atom/RSS, Email, Twitter, Reddit, and Facebook are all supported - just ask if there's something missing).

Multi-boot + data + multi-partition = octopus flash drive 2.0?

A while ago, I posted about a multi-boot flash drive. That approach has served me well, but I got a new flash drive a while ago - and for some reason I could never get it to be bootable in the same way.

After a frustrating experience trying to image a yet another machine and not being able to find a free flash drive, I decided that enough was enough and that I'd do something about it. My requirements are as follows:

  1. It has to be bootable via legacy BIOS
  2. It has to be bootable via (U)EFI
  3. I don't want multiple configuration files for each booting method
  4. I want to be able to store other files on it too
  5. I want it to be recognised by all major operating systems
  6. I want to be able to fiddle with the grub configuration without manually mounting a partition

Quite the list! I can confirm that this is all technically achievable - it just takes a bit of work to do so. In this post, I'll outline how you can do it too - with reasoning at each step as to why it's necessary.

Start by finding a completely free flash drive. Note that you'll lose all the data that's currently stored on it, because we need to re-partition it.

I used the excellent GParted for this purpose, which is included in the Ubuntu live CD for those without a supported operating system.

Start by creating a brand-new gpt partition table. We're using GPT here because I believe it's required for (U)EFI booting. I haven't run into a machine that doesn't understand it yet, but there's always a hybrid partition that you can look into if you have issues.

Once done, create a FAT32 partition that fills all but the last 128MiB or so of the disk. Let's call this one DATA.

Next, create another partition that fills the remaining ~128MiB of the disk. Let's call this one EFI.

Write these to disk. Once done, right click on each partition in turn and click "manage flags". Set them as such:

Partition Filesystem Flags
DATA FAT32 msftdata
EFI FAT32 esp, boot

This is important, because only partitions marked with the boot flag can be booted from via EFI. Partitions marked boot also have to be marked esp apparently, which is mutually exclusive with the msftdata flag. The other problem is that only partitions marked with msftdata will be auto-detected by operating systems in a GPT partition table.

It is for this reason that we need to have a separate partition marked as esp and boot - otherwise operating systems wouldn't detect and automount our flash drive.

Once you've finished setting the flags, close GParted and mount the partitions. Windows users may have to use a Linux virtual machine and pass the flash drive in via USB passthrough.

Next, we'll need to copy a pair of binary files to the EFI partition to allow it to boot via EFI. These can be found in this zip archive, which is part of this tutorial that I linked to in my previous post I linked to above. Extract the EFI directory from the zip archive to the EFI partition we created, and leave the rest.

Next, we need to install grub to the EFI partition. We need to do this twice:

  • Once for (U)EFI booting
  • Once for legacy bios booting

Before you continue, make sure that your host machine is not Ubuntu 19.10. This is really important - as there's a bug in the grub 2.04 version used in Ubuntu 19.10 that basically renders the loopback command (used for booting ISOs) useless when booting via UEFI! Try Ubuntu 18.04 - hopefully it'll get fixed soon.

This can be done like so:

# Install for UEFI boot:
sudo grub-install --target x86_64-efi --force --removable --boot-directory=/media/sbrl/EFI --efi-directory=/media/sbrl/EFI /dev/sdb
# Install for legacy BIOS boot:
sudo grub-install --target=i386-pc --force --removable --boot-directory=/media/sbrl/EFI /dev/sdb --removable

It might complain a bit, but you should be able to (mostly) ignore it.

This is actually ok - as this Unix Stack Exchange post explains - as the two installations don't actually clash with each other and just happen to load and use the same configuration file in the end.

If you have trouble, make sure that you've got the right packages installed with your package manager (apt on Linux-based systems). Most systems will be missing 1 of the following, as it seems that the installer will only install the one that's required for your system:

  • For BIOS booting, grub-pc-bin needs to be installed via apt.
  • For UEFI booting grub-efi-amd64-bin needs to be installed via apt.

Note that installing these packages won't mess with the booting of your host machine you're working on - it's the grub-pc and grub-efi-amd64 packages that do that.

Next, we can configure grub. This is a 2-step process, as we don't want the main grub configuration file on the EFI partition because of requirement #6 above.

Thankfully, we can achieve this by getting grub to dynamically load a second configuration file, in which we will store our actual configuration.

Create the file grub/grub.cfg on the EFI partition, and paste this inside:

# Load the configfile on the main partition
configfile (hd0,gpt1)/images/grub.cfg

In grub, partitioned block devices are called hdX, where X is a number indexed from 0. Partitions on a block device are specified by a comma, followed by the partition type and the number of the partition (which starts from 1, oddly enough). The block device grub booted from is always device 0.

In the above, we specify that we want to dynamically load the configuration file that's located on the first partition (the DATA partition) of the disk that it booted from. I did it this way around, because I suspect that Windows still has that age-old bug where it will only look at the first partition of a flash drive - which would be marked as esp + boot and thus hidden if we had them the other way around. I haven't tested this though, so I could be wrong.

Now, we can create that other grub configuration file on the DATA partition. I'm storing all my ISOs and the grub configuration file in question in a folder called images (specifically my main grub configuration file is located at /images/grub.cfg on the DATA partition), but you can put it wherever you like - just remember to edit above the grub configuration file on the EFI partition - otherwise grub will get confused and complain it can't find the configuration file on the DATA partition.

For example, here's a (cut-down) portion of my grub configuration file:

# Just a header message - selecting this basically has no effect
menuentry "*** Bootable Images ***" { true }

submenu "Ubuntu" {
    set isofile="/images/ubuntu-18.04.3-desktop-amd64.iso"
    set isoversion="18.04 Bionic Beaver"
    #echo "ISO file: ${isofile}, version: ${isoversion}";

    loopback loop $isofile

    menuentry "[x64] Ubuntu Desktop ${isoversion}" {
        linux (loop)/casper/vmlinuz boot=casper setkmap=uk eject noprompt splash  iso-scan/filename=${isofile} --
        initrd (loop)/casper/initrd
    }
    menuentry "[x64] [ejectable] Ubuntu Desktop ${isoversion}" {
        linux (loop)/casper/vmlinuz boot=casper iso-scan/filename=$isofile setkmap=uk eject noprompt splash toram iso-scan/filename=${isofile} --
        initrd (loop)/casper/initrd
    }
    menuentry "[x64] [install] Ubuntu Desktop ${isoversion}" {
        linux (loop)/capser/vmlinuz  file=/cdrom/preseed/ubuntu.seed only-ubiquity quiet iso-scan/filename=${isofile} --
        initrd (loop)/install/initrd
    }
}


# Artix Linux
menuentry "Artix Linux" {
    set isofile="/images/artix-lxqt-openrc-20181008-x86_64.iso"

    probe -u $root --set=rootuuid
    set imgdevpath="/dev/disk/by-uuid/$rootuuid"

    loopback loop $isofile
    probe -l loop --set=isolabel

    linux (loop)/arch/boot/x86_64/vmlinuz archisodevice=/dev/loop0 img_dev=$imgdevpath img_loop=$isofile archisolabel=$isolabel earlymodules=loop
    initrd (loop)/arch/boot/x86_64/archiso.img
}

menuentry "Fedora Workstation 31" {
    set isofile="/images/Fedora-Workstation-Live-x86_64-31-1.9.iso"

    echo "Setting up loopback"
    loopback loop "${isofile}" 
    probe -l loop --set=isolabel
    echo "ISO Label is ${isolabel}"

    echo "Booting...."
    linux (loop)/isolinux/vmlinuz iso-scan/filename="${isofile}" root=live:CDLABEL=$isolabel  rd.live.image
    initrd (loop)/isolinux/initrd.img
}

menuentry "Offline Password Changer [01/02/2014]" {
    loopback loop /images/offline_password_changer.iso
    linux (loop)/VMLINUZ setkmap=uk isoloop=$isofile
    # initrd (loop)/initrd.cgz
    initrd (loop)/initrd
}

menuentry "Memtest 86+ 5.01" {
    linux16 /images/memtest86+.bin
}

submenu "Boot from Hard Drive" {
    menuentry "Hard Drive 0" {
        set root=(hd0)
        chainloader +1
    }
    menuentry "Hard Drive 1" {
        set root=(hd1)
        chainloader +1
    }
    menuentry "Hard Drive 2" {
        set root=(hd2)
        chainloader +1
    }
    menuentry "Hard Drive 3" {
        set root=(hd3)
        chainloader +1
    }
}

If you're really interested in building on your grub configuration file, I'll include some useful links at the bottom of this post. Specifically, having an understanding of the Linux boot process can be helpful for figuring out how to boot a specific Linux ISO if you can't find any instructions on how to do so. These steps might help if you are having issues figuring out the right parameters to boot a specific ISO:

  • Use your favourite search engine and search for Boot DISTRO_NAME_HERE iso with grub or something similar
  • Try the links at the bottom of this post to see if they have the parameters you need
  • Try looking for a configuration for a more recent version of the distribution
  • Try using the configuration from a similar distribution (e.g. Artix is similar to Manjaro - it's the successor to Manjaro OpenRC, which is derived from Arch Linux)
  • Open the ISO up and look for the grub configuration file for a clue
  • Try booting it with memdisk
  • Ask on the distribution's forums

Memdisk is a tool that copies a given ISO into RAM, and then chainloads it (as far as I'm aware). It can actually be used with grub (despite the fact that you might read that it's only compatible with syslinux):

menuentry "Title" {
    linux16 /images/memdisk iso
    initrd16 /path/to/linux.iso
}

Sometimes it can help with particularly stubborn ISOs. If you're struggling to find a copy of it out on the web, here's the version I use - though I don't remember where I got it from (if you know, post a comment below and I'll give you attribution).

That concludes this (quite lengthly!) tutorial on creating the, in my opinion, ultimate multi-boot everything flash drive. My future efforts with respect to my flash drive will be directed in the following areas:

  • Building a complete portable environment for running practically all the software I need when out and about
  • Finding useful ISOs to include on my flash drive
  • Anything else that increases the usefulness of flash drive that I haven't thought of yet

If you've got any cool suggestions (or questions about the process) - comment below!

Sources and Further Reading

Own your code, part 6: The Lantern Build Engine

It's time again for another installment in the own your code series! In the last post, we looked at the git post-receive hook that calls the main git-repo Laminar CI task, which is the core of our Continuous Integration system (which we discussed in the post before that). You can see all the posts in the series so far here.

In this post we're going to travel in the other direction, and look at the build script / task automation engine that I've developed that goes hand-in-hand with the Laminar CI system - though it can and does stand on it's own too.

Introducing the Lantern Build Engine! Finally, after far too long I'm going to formally post here about it.

Originally developed out of a need to automate the boring and repetitive parts of building and packing my assessed coursework (ACWs) at University, the lantern build engine is my personal task automation system. It's written in 100% Bash, and allows tasks to be easily defined like so:

task_dostuff() {
    task_begin "Doing a thing";
    do_work;
    task_end "$?" "Oops, do_work failed!";

    task_begin "Doing another thing";
    do_hard_work;
    task_end "$?" "Yikes! do_hard_work failed.";
}

When the above task is run, Lantern will automatically detect the dustuff task, since it's a bash function that's prefixed with task_. The task_begin and task_end calls there are 2 other bash functions, which generate pretty output to inform the user that a task is starting or ending. The $? there grabs the exit code from the last command - and if it fails task_end will automatically display the provided error message.

Tasks are defined in a build.sh file, for which Lantern provides a template. Currently, the template file contains some additional logic such as the help text output if no tasks were specified - which is left-over from the time when Lantern was small enough to fit in the same file as the build tasks themselves.

I'm in the process of adding support for the all the logic in the template file, so that I can cut down on the extra boilerplate there even further. After defining your tasks in a copy of the template build file, it's really easy to call them:

./build dostuff

Of course, don't forget to mark the copy of the template file executable with chmod +x ./build.

The above initial example only scratches the surface of what Lantern can do though. It can easily check to see if a given command is installed with check_command:

task_go-to-the-moon() {
    task_begin "Checking requirements";
    check_command git true;
    check_command node true;
    check_command npm true;
    task_end 0;
}

If any of the check_command calls fail, then an error message is printed and the build terminated.

Work that needs doing in Lantern can be expressed with 3 levels of logical separation: stages, tasks, and subtasks:

task_build-rocket() {
    stage_begin "Preparation";

    task_begin "Gathering resources";
    gather_resources;
    task_end "$?" "Failed to gather resources";

    task_begin "Hiring engineers";
    hire_engineers;
    task_end "$?" "Failed to hire engineers";

    stage_end "$?";

    stage_begin "Building Rocket";
    build_rocket --size big --boosters 99;
    stage_end "$?";

    stage_begin "Launching rocket";
    task_begin "Preflight checks";
    subtask_begin "Checking fuel";
    check_fuel --level full;
    subtask_end "$?" "Error: The fuel tank isn't full!";
    subtask_begin "Loading snacks";
    load_items --type snacks --from warehouse;
    subtask_end "$?" "Error: Failed to load snacks!";
    task_end "$?";

    task_begin "Launching!";
    launch --countdown 10;
    task_end "$?";

    stage_end "$?";
}

Come to think about it, I should probably rename the function prefix from task to job. Stages, tasks, and subtasks each look different in the output - so it's down to personal preference as to which one you use and where. Subtasks in particular are best for commands that don't return any output.

Popular services such as [Travis CI]() have a thing where in the build transcript they display the versions of related programs to the build, like this:

$ uname -a
Linux MachineName 5.3.0-19-generic #20-Ubuntu SMP Fri Oct 18 09:04:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ node --version
v13.0.1
$ npm --version
6.12.1

Lantern provides support for this with the execute command. Prefixing commands with execute will cause them to be printed before being executed, just like the above:

task_prepare() {
    task_begin "Displaying environment details";
    execute uname -a;
    execute node --version;
    execute npm --version;
    task_end "$?";
}

As build tasks get more complicated, it's logical to split them up into multiple tasks that can be called independently and conditionally. Lantern makes this easy too:

task_build() {
    task_begin "Building";
    # Do build stuff here
    task_end "$?";
}
task_deploy() {
    task_begin "Deploying";
    # Do deploy stuff here
    task_end "$?";
}

task_all() {
    tasks_run build deploy;
}

The all task in the above runs both the build and deploy tasks. In fact, the template build script uses tasks_run at the very bottom to treat every argument passed to it as a task name, leading to the behaviour described above.

Lantern also provides an array of other useful functions to make expressing build sequences easy, concise, and readable - from custom colours to testing environment variables to see if they exist. It's all fully documented in the README of the project too.

As described 2 posts ago, the git-repo Laminar CI task (once it's spawned a hologram of itself) currently checks for the existence of a build or build.sh executable script in the root of the repository it is running on, and passes ci as the first and only argument.

This provides easy integration with Lantern, since Lantern build scripts can be called anything we like, and with a tasks_run call at the bottom as in the template file, we can simply define a ci Lantern task function that runs all our continuous integration jobs that we need to execute.

If you're interested in trying out Lantern for yourself, check out the repository!

https://gitlab.com/sbrl/lantern-build-engine#lantern-build-engine

Personally, I use it for everything from CI to rapid development environment setup.

This concludes my (epic) series about my git hosting and continuous integration. We've looked at git hosting, and taken a deep dive into integrating it into a continuous integration system, which we've augmented with a bunch of scripts of our own design. The system we've ended up with, while a lot of work to setup, is extremely flexible, allowing for modifications at will (for example, I have a webhook script that's similar to the git post-receive hook, but is designed to receive notifications from GitHub instead of Gitea and queue the git-repo just the same).

I'll post a series list post soon. After that, I might blog about my personal apt repository that I've setup, which is somewhat related to this.

Own your code, part 5: git post-receive hook

In the last post, I took a deep dive into the master git-repo job that powers the my entire build system. In the next few posts, I'm going to take a look at the bits around the edges that interact with this laminar job - starting with the git post-receive hook in this post.

When you push commits to a git repository, the remote server does a bunch of work to integrate your changes into the remote master copy of the repository. At various points in the process, git allows you to run scripts to augment your repository, and potentially alter the way git ultimately processes the push. You can send content back to the pushing user too - which is how you get those messages on the command-line occasionally when you push to a GitHub repository.

In our case, we want to queue a new Laminar CI job when new commits are pushed to a private Gitea server, for instance (like mine). Doing this isn't particularly difficult, but we do need to collect a bunch of information about the environment we're running in so that we can correctly inform the git-repo task where it needs to pull the repository from, who pushed the commits, and which commits need testing.

In addition, we want to write 1 universal git post-receive hook script that will work everywhere - regardless of the server the repository is hosted on. Of course, on GitHub you can't run a script directly, but if I ever come into contact with another supporting git server, I want to minimise the amount of extra work I've got to do to hook it up.

Let's jump into the script:

#!/usr/bin/env bash
if [ "${GIT_HOST}" == "" ]; then
    GIT_HOST="git.starbeamrainbowlabs.com";
fi

Fairly standard stuff. Here we set a shebang and specify the GIT_HOST variable if it's not set already. This is mainly just a placeholder for the future, as explained above.

Next, we determine the git repository's url, because I'm not sure that Gitea (my git server, for which this script is intended) actually tells you directly in a git post-receive hook. The post-receive hook script does actually support HTTPS, but this support isn't currently used and I'm unsure how the git-repo Laminar CI job would handle a HTTPS url:

# The url of the repository in question. SSH is recommended, as then you can use a deploy key.
# SSH:
GIT_REPO_URL="git@${GIT_HOST}:${GITEA_REPO_USER_NAME}/${GITEA_REPO_NAME}.git";
# HTTPS:
# git_repo_url="https://git.starbeamrainbowlabs.com/${GITEA_REPO_USER_NAME}/${GITEA_REPO_NAME}.git";

With the repository url determined, next on the list is the identity of the pusher. At this stage it's a simple matter of grabbing the value of 1 variable and putting it in another as we're only supporting Gitea at the moment, but in the future we may have some logic here to intelligently determine this value.

GIT_AUTHOR="${GITEA_PUSHER_NAME}";

With the basics taken care of, we can start getting to the more interesting bits. Before we do that though, we should define a few common settings:

###### Internal Settings ######

version="0.2";

# The job name to queue.
job_name="git-repo";

###############################

job_name refers to the name of the Laminar CI job that we should queue to process new commits. version is a value that we can increment should we iterate on this script in the future, so that we can then tell which repositories have the new version of the post-receive hook and which ones don't.

Next, we need to calculate the virtual name of the repository. This is used by the git-repo job to generate a 'hologram' copy of itself that acts differently, as explained in the previous post. This is done through a series of Bash transformations on the repository URL:

# 1. Make lowercase
repo_name_auto="${GIT_REPO_URL,,}";
# 2. Trim git@ & .git from url
repo_name_auto="${repo_name_auto/git@}";
repo_name_auto="${repo_name_auto/.git}";
# 3. Replace unknown characters to make it 'safe'
repo_name_auto="$(echo -n "${repo_name_auto}" | tr -c '[:lower:]' '-')";

The result is quite like 'slugification'. For example, this URL:

git@git.starbeamrainbowlabs.com:sbrl/Linux-101.git

...will get turned into this:

git-starbeamrainbowlabs-com-sbrl-linux----

I actually forgot to allow digits in step #3, but it's a bit awkward to change it at this point :P Maybe at some later time when I'm feeling bored I'll update it and fiddle with Laminar's data structures on disk to move all the affected repositories over to the new naming scheme.

Now that we've got everything in place, we can start to process the commits that the user has pushed. The documentation on how this is done in a post-receive hook is a bit sparse, so it took some experimenting before I had it right. Turns out that the information we need is provided on the standard input, so a while-read loop is needed to process it:

while read next_line
do
    # .....
done

For each line on the standard input, 3 variables are provided:

  • The old commit reference (i.e. the commit before the one that was pushed)
  • The new commit reference (i.e. the one that was pushed)
  • The name of the reference (usually the branch that the commit being pushed is on)

Commits on multiple branches can be pushed at once, so the name of the branch each commit is being pushed to is kind of important.

Anyway, I pull these into variables like so:

oldref="$(echo "${next_line}" | cut -d' ' -f1)";
newref="$(echo "${next_line}" | cut -d' ' -f2)";
refname="$(echo "${next_line}" | cut -d' ' -f3)";

I think there's some clever Bash trick I've used elsewhere that allows you to pull them all in at once in a single line, but I believe I implemented this before I discovered that trick.

With that all in place, we can now (finally) queue the Laminar CI job. This is quite a monster, as it needs to pass a considerable number of variables to the git-repo job itself:

LAMINAR_HOST="127.0.0.1:3100" LAMINAR_REASON="Push from ${GIT_AUTHOR} to ${GIT_REPO_URL}" laminarc queue "${job_name}" GIT_AUTHOR="${GIT_AUTHOR}" GIT_REPO_URL="${GIT_REPO_URL}" GIT_COMMIT_REF="${newref}" GIT_REF_NAME="${refname}" GIT_AUTHOR="${GIT_AUTHOR}" GIT_REPO_NAME="${repo_name_auto}";

Laminar CI's management socket listens on the abstract unix socket laminar (IIRC). Since you can't yet forward abstract sockets over SSH with OpenSSH, I instead opt to use a TCP socket instead. To this end, the LAMINAR_HOST prefix there is needed to tell laminarc where to find the management socket that it can use to talk to the Laminar daemon, laminard - since Gitea and Laminar CI run on different servers.

The LAMINAR_REASON there is the message that is displayed in the Laminar CI web interface. Said interface is read-only (by design), but very useful for inspecting what's going on. Messages like this add context as to why a given job was triggered.

Lastly, we should send a message to the pushing user, to let them know that a job has been queued. This can be done with a simple echo, as the standard output is sent back to the client:

echo "[Laminar git hook ${version}] Queued Laminar CI build ("${job_name}" -> ${repo_name_auto}).";

Note that we display the version number of the post-receive hook here. This is how I tell whether I need to give into the Gitea settings to update the hook or not.

With that, the post-receive hook script is complete. It takes a bunch of information lying around, transforms it into a common universal format, and then passes the information on to my continuous integration system - which is then responsible for building the code itself.

Here's the completed script:

#!/usr/bin/env bash

##############################
########## Settings ##########
##############################

# Useful environment variables (gitea):
#   GITEA_REPO_NAME         Repository name
#   GITEA_REPO_USER_NAME    Repo owner username
#   GITEA_PUSHER_NAME       The username that pushed the commits

#   GIT_HOST                Domain name the repo is hosted on. Default: git.starbeamrainbowlabs.com

if [ "${GIT_HOST}" == "" ]; then
    GIT_HOST="git.starbeamrainbowlabs.com";
fi

# The url of the repository in question. SSH is recommended, as then you can use a deploy key.
# SSH:
GIT_REPO_URL="git@${GIT_HOST}:${GITEA_REPO_USER_NAME}/${GITEA_REPO_NAME}.git";
# HTTPS:
# git_repo_url="https://git.starbeamrainbowlabs.com/${GITEA_REPO_USER_NAME}/${GITEA_REPO_NAME}.git";

# The user that pushed the commits
GIT_AUTHOR="${GITEA_PUSHER_NAME}";

##############################

###### Internal Settings ######

version="0.2";

# The job name to queue.
job_name="git-repo";

###############################

# 1. Make lowercase
repo_name_auto="${GIT_REPO_URL,,}";
# 2. Trim git@ & .git from url
repo_name_auto="${repo_name_auto/git@}";
repo_name_auto="${repo_name_auto/.git}";
# 3. Replace unknown characters to make it 'safe'
repo_name_auto="$(echo -n "${repo_name_auto}" | tr -c '[:lower:]' '-')";

while read next_line
do
    oldref="$(echo "${next_line}" | cut -d' ' -f1)";
    newref="$(echo "${next_line}" | cut -d' ' -f2)";
    refname="$(echo "${next_line}" | cut -d' ' -f3)";
    # echo "********";
    # echo "oldref: ${oldref}";
    # echo "newref: ${newref}";
    # echo "refname: ${refname}";
    # echo "********";

    LAMINAR_HOST="127.0.0.1:3100" LAMINAR_REASON="Push from ${GIT_AUTHOR} to ${GIT_REPO_URL}" laminarc queue "${job_name}" GIT_AUTHOR="${GIT_AUTHOR}" GIT_REPO_URL="${GIT_REPO_URL}" GIT_COMMIT_REF="${newref}" GIT_REF_NAME="${refname}" GIT_AUTHOR="${GIT_AUTHOR}" GIT_REPO_NAME="${repo_name_auto}";
    # GIT_REF_NAME and GIT_AUTHOR are used for the LAMINAR_REASON when the git-repo task recursively calls itself
    # GIT_REPO_NAME is used to auto-name hologram copies of the git-repo.run task when recursing
    echo "[Laminar git hook ${version}] Queued Laminar CI build ("${job_name}" -> ${repo_name_auto}).";
done

#cat -;
# YAY what we're after is on the first line of stdin! :D
# The format appears to be documented here: https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks#_server_side_hooks
# Line format:
# oldref newref refname
# There may be multiple lines that all need handling.

In the next post, I want to finally introduce my very own home-brew build engine: lantern. I've used it in over half a dozen different projects by now, so it's high time I talked about it a bit more formally.

Found this interesting? Spotted a mistake? Got a suggestion to improve it? Comment below!

Saving space on Linux

While Linux is a whole lot lighter than Windows, there does come a point at which one has to look at reducing the amount of stuff that's on one's hard drive.

Thankfully, there are a number of possible things that we can do on Linux to find and delete large, bulky, and extraneous files, and I thought I'd post about them here.

Firstly, there's the Disk Usage Analyser, or baobab. It's a graphical interface that shows you what your hard drive looks like:

The disk usage analyser, showing my root partition.

Personally, I really appreciate the diagram on the right-hand side - it's a wonderfully visual way of displaying hard disk usage. By right clicking on a directory, you can send it to the recycle bin (don't forget to empty the recycle bin later! Recycle bins on Linux are per-user, so you'll need to make sure you empty them all - you can do root's by doing sudo nautilus and navigating to the recycle bin).

By default the program starts under your current user, so to do it for any directory you'll need to use sudo:

sudo baobab

If you don't have a GUI (e.g. if you're trying to clear out a server's hard disk), there's always ncdu. This, unlike the Disk Usage Analyser, isn't installed by default (at least on Ubuntu Server), so you'll need to install it:

sudo apt install ncdu

Then, you can get it to scan a partition:

sudo ncdu -x /

In the above, I'm scanning / (the root partition) with sudo - as not all the files are under my ownership. The -x ensures that ncdu doesn't cross partition boundaries and end up scanning something silly like /proc.

ncdu, showing my server's root partition.

By using these tools, not only was I able to clear out a bunch of files my systems don't need, but I also discovered that /var/log/journald was taking up 4GiB (!) on my laptop's disk. 4 GiB! On systems that use systemd, journald is used to store and manage some log files. It's strange, weird, and I'm not sure I like the opaque storage format, but there you go.

Unlike syslog and logrotate though, it doesn't appear to have a limit set on when it should delete logs. This has to be done manually:

# Show journald disk space usage beforehand
journalctl --disk-usage
sudo nano /etc/systemd/journald.conf
# Add "SystemMaxUse=500M" to the bottom
sudo systemctl kill --kill-who=main --signal=SIGUSR2 systemd-journald.service
sudo systemctl restart systemd-journald.service
# Show journald disk space usage afterwards
journalctl --disk-usage

(Source: this Unix StackExchange answer)

Found this helpful? Got another great tip to save space on disk? Comment below!

Own your code, part 4: Laminar CI

In the last post, I talked at a high level about the infrastructure behind my continuous integration and deployment system. In this post, I'm going to dive into the details of the Laminar CI job is the engine that drives the whole system.

Laminar CI is based on a concept of jobs. The docs explain it quite well, but in short each job is a file in the jobs folder with the file extension run and a shebang. In my case, I'm using Bash - and I'll continue to do so at regular intervals throughout this series.

Unlike most other setups, the Laminar CI job that we'll be writing here won't actually do any of the actual CI tasks itself - it will simply act as a proxy script to setup & manage the execution of the actual build system - which, in this case, will be the lantern build engine, an engine I wrote to aid me with automating repetitive tasks when working on my University ACWs (Assessed CourseWork).

Every job has it's own workspace, which acts as a common area to store and cache various files across all the runs of that job. Each run of a job also has it's very own private area too - which will be useful later on.

The first step in this proxy script is to extract the parameters of the run that we're supposed to be doing. For me, I store this in a number of environment variables, which are set when queuing the job run from the git post-receive (or web) hook:

Variable Example Description
GIT_REPO_NAME git-starbeamrainbowlabs-com-sbrl-rhinoreminds The safe name of the repository that we're running against, with potentially troublesome characters removed.
GIT_REF_NAME refs/heads/master Basically the branch that we're working on. Useful for logging purposes.
GIT_REPO_URL git@git.starbeamrainbowlabs.com:sbrl/rhinoreminds.git The URL of the repository that we're running against.
GIT_COMMIT_REF e23b2e0.... The exact commit to check out and build.
GIT_AUTHOR The friendly name of the author that pushed the commit. Useful for logging purposes.

Before we do anything else, we need to make sure that these variables are defined:

set -e; # Don't allow errors

# Check that all the right variables are present
if [ -z "${GIT_REPO_NAME}" ]; then echo -e "Error: The environment variable GIT_REPO_NAME isn't set." >&2; exit 1; fi
if [ -z "${GIT_REF_NAME}" ]; then echo -e "Error: The environment variable GIT_REF_NAME isn't set." >&2; exit 1; fi
if [ -z "${GIT_REPO_URL}" ]; then echo -e "Error: The environment variable GIT_REPO_URL isn't set." >&2; exit 1; fi
if [ -z "${GIT_COMMIT_REF}" ]; then echo -e "Error: The environment variable GIT_COMMIT_REF isn't set." >&2; exit 1; fi
if [ -z "${GIT_AUTHOR}" ]; then echo -e "Error: The environment variable GIT_AUTHOR isn't set." >&2; exit 1; fi

There are a bunch of other variables that I'm omitting here, since they are dynamically determined by from the build variables. I extract many of these additional variables using regular expressions. For example:

GIT_REF_TYPE="$(regex_match "${GIT_REF_NAME}" 'refs/([a-z]+)')";

GIT_REF_TYPE is the bit after the refs/ and before the actual branch or tag name. It basically tells us whether we're building against a branch or a tag. That regex_match function is a utility function that I found in the pure bash bible - which is an excellent resource on various tips and tricks to do common tasks without spawning subprocesses - and therefore obtaining superior performance and lower resource usage. Here it is:

# @source https://github.com/dylanaraps/pure-bash-bible#use-regex-on-a-string
# Usage: regex "string" "regex"
regex_match() {
    [[ $1 =~ $2 ]] && printf '%s\n' "${BASH_REMATCH[1]}"
}

Very cool. For completeness, here are the remainder of the secondary environment variables. Many of them aren't actually used directly - instead they are used indirectly by other scripts and lantern build engine tasks that we call from the main Laminar CI job.

if [[ "${GIT_REF_TYPE}" == "tags" ]]; then
    GIT_TAG_NAME="$(regex_match "${GIT_REF_NAME}" 'refs/tags/(.*)$')";
fi

# NOTE: These only work with SSH urls.
GIT_REPO_OWNER="$(echo "${GIT_REPO_URL}" | grep -Po '(?<=:)[^/]+(?=/)')";
GIT_REPO_NAME_SHORT="$(echo "${GIT_REPO_URL}" | grep -Po '(?<=/)[^/]+(?=\.git$)')";
GIT_SERVER_DOMAIN="$(echo "${GIT_REPO_URL}" | grep -Po '(?<=@)[^/]+(?=:)')";

GIT_TAG_NAME is the name of the tag that we're building against - but only if we've been passed a tag as the GIT_REF_TYPE.

The GIT_SERVER_DOMAIN is important for sending the status reports to the right place. Gitea supports a status API that we can hook into to report on how we're doing. You can see it in action here on my RhinoReminds repository. Those green ticks are the build status that was reported by the Laminar CI job that we're writing in this post. Unfortunately you won't be able to click on it to see the actual build output, as that is currently protected behind a username and password, since the Laminar CI web interface exposes all the git project I've currently got setup on it - including a number of private ones that I can't share.

Anyway, with all our environment variables in order, it's time to do something with them. Before we do though, we should tell Gitea that we're starting the build process:

send-status-gitea "${GIT_COMMIT_REF}" "pending" "Executing build....";

I haven't yet implemented support for sending notifications to GitHub, but it's on my todo list. In theory it's pretty easy to do - this is why I've got that GIT_SERVER_DOMAIN variable above in anticipation of this.

That send-status-gitea function there is another helper script I've written that does what you'd expect - it sends a status message to Gitea. It does this by using the environment variables we deduced earlier (that are also exported - though I didn't include that in the abovecode snippet) and curl.

There's still a bunch of stuff to get through in this post, so I'm going to omit the source of that script from this post for brevity. I've got no particular issue with releasing it though - if you're interested, contact me using the details on my homepage.

Next, we need to set an exit trap. This is a function that will run when the Bash process exits - regardless of whether this was because we finished our work successfully, or otherwise. This can be very useful to make absolutely sure that your script cleans up after itself. In our case, we're only going to be using it to report the build status back to Gitea:

# Runs on exit, no matter what
cleanup() {
    original_exit_code="$?";

    status="success";
    description="Build ${RUN} succeeded in $(human-duration "${SECONDS}").";
    if [[ "${original_exit_code}" -ne "0" ]]; then
        status="failed";
        description="Build failed with exit code ${original_exit_code} after $(human-duration "${SECONDS}")";
    fi

    send-status-gitea "${GIT_COMMIT_REF}" "${status}" "${description}";
}

trap cleanup EXIT;

Very cool. The RUN variable there is provided by Laminar CI, and SECONDS is a bash built-in that tells us the number of seconds that the current Bash process has been running for. human-duration is yet another helper script because I like nice readable durations in my status messages - not something unreadable like Build 3 failed in 345 seconds. It's also somewhat verbose - I adapted it from this StackExchange answer.

With that all out of the way, the next item on the list is to work out what job name we're running under. I've chosen git-repo for the name of the master 'virtual' job - that is to say the one whose entire purpose is to queue the actual job. That's pretty easy, since Laminar gives us an environment variable:

if [ "${JOB}" == "git-repo" ]; then
    # ...
fi

If the job name is git-repo, then we need to queue the actual job name. Since I don't want to have to manually alter the system every time I'm setting up a new repo on my CI system, I've automated the process with symbolic links. The main git-repo job creates a symbolic link to itself in the name of the repository that it's supposed to be running against, and then queues a new job to run itself under the different job name. This segment takes place nested in the above if statement:

# If the job file doesn't exist, create it
# We create a symlink here because this is a 'smart' job - whose
# behaviour changes dynamically based on the job name.
if [ ! -e "${LAMINAR_HOME}/cfg/jobs/${repo_job_name}.run" ]; then
    pushd "${LAMINAR_HOME}/cfg/jobs";
    ln -s "git-repo.run" "${repo_job_name}.run";
    popd
fi

Once we're sure that the symbolic link is in place, we can queue the virtual copy:

# Queue our new hologram
LAMINAR_REASON="git push by ${GIT_AUTHOR} to ${GIT_REF_NAME}" laminarc queue "${repo_job_name}" GIT_REPO_NAME="${GIT_REPO_NAME}" GIT_REF_NAME="${GIT_REF_NAME}" GIT_REPO_URL="${GIT_REPO_URL}" GIT_COMMIT_REF="${GIT_COMMIT_REF}" GIT_AUTHOR="${GIT_AUTHOR}";
# If we got to here, we queued the hologram successfully
# Clear the trap, because we know that the trap for the hologram will fire
# This avoids sending a 2nd status to Gitea, linking the user to the wrong place
trap - EXIT;

exit 0;

This also ensures that if we make any changes to the main job file, all the copies will get updated automatically too. After all, they are only pointers to the actual job on disk.

Notice that we also clear the trap there before exiting - that's important, since we're queuing a copy of ourselves, we don't want to report the completed status before we've actually finished.

At this point, we can now look at what happens if the job name isn't git-repo. In this case, we need to do a few things:

  1. Clone the git repository in question to the shared workspace (if it hasn't been done already)
  2. Fetch new commits on the shared repository copy
  3. Check out the right commit
  4. Copy it to the run-specific directory
  5. Execute the build script

Additionally, we need to ensure that points #1 to #4 are not done by multiple jobs that are running at the same time, since that would probably confuse things and induce weird and undesirable behaviour. This might happen if we push multiple commits at once, for example - since the git post-receive hook (which I'll be talking about in a future post) queues 1 run per commit.

We can make sure of this by using flock. It's actually a feature provided by the Linux Kernel, which allows a single process to obtain exclusive access to a resource on disk. Since each Laminar job has it's own workspace as described above, we can abuse this by doing an flock on the workspace directory. This will ensure that only 1 run per job is accessing the workspace area at once:

# Acquire a lock for this repo
exec 9<"${WORKSPACE}";
flock --exclusive 9;

echo "[${SECONDS}] Lock acquired";

Nice. Next, we need to clone the repository into the shared workspace if we haven't already:

cd "${WORKSPACE}";

# If we haven't already, clone the repository
git_directory="$(echo "${GIT_REPO_URL}" | grep -oP '(?<=/)(.+)(?=.git$)')";
if [ ! -d "${git_directory}" ]; then
    echo "[${SECONDS}] Cloning repository";
    git clone "${GIT_REPO_URL}";
fi
cd "${git_directory}";

Then, we need to fetch any new commits:

# Pull down any updates that are available
echo "[${SECONDS}] Downloading commits";
git fetch origin;

....and check out the one we're supposed to be building:

# Checkout the commit we're interested in testing
echo "[${SECONDS}] Checking out ${GIT_COMMIT_REF}";
git checkout "${GIT_COMMIT_REF}";

Then, we need to copy the repo to the run-specific directory. This is important, since the run might create new files - and we don't want multiple runs running in the same directory at the same time.


echo "[${SECONDS}] Linking source to run directory";
# Hard-link the repo content to the run directory
# This is important because then we can allow multiple runs of the same repo at the same time without using extra disk space
# -r    Recursive mode
# -a    Preserve permissions
# -l    Hardlink instead of copy
cp -ral ./ "${run_directory}";
# Don't forget the .git directory, .gitattributes, .gitmodules, .gitignore, etc.
# This is required for submodules and other functionality, but likely won't be edited - hence we can hardlink here (I think).
# NOTE: If we see weirdness with multiple runs at a time, then we'll need to do something about this.
cp -ral ./.git* "${run_directory}/.git";

I'm using hard linking here for efficiency - I'm banking on the fact that the build script I call isn't going to modify any existing files. Thinking about it, I should do a git reset --hard there just in case - though then I'd have all sorts of nasty issues with timing problems.

So far, I haven't had any issues. If I do, then I'll just disable the hard linking and copy instead. This entire script assumes a trusted environment - i.e. it trusts that the code being executed is not malicious. To this end, it's only suitable for personal projects and the like.

For it to be useful in untrusted environments, it would need to avoid hard linking and execute the build script inside a container - e.g. using LXD or Docker.

Moving on, we next need to release that flock and return to the run-specific directory:

# Go back to the job-specific run directory
cd "${run_directory}";

# Release the lock
exec 9>&- # Close file descriptor 9 and release lock

echo "[${SECONDS}] Lock released";

At this point, we're all set up to run the build script. We need to find it first though. I've currently got 2 standards I'm using across my repositories: build and build.sh. This is easy to automate:

build_script="./build";
if [ ! -x "${build_script}" ]; then build_script="./build.sh"; fi
# FUTURE: Add Makefile support here?
if [ ! -x "${build_script}" ]; then
    echo "[${SECONDS}] Error: Couldn't find the build script, or it wasn't marked as executable." >&2;
    exit 1;
fi

Now that we know where it is, we can execute it. Before we do though, as a little extra I like to run shellcheck over it - since we assume that it's a shell script too (though it might call something that isn't a shell script):

echo "----------------------------------------------------------------";
echo "------------------ Shellcheck of build script ------------------";
set +e; # Allow shellcheck errors - we just warn about them
shellcheck "${build_script}";
set -e;
echo "----------------------------------------------------------------";

I can highly recommend shellcheck - it finds a number of potential issues in both style and syntax that might cause your shell scripts to behave in unexpected ways. I've learnt a bunch about shell scripting and really improved my skills from using it on a regular basis.

Finally, we can now actually execute the build script:

echo "[${SECONDS}] Executing '${build_script} ci'";

nice -n10 ${build_script} ci

I pass the argument ci here, since the lantern build engine takes task names as arguments on the command line. If it's not a lantern script, then it can be interpreted as a helpful hint as to the environment that it's running in.

I also nice it to push it into the background, since I actually have my Laminar CI server running on a Raspberry Pi and it's resources are rather limited. I found oddly that I'd lose other essential services (e.g. SSH) if I didn't do this for some reason - since build tasks are usually quite computationally expensive.

That completes the build script. Of course, when the above finishes executing the trap that we set earlier will trigger and the build status reported. I'll include the full script at the bottom of this post.

This was a long post! We've taken a deep dive into the engine that powers my build system. In the next few posts, I'd like to talk about the git post-receive hook I've been mentioning that triggers this job. I'd also like to talk formally about the lantern build engine - what it is, where it came from, and how it works.

Found this interesting? Spotted a mistake? Got a suggestion? Confused about something? Comment below!

#!/usr/bin/env bash
set -e; # Don't allow errors

# Check that all the right variables are present
if [ -z "${GIT_REPO_NAME}" ]; then echo -e "Error: The environment variable GIT_REPO_NAME isn't set." >&2; exit 1; fi
if [ -z "${GIT_REF_NAME}" ]; then echo -e "Error: The environment variable GIT_REF_NAME isn't set." >&2; exit 1; fi
if [ -z "${GIT_REPO_URL}" ]; then echo -e "Error: The environment variable GIT_REPO_URL isn't set." >&2; exit 1; fi
if [ -z "${GIT_COMMIT_REF}" ]; then echo -e "Error: The environment variable GIT_COMMIT_REF isn't set." >&2; exit 1; fi
if [ -z "${GIT_AUTHOR}" ]; then echo -e "Error: The environment variable GIT_AUTHOR isn't set." >&2; exit 1; fi

# It's checked directly anyway
# shellcheck disable=SC1091
source source_regex_match.sh;

GIT_REF_TYPE="$(regex_match "${GIT_REF_NAME}" 'refs/([a-z]+)')";

if [[ "${GIT_REF_TYPE}" == "tags" ]]; then
    GIT_TAG_NAME="$(regex_match "${GIT_REF_NAME}" 'refs/tags/(.*)$')";
fi

# NOTE: These only work with SSH urls.
GIT_REPO_OWNER="$(echo "${GIT_REPO_URL}" | grep -Po '(?<=:)[^/]+(?=/)')";
GIT_REPO_NAME_SHORT="$(echo "${GIT_REPO_URL}" | grep -Po '(?<=/)[^/]+(?=\.git$)')";
GIT_SERVER_DOMAIN="$(echo "${GIT_REPO_URL}" | grep -Po '(?<=@)[^/]+(?=:)')";

export GIT_REPO_OWNER GIT_REPO_NAME_SHORT GIT_SERVER_DOMAIN GIT_REF_TYPE GIT_TAG_NAME;

###############################################################################

# Example URL: git@git.starbeamrainbowlabs.com:sbrl/rhinoreminds.git
# Environment variables:
#   GIT_REPO_NAME           git-starbeamrainbowlabs-com-sbrl-rhinoreminds
#   GIT_REF_NAME            refs/heads/master, refs/tags/v0.1.1-build7
#   GIT_REF_TYPE            heads, tags
#       Determined dynamically from GIT_REF_NAME.
#   GIT_TAG_NAME            v0.1.1-build7
#       Determined dynamically from GIT_REF_NAME, only set if GIT_REF_TYPE == "tags".
#   GIT_REPO_URL            git@git.starbeamrainbowlabs.com:sbrl/rhinoreminds.git
#   GIT_COMMIT_REF          e23b2e0f3c0b9f48effebca24db48d9a3f028a61
#   GIT_AUTHOR              bob
# Generated:
#   GIT_SERVER_DOMAIN       git.starbeamrainbowlabs.com
#   GIT_REPO_OWNER          sbrl
#   GIT_REPO_NAME_SHORT     rhinoreminds
#   GIT_RUN_SOURCE          github
#       Not always set. If not set then assume git.starbeamrainbowlabs.com


send-status-gitea "${GIT_COMMIT_REF}" "pending" "Executing build....";

# Runs on exit, no matter what
cleanup() {
    original_exit_code="$?";

    status="success";
    description="Build ${RUN} succeeded in $(human-duration "${SECONDS}").";
    if [[ "${original_exit_code}" -ne "0" ]]; then
        status="failed";
        description="Build failed with exit code ${original_exit_code} after $(human-duration "${SECONDS}")";
    fi

    send-status-gitea "${GIT_COMMIT_REF}" "${status}" "${description}";
}

trap cleanup EXIT;

###############################################################################


repo_job_name="$(echo "${GIT_REPO_NAME}" | tr '/' '--')";
if [ "${JOB}" == "git-repo" ]; then
    # If the job file doesn't exist, create it
    # We create a symlink here because this is a 'smart' job - whose
    # behaviour changes dynamically based on the job name.
    if [ ! -e "${LAMINAR_HOME}/cfg/jobs/${repo_job_name}.run" ]; then
        pushd "${LAMINAR_HOME}/cfg/jobs";
        ln -s "git-repo.run" "${repo_job_name}.run";
        popd
    fi

    # Queue our new hologram
    LAMINAR_REASON="git push by ${GIT_AUTHOR} to ${GIT_REF_NAME}" laminarc queue "${repo_job_name}" GIT_REPO_NAME="${GIT_REPO_NAME}" GIT_REF_NAME="${GIT_REF_NAME}" GIT_REPO_URL="${GIT_REPO_URL}" GIT_COMMIT_REF="${GIT_COMMIT_REF}" GIT_AUTHOR="${GIT_AUTHOR}";
    # If we got to here, we queued the hologram successfully
    # Clear the trap, because we know that the trap for the hologram will fire
    # This avoids sending a 2nd status to Gitea, linking the user to the wrong place
    trap - EXIT;

    exit 0;
fi

# We're running in hologram mode!

# Remember the run directory - we'll need it later
run_directory="$(pwd)";

# Important directories:
# $WORKSPACE        Shared between all runs of a job
# $run_directory    The initial directory a run lands in. Empty and run-specific.
# $ARCHIVE          Also run-speicfic, but the contents is persisted after the run ends


# Acquire a lock for this repo
#laminarc lock "${JOB}-workspace";
exec 9<"${WORKSPACE}";
flock --exclusive 9;
###############################################################################
# No need to allow errors here, because the lock will automagically be released 
# if the process crashes, as that'll close the file description anyway :P
echo "[${SECONDS}] Lock acquired";

cd "${WORKSPACE}";

# If we haven't already, clone the repository
git_directory="$(echo "${GIT_REPO_URL}" | grep -oP '(?<=/)(.+)(?=.git$)')";
if [ ! -d "${git_directory}" ]; then
    echo "[${SECONDS}] Cloning repository";
    git clone "${GIT_REPO_URL}";
fi
cd "${git_directory}";


# Pull down any updates that are available
echo "[${SECONDS}] Downloading commits";
git fetch origin;
# Checkout the commit we're interested in testing
echo "[${SECONDS}] Checking out ${GIT_COMMIT_REF}";
git checkout "${GIT_COMMIT_REF}";

echo "[${SECONDS}] Linking source to run directory";
# Hard-link the repo content to the run directory
# This is important because then we can allow multiple runs of the same repo at the same time without using extra disk space
# -r    Recursive mode
# -a    Preserve permissions
# -l    Hardlink instead of copy
cp -ral ./ "${run_directory}";
# Don't forget the .git directory, .gitattributes, .gitmodules, .gitignore, etc.
# This is required for submodules and other functionality, but likely won't be edited - hence we can hardlink here (I think).
# NOTE: If we see weirdness with multiple runs at a time, then we'll need to do something about this.
cp -ral ./.git* "${run_directory}/.git";
echo "[${SECONDS}] done";

# Go back to the job-specific run directory
cd "${run_directory}";

###############################################################################
# Release the lock
exec 9>&- # Close file descriptor 9 and release lock
#laminarc release "${JOB}-workspace";


echo "[${SECONDS}] Lock released";

echo "[${SECONDS}] Finding build script";

build_script="./build";
if [ ! -x "${build_script}" ]; then build_script="./build.sh"; fi
# FUTURE: Add Makefile support here?
if [ ! -x "${build_script}" ]; then
    echo "[${SECONDS}] Error: Couldn't find the build script, or it wasn't marked as executable." >&2;
    exit 1;
fi


echo "[${SECONDS}] Executing '${build_script} ci'";

echo "----------------------------------------------------------------";
echo "------------------ Shellcheck of build script ------------------";
set +e; # Allow shellcheck errors - we just warn about them
shellcheck "${build_script}";
set -e;
echo "----------------------------------------------------------------";


nice -n10 ${build_script} ci

Quick File Management with Gossa

Recently a family member needed to access some documents at a remote location that didn't support USB flash drives. Awkward to be sure, but I did some searching around and found a nice little solution that I thought I'd blog about here.

At first, I thought about setting up Filestash - but I discovered that only installation through Docker is officially supported (if it's written in Go, then shouldn't it end up as a single binary? What's Docker needed for?).

Docker might be great, but for a quick solution to an awkward issue I didn't really want to go to the trouble for installing Docker and figuring out all the awkward plumbing problems for the first time. It definitely appeared to me that it's better suited to a setup where you're already using Docker.

Anyway, I then discovered Gossa. It's also written in Go, and is basically a web interface that lets you upload, download, and rename files (click on a file or directory's icon to rename).

A screenshot of Gossa listing the contents of my CrossCode music folder. CrossCode is awesome, and you should totally go and play it - after finishing reading this post of course :P

Is it basic? Yep.

Do the icons look like something from 1995? Sure.

(Is that Times New Roman I spy? I hope not)

Does it do the job? Absolutely.

For what it is, it's solved my problem fabulously - and it's so easy to setup! First, I downloaded the binary from the latest release for my CPU architecture, and put it somewhere on disk:

curl -o gossa -L https://github.com/pldubouilh/gossa/releases/download/v0.0.8/gossa-linux-arm

chmod +x gossa
sudo chown root: gossa
sudo mv gossa /usr/local/bin/gossa;

Then, I created a systemd service file to launch Gossa with the right options:

[Unit]
Description=Gossa File Manager (syncthing)
After=syslog.target rsyslog.service network.target

[Service]
Type=simple
User=gossa
Group=gossa
WorkingDirectory=/path/to/dir
ExecStart=/usr/local/bin/gossa -h [::1] -p 5700 -prefix /gossa/ /path/to/directory/to/serve
Restart=always

StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=gossa


[Install]
WantedBy=multi-user.target

_(Top tip! Use systemctl cat service_name to quickly see the service file definition for any given service)_

Here I start Gossa listening on the IPv6 local loopback address on port 5700, set the prefix to /gossa/ (I'm going to be reverse-proxying it later on using a subdirectory of a pre-existing subdomain), and send the standard output & error to syslog. Speaking of which, we should tell syslog what to do with the logs we send it. I put this in /etc/rsyslog.d/gossa.conf:

if $programname == 'gossa' then /var/log/gossa/gossa.log
if $programname == 'gossa' then stop

After that, I configured log rotate by putting this into /etc/logrotate.d/gossa:

/var/log/gossa/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 root adm
    postrotate
        invoke-rc.d rsyslog rotate >/dev/null
    endscript
}

Very similar to the configuration I used for RhinoReminds, which I blogged about here.

Lastly, I configured Nginx on the machine I'm running this on to reverse-proxy to Gossa:

server {

    # ....

    location /gossa {
        proxy_pass http://[::1]:5700;
    }

    # ....

}

I've configured authentication elsewhere in my Nginx server block to protect my installation against unauthorised access (and oyu probably should too). All that's left to do is start Gossa and reload Nginx:

sudo systemctl daemon-reload
sudo systemctl start gossa
# Check that Gossa is running
sudo systemctl status gossa

# Test the Nginx configuration file changes before reloading it
sudo nginx -t
sudo systemctl reload

Note that reloading Nginx is more efficient that restarting it, since it doesn't kill the process - only reload the configuration from disk. It doesn't matter here, but in a production environment that receives a high volume of traffic you it's a great way make configuration changes while avoid dropping client connections.

In your web browser, you should see something like the image at the top of this post.

Found this interesting? Got another quick solution to an otherwise awkward issue? Comment below!

Own your code, part 3: Shell scripting infrastructure

In the last post, I told the curious tale of my unreliable webhook. In the post before that, I talked about my Gitea-powered git server and how I set it up. In this one, we're going to back up a bit and look at setting up Laminar CI.

Laminar CI is a continuous integration program that takes a decidedly different approach to the one you see in solutions like GitLab CI and Travis. It takes a much more minimal approach, instead preferring to provide you with the tools you need and letting you get on with setting it up however you like and integrating it with whatever you like.

I recommend looking at its website and user manual to get a feel for how it works. In short, it lets you do things like this:

laminarc queue build-code
laminarc show-jobs

It is, of course, entirely command-line based. In order to integrate it with other services, webhooks are needed. In my case, I've used webhook for this purpose. Before we get into that though, we should outline how the system we build should work.

For me, I'm not happy with filling a folder with job scripts. I want my CI system to have the following properties:

  • I want to have all the configuration files and scripts under version control
  • I want to keep project-specific CI scripts in their appropriate repositories
  • I don't want to have to alter the CI configuration every time I start a new project.
  • The CI system should be stable - it shouldn't fall over if multiple jobs for the same repository are running at the same time.

Bold claims. Achieving this was actually quite complicated, and demanded a pretty sophistic infrastructure that's comprised of multiple independent shell scripts. It's best explained with a diagram:

There are 3 different machines at play here.

  1. The local computer, where we write our code
  2. The git server, where the code is stored
  3. The CI Server, which runs continuous integration tasks

Somehow, we need to notify the CI server that it needs to do something when new commits end up at the git server. We can achieve this with a git post-receive hook, which is basically a shell script (yep, we'll be seeing a lot of those) that can perform some logic on the server just after a push is complete, but the client pushing them hasn't disconnected yet.

In this post-receive hook we need to trigger the webhook that notifies the CI server that there are new commits for it to test. GitHub makes this easy, as it provides a webhook system where you can configure a webhook via a GUI - but it doesn't let you set the webhook script directly as far as I know.

Alternatively, should we run into issues with the webhook and we have control over the git server, we can trigger the CI build directly by writing a post-receive git hook directly utilising SSH port forwarding. This is what I did in the end for my personal git server, though as I noted in part 2 I did end up working around the webhook issues so that I could have it work with GitHub too.

For the webhook to work, we'll need a receiving script that will parse the JSON body of the webhook itself, and queue the laminar job.

In order for the laminar job to work without modification when we add a new project, it will have to come in 2 parts. The first will have a generic 'virtual' or 'smart' job, which should create a symbolic link to itself under the name of the repository that we want to run CI tasks for.

When called by Laminar under a repository-specific name, we want to run the CI tasks - but only on a copy of the main repository. Additionally, we don't want to re-clone the repository each time - this is slow and wastes bandwidth.

Finally, we need a unified standard for defining CI tasks in our repositories for which we want to enable continuous integration.

This we can achieve with the use of the lantern build engine, which I'll talk about (and its history!) in a future post.

Putting all this together has been quite the the undertaking - hence this series of blog posts! For now, since this blog post somehow seems to be getting rather long already and we've laid down the foundations of quite the complicated system, I think I'll leave this post here for now.

In the next post, we'll look at building the core of the system: The main laminar CI job that will organise the execution of project-specific CI tasks.

Setting up a Mosquitto MQTT server

I recently found myself setting up a mosquitto instance (yep, for this) due to a migration we're in the middle of doing and it got quite interesting, so I thought I'd post about it here. This post is also partly documentation of what I did and why, just in case future people come across it and wonder how it's setup, though I have tried to make it fairly self-documenting.

At first, I started by doing sudo apt install mosquitto and seeing if it would work. I can't remember if it did or not, but it certainly didn't after I played around with the configuration files. To this end, I decided that enough was enough and I turned the entire configuration upside-down. First up, I needed to disable the existing sysV init-based service that ships with the mosquitto package:

sudo systemctl stop mosquitto # Just in case
sudo systemctl start mosquitto

Next, I wrote a new systemd service file:

[Unit]

Description=Mosquitto MQTT Broker
After=syslog.target rsyslog.target network.target

[Service]
Type=simple
PIDFile=/var/run/mosquitto/mosquitto.pid
User=mosquitto

PermissionsStartOnly=true
ExecStartPre=-/bin/mkdir /run/mosquitto
ExecStartPre=/bin/chown -R mosquitto:mosquitto /run/mosquitto

ExecStart=/usr/sbin/mosquitto --config-file /etc/mosquitto/mosquitto.conf
ExecReload=/bin/kill -s HUP $MAINPID

StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=mosquitto


[Install]
WantedBy=multi-user.target

This is broadly similar to the service file I developed in my earlier tutorial post, but it's slightly more complicated.

For one, I use PermissionsStartOnly=true and a series of ExecStartPre directives to allow mosquitto to create a PID file in a directory in /run. /run is a special directory on Linux for PID files and other such things, but normally only root can modify it. mosquitto will be running under the mosquitto user (surprise surprise), so we need to create a subdirectory for it and chown it so that it has write permissions.

A PID file is just a regular file on disk that contains the PID (Process IDentifier) number of the primary process of a system service. System service managers such as systemd and OpenRC use this number to manage the health of the service while it's running and send it various signals (such as to ask it to reload its configuration file).

With this in place, I then added an rsyslog definition at /etc/rsyslog.d/mosquitto.conf to tell it where to put the log files:

if $programname == 'kraggwapple' then /var/log/mosquitto/mosquitto.log
if $programname == 'kraggwapple' then stop

Thinking about it, I should probably check that a log rotation definition file is also in place.

Just in case, I then chowned the pre-existing log files to ensure that rsyslog could read & write to it:

sudo chown -R syslog: /var/log/mosquitto

Then, I filled out /etc/mosquitto/mosquitto.conf with a few extra directives and restarted the service. Here's the full configuration file:

# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example

# NOTE: We can't use tab characters here, as mosquitto doesn't like it.

pid_file /run/mosquitto/mosquitto.pid

# Persistence configuration
persistence true
persistence_location /var/lib/mosquitto/


# Not a file today, thanks
# Log files will actually end up at /var/llog/mosquitto/mosquitto.log, but will go via syslog
# See /etc/rsyslog.d/mosquitto.conf
#log_dest file /var/log/mosquitto/mosquitto.log
log_dest syslog


include_dir /etc/mosquitto/conf.d


# Documentation: https://mosquitto.org/man/mosquitto-conf-5.html

# Require a username / password to connect
allow_anonymous false
# ....which are stored in the following file
password_file /etc/mosquitto/mosquitto_users

# Make a log entry when a client connects & disconnects, to aid debugging
connection_messages true

# TLS configuration
# Disabled at the moment, since we don't yet have a letsencrypt cert
# NOTE: I don't think that the sensors currently connect over TLS. We should probably fix this.
# TODO: Point these at letsencrypt
#cafile /etc/mosquitto/certs/ca.crt
#certfile /etc/mosquitto/certs/hostname.localdomain.crt
#keyfile /etc/mosquitto/certs/hostname.localdomain.key

As you can tell, I've still got some work to do here - namely the TLS setup. It's a bit of a chicken-and-egg problem, because I need the domain name to be pointing at the MQTT server in order to get a Let's Encrypt TLS certificate, but that'll break all the sensors using the current one..... I'm sure I'll figure it out.

But wait! We forgot the user accounts. Before I started the new service, I added some user accounts for client applications to connect with:

sudo mosquitto_passwd /etc/mosquitto/mosquitto_users username1
sudo mosquitto_passwd /etc/mosquitto/mosquitto_users username1

The mosquitto_passwd program prompts for a password - that way you don't end up with the passwords in your ~/.bash_history file.

With all that taken care of, I started the systemd service:

sudo systemctl daemon-reload
sudo systemctl start mosquitto-broker.service

Of course, I ended up doing a considerable amount of debugging in between all this - I've edited it down to make it more readable and fit better in a blog post :P

Lastly, because I'm paranoid, I double-checked that it was running with htop and netstat:


sudo netstat -peanut | grep -i mosquitto
tcp        0      0 0.0.0.0:1883            0.0.0.0:*               LISTEN      112        2676558    5246/mosquitto      
tcp        0      0 x.y.z.w:1883           x.y.z.w:54657       ESTABLISHED 112        2870033    1234/mosquitto      
tcp        0      0 x.y.z.w:1883           x.y.z.w:39365       ESTABLISHED 112        2987984    1234/mosquitto      
tcp        0      0 x.y.z.w:1883           x.y.z.w:58428       ESTABLISHED 112        2999427    1234/mosquitto      
tcp6       0      0 :::1883                 :::*                    LISTEN      112        2676559    1234/mosquitto      

...no idea why it want to connect to itself, but hey! Whatever floats its boat.

Orange Pi 3 in review

An Orange Pi 3, along with it's logo. Of course, I'm not affiliated with the manufacturers in any way. In fact, they are probably not aware that this post even exists

I recently bought an Orange Pi 3 (based on the Allwinner H6 chipset) to perform a graphics-based task, and I've had an interesting enough time with it that I thought I'd share my experiences in a sort of review post here.

The first problem when it arrived was to find an operating system that supports it. My initial thought was to use Devuan, but I quickly realised that practically the only operating system that supports it at the moment is Armbian.

Not to be deterred, after a few false starts I got Armbian based on Ubuntu 18.04 Bionic Beaver installed. The next order of business was to install the software I wanted to use.

For the most part, I didn't have too much trouble with this - though it was definitely obvious that the arm64 (specifically sunxi64) architecture isn't a build target that's often supported by apt repository owners. This wasn't helped by the fact that apt has a habit of throw really weird error messages when you try to install something that exists in an apt repository, but for a different architecture.

After I got Kodi installed, the next order of business was to get it to display on the screen. I ended up managing this (eventually) with the help of a lot of tutorials and troubleshooting, but the experience was really rather unpleasant. I kept getting odd errors, like failed to load driver sun4i-drm when trying to start Kodi via an X11 server and other strangeness.

The trick in the end was to force X11 to use the fbdev driver, but I'm not entirely sure what that means or why it fixed the issue.

Moving on, I then started to explore the other capabilities of the device. Here, too, I discovered that a number of shortcomings in the software support provided by Linux, such as a lack of support for audio via HDMI and Bluetooth. I found the status matrix of the SunXI project, which is the community working to add support for the Allwinner H6 chipset to the Linux Kernel.

They do note that support for the H6 chipset is currently under development and is incomplete at the moment - and I wish I'd checked on software support before choosing a device to purchase.

The other big problem I encountered was a lack of kernel headers provided by Armbian. Normally, you can install the headers for your kernel by installing the linux-headers-XXXXXX package with your favourite package manager, where XXXXXX is the same as the string present in the linux-image-XXXXXX package you've got installed that contains the kernel itself.

This is actually kind of a problem, because it means that you can't compile any software that calls kernel functions yourself without the associated header files, preventing you from installing various dkms-based kernel modules that auto-recompile against the kernel you've got installed.

I ended up finding this forum thread, but the response who I assume is an armbian developer was less than stellar - they basically said that if you want kernel headers, you need to compile the kernel yourself! That's a significant undertaking, for those not in the know, and certainly not something that should be undertaken lightly.

While I've encountered a number of awkward issues that I haven't seen before, the device does have some good things worth noting. For one, it actually packs a pretty significant punch: it's much more powerful than a Raspberry Pi 3B+ (of which I have one; I bought this device before the Raspberry Pi 4 was released). This makes it an ideal choice for more demanding workloads, which a Raspberry Pi wouldn't quite be suitable for.

In conclusion, while it's a nice device, I can't recommend it to people just yet. Software support is definitely only half-baked at this point with some glaring holes (HDMI audio is one of them, which doesn't look like it's coming any time soon).

I think part of the problem is that Xunlong (that company that makes the device and others in it's family) don't appear to be interested in supporting the community at all, choosing instead to dump custom low-quality firmware for people to use as blobs of binary code (which apparently doesn't work) - which causes the SunXI community a lot of extra work to reverse-engineer it all and figure out how it all works before they can start implementing support in the Linux Kernel.

If you're interested in buying a similar embedded board, I can recommend instead using HackerBoards to find one that suits your needs. Don't forget to check for operating system support!

Found this interesting? Thinking of buying a board yourself? Had a different experience? Comment below!

Art by Mythdael