Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio backups bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Question: How do you recover a deleted file that's been overwritten?

Answer: With the greatest of difficulty.

The blog post following this one in a few days time is, ironically, about backing things up. However, I actually ended up losing the entire post during the upload process to my server (it replaced both the source and destination files with an empty file!). I'd already saved it to disk, but still almost lost it anyway......

Recovery of deleted files is awkward at best. It relies on the fact that when you 'delete' something, it doesn't erase it from disk at all - just deallocates the sectors on disk that it was taking up and re-enters them into free pool of space - which is then re-used at will.

The most important thing to remember when you've just lost something is to not touch anything. Shutdown your computer, and, if you're not confident enough yourself, call someone who knows what they're doing to help you out.

The best way to recover a file is to boot into a live cd. This is a CD (or flash drive) that holds an (or multiple!) entire operating system(s). This way, no additional writing is done to the disk containing the deleted file - potentially corrupting it.

After fiddling about with this (I had to update my bootable flash drive, as Ubuntu 15.10 is out of support and I couldn't download the extundelete tool, whichI'll mention shortly), I found that I'd hit a dead-end.

I was using the extundelete (sudo apt install extundelete, apt) tool, and it claimed that it couldn't restore the file because it had been reallocated. Here's the command I used:

sudo extundelete --restore-file /absolute/path/to/file /dev/sda7

I suspect that it was getting confused because I had a file by that name on disk that was now empty.

Anyway, after doing something else for a while, I had an idea. Since my blog posts are just text files on disk, shouldn't it be on my disk somewhere? Could I locate it at all?

As it turns out, the answer is yes. Remembering a short sentence from the post I'd just written, I started a brute-force search of my disk:

sudo dd if=/dev/sda7 | strings | grep -i "AWS S3"

This has several components to it Explain Shell is great at providing an explanation of each bit in turn. Here's a short summary:

  • dd - This reads in the entire contents of a partition and pushes it into the following command. Find the partition name with lsblk.
  • strings - This extracts all runs of printable characters from the input stream.
  • grep - This searches (case-insensitively with -i) for an specified string in the input

I started to get results - a whole line from the blog post that had supposedly been deleted and overwritten! This wasn't really enough though. Taking a longer snippet to reduce the noise in the output, I tried again:

sudo dd if=/dev/sda7 | strings | grep -i -C100 "To start, we'll need an AWS S3 bucket"

This time, I added -C100. This tells grep that I want to see 100 lines before and after any lines that contain the specified search string.

With this, I managed to recover enough of the blog post to quickly re-edit and upload it. It did appear to remove blank lines and the back-ticks at the end of a code block, but they are easy to replace.

Note to self: Always copy first when crossing file system boundaries, and delete later. Don't move all in one go!

Write an XMPP bot in half an hour

Recently I've looked at using AI to extract key information from natural language, and creating a system service with systemd. The final piece of the puzzle is to write the bot itself - and that's what I'm posting about today.

Since not only do I use XMPP for instant messaging already but it's an open federated standard, I'll be building my bot on top of it for maximum flexibility.

To talk over XMPP programmatically, we're going to need library. Thankfully, I've located just such a library which appears to work well enough, called S22.XMPP. Especially nice is the comprehensive documentation that makes development go much more smoothly.

With our library in hand, let's begin! Our first order of business is to get some scaffolding in place to parse out the environment variables we'll need to login to an XMPP account.

using System;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

using S22.Xmpp;
using S22.Xmpp.Client;
using S22.Xmpp.Im;

namespace XmppBotDemo
{
    public static class MainClass
    {
        // Needed later
        private static XmppClient client;

        // Settings
        private static Jid ourJid = null;
        private static string password = null;

        public static int Main(string[] args)
        {
            // Read in the environment variables
            ourJid = new Jid(Environment.GetEnvironmentVariable("XMPP_JID"));
            password = Environment.GetEnvironmentVariable("XMPP_PASSWORD");

            // Ensure they are present
            if (ourJid == null || password == null) {
                Console.Error.WriteLine("XMPP Bot Demo");
                Console.Error.WriteLine("=============");
                Console.Error.WriteLine("");
                Console.Error.WriteLine("Usage:");
                Console.Error.WriteLine("    ./XmppBotDemo.exe");
                Console.Error.WriteLine("");
                Console.Error.WriteLine("Environment Variables:");
                Console.Error.WriteLine("    XMPP_JID         Required. Specifies the JID to login with.");
                Console.Error.WriteLine("    XMPP_PASSWORD    Required. Specifies the password to login with.");
                return 1;
            }

            // TODO: Connect here           

            return 0;
        }
    }
}

Excellent! We're reading in & parsing 2 environment variables: XMPP_JID (the username), and XMPP_PASSWORD. It's worth noting that you can call these environment variables anything you like! I chose those names as they describe their contents well. It's also worth mentioning that it's important to use environment variables for secrets passing them as command-line arguments cases them to be much more visible to other uses of the system!

Let's connect to the XMPP server with our newly read-in credentials:

// Create the client instance
client = new XmppClient(ourJid.Domain, ourJid.Node, password);

client.Error += errorHandler;
client.SubscriptionRequest += subscriptionRequestHandler;
client.Message += messageHandler;

client.Connect();

// Wait for a connection
while (!client.Connected)
    Thread.Sleep(100);

Console.WriteLine($"[Main] Connected as {ourJid}.");

// Wait forever.
Thread.Sleep(Timeout.Infinite);

// TODO: Automatically reconnect to the server when we get disconnected.

Cool! Here, we create a new instance of the XMPPClient class, and attach 3 event handlers, which we'll look at later. We then connect to the server, and then wait until it completes - and then write a message to the console. It looks like S22.Xmpp spins up a new thread, so unfortunately we can't catch any errors it throws with a traditional try-catch statement. Instead, we'll have to ensure we're really careful that we catch any exceptions we throw accidentally - otherwise we'll get disconnected!

It does appear that XmppClient catches some errors though, which trigger the Error event - so we should attach an event handler to that.

/// <summary>
/// Handles any errors thrown by the XMPP client engine.
/// </summary>
private static void errorHandler(object sender, ErrorEventArgs eventArgs) {
    Console.Error.WriteLine($"Error: {eventArgs.Reason}");
    Console.Error.WriteLine(eventArgs.Exception);
}

Before a remote contact is able to talk to our bot, they will send us a subscription request - which we'll need to either accept or reject. This is also done via an event handler. It's the SubscriptionRequest one this time:

/// <summary>
/// Handles requests to talk to us.
/// </summary>
/// <remarks>
/// Only allow people to talk to us if they are on the same domain we are.
/// You probably don't want this for production, but for developmental purposes
/// it offers some measure of protection.
/// </remarks>
/// <param name="from">The JID of the remote user who wants to talk to us.</param>
/// <returns>Whether we're going to allow the requester to talk to us or not.</returns>
public static bool subscriptionRequestHandler(Jid from) {
    Console.WriteLine($"[Handler/SubscriptionRequest] {from} is requesting access, I'm saying {(from.Domain == ourJid.Domain?"yes":"no")}");
    return from.Domain == ourJid.Domain;
}

This simply allows anyone on our own domain to talk to us. For development purposes this will offer us some measure of protection, but for production you should probably implement a whitelisting or logging system here.

The other interesting thing we can do here is send a user a chat message to either welcome them to the server, or explain why we rejected their request. To do this, we need to write a pair of utility methods, as sending chat messages with S22.Xmpp is somewhat over-complicated:

#region Message Senders

/// <summary>
/// Sends a chat message to the specified JID.
/// </summary>
/// <param name="to">The JID to send the message to.</param>
/// <param name="message">The messaage to send.</param>
private static void sendChatMessage(Jid to, string message)
{
    //Console.WriteLine($"[Bot/Send/Chat] Sending {message} -> {to}");
    client.SendMessage(
        to, message,
        null, null, MessageType.Chat
    );
}
/// <summary>
/// Sends a chat message in direct reply to a given incoming message.
/// </summary>
/// <param name="originalMessage">Original message.</param>
/// <param name="reply">Reply.</param>
private static void sendChatReply(Message originalMessage, string reply)
{
    //Console.WriteLine($"[Bot/Send/Reply] Sending {reply} -> {originalMessage.From}");
    client.SendMessage(
        originalMessage.From, reply,
        null, originalMessage.Thread, MessageType.Chat
    );
}

#endregion

The difference between these 2 methods is that one sends a reply directly to a message that we've received (like a threaded reply), and the other simply sends a message directly to another contact.

Now that we've got all of our ducks in a row, we can write the bot itself! This is done via the Message event handler. For this demo, we'll write a bot that echo any messages to it in reverse:

/// <summary>
/// Handles incoming messages.
/// </summary>
private static void messageHandler(object sender, MessageEventArgs eventArgs) {
    Console.WriteLine($"[Bot/Handler/Message] {eventArgs.Message.Body.Length} chars from {eventArgs.Jid}");
    char[] messageCharArray = eventArgs.Message.Body.ToCharArray();
    Array.Reverse(messageCharArray);
    sendChatReply(
        eventArgs.Message,
        new string(messageCharArray)
    );
}

Excellent! That's our bot complete. The full program is at the bottom of this post.

Of course, this is a starting point - not an ending point! A number of issues with this demo stand out. There isn't a whitelist, and putting the whole program in a single file doesn't sound like a good idea. The XMPP logic should probably be refactored out into a separate file, in order to keep the input settings parsing separate from the bot itself.

Other issues that probably need addressing include better error handling and more - but fixing them all here would complicate the example rather.

Edit: The code is also available in a git repository if you'd like to clone it down and play around with it :-)

Found this interesting? Got a cool use for it? Still confused? Comment below!

Complete Program

using System;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using S22.Xmpp;
using S22.Xmpp.Client;
using S22.Xmpp.Im;

namespace XmppBotDemo
{
    public static class MainClass
    {
        private static XmppClient client;
        private static Jid ourJid = null;
        private static string password = null;

        public static int Main(string[] args)
        {
            // Read in the environment variables
            ourJid = new Jid(Environment.GetEnvironmentVariable("XMPP_JID"));
            password = Environment.GetEnvironmentVariable("XMPP_PASSWORD");

            // Ensure they are present
            if (ourJid == null || password == null) {
                Console.Error.WriteLine("XMPP Bot Demo");
                Console.Error.WriteLine("=============");
                Console.Error.WriteLine("");
                Console.Error.WriteLine("Usage:");
                Console.Error.WriteLine("    ./XmppBotDemo.exe");
                Console.Error.WriteLine("");
                Console.Error.WriteLine("Environment Variables:");
                Console.Error.WriteLine("    XMPP_JID         Required. Specifies the JID to login with.");
                Console.Error.WriteLine("    XMPP_PASSWORD    Required. Specifies the password to login with.");
                return 1;
            }

            // Create the client instance
            client = new XmppClient(ourJid.Domain, ourJid.Node, password);

            client.Error += errorHandler;
            client.SubscriptionRequest += subscriptionRequestHandler;
            client.Message += messageHandler;

            client.Connect();

            // Wait for a connection
            while (!client.Connected)
                Thread.Sleep(100);

            Console.WriteLine($"[Main] Connected as {ourJid}.");

            // Wait forever.
            Thread.Sleep(Timeout.Infinite);

            // TODO: Automatically reconnect to the server when we get disconnected.

            return 0;
        }

        #region Event Handlers

        /// <summary>
        /// Handles requests to talk to us.
        /// </summary>
        /// <remarks>
        /// Only allow people to talk to us if they are on the same domain we are.
        /// You probably don't want this for production, but for developmental purposes
        /// it offers some measure of protection.
        /// </remarks>
        /// <param name="from">The JID of the remote user who wants to talk to us.</param>
        /// <returns>Whether we're going to allow the requester to talk to us or not.</returns>
        public static bool subscriptionRequestHandler(Jid from) {
            Console.WriteLine($"[Handler/SubscriptionRequest] {from} is requesting access, I'm saying {(from.Domain == ourJid.Domain?"yes":"no")}");
            return from.Domain == ourJid.Domain;
        }

        /// <summary>
        /// Handles incoming messages.
        /// </summary>
        private static void messageHandler(object sender, MessageEventArgs eventArgs) {
            Console.WriteLine($"[Handler/Message] {eventArgs.Message.Body.Length} chars from {eventArgs.Jid}");
            char[] messageCharArray = eventArgs.Message.Body.ToCharArray();
            Array.Reverse(messageCharArray);
            sendChatReply(
                eventArgs.Message,
                new string(messageCharArray)
            );
        }

        /// <summary>
        /// Handles any errors thrown by the XMPP client engine.
        /// </summary>
        private static void errorHandler(object sender, ErrorEventArgs eventArgs) {
            Console.Error.WriteLine($"Error: {eventArgs.Reason}");
            Console.Error.WriteLine(eventArgs.Exception);
        }

        #endregion

        #region Message Senders

        /// <summary>
        /// Sends a chat message to the specified JID.
        /// </summary>
        /// <param name="to">The JID to send the message to.</param>
        /// <param name="message">The messaage to send.</param>
        private static void sendChatMessage(Jid to, string message)
        {
            //Console.WriteLine($"[Rhino/Send/Chat] Sending {message} -> {to}");
            client.SendMessage(
                to, message,
                null, null, MessageType.Chat
            );
        }
        /// <summary>
        /// Sends a chat message in direct reply to a given incoming message.
        /// </summary>
        /// <param name="originalMessage">Original message.</param>
        /// <param name="reply">Reply.</param>
        private static void sendChatReply(Message originalMessage, string reply)
        {
            //Console.WriteLine($"[Rhino/Send/Reply] Sending {reply} -> {originalMessage.From}");
            client.SendMessage(
                originalMessage.From, reply,
                null, originalMessage.Thread, MessageType.Chat
            );
        }

        #endregion
    }
}

Creating a system service with systemd

While I've got some grumblings with systemd over how it handles (or not) certain things, it's the most popular service manager on Linux systems today. By this, I mean it starts and stops the various services (like your SSH server, web server, cron) automatically, according to the rules laid out in special service files.

Since it's so popular and I keep having to write services (and look up how to do so every time), I thought I'd write a post here about it to save me the trouble :P

Bear in mind that systemd isn't the only service manager out there. Others include [OpenRC](), [runit](), [upstart](), and more! If you're using one of these (I'm looking to investigate using one of these on my next server rebuild), then this tutorial isn't for you. I will probably be releasing a tutorial down the road for OpenRC though, if I get around to having a server running an OS that uses it.

systemd stores it's service files in /etc/systemd/system/, so to start we need to create a new file in there:

sudo sensible-editor /etc/systemd/system/service_name.service

With the new file open in your favourite editor, it's time to set out our service definition. This is done with an ini-like syntax. Here's an example:

[Unit]
Description=Gitea
After=syslog.target rsyslog.service network.target

[Service]
Type=simple
User=git
Group=git
WorkingDirectory=/srv/git/gitea
ExecStart=/srv/git/gitea/gitea web
Restart=always
Environment=USER=git HOME=/srv/git

[Install]
WantedBy=multi-user.target

The above is a simple service file for Gitea, which is the engine behind my personal git server. Let's go through each section one by one.

Firstly, the [Unit] section defines the metadata about the service. It's fairly self explanatory actually - we set the description of the service here, and also the other services (space ` separated) that we want our service to be started after with theAfter` property.

Next comes the [Service] section. This section specifies how it should start the service. We tell it that it's a simple service (in other words it doesn't do anything fancy - other types are available, but we won't use them here), the user and group it should run under, and working directory of the process, and the command (and it's arguments) to execute in order to start the process.

In addition, we also tell it to restart the service if it crashes, and set a few environment variables to refine the way Gitea behaves. Very cool!

The final section, [Install], simply specifies the systemd-equivalent of which run-level this service should start on. It's very interesting from a how-does-my-system-work perspective - I recommend reading this Stack Exchange answer and this article for more information - it's a topic for another post here on this blog :-)

To start this new service, do the following:

sudo systemctl daemon-reload
sudo systemctl enable service_name.service
sudo systemctl start service_name.service

This starts our new service and configures it to automatically start when the system first boots.

With that taken care of, we've now got the basics down of our very own service file! We can take this further though. What if there's a secret key that we need to pass to a service on startup in an environment variable, but we don't want to specify it in the service because it's world-readable?

The answer here is a clever bit of shell scripting. Consider the following service file:

[Unit]
Description=Awesome XMPP Bot
After=network.target prosody.service

[Service]
Type=simple
User=bot
WorkingDirectory=/srv/bot
ExecStart=/srv/bot/start_service.sh
Restart=on-failure
# Other Restart options: or always, on-abort, etc

[Install]
WantedBy=multi-user.target

In this case, we've defined a service file for an XMPP bot (public server directory). In order for it to connect to an XMPP server, it needs a JID (a username - formatted like an email address) and password. However, we don't want to specify these directly in the service file because they are secret!

Instead, we've specified that it should start a shell script that's located at /srv/bot/start_service.sh instead of the bot itself. Here's the contents of that shell script:

#!/usr/bin/env bash

source .xmpp_credentials

export XMPP_JID;
export XMPP_PASSWORD;

exec /usr/bin/mono Bot.exe

This simple shell script loads the contents of the file .xmpp_credentials, specifies that the XMPP_JID and XMPP_PASSWORD environment variables should be passed to any further (sub) processes, and asks for the current process to be terminated and replaced with an instance of Mono executing our bot's code that stored in Bot.exe (this way we don't have an extra bash process sitting around doing nothing, since it's job is done as soon as we start the bot itself).

In this way, we can store our precious private details in a file that we can lock down so that only the bot's user account can read it. Here's what that .xmpp_credentials file might look like:

#!/usr/bin/env bash
XMPP_JID="bot@bobsrockets.com";
XMPP_PASSWORD="sekret";

....and if I run ls .xmpp_credentials, I might see something like this:

-r-x------ 1 bot bot 104 Nov 10 21:27 .xmpp_credentials

Here the file permissions allow only the bot user to read and execute the file, but not modify it (sudo chown bot:bot .xmpp_credentials and sudo chmod 0500 .xmpp_credentials set these permissions for the curious).

These completes the tutorial on setting up services with systemd. We've seen how to create service files and make them start on boot (much easier than alternatives like running a command manually or using screen!). We've also learnt a simple way to hide credentials (though more advanced alternatives do exist).

Found this useful? Found a better way to do it? Comment below!

Sources and Further Reading

How to set up a shared PDF printer on your local network

I've recently ended up setting up a PDF printer on my local network in an effort to transfer some pictures out of a ridiculous i-device (I tell you, Apple'e iOS is the worst for being a walled garden). Since the process for doing so wasn't entirely obvious, I'm documenting it in this blog post to remind myself for later. If you find it useful, please let me know in the comments below!

Firstly, you'll need a machine running Linux. Any distribution will do, but I'll be using an apt-based distribution, so you may need to alter some of the commands here to suit your system.

Firstly, we need install the cups (which stands for the Common Unix Printing Service) PDF printer driver. It comes with a lot of junk if you're not careful, so here I use --no-install-recommends to avoid installing any unnecessary packages.

sudo apt install printer-driver-cups-pdf --no-install-recommends

If you've got a firewall running (which you really should - see this post of mine for more information on that), then you'll need to open the port 631 for TCP traffic to allow people to print. If you're using ufw, then this should do the trick:

sudo ufw allow cups

If not, then you may need to specify the port number explicitly:

sudo ufw allow 631/tcp

With the printer installed, we next need to open it to the world. Before that though, we should make some changes to the configuration file, which is located at /etc/cups-pdf.conf. Firstly, I wanted to put the resulting PDFs into my file server's shared folder. This is achieved by editing the Out and AnonDirName settings. They should already be present in the configuration file - it's just a matter of changing their values:

Out         /absolute/path/to/output/dir
AnonDirName /absolute/path/to/output/dir

I also wanted to customise the user account and permissions that it saves the pdfs with. I did this through the AnonUser and AnonUMask settings - which should also be present by default:

AnonUser    username
AnonUMask   0007

The umask is basically an inverted permission octal. I found a good calculator calculator online to do it for me :P (Don't forget the preceding 0 - it's important!)

Finally, I experienced an issue whereby cups kept overwriting the same file again and again because the iPad wasn't smart enough to send the photos to print with their actual filenames - instead opting to send them all as Photo.pdf. Thankfully though, cups-pdf has the Label option (also specified by default) that ensures that output filenames don't clash. Setting it to 1 instead of 0 solved the problem for me:

Label       1

Note that some of these properties may be prefixed with a hash (#). You'll need to remove this in order for it to take effect.

With the new PDF printer configured, it's time to open it up to our local network. Here's how to do that:

sudo cupsctl --share-printers
sudo lpadmin -p pdf -o printer-is-shared=true

Note that if you want to open it up to more than your local subnet you'll need to do some additional configuration - such as configuring authentication, for instance. Such things are beyond the scope of this blog post, but if there's the demand (comment below!) I can certainly investigate writing something up.

Found this useful? Got a better / different solution? Comment below!

Acorn Validator

Edit: Corrected title and a bunch of grammatical mistakes. I typed this out on my phone with Monospace - and I seems that my keyboard (and Phone-Laptop Bluetooth connection!) leave something to be desired.....

Over the last week, I've been hard at work on an entry for #LOWREZJAM. While it's not finished yet (submission is on Thursday), I've found some time to write up a quick blog post about a particular aspect of it. Of course, I'll be blogging about the rest of it later once it's finished :D

The history of my entry is somewhat.... complicated. Originally, I started work on it a few months back as an independent project, but due to time constraints and other issues I was unable to get very far with it. Once I discovered #LOWREZJAM, I made the decision to throw away the code I had written and start again from (almost) scratch.

It is for this reason that I have a Javascript validator script lying around. I discovered when writing it originally that my editor Atom didn't have syntax validation support for Javascript. While there are extensions that do the job, it looked like a complicated business setting one up for just syntax checking (I don't want your code style guideline suggestions! I have my own style!). To this end, I wrote myself a quick bash script to automatically check the syntax of all my javascript files that I can then include as a build step - just before webpack.

Over the time I've been working on my #LOWREZJAM entry here, I've been tweaking and improving it - and thought I'd share it here. In the future, I'm considering turning it into a linting provider for the Atom editor I mentioned above (it depends on how complicated that is, and how good the documentation is to help me understand the process).

The script (which can be found in full at the bottom of this post), has 2 modes of operation. In the first mode, it acts as a co-ordinator process that starts all the sub-processes that validate the javascript. In the second mode, it validates a single file - outputting the result to the standard output and also logging any errors in a special place that the co-ordinator process can find them later.

It decides which mode to operate in based on whether it recieves an argument telling it which file to validate:

if [ "$1" != "" ]; then
    validate_file "$1" "$2";
    exit $?;
fi

If it detects an argument, then it calls the validate_file function and exits with the returned exit code.

If not, then the script continues into co-ordinator mode. In this mode it chains a bunch of commands together like lego bricks to start subprocesses to validate all of the javascript files it can find a fast as possible - acorn, the validator itself, can only check one file as a time it would appear. It does this like so:

find . -not -path "./node_modules/" -not -path "./dist/" | grep -iP '\.mjs$' | xargs -n1 -I{} -P32 $0 "{}" "${counter_dirname}";

This looks complicated, but it can be broken down into smaller, easy-to-understand chunks. explainshell.com is rather good at demonstrating this. Particularly of note here is the $0. This variable holds the path to the currently executing script - allowing the co-ordinator to call itself in validator mode.

The validator function itself is also quite simple. In short, it runs the validator, storing the result in a variable. It then also saves the exit code for later analysis. Once done, it outputs to the standard output, and then also outputs the validator's output - but only is there was an error to keep things neat and tidy. Finally, if there was an error, it outputs a file to a temporary directory (whose name is determined by the co-ordinator and passed to sub-processes via the 2nd argument) with a name of its PID (the content doesn't matter - I just output 1, but anything will do). This allows the co-ordinator to count the number of errors that the subprocesses encounter, without having to deal with complicated locks arising from updating the value stored in a single file. Here's that in bash:

counter_dirname=$2;

# ....... 

# Use /dev/shm here since apparently while is in a subshell, so it can't modify variables in the main program O.o
    if ! [ "${validate_exit_code}" -eq 0 ]; then
        echo 1 &gt;"${counter_dirname}/$$";
    fi

Once all the subprocesses have finished up, the co-ordinator counts up all the errors and outputs the total at the bottom:

error_count=$(ls ${counter_dirname} | wc -l);

echo 
echo Errors: $error_count
echo 

Finally, the co-ordinator cleans up after the subprocesses, and exits with the appropriate error code. This last bit is important for automation, as a non-zero exit code tells the patent process that it failed. My build script (which uses my lantern build engine, which deserves a post of its own) picks up on this and halts the build if any errors were found.

rm -rf "${counter_dirname}";

if [ ${error_count} -ne 0 ]; then
    exit 1;
fi

exit 0;

That's about all there is to it! The complete code can be found at the bottom of this post. To use it, you'll need to run npm install acorn in the directory that you save it to.

I've done my best to optimize it - it can process a dozen or so files in ~1 second - but I think I can do much better if I rewrite it in Node.JS - as I can eliminate the subprocesses by calling the acorn API directly (it's a Node.JS library), rather than spawning many subprocesses via the CLI.

Found this useful? Got a better solution? Comment below!

#!/usr/bin/env sh

validate_file() {
    filename=$1;
    counter_dirname=$2;

    validate_result=$(node_modules/.bin/acorn --silent --allow-hash-bang --ecma9 --module $filename 2&gt;&amp;1);
    validate_exit_code=$?;
    validate_output=$([ ${validate_exit_code} -eq 0 ] &amp;&amp; echo ok || echo ${validate_result});
    echo "${filename}: ${validate_output}";
    # Use /dev/shm here since apparently while is in a subshell, so it can't modify variables in the main program O.o
    if ! [ "${validate_exit_code}" -eq 0 ]; then
        echo 1 &gt;"${counter_dirname}/$$";
    fi
}

if [ "$1" != "" ]; then
    validate_file "$1" "$2";
    exit $?;
fi

counter_dirname=$(mktemp -d -p /dev/shm/ -t acorn-validator.XXXXXXXXX.tmp);
# Parallelisation trick from https://stackoverflow.com/a/33058618/1460422
# Updated to use xargs
find . -not -path "./node_modules/" -not -path "./dist/" | grep -iP '\.mjs$' | xargs -n1 -I{} -P32 $0 "{}" "${counter_dirname}";

error_count=$(ls ${counter_dirname} | wc -l);

echo 
echo Errors: $error_count
echo 

rm -rf "${counter_dirname}";

if [ ${error_count} -ne 0 ]; then
    exit 1;
fi

exit 0;

Job Scheduling on Linux

Scheduling jobs to happen at a later time on a Linux based machine can be somewhat confusing. Confused by 5 4 8-10/4 6/4 * baffled by 5 */4 * * *? All will be revealed!

cron

Scheduling jobs on a Linux machine can be done in several ways. Let's start with cron - the primary program that orchestrates the whole proceeding. Its name comes from the Greek word Chronos, which means time. By filling in a crontab (read cron-table), you can tell it what to do when. It's essentially a time-table of jobs you'd like it to run.

Your Linux machine should come with cron installed already. You can check if cron is installed and running by entering this command into your terminal:

if [[ "$(pgrep -c cron)" -gt 0 ]]; then echo "Cron is installed :D"; else echo "Cron is not installed :-("; fi

If it isn't installed or running, then you'll have to investigate why this isn't the case. The most common is that it isn't installed. It's normally in the official repositories for most distributions - on Debian-based system sudo apt install cron should suffice. Arch-based users may need to check to make sure that the system service is enabled and do so manually.

With cron setup and ready to go, we can start adding jobs to it. This is done by way of a crontab, as explained above. Each user has their own crontab such that they can each configure their own individual sets jobs. To edit it, type this:

crontab -e

This will open your favourite editor with your crontab ready for editing (if you'd like to change your editor, do sudo update-alternatives --config editor or change the EDITOR environment variable). You should see a bunch of lines like this:

# Edit this file to introduce tasks to be run by cron.
# 
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
# 
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').# 
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
# 
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
# 
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
# 
# For more information see the manual pages of crontab(5) and cron(8)
# 
# m h  dom mon dow   command

I'd advise you keep this for future reference - just in case you find yourself in a pinch later - so scroll down to the bottom and start adding your jobs there.

Let's look at the syntax for telling cron about a job next. This is best done by example:

0 1 * * 7   cd /root && /root/run-backup

This job, as you might have guessed, runs a custom backup script. It's one I wrote myself, but that's a story for another time (comment below if you'd like me to post about that). What we're interested in is the bit at the beginning: 0 1 * * 7. Scheduling a cron job is done by specifying 5 space-separated values. In the case of the above, the job will run at 1am every Sunday morning. The order is as follows:

  • Minute
  • Hour
  • Day of the Month
  • Month
  • Day of the week

For of these values, a number of different specifiers can be used. For example, specifying an asterisk (*) will cause the job to run at every interval of that column - e.g. every minute or every hour. If you want to run something on every minute of the day (such as a logging or monitoring script), use * * * * *. Be aware of the system resources you can use up by doing that though!

Specifying number will restrict it to a specific time in an interval. For example, 10 * * * * will run the job at 10 minutes past every hour, and 22 3 * * * will run a job at 03:22 in the morning every day (I find such times great for maintenance jobs).

Sometimes, every hour or every minute is too often. Cron can handle this too! For example 3 */2 * * * will run a job at 3 minutes past every second hour. You can alter this at your leisure: The value after the forward slash (/) decides the interval (i.e. */3 would be every third, */15 would be every 15th, etc.).

The last column, the day of the week, is an alternative to the day of the month column. It lets you specify, as you may assume, the day oft he week a job should run on. This can be specified in 2 way: With the numbers 0-6, or with 3-letter short codes such as MON or SAT. For example, 6 20 * * WED runs at 6 minutes past 8 in the evening on Wednesday, and 0 */4 * * 0 runs every 4th hour on a Sunday.

The combinations are endless! Since it can be a bit confusing combining all the options to get what you want, crontab.guru is great for piecing cron-job specifications together. It describes your cron-job spec in plain English for you as you type!

crontab.guru showing a random cronjob spec.

(Above: crontab.guru displaying a random cronjob spec)

What if I turn my computer off?

Ok, so cron is all very well, but what if you turn your machine off? Well, if cron isn't running at the time a job should be run, then it won't get executed. For those of us who don't leave their laptops on all the time, all is not lost! It's time to introduce the second piece of software at our disposal.

Enter stage left: anacron. Built to be a complement to cron, anacron sets up 3 folders:

  • /etc/cron.daily
  • /etc/cron.weekly
  • /etc/cron.monthly`

Any executable scripts in this folder will be run at daily, weekly, and monthly intervals respectively by anacron, and it respects the hash-bang (that #! line at the beginning of the script) too!

Most server systems do not come with anacron pre-installed, though it should be present if your distributions official repositories. Once you've installed it, edit root's crontab (with sudo crontab -e if you can't remember how) and add a job that executes anacron every hour like so:

# Run anacron every hour
5 * * * *   /usr/sbin/anacron

This is important, as anacron does not in itself run all the time like cron does (this behaviour is called a daemon in the Linux world) - it needs a helping hand to get it to run.

If you've got more specific requirements, then anacron also has it's own configuration file you can edit. It's found at /etc/anacrontab, and has a different syntax. In the anacron table, jobs follow the following pattern:

  • period - The interval, in days, that the job should run
  • delay - The offset, in minutes, that the job should run at
  • job identifier - A textual identifier (without spaces, of course) that identifies the job
  • command - The command that should be executed

You'll notice that there are 3 jobs specified already - one for each of the 3 folders mentioned above. You can specify your own jobs too. Here's an example:`

# Do the weekly backup
7   20  run-backup  cd /root/data-shape-backup && ./do-backup;

The above job runs every 7 days, with an offset of 20 minutes. Note that I've included a command (the line starting with a hash #) to remind myself as to what the job does - I'd recommend you always include such a comment for your own reference - whether you're using cron, anacron, or otherwise.

I'd also recommend that you test your anacron configuration file after editing it to ensure it's valid. This is done like so:

anacron -T

I'm not an administrator, can I still use this?

Sure you can! If you've got anacron installed (you could even compile it from source locally if you haven't) and want to specify some jobs for your local account, then that's easily done too. Just create an anacrontab file anywhere you please, and then in your regular crontab (crontab -e), tell anacron where you put it like this:

# Run anacron every hour
5 * * * *   /usr/sbin/anacron -t "path/to/anacrontab"

What about one-off jobs?

Good point. cron and anacron are great for repeating jobs, but what if you want to set up a one-off job to auto-disable your firewall before enabling it just in case you accidentally lock yourself out? Thankfully, there's even an answer for this use-case too: atd.

atd is similar to cron in that it runs a daemon in the background, but instead of executing jobs specified in a crontab, you tell it when you want it to execute a series of commands, and then enter the commands themselves. For example:

$ at now + 10 minutes
warning: commands will be executed using /bin/sh
at> echo -e "Testing"  
at> uptime
at> <EOT>
job 4 at Thu Jul 12 14:36:00 2018

In the above, I tell it to run the job 10 minutes from now, and enter a pair of commands. To end the command list, I hit CTRL + D on an empty line. The output of the job will be emailed to me automatically if I've got that set up (cron and anacron also do this).

Specifying a time can be somewhat fiddly, but its also quite flexible:

  • at tomorrow
  • at now + 5 hours
  • at 16:06
  • at next month
  • at 2018 09 25

....and so on. Listing the current scheduled jobs is also just as easy:

atq

This will output a list of scheduled jobs that haven't been run yet. You can't see any jobs that aren't created by you unless you're root (use sudo), though. You can use the job ids listed here to cancel a job too:

# Remove job id 4:
atrm 4

Conclusion

That just about concludes this whirlwind tour of job scheduling on Linux systems. We've looked at how to schedule jobs with cron, and how to ensure our jobs get run - even when the target machine isn't turned on all the time with anacron. We've also looked at one-time jobs with atd, and how to manage the job queue.

As usual, this is a starting point - not an ending point! Job scheduling is just the beginning. From here, you can look at setting up automated backups. You could investigate setting up an email server, and how that integrates with cron. You can utilise cron to perform maintenance for your next great web (or other!) application. The possibilities are endless!

Found this useful? Still confused? Comment below!

Read / Write Disk Performance Testing in Bash

Recently I needed to quickly (and non-destructively) test the read / write performance of a flash drive of mine. Naturally, I turned my attention to my terminal. This post is me documenting what I did so that I can remember for next time :P

Firstly, to test the speed of a disk, we need some data to test with. Since lots of small files will inevitably cause slowdowns due to the overhead of writing the file metadata and inode information to the superblock, it makes the most sense to use one gigantic file rather than tons of small ones. Here's what I did to generate a 1 Gigabyte file filled with zeroes:

dd if=/dev/zero of=/tmp/testfile.bin bs=1M count=1024

Cool. Next, we need to copy it to the target disk and measure the time it took. Then, since we know the size of the file (1073741824 bytes, to be exact), we can calculate the speed at which the copy took place. Here's my first attempt:

time dd if=/tmp/testfile.bin >testfile.bin

If you run this, you might find that it doesn't take it very long at all, and you get a speed of something like ~250MiB / sec! While impressive, I seriously doubt that my flash drive has that kind of speed behind it. Typically, flash memory takes longer to write to and read from - and I'm pretty sure that it can't read from it that fast either. So what's going on?

Well, it turns out that Linux is caching the disk write operations in a buffer, and then doing them in the background for us. Whilst fine for ordinary operation, this doesn't give us an accurate representation of how fast it's actually writing to the disk. Thankfully, there's something we can do about this: Use the sync command. sync will flush all cached write operations to disk for us, giving us the actual time it took to write the 1 GiB file to disk. Here's the altered command:

sync;
time sh -c 'dd if=/tmp/testfile.bin >testfile.bin; sync'

Very cool! Now, we can just take the time it took and do some simple maths to calculate the write speed of our disk. What about the read speed though? Well, to test that, we'll first need to clear out the page cache - another one of Linux's (many) caches that holds portions of files that have recently been accessed for faster retrieval - because as before, we're not interested in the speed of the cache! Here's how to do that:

echo 1 | sudo tee /proc/sys/vm/drop_caches

With the correct cache cleared, we can test the read speed accurately. Here's how I did it:

time dd if=testfile.bin of=/dev/null

Fairly simple, right? At a later date I might figure out a way of automating this, but for the occasional use now and again this works just fine :)

Found this useful? Got a better way of doing it? Want to say hi? Post in the comments below!

Securing a Linux Server Part 2: SSH

Wow, it's been a while since I posted something in this series! Last time, I took a look at the Uncomplicated Firewall, and how you can use it to control the traffic coming in (and going out) of your server. This time, I'm going to take a look at steps you can take to secure another vitally important part of most servers: SSH. Used by servers and their administrators across the world to talk to one another, if someone manages to get in who isn't supposed to, they could do all kinds of damage!

The first, and easiest thing we can do it improve security is to prevent the root user logging in. If you haven't done so already, you should create a new user on your server, set a good password, and give it superuser privileges. Login with the new user account, and then edit /etc/ssh/sshd_config, finding the line that says something like

PermitRootLogin yes

....and change it to

PermitRootLogin no

Once done, restart the ssh server. Your config might be slightly different (e.g. it might be PermitRootLogin without-password) - but the principle is the same. This adds an extra barrier to getting into your server, as now attackers must not only guess your password, but your username as well (some won't even bother, and keep trying to login to the root account :P).

Next, we can move SSH to a non-standard port. Some might argue that this isn't a good security measure to take and that it doesn't actually make your server more secure, but I find that it's still a good measure to take for 2 reasons: defence in depth, and preventing excessive CPU load from all the dumb bots that try to get in on the default port. With that, it's make another modification to /etc/ssh/sshd_config. Make sure you test at every step you take, as if you lock yourself out, you'll have a hard time getting back in again....

Port 22

Change 22 in the above to any other number between about 1 and 65535. Next, make sure you've allowed the new port through your firewall! If you're using ufw, my previous post (link above) gives a helpful guide on how to do this. Once done, restart your SSH server again - and try logging in before you close your current session. That way if you make a mistake, you can fix through your existing session.

Once you're confident that you've got it right, you can close port 22 on your firewall.

So we've created a new user account with a secure password (tip: use a password manager if you have trouble remembering it :-)), disabled root login, and moved the ssh port to another port number that's out of the way. Is there anything else we can do? Turns out there is.

Passwords are not the only we can authenticate against an SSH server. Public private keypairs can be used too - and are much more secure - and convenient - than passwords if used correctly. You can generate your own public-private keypair like so:

ssh-keygen -t ed25519

It will ask you a few questions, such as a password to encrypt the private key on disk, and where to save it. Once done, we need to tell ssh to use the new public-private keypair. This is fairly easy to do, actually (though it took me a while to figure out how!). Simply edit ~/.ssh/config (or create it if it doesn't exist), and create (or edit) an entry for your ssh server, making it look something like this:

Host bobsrockets.com
    Port            {port_name}
    IdentityFile    {path/to/private/keyfile}

It's the IdentityFile line that's important. The port line simply makes it such that you can type ssh bobsrockets.com (or whatever your server is called) and it will figure out the port number for you.

With a public-private keypair now in use, there's just one step left: disable password-based logins. I'd recommend trailing it for a while to make sure you haven't messed anything up - because once you disable it, if you lose your private key, you won't be getting back in again any time soon!

Again, open /etc/ssh/sshd_config for editing. Find the line that starts with PasswordAuthentication, and comment it out with a hash symbol (#), if it isn't already. Directly below that line, add PasswordAuthentication no.

Once done, restart ssh for a final time, and check it works. If it does, congratulations! You've successfully secured your SSH server (to the best of my knowledge, of course). Got a tip I haven't covered here? Found a mistake? Let me know in a comment below!

Rendering LaTeX documents to PDF: Attempt #2

It was all going rather well, actually - until I discovered that pandoc doesn't support regular bibliographies / references. Upon discovering this, I ended up with a bit of problem. Thankfully, the answer lay in pdflatex - but getting to the point where I could use it without having it crash on me (which, by the way it can't accomplish properly - it gives an exit code of 0 when crashing! O.o) was not a trivial journey.

This blog post is a follow up to my first post on rendering LaTeX documents with pandoc, and is my attempt to document what I did to get it to work. To start with, I installed texlive properly. Here's how to do that on apt-based systems:

sudo apt install texlive-latex-extra --no-install-recommends

The no-install-recommends is useful here to avoid ~450MiB of useless documentation (in PDF form, apparently) being dumped to your hard drive. I've also got an arch-based system (it's actually Artix Linux, that I've blogged about) which I've done this on, so here's the install command for those kind of systems:

sudo pacman -S texlive-latexextra

Once that's installed, we can use it to render our LaTeX document to PDF. Upon discussing my issues with my Lecturer at University, I discovered that you actually have to run 3 commands in succession in order to render a single PDF. Here they are:

bibtex filename
pdflatex --output-directory=. filename.tex
pdflatex --output-directory=. filename.tex

The first one compiles the bibliography using BiBTeX. If it isn't installed already, you might need to search your distribution's repositories and install it. Next, we run the LaTeX file through pdflatex from TeXLive not once but twice - as it apparently needs to resolve the references on the first pass (why it can't do them all in once pass I have no idea :P).

It's also worth noting that the bibtex command doesn't like you to append the filename extension - it does it automatically, apparently.

That's about everything I've got on the process so far. If you've got anything else to add, please let me know in the comments below (I'm rather new to this whole LaTeX thing....)

Further Reading

Rendering LaTeX documents to PDF on Linux (and maybe Windows too)

I'm starting to write another report for University, and unlike other reports, this one apparently has to be a rather specific format. To that end, I've got two choices, apparently: Use the provided Word / LibreOffice template, or use a LaTeX template instead. After the trouble and frustration I had with LibreOffice for my previous report, I've naturally decided that using the LaTeX template might be a good idea.

After downloading it, I ended up doing some research and troubleshooting to get it to render properly to a PDF. Now that I've figured it out, I thought I'd share it here for anyone else who ends up experiencing difficulties or is unsure on how it's done.

The way I'm going to be using it is with a tool called Pandoc. First, install it like so:

sudo apt install pandoc texlive-fonts-recommended

Adjust as necessary for your distribution - Windows users will need to read the download instructions. The texlive-font-recommended package is ~66MiB(!), but it contains a bunch of fonts that are needed when you're rendering LaTeX documents, apparently.

With the dependencies installed, here's the command to convert a LaTeX document to a PDF:

pandoc -s input.tex -o output.pdf

Replace index.tex with the path to your input file, and output.pdf with the desired path to the output file. I haven't figured out how to set the font to sans-serif yet, but I'll probably make another post about it when I do.

Found this helpful? Still having issues? Let me know below! I don't have analytics on here, so that's the only way I'll know if anyone reads this :-)

Sources and Further Reading

Art by Mythdael