Starbeamrainbowlabs

Stardust
Blog


Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter

Tag Cloud

3d account algorithms announcement archives arduino artificial intelligence assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compiling css dailyprogrammer debugging demystification distributed computing downtime embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday html html5 html5 canvas interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance network networking node.js operating systems performance photos php pixelbot portable privacy programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control visual web website windows windows 10 xmpp

Debug your systemd services with journalctl

Some nice bulbs and bits of wood.

The chances are that if you're using linux, you will probably have run into systemd. If you find yourself in the situation where you've got a systemd service that keeps dying and you don't know why (I've been there before several times!), and there's nothing helpful in /var/log, before you give up, you might want to give journalctl a try. It's systemd's way of capturing the output of a service and storing it in it's logging system (or something).

When I first found out about it, I read that apparently journalctl -xe servicename would show me the logs for any given service. It turned out that it wasn't the case (it just threw a nasty error), so I went trawling through the man pages and found the correct command-line switch. If you've got a service called rocketbooster.service, and you want to see if systemd has any logs stored for it, then you can execute this command:

journalctl --unit rocketbooster.service

...or for short

journalctl -u rocketbooster.service

It should open the logs (if there are any) in less - with the oldest logs at the top, so you might need to scroll all the way down to the bottom to see anything that's relevant to your problem (shift + G will take you to the bottom of the file).

I've found that systemd has a habit of rotating the logs too - and journalctl doesn't appear to know how to access the rotated logs, so it's best if you use this command as soon as possible after failure (suggestions on how to access these rotated logs are welcome! Post down in the comment :D).

I thought I'd document it here in case it was useful to anyone - and so I don't forget myself! :P

Access your home linux box from anywhere with SSH tunnels

An abstract tunnel that doesn't hold much relevant to the blog post :P

(Header by GDJ from openclipart.org. Source page)

....and other things! Recently, I bought a Raspberry Pi 3. Now that the rest of the components have arrived, I've got a rather nice little home server that's got a 1 terabyte WD PiDrive attached to it to provide lots of lovely shared storage, which is rather nice.

However, within a few weeks I was faced with a problem. How do I access my new box to configure it from my internship when I'm on lunch? Faced with such a challenge, I did what anyone would, and took to the internet to find a solution.

It didn't take long. A while ago I heard about these things called 'SSH tunnels', which, while not designed for a high throughput, are more than adequate for a low-intensity SSH connection that runs a few kilobytes a second in either direction. After reading this excellent answer by erik on the Unix & Linux StackExchange, I had an understanding of how SSH tunnels work, and was ready to put together a solution. You should go and read that answer if you'd like to understand SSH tunnels too - it explains it much better than I ever could :P

With that knowledge in hand, I went about planning the SSH tunnel. I already have a server a public IP address (it's hosting this website!), so I needed a reverse tunnel to allow me to access a port local to my linux box at home (called elessar - a virtual cookie for anyone who gets the reference!) from starbeamrainbowlabs.com.

Important! Ask yourself whether it's moral and ethical to set up an ssh tunnel before you think about following along with this article! If you find yourself behind a firewall or something similar, then the chances are that it's there for a good reason - and you might get into trouble if you try and circumvent it. I won't be held responsible for any loss or damages of any description caused by the reading of this post.

First job: create a limited account on starbeamrainbowlabs.com for elessar to SSH into. That's easy:

sudo useradd --system ssh-tunnel

Then, with a few quick lines in /etc/ssh/sshd_config:

Match User ssh-tunnel
    ForceCommand echo 'This account can only be used for ssh tunnelling.'

....we can prevent the ssh-tunnel user from being abused to gain shell access to the server (let me know if there are any further measures I can put in place here).

Now that I had a user account to ssh in as, I could set up a public / private keypair to authenticate with starbeamrainbowlabs.com, and cook up an SSH command for elessar that would set up the appropriate tunnel. After fiddling around a bit, I came up with this that did the job:

ssh -TN -R30582:localhost:5724 ssh-tunnel@starbeamrainbowlabs.com

Very cool. So with that command executing on elessar, I could ssh into elessar from starbeamrainbowlabs.com! In short, it sets up a tunnel that will make port 30582 on starbeamrainbowlabs.com tunnel through to port 5724 on elessar - the port on elessar that has SSH running on it, without allocating a pseudo-tty to save resources. explainshell.com can, well, explain it in more detail if you're interested.

Having an SSH command that would set up the tunnel is nice, but it's not very useful, since I have to execute it first before I can actually SSH into elessar from afar.

The solution was actually a little bit complicated. First, I wrote a simple systemd service file (systemd is what I have installed, since it's vanilla raspbian - this should be easily adaptable to other systems and setups) to start the SSH tunnel automagically on boot:

[Unit]
Description=SSH tunnel from starbeamrainbowlabs.com to local ssh server.

[Service]
Type=simple
ExecStart=/usr/bin/ssh -TN -R30582:localhost:5724 ssh-tunnel@starbeamrainbowlabs.com

[Install]
WantedBy=network-online.target

I quickly realised that there were a few flaws with this approach. Firstly, it tried to start the SSH connection before my router had connected to the internet, since my router starts faster than the box that initialises the fibre connection to my ISP. Secondly, it fails to retry when the connection dies.

The first problem can be solved relatively easily, by wrapping the ssh command in a clever bit of shell scripting:

/bin/sh -c 'until ping -c1 starbeamrainbowlabs.com &>/dev/null && sleep 5; do :; done && /usr/bin/ssh -TN -R30582:localhost:5724 ssh-tunnel@starbeamrainbowlabs.com

The above tries to ping starbeamrainbowlabs.com every 5 seconds until it succeeds, and only then does it attempt to open the SSH connection. This solves the first problem. To solve the second, we need to look at autossh. Autossh is a small tool that monitors an ssh connection in a variety of configurable ways and restarts the connection if ever dies for whatever reason. You can install it with your favourite package manager:

sudo apt install autossh

Substitute apt with whatever package manager you use on your system. With it installed, we can use a command like this:

autossh -o "UserKnownHostsFile /home/ssh-tunnel/.ssh/known_hosts" -o "IdentityFile /home/ssh-tunnel/.ssh/ssh-tunnel_ed25519" -o "PubkeyAuthentication=yes" -o "PasswordAuthentication=no" -o "ServerAliveInterval 900" -TN -R30582:localhost:5724 -p 7261 ssh-tunnel@starbeamrainbowlabs.com

to automatically start our ssh tunnel, and restart it if anything goes wrong. Note all the extra settings I had to specify here. This is because even though I had many of them specified in ~/.ssh/config for the ssh-tunnel user, because of systemd's weird environment when it starts a service, I found I had to specify everything in the command line with absolute paths (ugh).

Basically, the above tells autossh where the known_hosts file is (important for automation!), that it should only attempt public / private keypair authentication and not password authentication, that it should check the server's still there every 15 minutes, and all the other things we figured out above.

Finally, I combined the solutions I came up with for both problems, which left me with this:

[Unit]
Description=SSH tunnel from starbeamrainbowlabs.com to local ssh server.

[Service]
Type=simple
ExecStart=/bin/sh -c 'until ping -c1 starbeamrainbowlabs.com &>/dev/null && sleep 5; do :; done && /usr/bin/autossh -o "UserKnownHostsFile /home/pi/.ssh/known_hosts" -o "IdentityFile /home/pi/.ssh/ssh-tunnel_ed25519" -o "PubkeyAuthentication=yes" -o "PasswordAuthentication=no" -o "ServerAliveInterval 900" -TN -R30582:localhost:5724 -p 7261 ssh-tunnel@starbeamrainbowlabs.com'

[Install]
WantedBy=network-online.target

Here's a version that utilises the -f parameter of autossh to put the autossh into the background, which eliminates the sh parent process:

[Unit]
Description=SSH tunnel from starbeamrainbowlabs.com to local ssh server.

[Service]
Type=forking
Environment=AUTOSSH_PIDFILE=/var/run/sbrl-ssh-tunnel/ssh-tunnel.pid
PIDFile=/var/run/sbrl-ssh-tunnel/ssh-tunnel.pid
ExecStartPre=/bin/mkdir -p /var/run/sbrl-ssh-tunnel
ExecStartPre=-/bin/chown ssh-tunnel:ssh-tunnel /var/run/sbrl-ssh-tunnel
ExecStart=/bin/sh -c 'until ping -c1 starbeamrainbowlabs.com &>/dev/null && sleep 5; do :; done && /usr/bin/autossh -f -o "UserKnownHostsFile /home/pi/.ssh/known_hosts" -o "IdentityFile /home/pi/.ssh/ssh-tunnel_ed25519" -o "PubkeyAuthentication=yes" -o "PasswordAuthentication=no" -o "ServerAliveInterval 900" -TN -R30582:localhost:5724 -p 7261 ssh-tunnel@starbeamrainbowlabs.com'

[Install]
WantedBy=network-online.target

I ended up further modifying the above to set up an additional tunnel to allow elessar to send emails via the postfix email server that's running on starbeamrainbowlabs.com. Let me know if you'd be interested in a tutorial on this!

Sources and Futher Reading

Learn your terminal (or command line)

Enter stage left: the terminal (or command line, on windows). That window with strange white text on a black background. You might not see it, but every operating system has one - humming away in the background, just waiting to be used, but epic arcane skills are needed to navigate this bizarre and perhaps dated window into your computer.... or so it seems.

When you think of your computer, you will probably think of a GUI (a.k.a. goo-ey), with windows, a cursor, and perhaps a few buttons. GUIs make it easy for newcomers to easily find their way around a computer by referencing things that exist in the real world (e.g. folders and files, a floppy disk on the save button, etc.), but they can be inherently slower to use - especially for long series of perhaps repetitive tasks that stay essentially the same.

A terminal (linux and friends) or a command line (windows) is another view into your computer. It's a way of controlling your computer with text. Text that follows particular set of rules, that can be saved and repeated at will through the use of scripts. It's built on commands, each of which does one thing and one thing well. On their own they're mildly useful, but together they form a powerful framework that can perform almost any task. It's certainly different (and there's a little bit of learning curve, to be sure), but not as hard or arcane as you might think it currently.

A knowledge of the terminal or command line on your computer can be rather useful - especially so for those involved in computer science or technical support. How long would it take you to flatten a large set of deeply nested folders with a GUI? Or convert and recompress few folders worth of videos? Or even renew all your ssl certificates on your web server? All of these things can be automated through the use of a terminal or command line.

Even if you're just a casual computer user who's not into programming, it's still worth at least looking into. Perhaps it'll save you some time! Perhaps it'll save you from asking your friend where something is on their computer when you can't find it. Maybe it'll even save you if your computer suddenly decides it doesn't want to boot up properly. And you'll look cool doing it too :P (What better reason is there?)

If I've somehow managed to convince you to dive in and take up the challenge learning, then I'll end this somewhat different post with a collection of places you can go to get started.

Make your linux learning experience painless with tldr-pages!

If you've been learning linux for a little while, you'll probably have encountered man pages. They are the complete documentation of all the tools, commands(, and kernel functions) available on the system you're currently on (read them online here!). If you have encountered them, you'll also know that they usually are somewhat... verbose.

Enter stage left: tldr-pages!

tldr-pages is an ongoing effort to create a repository of simplified man pages, that document the most common usages of a command. How about this, for the tar command?

# tar

> Archiving utility.
> Often combined with a compression method, such as gzip or bzip.

- Create an archive from files:

`tar cf {{target.tar}} {{file1 file2 file3}}`

- Create a gzipped archive:

`tar czf {{target.tar.gz}} {{file1 file2 file3}}`

- Extract an archive in a target folder:

`tar xf {{source.tar}} -C {{folder}}`

- Extract a gzipped archive in the current directory:

`tar xzf {{source.tar.gz}}`

- Extract a bzipped archive in the current directory:

`tar xjf {{source.tar.bz2}}`

- Create a compressed archive, using archive suffix to determine the compression program:

`tar caf {{target.tar.xz}} {{file1 file2 file3}}`

- List the contents of a tar file:

`tar tvf {{source.tar}}`

...or this for git reset?

# git reset

> Undo commits or unstage changes, by resetting the current git HEAD to the specified state.
> If a path is passed, it works as "unstage"; if a commit hash or branch is passed, it works as "uncommit".

- Unstage everything:

`git reset`

- Unstage specific file(s):

`git reset {{path/to/file(s)}}`

- Unstage portions of a file:

`git reset -p {{path/to/file}}`

- Undo the last commit, keeping its changes (and any further uncommitted changes) in the filesystem:

`git reset HEAD~`

- Undo the last two commits, adding their changes to the index, i.e. staged for commit:

`git reset --soft HEAD~2`

- Discard any uncommitted changes, staged or not (for only unstaged changes, use `git checkout`):

`git reset --hard`

- Reset the repository to a given commit, discarding committed, staged and uncommitted changes since then:

`git reset --hard {{commit}}`

For those learning linux and the terminal, I think it's an invaluable tool. It helps you out by showing you how to perform common tasks. As you get more experienced though, it becomes useful in another way: showing you how to do those things that you don't do often enough to remember off the top of your head.

I'm probably a bit biased, since I've been contributing to the project for a while (and the nice folks over there recently promoted me to the rank of maintainer :D), so you should check it out for yourself! There's even an online client that you can use without installing anything :-) Once you're ready to install a client directly in your terminal, there's an extensive list of clients documented on the repository wiki, with one available for every environment and platform.

If you encounter a command that hasn't been documented yet, then they've also made it easy to contribute a page yourself.

I think the idea is rather cool, actually - as you've probably guessed by now! Let me know what you think of it in the comments.

Weekend Challenge: Detecting and Decoding Morse Code in an Audio File

A banner full of morse code. The depicted audio processing program in the background is ocenaudio. Recently I received a message in morse code from a family member using this site. It said that the sender had hidden the message, so I was presented with 2 options: I could sit and decode the message by listening to it over and over again, or write a program to do it for me.

Naturally, as a computer science student and enthusiast, I chose the second option. My first problem: Capture a recording of the target morse code. This was easy - the audio-recorder package in the ubuntu repositories solved that one easily, as it has an option to record the audio output of my laptop.

Second problem: Figure out how to read the recording in and extract the samples in C♯. This one wasn't so easy. Amidst issues with flatpak and Monodevelop (flatpak is terrible!), I eventually found the NAudio (Codeplex, GitHub, NuGet) package to do the job. After some digging, I discovered that NAudio is actually really powerful! It's got some pretty advanced functions for handling audio that I'll have to explore at a later date.

Anyway, with a plan of action I set to work. - decided to work in reverse, so the first thing I needed was a chart that converted morse code into the latin alphabet. Wikipedia to the rescue:

A chart showing how to convert morse code into the latin alphabet.

(Source, Direct link, My Mirror)

With a handy-dandy conversion chart, it was relatively simple to create a class to handle the conversion from dots and dashes to the latin alphabet automatically:

using System;
using System.Collections.Generic;

namespace SBRL.Algorithms.MorseCodeTranslator
{
    /// <summary>
    /// A simple class to translate a morse code string into a normal string.
    /// </summary>
    /// <license>Mozilla Public License version 2.0</license>
    /// <origin></origin>
    /// <author>Starbeamrainbowlabs (https://starbeamrainbowlabs.com/)</author>
    /// <changelog>
    /// v0.1 - 26th May 2017:
    ///      - Creation! 😁
    /// </changelog>
    public static class MorseDecoder
    {
        /// <summary>
        /// The morse code lookup table. Use the methods in this class is possible,
        /// rather than accessing this lookup table directly!
        /// </summary>
        public static Dictionary<string, char> morseCodeLookup = new Dictionary<string, char>()
        {
            [".-"] = 'a',
            ["-..."] = 'b',
            ["-.-."] = 'c',
            ["-.."] = 'd',
            ["."] = 'e',
            ["..-."] = 'f',
            ["--."] = 'g',
            ["...."] = 'h',
            [".."] = 'i',
            [".---"] = 'j',
            ["-.-"] = 'k',
            [".-.."] = 'l',
            ["--"] = 'm',
            ["-."] = 'n',
            ["---"] = 'o',
            [".--."] = 'p',
            ["--.-"] = 'q',
            [".-."] = 'r',
            ["..."] = 's',
            ["-"] = 't',
            ["..-"] = 'u',
            ["...-"] = 'v',
            [".--"] = 'w',
            ["-..-"] = 'x',
            ["-.--"] = 'y',
            ["--.."] = 'z',
            [".----"] = '1',
            ["..---"] = '2',
            ["...--"] = '3',
            ["....-"] = '4',
            ["....."] = '5',
            ["-...."] = '6',
            ["--..."] = '7',
            ["---.."] = '8',
            ["----."] = '9',
            ["-----"] = '0',
        };

        /// <summary>
        /// Translates a single letter from morse code.
        /// </summary>
        /// <param name="morseSource">The morse code to translate.</param>
        /// <returns>The translated letter.</returns>
        public static char TranslateLetter(string morseSource)
        {
            return morseCodeLookup[morseSource.Trim()];
        }

        /// <summary>
        /// Translates a string of space-separated morse code strings from morse code.
        /// </summary>
        /// <param name="morseSource">The morse code to translate.</param>
        /// <returns>The translated word.</returns>
        public static string TranslateWord(string morseSource)
        {
            string result = string.Empty;

            string[] morseLetters = morseSource.Split(" ".ToCharArray());

            foreach(string morseLetter in morseLetters)
                result += TranslateLetter(morseLetter);

            return result;
        }

        /// <summary>
        /// Translates a list of morse-encoded words.
        /// </summary>
        /// <param name="morseSources">The morse-encoded words to decipher.</param>
        /// <returns>The decoded text.</returns>
        public static string TranslateText(IEnumerable<string> morseSources)
        {
            string result = string.Empty;
            foreach(string morseSource in morseSources)
                result += $"{TranslateWord(morseSource)} ";
            return result.Trim();
        }
    }
}

That was easy! The next challenge to tackle was considerably more challenging though: Read in the audio file and analyse the samples. I came up with that I think is a rather ingenious design. It's best explained with a diagram:

The algorithm I created to decode an morse code signal from an audio file.

  1. Read the raw samples into a buffer. If there isn't enough space to hold it all at once, then we handle it in chunks.
  2. Move a sliding-window along the raw buffer, with a width of 100 samples and sliding along 25 samples at a time. Extracts the maximum value from the window each time and places it in the windowed buffer.
  3. Analyse the windowed buffer and extract context-free tokens that mark the start or end of a tone.
  4. Convert the context-free tokens into ones that hold the starting point and length of the tones.
  5. Analyse the contextual tokens to extract the morse code as a string
  6. Decipher the morse code string

It's a pretty complicated problem when you first think about it, but breaking it down into steps as I did in the above diagram really helps in figuring out how you're going to tackle it. I, however, ended up drawing the diagram after Id finished writing the program.... I appear to find it easy to break things down in my head - it's only when it gets too big to remember all at once or if I'm working with someone else that I draw diagrams :P

Having drawn up an algorithm and 6 steps I needed to follow to create the program, I spent a happy afternoon writing some C♯. While the remainder of the algorithm is not too long (only ~202 lines), it's a bit too long to explain bit by bit here. I have uploaded the full program to a repository on my personal git server, which you can find here: sbrl/AudioMorseDecoder.

If you're confused about any part of it, ask away in the comments below! Binaries available on request.

I'll leave you with a pair of challenging messages of my own to decode. Try not to use my decoder - write your own!

Message A (easy), Message B (hard) (hard message generated with cwwav)

Let's build a weighted random number generator!

Ever wondered how random loot in a dungeon is generated? Or how the rooms in a procedurally generated castle might be picked? Perhaps you need to skew the number of times an apple is picked by your game engine over a banana. If you've considered any of these things, then you want a weighted random number generator. In this post, I'll be showing you how I built one, and how you can build one too.

If you're interested in trying to build one for yourself first though, then look away now! Come back when you're done (or stuck) to see my solution.

To start with, let's consider what a weighted random number generator actually is. Let's say we've got 3 rewards for a treasure chest: a cool-looking shield, a health potion, and a fancy ring. We want to give the player 1 of the 3 when they option the chest, making sure that the health potion is more common than the others. We can represent that as a ratio: $3 : 4 : 3$.

The introduction image that explains my weighted random number generator.

(Above: The ratio between the different items. See below for the explanation of the math!).

In order to pick one of the 3 items using the ratio, we need to normalise the ratio so that it's between $0$ and $1$. That's rather easy, as far as maths goes: All we have to do is convert each part of the ratio into a fraction, and that into a decimal. Let's calculate the denominator of the fraction first. That's easy-peasy too - we just add up all the parts of the ratio, as we want to represent each part as a fraction of a whole: $3 + 4 + 3 = 10$. With our denominator sorted, we can convert each part into a fraction:

$$ \frac{3}{10} + \frac{4}{10} + \frac{3}{10} = 1 $$

Fractions are nice, but it's be better to have that as a decimal:

$$ 0.3 + 0.4 + 0.3 = 10 $$

That's much better. Now, with the initial theory out of the way, let's start writing a class for it.

using System;
using System.Collections.Generic;
using System.Linq;

namespace SBRL.Algorithms
{
    public class WeightedRandom<ItemType>
    {
        protected Random rand = new Random();

        protected Dictionary<double, ItemType> weights = new Dictionary<double, ItemType>();

        /// <summary>
        /// Creates a new weighted random number generator.
        /// </summary>
        /// <param name="items">The dictionary of weights and their corresponding items.</param>
        public WeightedRandom(IDictionary<double, ItemType> items)
        {
            if(items.Count == 0)
                throw new ArgumentException("Error: The items dictionary provided is empty!");

            double totalWeight = items.Keys.Aggregate((double a, double b) => a + b);
            foreach(KeyValuePair<double, ItemType> itemData in items)
                weights.Add(itemData.Key / totalWeight, itemData.Value);
        }
    }
}

I've created a template class here, to allow the caller to provide us with any type of item (so long as they are all the same). That's what the <ItemType> bit is on the end of the class name - it's the same syntax behind the List class:

List<TreasureReward> rewards = new List<TreasureReward>() {
    TreasureReward.FromFile("./treasure/coolsword.txt"),
    TreasureReward.FromFile("./treasure/healthpotion.txt"),
    TreasureReward.FromFile("./treasure/fancyring.txt"),
};

Next, let's go through that constructor bit by bit. First, we make sure that we actually have some weights in the first place:

if(items.Count == 0)
    throw new ArgumentException("Error: The items dictionary provided is empty!");

Then, it's more Linq to the rescue in calculating the total of the weights we've been provided with:

double totalWeight = items.Keys.Aggregate((double a, double b) => a + b);

Finally, we loop over each of the items in the provided dictionary, dividing them by the sum of the weights and adding them to our internal dictionary of normalised weights.

foreach(KeyValuePair<double, ItemType> itemData in items)
    weights.Add(itemData.Key / totalWeight, itemData.Value);

Now that we've got our items loaded and the weights normalised, we can start picking things from our dictionary. For this part, I devised a sort of 'sliding window' algorithm to work out which item to pick. It's best explained through a series of whiteboard images:

Basically, I have 2 variables: lower and higher. When I loop over each of the weights, I do the following things:

  1. Add the current normalised weight to higher
  2. Check if the target is between lower and higher a. If it is, then return the current item b. If not, then keep going
  3. Bring lower up to the same value as higher
  4. Loop around again until we find the weight in which the target lies.

With that in mind, here's the code I cooked up:

/// <summary>
/// Picks a new random item from the list provided at initialisation, based
/// on the weights assigned to them.
/// </summary>
/// <returns>A random item, picked according to the assigned weights.</returns>
public ItemType Next()
{
    double target = rand.NextDouble();

    double lower = 0;
    double higher = 0;
    foreach(KeyValuePair<double, ItemType> weightData in weights)
    {
        higher += weightData.Key;
        if(target >= lower && target <= higher)
            return weightData.Value;
        lower += weightData.Key;
    }

    throw new Exception($"Error: Unable to find the weight that matches {target}");
}

That pretty much completes the class. While it seems daunting at first, it's actually quite easy once you get your head around it. Personally, I find whiteboards very useful in that regard! Here's the completed class:

(License: MPL-2.0)

Found this interesting? Got stuck? Have a suggestion for another cool algorithm I could implement? Comment below!

Markov Chains Part 2: Unweighted Chains

Hello and welcome to the second part of this mini-series about markov chains. In the last part, I explained what an n-gram was, and how I went about generating them.

In this part, I'll get to the meat of the subject: The markov chain itself. To start with (to simplify matters) I'll be looking at unweighted markov chains.

A markov chain, in essence, takes the n-grams we generated last time, and picks one to start with. It then takes the all but the first character of the n-gram it chose, and finds all the n-grams in it's library that begin with that sequence of characters. After drawing up a list of suitable n-grams, it picks one at random, and tacks the last character in the n-gram it chose onto the end of the first n-gram.

Then, it starts the whole process all over again with the 2nd n-gram it chose, and then the 3rd, and so on until it either a) hits a brick wall and can't find any suitable n-grams to use next, or b) reaches the desired length of word it was asked to generate.

An unweighted markov chain, as I call it, does not take the frequency of the source n-grams in the original text into account - it just picks the next n-gram from the list randomly.

With explanations and introductions out of the way, let's get down to some code! Since the markov chain is slightly more complicated, I decided to write a class for it. Let's start with one of those, then:

using System;
using System.Collections.Generic;
using System.Linq;

namespace SBRL.Algorithms.MarkovGrams
{
    /// <summary>
    /// An unweighted character-based markov chain.
    /// </summary>
    public class UnweightedMarkovChain
    {

    }
}

I've also added a few using statements for later. Our new class is looking a bit bare. how about some methods to liven it up a bit?

/// <summary>
/// Creates a new character-based markov chain.
/// </summary>
/// <param name="inNgrams">The ngrams to populate the new markov chain with.</param>
public UnweightedMarkovChain(IEnumerable<string> inNgrams)
{

}

/// <summary>
/// Returns a random ngram that's currently loaded into this UnweightedMarkovChain.
/// </summary>
/// <returns>A random ngram from this UnweightMarkovChain's cache of ngrams.</returns>
public string RandomNgram()
{

}

/// <summary>
/// Generates a new random string from the currently stored ngrams.
/// </summary>
/// <param name="length">
/// The length of ngram to generate.
/// Note that this is a target, not a fixed value - e.g. passing 2 when the n-gram order is 3 will
/// result in a string of length 3. Also, depending on the current ngrams this markov chain contains,
/// it may end up being cut short. 
/// </param>
/// <returns>A new random string.</returns>
public string Generate(int length)
{

}

That's much better. Let's keep going - this time with some member variables:

/// <summary>
/// The random number generator
/// </summary>
Random rand = new Random();

/// <summary>
/// The ngrams that this markov chain currently contains.
/// </summary>
List<string> ngrams;

We'll need that random number generator later! As for the List<string>, we'll be using that to store our n-grams - but you probably figured that one out for yourself :P

The class isn't looking completely bare anymore, but we can still do something about those methods. Let's start with that constructor:

public UnweightedMarkovChain(IEnumerable<string> inNgrams)
{
    ngrams = new List<string>(inNgrams);
}

Easy peasy! It just turns the IEnumerable<string> into a List<string> and stores it. Let's do another one:

public string RandomNgram()
{
    return ngrams[rand.Next(0, ngrams.Count)];
}

We're on a roll here! This is another fairly simple method - it just picks a random n-gram from the dictionary. We'll need this for our 3rd, and most important, method, Generate(). This one's a bit more complicated, so let's take it in a few stages. Firstly, we need an n-gram to start the whole thing off. We also need to return it at the end of the method.

string result = RandomNgram();

return result;

While we're at it, we'll also need a variable to keep track of the last n-gram in the chain, so we can find an appropriate match to come next.

string lastNgram = result;

Then we'll need a loop to keep adding n-grams to the chain. Since we're not entirely sure how long we'll be looping for (and we've got fairly complicated stop conditions, as far as that kind of thing goes), I decided to use a while loop here.

while(result.Length < length)
{

}

That's the first of our 2 stop conditions in place, too! We want to stop when the word we're working on reaches it's desired length. Now, we can write the bit that works out which n-gram should come next! This bit goes inside the while loop we created above (as you might suspect). First, let's fetch a list of n-grams that would actually make sense coming next.

// The substring that the next ngram in the chain needs to start with
string nextStartsWith = lastNgram.Substring(1);
// Get a list of possible n-grams we could choose from next
List<string> nextNgrams = ngrams.FindAll(gram => gram.StartsWith(nextStartsWith));

With a bit of Linq (Language-INtrgrated Query), that isn't too tough :-) If you haven't seen linq before, then I'd highly recommend you check it out! It makes sorting and searching datasets much easier. The above is quite simple - I just filter our list of n-grams through a function that extracts all the ones that start with the appropriate letter.

It's at this point that we can insert the second of our two stopping conditions. If there aren't any possible n-grams to pick from, then we can't continue.

// If there aren't any choices left, we can't exactly keep adding to the new string any more :-(
if(nextNgrams.Count == 0)
    break;

With our list of possible n-grams, we're now in a position to pick one at random to add to the word. It's LINQ to the rescue again:

// Pick a random n-gram from the list
string nextNgram = nextNgrams.ElementAt(rand.Next(0, nextNgrams.Count));

This is another simple one - it just extract the element in the list at a random location in the list. In hindsight I could have used the array operator syntax here ([]), but it doesn't really matter :-)

Now that we've picked the next n-gram, we can add it to the word we're building:

// Add the last character from the n-gram to the string we're building
result += nextNgram[nextNgram.Length - 1];

and that's the markov chain practically done! Oh, we mustn't forget to update the lastNgram variable (I forgot this when building it :P):

lastNgram = nextNgram;

And that wraps up our unweighted markov chain. Here's the whole class in full:

using System;
using System.Collections.Generic;
using System.Linq;

namespace SBRL.Algorithms.MarkovGrams
{
    /// <summary>
    /// An unweighted character-based markov chain.
    /// </summary>
    public class UnweightedMarkovChain
    {
        /// <summary>
        /// The random number generator
        /// </summary>
        Random rand = new Random();

        /// <summary>
        /// The ngrams that this markov chain currently contains.
        /// </summary>
        List<string> ngrams;

        /// <summary>
        /// Creates a new character-based markov chain.
        /// </summary>
        /// <param name="inNgrams">The ngrams to populate the new markov chain with.</param>
        public UnweightedMarkovChain(IEnumerable<string> inNgrams)
        {
            ngrams = new List<string>(inNgrams);
        }

        /// <summary>
        /// Returns a random ngram that's currently loaded into this UnweightedMarkovChain.
        /// </summary>
        /// <returns>A random ngram from this UnweightMarkovChain's cache of ngrams.</returns>
        public string RandomNgram()
        {
            return ngrams[rand.Next(0, ngrams.Count)];
        }

        /// <summary>
        /// Generates a new random string from the currently stored ngrams.
        /// </summary>
        /// <param name="length">
        /// The length of ngram to generate.
        /// Note that this is a target, not a fixed value - e.g. passing 2 when the n-gram order is 3 will
        /// result in a string of length 3. Also, depending on the current ngrams this markov chain contains,
        /// it may end up being cut short. 
        /// </param>
        /// <returns>A new random string.</returns>
        public string Generate(int length)
        {
            string result = RandomNgram();
            string lastNgram = result;
            while(result.Length < length)
            {
                // The substring that the next ngram in the chain needs to start with
                string nextStartsWith = lastNgram.Substring(1);
                // Get a list of possible n-grams we could choose from next
                List<string> nextNgrams = ngrams.FindAll(gram => gram.StartsWith(nextStartsWith));
                // If there aren't any choices left, we can't exactly keep adding to the new string any more :-(
                if(nextNgrams.Count == 0)
                    break;
                // Pick a random n-gram from the list
                string nextNgram = nextNgrams.ElementAt(rand.Next(0, nextNgrams.Count));
                // Add the last character from the n-gram to the string we're building
                result += nextNgram[nextNgram.Length - 1];
                lastNgram = nextNgram;
            }

            return result;
        }
    }
}

I've released the full code for my markov generator (with a complete command line interface!) on my personal git server. The repository can be found here: sbrl/MarkovGrams. To finish this post off, I'll leave you with a few more words that I've generated using it :D

1 2 3 4 5
mecuc uipes jeraq acrin nnvit
blerbopt drsacoqu yphortag roirrcai elurucon
pnsemophiqub omuayplisshi udaisponctec mocaltepraua rcyptheticys
eoigemmmpntartrc rattismemaxthotr hoaxtancurextudu rrgtryseumaqutrc hrpiniglucurutaj

How to set up a WebDav share with Nginx

I've just been setting up a WebDav share on a raspberry pi 3 for my local network (long story), and since it was a bit of a pain to set up (and I had to combine a bunch of different tutorials out there to make mine work), I thought I'd share how I did it here.

I'll assume you have a raspberry pi all set up and up-to-date in headless mode, with a ufw for your firewall (if you need help with this, post in the comments below or check out the Raspberry Pi Stack Exchange). To start with, we need to install the nginx-full package:

sudo apt update
sudo apt install  nginx-full

Note that we need the nginx-full package here, because the nginx-extras or just simply nginx packages don't include the required additional webdav support modules. Next, we need to configure Nginx. Nginx's configuration files live at /etc/nginx/nginx.conf, and in /etc/nginx/conf.d. I did something like this for my nginx.conf:

user www-data;
worker_processes 4;
pid /run/nginx.pid;

events {
    worker_connections 768;
    # multi_accept on;
}

http {

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    # server_tokens off;

    # server_names_hash_bucket_size 64;
    # server_name_in_redirect off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # SSL Settings
    ##

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
    ssl_prefer_server_ciphers on;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;

    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Not many changes here. Then, I created a file called 0-webdav.conf in the conf.d directory, and this is what I put in it:

server {
    listen 80;
    listen [::]:80;

    server_name plans.helenshydrogen.be;

    auth_basic              realm_name;
    auth_basic_user_file    /etc/nginx/.passwords.list;

    dav_methods     PUT DELETE MKCOL COPY MOVE;
    dav_ext_methods PROPFIND OPTIONS;
    dav_access      user:rw group:rw all:r

    client_body_temp_path   /tmp/nginx/client-bodies;
    client_max_body_size    0;
    create_full_put_path    on;

    root /mnt/hydroplans;
}

Now this is where the magic happens. The dav_access directive tells nginx to allow everyone to read, but only logged in users to write to the share. This isn't actually particularly relevant, because of the auth_basic and auth_basic_user_file directives, which tell nginx to require people to login to the share before they are allowed to access it.

It's also important to note that I found that Windows (10, at least), didn't like the basic authentication - even though Ubuntu's Nautilus accepted it just fine - so I had to comment that bit out :-(

If you do still want authentication (hey! May you'll have better luck than I :P), then you'll need to set up the passwords file. Here's how you create it:


echo -n 'helen:' | sudo tee /etc/nginx/.passwords.list
openssl passwd -apr1 | sudo tee -a /etc/nginx/.passwords.list 
Password:

The above creates a user called helen, and asks you to type a password. If you're adding another user to the file, simply change the first tee to be tee -a to avoid overwriting the first one.

With that all configured, it's time to test the configuration file, and, if we're lucky, restart nginx!


sudo nginx -t
sudo systemctl restart nginx

That's all you should need to do to set up a simple WebDav share. Remember that this is a starting point, and not an ending point - there are a few big holes in the above that you'll need to address, depending on your use case (for example, I haven't included the setup of https / encryption - try letsencrypt for that).

Here are the connection details for the above for a few different clients:

  • Ubuntu / Nautilus: (Go to "Other Locations" and paste this into the "Connect to Server" box) dav://plans.helenshydrogen.be/
  • Windows: (Go to "Map Network Drive" and paste this in) http://plans.helenshydrogen.be/

Did this work for you? Have any problems? Got instructions for a WebDav client not listed here? Let me know in the comments!

Markov Chains Part 1: N-Grams

After wanting to create a markov chain to generate random words for ages, I've recently had the time to actually write one :D Since I had a lot of fun writing it, I thought I'd share it here.

A markov chain, in simple terms, is an algorithm to take a bunch of input, and generate a virtually unlimited amount of output in the style of the input. If I put my 166 strong wordlist of sciencey words through a markov chain, I get a bunch of words like this:

a b c d
raccession bstrolaneu aticl lonicretiv
mpliquadri tagnetecta subinverti catorp
ssignatten attrotemic surspertiv tecommultr
ndui coiseceivi horinversp icreflerat
landargeog eograuxila omplecessu ginverceng
evertionde chartianua spliqui ydritangt
grajecubst ngintagorp ombintrepe mbithretec
trounicabl ombitagnai ccensorbit holialinai
cessurspec dui mperaneuma yptintivid
ectru llatividet imaccellat siondl
tru coo treptinver gnatiartia
nictrivide pneumagori entansplan uatellonic

Obviously, some of the above aren't particularly readable, but the majority are ok (I could do with a longer input wordlist, I think).

To create our very own markov chain that can output words like the above, we need 2 parts: An n-gram generator, to take in the word list and convert it into a form that we can feed into the second part - the markov chain itself. In this post, I'm going to just look at the n-gram generator - I'll cover the markov chain itself in the second part of this mini-series.

An n-gram is best explained by example. Take the word refractive, for example. Let's split it up into chunks:

ref
efr
fra
rac
act
cti
tiv
ive

See what I've done? I've taken the original word and split it into chunks of 3, but I've only moved along the word by 1 character at a time, so some characters have been duplicated. These are n-grams of order 3. The order, in the case of an n-gram, is the number of characters per chunk. We could use any order we like:

refra
efrac
fract
racti
activ
ctive

The order of the above is 5. If you're wondering how this could possibly be useful - don't worry: All will be explained in due time :-) For now though, writing all these n-grams out manually is rather annoying and tedious. Let's write some code!

Generating n-grams from a single word like we did above is actually pretty simple. Here's what I came up with:

/// <summary>
/// Generates a unique list of n-grams from the given string.
/// </summary>
/// <param name="str">The string to n-gram-ise.</param>
/// <param name="order">The order of n-gram to generate.</param>
/// <returns>A unique list of n-grams found in the specified string.</returns>
public static IEnumerable<string> GenerateFlat(string str, int order)
{
    List<string> results = new List<string>();
    for(int i = 0; i < str.Length - order; i++)
    {
        results.Add(str.Substring(i, order));
    }
    return results.Distinct();
}

I'm using C♯ here, but you can use whatever language you like. Basically, I enter a loop and crawl along the word, adding the n-grams I find to a list, which I then de-duplicate and return.

Generating n-grams for just one word is nice, but we need to process a whole bunch of words. Thankfully, that's easy to automate too with a sneaky overload:

/// <summary>
/// Generates a unique list of n-grams that the given list of words.
/// </summary>
/// <param name="words">The words to turn into n-grams.</param>
/// <param name="order">The order of n-gram to generate..</param>
/// <returns>A unique list of n-grams found in the given list of words.</returns>
public static IEnumerable<string> GenerateFlat(IEnumerable<string> words, int order)
{
    List<string> results = new List<string>();
    foreach(string word in words)
    {
        results.AddRange(GenerateFlat(word, order));
    }
    return results.Distinct();
}

All the above does is take a list of words, run them all through the n-gram generation method we wrote above and return the de-duplicated results. Here's a few that it generated from the same wordlist I used above in order 3:

1 2 3 4 5 6 7 8
hor sig ign gna str tre ren ngt
sol old lde oli sor sou oun tel
lla sub ubs bst tem emp mpe atu
tur err ert thr hre dim ime men
nsi ack cki kin raj aje jec tor
ans nsa sat nsf sfe nsl sla slu
luc uce nsm smi nsp are nsu tan

Next time, I'll show you my (unweighted) markov chain I've written that uses the n-grams generated by these methods.

Sources and Further Reading

Website Integrations (Mini!) Series List

Since I ended up doing a mini-series on the various website integrations I implemented, I thought that you might find a (mini-)series list useful. Here it is:

That's just about all I've got to mention here. Do you have any suggestions and / or requests on what I should blog about? Let me know below in the comments!

Art by Mythdael