Starbeamrainbowlabs

Stardust
Blog


Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d account algorithms announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling css dailyprogrammer debugging demystification distributed computing downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday html html5 html5 canvas infrastructure interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance manjaro network networking node.js operating systems performance photos php pixelbot portable privacy programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Line Simplification: Visvalingam's Algorithm

An screenshot of my demo of my implementation of Visvalingam's Algorithm. (Above: A screenshot of the demo of my implementation of Visvalingam's line simplification algorithm. Link below!)

For a secret project of mine I've been working on since about February time (if I recall correctly), I've discovered that I could make some considerable use of a line simplification algorithm. The tricky thing is though that I need an implementation in both Javascript and C♯ - which will both return identical results.

Initially, I chose the Ramer-Douglas-Peucker Algorithm, but I ended up implementing Visvalingam's Algorithm instead, as I encountered issues with calculating the shortest distance from a point to a line reliably along with other algorithmic problems that I determined weren't worth the time to fix.

Visvalingam's algorithm is actually really simple. Suppose we take a line:

A line with 6 points in it.

If we create a sliding window with a width of 3 and slide it along the list of points, then we get a set of triangles. To simplify the line, we can calculate the area of each of these triangles, and remove the centre point of the triangle with the smallest area.

The same line with the triangles highlighted.

The same line with a point removed.

Then we can continue removing the centre point of the smallest triangle until we reach a triangle with an area that's above a threshold we set - and this is Visvalingam's Algorithm.

Though I haven't written the C♯ version yet, I've completed the Javascript implementation - and created a demo for you to play around with! Here's a link:

Visvalingam's Algorithm Demo

Note that you'll need to enable ES6 Module support in your browser to get it to work, as I've used ES6 Modules whilst building it.

In Firefox this can be done by setting dom.moduleScripts.enabled to true in about:config, and in chrome by visiting chrome://flags/#enable-javascript-harmony (sorry, hyperlinks don't work for chrome:// urls IIRC!), enabling it, and restarting your browser.

It's open-source, of course - under the Mozilla Public License 2.0. You can find my code on GitHub - and pull requests are welcome :D

Finally, I've released it as an npm package. If you aren't aware of npm, it's really cool. It's the primary package manager for Javascript - I've written a blog post on this here.

Once I've written the C♯ version I'll have another bash at trying to get Nuget to package it. I think I know what the issue has been so far - so hopefully it works this time! If it does I'll blog about that too.

Found this useful? Think it's cool? Let me know in the comments below!

Weekend Challenge: Detecting and Decoding Morse Code in an Audio File

A banner full of morse code. The depicted audio processing program in the background is ocenaudio. Recently I received a message in morse code from a family member using this site. It said that the sender had hidden the message, so I was presented with 2 options: I could sit and decode the message by listening to it over and over again, or write a program to do it for me.

Naturally, as a computer science student and enthusiast, I chose the second option. My first problem: Capture a recording of the target morse code. This was easy - the audio-recorder package in the ubuntu repositories solved that one easily, as it has an option to record the audio output of my laptop.

Second problem: Figure out how to read the recording in and extract the samples in C♯. This one wasn't so easy. Amidst issues with flatpak and Monodevelop (flatpak is terrible!), I eventually found the NAudio (Codeplex, GitHub, NuGet) package to do the job. After some digging, I discovered that NAudio is actually really powerful! It's got some pretty advanced functions for handling audio that I'll have to explore at a later date.

Anyway, with a plan of action I set to work. - decided to work in reverse, so the first thing I needed was a chart that converted morse code into the latin alphabet. Wikipedia to the rescue:

A chart showing how to convert morse code into the latin alphabet.

(Source, Direct link, My Mirror)

With a handy-dandy conversion chart, it was relatively simple to create a class to handle the conversion from dots and dashes to the latin alphabet automatically:

using System;
using System.Collections.Generic;

namespace SBRL.Algorithms.MorseCodeTranslator
{
    /// <summary>
    /// A simple class to translate a morse code string into a normal string.
    /// </summary>
    /// <license>Mozilla Public License version 2.0</license>
    /// <origin></origin>
    /// <author>Starbeamrainbowlabs (https://starbeamrainbowlabs.com/)</author>
    /// <changelog>
    /// v0.1 - 26th May 2017:
    ///      - Creation! 😁
    /// </changelog>
    public static class MorseDecoder
    {
        /// <summary>
        /// The morse code lookup table. Use the methods in this class is possible,
        /// rather than accessing this lookup table directly!
        /// </summary>
        public static Dictionary<string, char> morseCodeLookup = new Dictionary<string, char>()
        {
            [".-"] = 'a',
            ["-..."] = 'b',
            ["-.-."] = 'c',
            ["-.."] = 'd',
            ["."] = 'e',
            ["..-."] = 'f',
            ["--."] = 'g',
            ["...."] = 'h',
            [".."] = 'i',
            [".---"] = 'j',
            ["-.-"] = 'k',
            [".-.."] = 'l',
            ["--"] = 'm',
            ["-."] = 'n',
            ["---"] = 'o',
            [".--."] = 'p',
            ["--.-"] = 'q',
            [".-."] = 'r',
            ["..."] = 's',
            ["-"] = 't',
            ["..-"] = 'u',
            ["...-"] = 'v',
            [".--"] = 'w',
            ["-..-"] = 'x',
            ["-.--"] = 'y',
            ["--.."] = 'z',
            [".----"] = '1',
            ["..---"] = '2',
            ["...--"] = '3',
            ["....-"] = '4',
            ["....."] = '5',
            ["-...."] = '6',
            ["--..."] = '7',
            ["---.."] = '8',
            ["----."] = '9',
            ["-----"] = '0',
        };

        /// <summary>
        /// Translates a single letter from morse code.
        /// </summary>
        /// <param name="morseSource">The morse code to translate.</param>
        /// <returns>The translated letter.</returns>
        public static char TranslateLetter(string morseSource)
        {
            return morseCodeLookup[morseSource.Trim()];
        }

        /// <summary>
        /// Translates a string of space-separated morse code strings from morse code.
        /// </summary>
        /// <param name="morseSource">The morse code to translate.</param>
        /// <returns>The translated word.</returns>
        public static string TranslateWord(string morseSource)
        {
            string result = string.Empty;

            string[] morseLetters = morseSource.Split(" ".ToCharArray());

            foreach(string morseLetter in morseLetters)
                result += TranslateLetter(morseLetter);

            return result;
        }

        /// <summary>
        /// Translates a list of morse-encoded words.
        /// </summary>
        /// <param name="morseSources">The morse-encoded words to decipher.</param>
        /// <returns>The decoded text.</returns>
        public static string TranslateText(IEnumerable<string> morseSources)
        {
            string result = string.Empty;
            foreach(string morseSource in morseSources)
                result += $"{TranslateWord(morseSource)} ";
            return result.Trim();
        }
    }
}

That was easy! The next challenge to tackle was considerably more challenging though: Read in the audio file and analyse the samples. I came up with that I think is a rather ingenious design. It's best explained with a diagram:

The algorithm I created to decode an morse code signal from an audio file.

  1. Read the raw samples into a buffer. If there isn't enough space to hold it all at once, then we handle it in chunks.
  2. Move a sliding-window along the raw buffer, with a width of 100 samples and sliding along 25 samples at a time. Extracts the maximum value from the window each time and places it in the windowed buffer.
  3. Analyse the windowed buffer and extract context-free tokens that mark the start or end of a tone.
  4. Convert the context-free tokens into ones that hold the starting point and length of the tones.
  5. Analyse the contextual tokens to extract the morse code as a string
  6. Decipher the morse code string

It's a pretty complicated problem when you first think about it, but breaking it down into steps as I did in the above diagram really helps in figuring out how you're going to tackle it. I, however, ended up drawing the diagram after Id finished writing the program.... I appear to find it easy to break things down in my head - it's only when it gets too big to remember all at once or if I'm working with someone else that I draw diagrams :P

Having drawn up an algorithm and 6 steps I needed to follow to create the program, I spent a happy afternoon writing some C♯. While the remainder of the algorithm is not too long (only ~202 lines), it's a bit too long to explain bit by bit here. I have uploaded the full program to a repository on my personal git server, which you can find here: sbrl/AudioMorseDecoder.

If you're confused about any part of it, ask away in the comments below! Binaries available on request.

I'll leave you with a pair of challenging messages of my own to decode. Try not to use my decoder - write your own!

Message A (easy), Message B (hard) (hard message generated with cwwav)

Let's build a weighted random number generator!

Ever wondered how random loot in a dungeon is generated? Or how the rooms in a procedurally generated castle might be picked? Perhaps you need to skew the number of times an apple is picked by your game engine over a banana. If you've considered any of these things, then you want a weighted random number generator. In this post, I'll be showing you how I built one, and how you can build one too.

If you're interested in trying to build one for yourself first though, then look away now! Come back when you're done (or stuck) to see my solution.

To start with, let's consider what a weighted random number generator actually is. Let's say we've got 3 rewards for a treasure chest: a cool-looking shield, a health potion, and a fancy ring. We want to give the player 1 of the 3 when they option the chest, making sure that the health potion is more common than the others. We can represent that as a ratio: $3 : 4 : 3$.

The introduction image that explains my weighted random number generator.

(Above: The ratio between the different items. See below for the explanation of the math!).

In order to pick one of the 3 items using the ratio, we need to normalise the ratio so that it's between $0$ and $1$. That's rather easy, as far as maths goes: All we have to do is convert each part of the ratio into a fraction, and that into a decimal. Let's calculate the denominator of the fraction first. That's easy-peasy too - we just add up all the parts of the ratio, as we want to represent each part as a fraction of a whole: $3 + 4 + 3 = 10$. With our denominator sorted, we can convert each part into a fraction:

$$ \frac{3}{10} + \frac{4}{10} + \frac{3}{10} = 1 $$

Fractions are nice, but it's be better to have that as a decimal:

$$ 0.3 + 0.4 + 0.3 = 10 $$

That's much better. Now, with the initial theory out of the way, let's start writing a class for it.

using System;
using System.Collections.Generic;
using System.Linq;

namespace SBRL.Algorithms
{
    public class WeightedRandom<ItemType>
    {
        protected Random rand = new Random();

        protected Dictionary<double, ItemType> weights = new Dictionary<double, ItemType>();

        /// <summary>
        /// Creates a new weighted random number generator.
        /// </summary>
        /// <param name="items">The dictionary of weights and their corresponding items.</param>
        public WeightedRandom(IDictionary<double, ItemType> items)
        {
            if(items.Count == 0)
                throw new ArgumentException("Error: The items dictionary provided is empty!");

            double totalWeight = items.Keys.Aggregate((double a, double b) => a + b);
            foreach(KeyValuePair<double, ItemType> itemData in items)
                weights.Add(itemData.Key / totalWeight, itemData.Value);
        }
    }
}

I've created a template class here, to allow the caller to provide us with any type of item (so long as they are all the same). That's what the <ItemType> bit is on the end of the class name - it's the same syntax behind the List class:

List<TreasureReward> rewards = new List<TreasureReward>() {
    TreasureReward.FromFile("./treasure/coolsword.txt"),
    TreasureReward.FromFile("./treasure/healthpotion.txt"),
    TreasureReward.FromFile("./treasure/fancyring.txt"),
};

Next, let's go through that constructor bit by bit. First, we make sure that we actually have some weights in the first place:

if(items.Count == 0)
    throw new ArgumentException("Error: The items dictionary provided is empty!");

Then, it's more Linq to the rescue in calculating the total of the weights we've been provided with:

double totalWeight = items.Keys.Aggregate((double a, double b) => a + b);

Finally, we loop over each of the items in the provided dictionary, dividing them by the sum of the weights and adding them to our internal dictionary of normalised weights.

foreach(KeyValuePair<double, ItemType> itemData in items)
    weights.Add(itemData.Key / totalWeight, itemData.Value);

Now that we've got our items loaded and the weights normalised, we can start picking things from our dictionary. For this part, I devised a sort of 'sliding window' algorithm to work out which item to pick. It's best explained through a series of whiteboard images:

Basically, I have 2 variables: lower and higher. When I loop over each of the weights, I do the following things:

  1. Add the current normalised weight to higher
  2. Check if the target is between lower and higher a. If it is, then return the current item b. If not, then keep going
  3. Bring lower up to the same value as higher
  4. Loop around again until we find the weight in which the target lies.

With that in mind, here's the code I cooked up:

/// <summary>
/// Picks a new random item from the list provided at initialisation, based
/// on the weights assigned to them.
/// </summary>
/// <returns>A random item, picked according to the assigned weights.</returns>
public ItemType Next()
{
    double target = rand.NextDouble();

    double lower = 0;
    double higher = 0;
    foreach(KeyValuePair<double, ItemType> weightData in weights)
    {
        higher += weightData.Key;
        if(target >= lower && target <= higher)
            return weightData.Value;
        lower += weightData.Key;
    }

    throw new Exception($"Error: Unable to find the weight that matches {target}");
}

That pretty much completes the class. While it seems daunting at first, it's actually quite easy once you get your head around it. Personally, I find whiteboards very useful in that regard! Here's the completed class:

(License: MPL-2.0)

Found this interesting? Got stuck? Have a suggestion for another cool algorithm I could implement? Comment below!

Markov Chains Part 2: Unweighted Chains

Hello and welcome to the second part of this mini-series about markov chains. In the last part, I explained what an n-gram was, and how I went about generating them.

In this part, I'll get to the meat of the subject: The markov chain itself. To start with (to simplify matters) I'll be looking at unweighted markov chains.

A markov chain, in essence, takes the n-grams we generated last time, and picks one to start with. It then takes the all but the first character of the n-gram it chose, and finds all the n-grams in it's library that begin with that sequence of characters. After drawing up a list of suitable n-grams, it picks one at random, and tacks the last character in the n-gram it chose onto the end of the first n-gram.

Then, it starts the whole process all over again with the 2nd n-gram it chose, and then the 3rd, and so on until it either a) hits a brick wall and can't find any suitable n-grams to use next, or b) reaches the desired length of word it was asked to generate.

An unweighted markov chain, as I call it, does not take the frequency of the source n-grams in the original text into account - it just picks the next n-gram from the list randomly.

With explanations and introductions out of the way, let's get down to some code! Since the markov chain is slightly more complicated, I decided to write a class for it. Let's start with one of those, then:

using System;
using System.Collections.Generic;
using System.Linq;

namespace SBRL.Algorithms.MarkovGrams
{
    /// <summary>
    /// An unweighted character-based markov chain.
    /// </summary>
    public class UnweightedMarkovChain
    {

    }
}

I've also added a few using statements for later. Our new class is looking a bit bare. how about some methods to liven it up a bit?

/// <summary>
/// Creates a new character-based markov chain.
/// </summary>
/// <param name="inNgrams">The ngrams to populate the new markov chain with.</param>
public UnweightedMarkovChain(IEnumerable<string> inNgrams)
{

}

/// <summary>
/// Returns a random ngram that's currently loaded into this UnweightedMarkovChain.
/// </summary>
/// <returns>A random ngram from this UnweightMarkovChain's cache of ngrams.</returns>
public string RandomNgram()
{

}

/// <summary>
/// Generates a new random string from the currently stored ngrams.
/// </summary>
/// <param name="length">
/// The length of ngram to generate.
/// Note that this is a target, not a fixed value - e.g. passing 2 when the n-gram order is 3 will
/// result in a string of length 3. Also, depending on the current ngrams this markov chain contains,
/// it may end up being cut short. 
/// </param>
/// <returns>A new random string.</returns>
public string Generate(int length)
{

}

That's much better. Let's keep going - this time with some member variables:

/// <summary>
/// The random number generator
/// </summary>
Random rand = new Random();

/// <summary>
/// The ngrams that this markov chain currently contains.
/// </summary>
List<string> ngrams;

We'll need that random number generator later! As for the List<string>, we'll be using that to store our n-grams - but you probably figured that one out for yourself :P

The class isn't looking completely bare anymore, but we can still do something about those methods. Let's start with that constructor:

public UnweightedMarkovChain(IEnumerable<string> inNgrams)
{
    ngrams = new List<string>(inNgrams);
}

Easy peasy! It just turns the IEnumerable<string> into a List<string> and stores it. Let's do another one:

public string RandomNgram()
{
    return ngrams[rand.Next(0, ngrams.Count)];
}

We're on a roll here! This is another fairly simple method - it just picks a random n-gram from the dictionary. We'll need this for our 3rd, and most important, method, Generate(). This one's a bit more complicated, so let's take it in a few stages. Firstly, we need an n-gram to start the whole thing off. We also need to return it at the end of the method.

string result = RandomNgram();

return result;

While we're at it, we'll also need a variable to keep track of the last n-gram in the chain, so we can find an appropriate match to come next.

string lastNgram = result;

Then we'll need a loop to keep adding n-grams to the chain. Since we're not entirely sure how long we'll be looping for (and we've got fairly complicated stop conditions, as far as that kind of thing goes), I decided to use a while loop here.

while(result.Length < length)
{

}

That's the first of our 2 stop conditions in place, too! We want to stop when the word we're working on reaches it's desired length. Now, we can write the bit that works out which n-gram should come next! This bit goes inside the while loop we created above (as you might suspect). First, let's fetch a list of n-grams that would actually make sense coming next.

// The substring that the next ngram in the chain needs to start with
string nextStartsWith = lastNgram.Substring(1);
// Get a list of possible n-grams we could choose from next
List<string> nextNgrams = ngrams.FindAll(gram => gram.StartsWith(nextStartsWith));

With a bit of Linq (Language-INtrgrated Query), that isn't too tough :-) If you haven't seen linq before, then I'd highly recommend you check it out! It makes sorting and searching datasets much easier. The above is quite simple - I just filter our list of n-grams through a function that extracts all the ones that start with the appropriate letter.

It's at this point that we can insert the second of our two stopping conditions. If there aren't any possible n-grams to pick from, then we can't continue.

// If there aren't any choices left, we can't exactly keep adding to the new string any more :-(
if(nextNgrams.Count == 0)
    break;

With our list of possible n-grams, we're now in a position to pick one at random to add to the word. It's LINQ to the rescue again:

// Pick a random n-gram from the list
string nextNgram = nextNgrams.ElementAt(rand.Next(0, nextNgrams.Count));

This is another simple one - it just extract the element in the list at a random location in the list. In hindsight I could have used the array operator syntax here ([]), but it doesn't really matter :-)

Now that we've picked the next n-gram, we can add it to the word we're building:

// Add the last character from the n-gram to the string we're building
result += nextNgram[nextNgram.Length - 1];

and that's the markov chain practically done! Oh, we mustn't forget to update the lastNgram variable (I forgot this when building it :P):

lastNgram = nextNgram;

And that wraps up our unweighted markov chain. Here's the whole class in full:

using System;
using System.Collections.Generic;
using System.Linq;

namespace SBRL.Algorithms.MarkovGrams
{
    /// <summary>
    /// An unweighted character-based markov chain.
    /// </summary>
    public class UnweightedMarkovChain
    {
        /// <summary>
        /// The random number generator
        /// </summary>
        Random rand = new Random();

        /// <summary>
        /// The ngrams that this markov chain currently contains.
        /// </summary>
        List<string> ngrams;

        /// <summary>
        /// Creates a new character-based markov chain.
        /// </summary>
        /// <param name="inNgrams">The ngrams to populate the new markov chain with.</param>
        public UnweightedMarkovChain(IEnumerable<string> inNgrams)
        {
            ngrams = new List<string>(inNgrams);
        }

        /// <summary>
        /// Returns a random ngram that's currently loaded into this UnweightedMarkovChain.
        /// </summary>
        /// <returns>A random ngram from this UnweightMarkovChain's cache of ngrams.</returns>
        public string RandomNgram()
        {
            return ngrams[rand.Next(0, ngrams.Count)];
        }

        /// <summary>
        /// Generates a new random string from the currently stored ngrams.
        /// </summary>
        /// <param name="length">
        /// The length of ngram to generate.
        /// Note that this is a target, not a fixed value - e.g. passing 2 when the n-gram order is 3 will
        /// result in a string of length 3. Also, depending on the current ngrams this markov chain contains,
        /// it may end up being cut short. 
        /// </param>
        /// <returns>A new random string.</returns>
        public string Generate(int length)
        {
            string result = RandomNgram();
            string lastNgram = result;
            while(result.Length < length)
            {
                // The substring that the next ngram in the chain needs to start with
                string nextStartsWith = lastNgram.Substring(1);
                // Get a list of possible n-grams we could choose from next
                List<string> nextNgrams = ngrams.FindAll(gram => gram.StartsWith(nextStartsWith));
                // If there aren't any choices left, we can't exactly keep adding to the new string any more :-(
                if(nextNgrams.Count == 0)
                    break;
                // Pick a random n-gram from the list
                string nextNgram = nextNgrams.ElementAt(rand.Next(0, nextNgrams.Count));
                // Add the last character from the n-gram to the string we're building
                result += nextNgram[nextNgram.Length - 1];
                lastNgram = nextNgram;
            }

            return result;
        }
    }
}

I've released the full code for my markov generator (with a complete command line interface!) on my personal git server. The repository can be found here: sbrl/MarkovGrams. To finish this post off, I'll leave you with a few more words that I've generated using it :D

1 2 3 4 5
mecuc uipes jeraq acrin nnvit
blerbopt drsacoqu yphortag roirrcai elurucon
pnsemophiqub omuayplisshi udaisponctec mocaltepraua rcyptheticys
eoigemmmpntartrc rattismemaxthotr hoaxtancurextudu rrgtryseumaqutrc hrpiniglucurutaj

Markov Chains Part 1: N-Grams

After wanting to create a markov chain to generate random words for ages, I've recently had the time to actually write one :D Since I had a lot of fun writing it, I thought I'd share it here.

A markov chain, in simple terms, is an algorithm to take a bunch of input, and generate a virtually unlimited amount of output in the style of the input. If I put my 166 strong wordlist of sciencey words through a markov chain, I get a bunch of words like this:

a b c d
raccession bstrolaneu aticl lonicretiv
mpliquadri tagnetecta subinverti catorp
ssignatten attrotemic surspertiv tecommultr
ndui coiseceivi horinversp icreflerat
landargeog eograuxila omplecessu ginverceng
evertionde chartianua spliqui ydritangt
grajecubst ngintagorp ombintrepe mbithretec
trounicabl ombitagnai ccensorbit holialinai
cessurspec dui mperaneuma yptintivid
ectru llatividet imaccellat siondl
tru coo treptinver gnatiartia
nictrivide pneumagori entansplan uatellonic

Obviously, some of the above aren't particularly readable, but the majority are ok (I could do with a longer input wordlist, I think).

To create our very own markov chain that can output words like the above, we need 2 parts: An n-gram generator, to take in the word list and convert it into a form that we can feed into the second part - the markov chain itself. In this post, I'm going to just look at the n-gram generator - I'll cover the markov chain itself in the second part of this mini-series.

An n-gram is best explained by example. Take the word refractive, for example. Let's split it up into chunks:

ref
efr
fra
rac
act
cti
tiv
ive

See what I've done? I've taken the original word and split it into chunks of 3, but I've only moved along the word by 1 character at a time, so some characters have been duplicated. These are n-grams of order 3. The order, in the case of an n-gram, is the number of characters per chunk. We could use any order we like:

refra
efrac
fract
racti
activ
ctive

The order of the above is 5. If you're wondering how this could possibly be useful - don't worry: All will be explained in due time :-) For now though, writing all these n-grams out manually is rather annoying and tedious. Let's write some code!

Generating n-grams from a single word like we did above is actually pretty simple. Here's what I came up with:

/// <summary>
/// Generates a unique list of n-grams from the given string.
/// </summary>
/// <param name="str">The string to n-gram-ise.</param>
/// <param name="order">The order of n-gram to generate.</param>
/// <returns>A unique list of n-grams found in the specified string.</returns>
public static IEnumerable<string> GenerateFlat(string str, int order)
{
    List<string> results = new List<string>();
    for(int i = 0; i < str.Length - order; i++)
    {
        results.Add(str.Substring(i, order));
    }
    return results.Distinct();
}

I'm using C♯ here, but you can use whatever language you like. Basically, I enter a loop and crawl along the word, adding the n-grams I find to a list, which I then de-duplicate and return.

Generating n-grams for just one word is nice, but we need to process a whole bunch of words. Thankfully, that's easy to automate too with a sneaky overload:

/// <summary>
/// Generates a unique list of n-grams that the given list of words.
/// </summary>
/// <param name="words">The words to turn into n-grams.</param>
/// <param name="order">The order of n-gram to generate..</param>
/// <returns>A unique list of n-grams found in the given list of words.</returns>
public static IEnumerable<string> GenerateFlat(IEnumerable<string> words, int order)
{
    List<string> results = new List<string>();
    foreach(string word in words)
    {
        results.AddRange(GenerateFlat(word, order));
    }
    return results.Distinct();
}

All the above does is take a list of words, run them all through the n-gram generation method we wrote above and return the de-duplicated results. Here's a few that it generated from the same wordlist I used above in order 3:

1 2 3 4 5 6 7 8
hor sig ign gna str tre ren ngt
sol old lde oli sor sou oun tel
lla sub ubs bst tem emp mpe atu
tur err ert thr hre dim ime men
nsi ack cki kin raj aje jec tor
ans nsa sat nsf sfe nsl sla slu
luc uce nsm smi nsp are nsu tan

Next time, I'll show you my (unweighted) markov chain I've written that uses the n-grams generated by these methods.

Sources and Further Reading

Chaikin Curves in C#: An alternative curve generation algorithm

A chaikin curve, increasing in order each frame.

A little while ago I was curious to know if there were any other ways to generate a smooth curve other than with a Bezier Curve. Turns out the answer is yes, and it comes in the form of a Chaikin Curve, which was invented in 1974 by a lecturer in America by the name of George Chaikin. A few days (and a lot of debugging) later, I found myself with a Chaikin curve generator written in pure C♯ (I seem to have this fascination with implementing algorithms :P), so I thought I'd share it here.

Before I do though, I should briefly explain how Chaikin's algorithm actually works. It's actually quite simple. If you have a list of control points, and you were to draw a line through them all, you'd get this:

Some lines before the chaikin algorithm has been run.

The magic of the algorithm happens when you interpolate between your control points. If you build a new list of points that contains points that are ¼ and ¾ along each of the lines between the current control points and draw a line though them instead, then the line suddenly gets a lot smoother. This process can be repeated multiple times to further refine the curve, as is evidenced in the animation above.

This page is very helpful in understanding the algorithm if you're having trouble getting your head around it.

My implementation makes use of the PointF class in the System.Drawing namespace, and also has the ability to generate an SVG version of any generated curve, so that it can be inspected and debugged.

You can find my implementation here: Chaikin Generator - comments and improvements are welcome!

Instructions on how to use to use it are available in the README, and the class is fully documented with Intellisense comments, so it should feel fairly intuitive to use. I've tried to use patterns that are present in the rest of the .NET framework too, so you can probably even guess how to use it correctly.

Additionally, I 'm going to try put it up as a Nuget package, but currently I can't get Nuget to pack it currently on linux (when I do, you can expect a tutorial on here!)

Pocketblock: Simple encryption tutorials

The Pocketblock logo.

Recently I found a project that aims to explain cryptography and encryption in a simple fashion through this Ars Technica article. The repository is called Pocketblock and is being created by an insanely clever guy called Justin Troutman. Initially the repository didn't have anything in it (which was confusing to say the least), but now that the first guide of sorts has been released I'd like to take the time to recommend it here.

The first article explains an encryption algorithm called 'Pockenacci', an encryption algorithm that is from the same family as AES. It's a great start to what I hope will be an awesome series! If you're interested in encryption or interested in getting into encryption, you should certainly go and check it out.

Set and forget async tasks

Banner image. (Banner image from here by GDJ)

Recently I've been using asynchronous C# quite a bit, and I've run into the problem of 'setting and forgetting' an asynchronous task more than once. You might want to do this when handling requests in some sort of server, for example.

I looked into it and came up with a few snippets of code I thought someone else might find useful, so I'm posting them here.

Without further delay, here's the first snippet:

/// <summary>
/// Call this method to allow a given task to complete in the background.
/// Errors will be handled correctly.
/// Useful in fire-and-forget scenarios, like a TCP server for example.
/// From http://stackoverflow.com/a/22864616/1460422
/// Adapted by Starbeamrainbowlabs
/// </summary>
/// <param name="task">The task to forget about.</param>
/// <param name="acceptableExceptions">Acceptable exceptions. Exceptions specified here won't cause a crash.</param>
public static async void ForgetTask(Task task, params Type[] acceptableExceptions)
{
    try
    {
        await task.ConfigureAwait(false);
    }
    catch (Exception ex)
    {
        // TODO: consider whether derived types are also acceptable.
        if (!acceptableExceptions.Contains(ex.GetType()))
            throw;
    }
}

All asynchronous methods in C♯ return some form of Task - and these Task s can be reconfigured and manipulated to make them run in the background on the thread pool, as in the above. The above also handles exceptions correctly so that your asynchronous methods won't just silently fail.

Talking about exceptions, if you await an asynchronous method, it's highly likely that if they do throw an exception it'll be an AggregateException. This is not helpful. It doesn't tell us anything about the actual exception that was thrown in the first place! It gets annoying manually inspecting the innerExceptions property of the AggregateException very quickly. Thankfully, I've found a solution to that too:

try
{
    await DoAsyncWork();
}
catch(AggregateException agError)
{
    agError.Handle((error) => {
        ExceptionDispatchInfo.Capture(error).Throw();
        throw error;
    });
}
catch
{
    Console.Error.WriteLine("Something went very wrong O.o");
    throw;
}

I can't remember where I found the ExceptionDispatchInfo bit (if it was your idea, please let me know so I can give you appropriate credit!), but the rest I wrote myself. It essentially unwraps the AggregateException and rethrows each exception in turn, whilst preserving the original stack trace. That way you can track the issue that threw the exception in the first place down.

Random Number Generation: The what, why and how

Many computer scientists are absolutely crazy about random numbers. On first thought it sounds a little bit odd, but upon further inspection it's easy to see why. You can use them for generating random loot in a game, or making a monster walk around randomly. Or in cryptography. The possibilities are endless! In addition, sometimes it is desirable to repeat a particular sequence of numbers without storing them all, and sometimes the opposite is true. In this post I intend to explain what a pseudo-random number generator is, why you'd want one, and where you can get your own.

Unfortunately, although computers are really good at complicated calculations, they are totally rubbish at generating true random numbers (that's what random.org is for). All is not lost though - we can still generate long sequences of numbers using a pseudo-random number generator (PRNG).

There are many different algorithms in several different families, but they all rely on a few basic principles. All PRNGs start with a seed, do something to transform the seed, and produce an output. Some algorithms store a few of the previously generated numbers to feed them back into the algorithm too. All PRNGs also have a period, which is the number of random values they can produce before they start to repeat themselves. Usually this value is so high that it doesn't mattter.

The seed, in this case, is a value that is used to initialise the random number generator and give it something to work with. Because PRNGs aren't truly random, any given seed will always produce the same sequence of random numbers. This can be useful if you want to allow players of your game to share cool maps that they've found without having to store details of every single item.

PRNGs can also be measured in terms of the 'quality' of the random numbers they produce. This sounds like a difficult thing to measure, and it is. The easiest (but probably not the best - I don't know, I'm not an expert!) way to test the quality of a random number generator is to generate a whole bunch of random bytes, save them to a file, and try to compress it. A true sequence of random numbers should be uncompressible. Besides, nobody wants a poor-quality random number generator.

After all that, we can finally get around to the algorithms themselves. there are 3 categories that I know of:

Linear Congruential Generators (LCGs)

Your bog standard (poor quality) generator. Usually has relatively short period too. The only good thing about these is their speed.

Mersenne Twister

Slower, but have an insanely large period (2219937 − 1 to be exact). Output is of a high quality.

XOR bitshifters

A family of fast, high-quality generators. Variable period, depending on the algorithm you pick. Also very easy to implement. See below for more information.

There may be others, but these are the 3 that I've seen around (suggestions of algorithm families are welcome in the comments).

Since XOR bit shifters are the new up-and-coming thing, I'll elaborate on them a little bit. There are actually a bunch of different algorithms in this family:

  • xorshift
  • xorshift1024*
  • xorshift128+
  • xoroshiro128+
  • ....

Their history is a bit complicated, so I won't go into any detail, but basically it all started with the xorshift algorithm. The xorshift* family followed as an improvement afterwards, and the xorshift+ family is the result of another (but different) improvement made by tweaking the original algorithm slightly. Finally, the xoroshiro+ set are new and include yet another improvement based on xorshiro+. This article sums them up nicely, along with a suggestion as to when you should use each.

The number in the names of the above refer to the number of bits that each uses to store its state. Apparently each algorithm is available in 64, 128, 256, 1024, and probably even more flavours, but the above are the most popular.

Implementations

To end this post, I'm going to include some links to implementations of the algorithms mentioned in this post in various languages. This (certainly) isn't an exhaustive list, but should serve as a good starting point if you are on the hunt for a random number generator for your next project.

Sources

  1. The multiply-with-carry (aka MWC) algorithm apparently come under xor bitshifters, but I' haven't mentioned it to as to keep things (relatively) simple. More information can be found here.

Update 5th Jan 2017: Fixed a typo.

An introduction to L Systems

Recently I've been taking a look at L Systems. Apparently, they are used for procedural content generation. After playing around with them for a little while, I discovered that you can create some rather cool patterns with them, like this Sierpinski Triangle for instance.

A Sierpinski Triangle

Before we get into how I made the above triangle grow, it's important to understand how an L System works first. The best way to describe an L System is to show you one. Let's start with a single letter:

f

Not very interesting, is it? Let's run a few find and replace rules over it. Here are a few I found lying around:

h=f+h+f f=h-f-h

After running those, here's what we got back:

h-f-h

Hrm. Interesting. What happens if I do it again?

f+h+f-h-f-h-f+h+f

Ah. Now we're getting somewhere. Here are the next 2 runs for reference:

Run 3:

h-f-h+f+h+f+h-f-h-f+h+f-h-f-h-f+h+f-h-f-h+f+h+f+h-f-h

Run 4:

f+h+f-h-f-h-f+h+f+h-f-h+f+h+f+h-f-h+f+h+f-h-f-h-f+h+f-h-f-h+f+h+f+h-f-h-f+h+f-h-f-h-f+h+f-h-f-h+f+h+f+h-f-h-f+h+f-h-f-h-f+h+f+h-f-h+f+h+f+h-f-h+f+h+f-h-f-h-f+h+f

Now that we have run it a few times, it's started to really blow up in size. The proper term for the letter we started with is the axiom, and each consecutive run we did with the find and replace rules are really called generations. This is the underlying principle of an L System. While this is cool, I'm sure you're asking how I turned the long string above into the animation at the beginning of this post.

Enter the turtle

For this next part you are going to need a (virtual) pet turtle. My turtle isn't just any turtle - he's a super fast racing turtle that I've trained to follow simple instructions like "go forwards", or "turn right". Now suppose I attach a pen to my turtle so that he leaves a line everywhere he walks.

Now if I give my turtle the following rules and the output from one of the generations above, I'll get back a picture:

f means go forwards one pace

h also mean go forwards one pace

+ means turn right

means turn left

ignore any other characters

Sierpinski triangle generation #4

Writing some code

Now that I've introduced how L Systems work and how you can use them to draw pretty pictures, we can start writing some code. I decided to use C♯ to simulate the L System, along with Mono.Cairo to draw the output. Mono.Cairo is a wonderful graphics drawing library that comes bundled along with the Mono runtime as standard.

I decided to split my implementation into 2 classes: the L System and the Turtle. Here's the code for the L System:

using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.InteropServices;
using System.Security.Policy;

namespace LSystem
{
    public class Rule
    {
        public string Find;
        public string Replace;

        public Rule(string inFind, string inReplace)
        {
            Find = inFind;
            Replace = inReplace;
        }
    }

    /// <summary>
    /// Simulates an L-System.
    /// Implemented according to http://www.cs.unm.edu/~joel/PaperFoldingFractal/L-system-rules.html
    /// </summary>
    public class LSystem
    {
        public readonly string Root;
        private List<Rule> rules = new List<Rule>();

        public int GenerationCount { get; private set; }
        public string CurrentGeneration { get; private set; }

        public Dictionary<string, string> Definitions { get; private set; }

        public LSystem(string inRoot)
        {
            CurrentGeneration = Root = inRoot;
            Definitions = new Dictionary<string, string>();
        }

        public void AddRule(string find, string replace)
        {
            rules.Add(new Rule(find, replace));
        }

        public string Simulate()
        {
            List<KeyValuePair<int, Rule>> rulePositions = new List<KeyValuePair<int, Rule>>();
            // Find all the current positions
            foreach(Rule rule in rules)
            {
                List<int> positions = AllIndexesOf(CurrentGeneration, rule.Find);
                foreach (int pos in positions)
                    rulePositions.Add(new KeyValuePair<int, Rule>(pos, rule));
            }
            rulePositions.Sort(compareRulePairs);

            string nextGeneration = CurrentGeneration;
            int replaceOffset = 0;
            foreach(KeyValuePair<int, Rule> rulePos in rulePositions)
            {
                int nextPos = rulePos.Key + replaceOffset;
                nextGeneration = nextGeneration.Substring(0, nextPos) + rulePos.Value.Replace + nextGeneration.Substring(nextPos + rulePos.Value.Find.Length);
                replaceOffset += rulePos.Value.Replace.Length - rulePos.Value.Find.Length;
            }
            CurrentGeneration = nextGeneration;
            GenerationCount++;
            return CurrentGeneration;
        }

        private int compareRulePairs(KeyValuePair<int, Rule> a, KeyValuePair<int, Rule> b)
        {
            return a.Key - b.Key;
        }

        /// <summary>
        /// From http://stackoverflow.com/a/2641383/1460422
        /// </summary>
        /// <param name="str"></param>
        /// <param name="value"></param>
        /// <returns></returns>
        private List<int> AllIndexesOf(string str, string value) {
            if (String.IsNullOrEmpty(value))
                throw new ArgumentException("the string to find may not be empty", "value");
            List<int> indexes = new List<int>();
            for (int index = 0;; index += value.Length) {
                index = str.IndexOf(value, index);
                if (index == -1)
                    return indexes;
                indexes.Add(index);
            }
        }

        public static LSystem FromFile(string filename)
        {
            StreamReader source = new StreamReader(filename);
            LSystem resultSystem = new LSystem(source.ReadLine());

            string nextLine = string.Empty;
            while (true) {
                nextLine = source.ReadLine();
                if (nextLine == null)
                    break;
                if (!nextLine.Contains("=") || nextLine.StartsWith("#") || nextLine.Trim().Length == 0)
                    continue;
                string[] parts = nextLine.Split(new char[]{'='}, 2);

                if(parts[0].StartsWith("!"))
                {
                    // This is a definition
                    resultSystem.Definitions.Add(parts[0].Trim('!'), parts[1]);
                }
                else
                {
                    resultSystem.AddRule(parts[0].Trim(), parts[1].Trim());
                }
            }
            return resultSystem;
        }
    }
}

There's quite a lot of code here, so I'll break it down. The most important bit is highlighted on lines 46-69. This Simulate() function takes the current generation and runs all the find and replace rules in parallel. A regular find and replace wouldn't work here because subsequent rules would pick up on new characters are were added when the previous rules added in a single generation.

Other important bits include the Rule class at the top, which represents a single find and replace rule, and the static FromFile() method. The FromFile() method loads a ruleset from a text file like this one, which produces a Dragon Curve:

f
f=f-h
h=f+h

....or this one, which produces a Sierpinski triangle, like the one above:

f
!angle=1.0472
h=f+h+f
f=h-f-h

The line beginning with an exclamation mark is a definition or a directive, which gives the turtle an instruction on how it should do something. They are stored in the string-to-string dictionary Definitions. The turtle class (which I'll show you in a moment) picks up on these and configures itself accordingly.

The turtle class I wrote is not as neat as the above. In fact it's kind of hacky. Here's what I ended up with:


using System;
using Cairo;
using System.Text.RegularExpressions;
using System.Resources;
using System.Collections.Generic;

namespace SimpleTurtle
{
    public class Area
    {
        public double X;
        public double Y;
        public double Width;
        public double Height;

        public Area(double inX, double inY, double inWidth, double inHeight)
        {
            X = inX;
            Y = inY;
            Width = inWidth;
            Height = inHeight;
        }
    }
    public class Turtle
    {
        private bool strictMode = false;
        private string commandQueue;

        private PointD position;
        private Area bounds;
        private double heading;
        private double headingStep;
        public double HeadingStep
        {
            get { return headingStep; }
            set { headingStep = value; }
        }
        private double movementStep ;
        public double MovementStep
        {
            get { return movementStep; }
            set { movementStep = value; }
        }

        public Turtle ()
        {
            Reset();
        }

        public void ApplyDefinitions(Dictionary<string, string> definitions)
        {
            foreach(KeyValuePair<string, string> definition in definitions)
            {
                switch(definition.Key.ToLower())
                {
                    case "angle":
                        HeadingStep = double.Parse(definition.Value);
                        break;
                }
            }
        }

        public bool Commands(string commandText)
        {
            // Remove all whitespace
            commandText = Regex.Replace(commandText, @"\s+", "");
            commandText = commandText.Replace("h", "f");

            string okCommands = "f+-";

            foreach(char ch in commandText)
            {
                if(okCommands.Contains(ch.ToString()))
                {
                    switch(ch)
                    {
                        case 'f':
                            Forwards();
                            break;
                        case '+':
                            Turn(false);
                            break;
                        case '-':
                            Turn(true);
                            break;
                        default:
                            if (strictMode) {
                                Console.WriteLine("The unexpected character '{0}' slipped through the net!", ch);
                                return false;
                            }
                            break;
                    }
                }
                else if(strictMode)
                {
                    Console.Error.WriteLine("Error: unexpected character '{0}'", ch);
                    return false;
                }
            }
            commandQueue += commandText;
            return true;
        }

        public void Draw(string filename, bool reset = true)
        {
            ImageSurface canvas = new ImageSurface(Format.ARGB32, (int)Math.Ceiling(bounds.Width + 10), (int)Math.Ceiling(bounds.Height + 10));
            Context context = new Context(canvas);
            PointD position = new PointD(-bounds.X + 5, -bounds.Y + 5);
            double heading = 0;

            context.LineWidth = 3;
            context.MoveTo(position);
            foreach(char ch in commandQueue)
            {
                switch(ch)
                {
                    case 'f':
                        PointD newPosition = new PointD(
                            position.X + movementStep * Math.Sin(heading),
                            position.Y + movementStep * Math.Cos(heading)
                        );
                        context.LineTo(newPosition);
                        position = newPosition;
                        break;
                    case '+':
                        heading += headingStep;
                        break;
                    case '-':
                        heading -= headingStep;
                        break;
                }
            }

            context.Stroke();
            canvas.WriteToPng(string.Format(filename));
            context.Dispose();
            canvas.Dispose();
            if(reset)
                Reset();
        }

        public void Forwards()
        {
            PointD newPosition = new PointD(
                position.X + MovementStep * Math.Sin(heading),
                position.Y + MovementStep * Math.Cos(heading)
            );

            if (newPosition.X > bounds.X + bounds.Width)
                bounds.Width += newPosition.X - position.X;
            if (newPosition.Y > bounds.Y + bounds.Height)
                bounds.Height += newPosition.Y - position.Y;
            if (newPosition.X < bounds.X)
            {
                bounds.X = newPosition.X;
                bounds.Width += position.X - newPosition.X;
            }
            if (newPosition.Y < bounds.Y)
            {
                bounds.Y = newPosition.Y;
                bounds.Height += position.Y - newPosition.Y;
            }

            position = newPosition;
        }

        public void Turn(bool anticlockwise = false)
        {
            if (!anticlockwise)
                heading += HeadingStep;
            else
                heading -= HeadingStep;
        }

        public void Reset()
        {
            commandQueue = string.Empty;
            position = new PointD(0, 0);
            bounds = new Area(position.X, position.Y, 1, 1);
            heading = 0;
            headingStep = Math.PI / 2;
            movementStep = 25;
        }
    }
}

Basically, the problem I found myself with was that I had to tell Cairo how large the canvas should be ahead of time, before I'd actually finished following all the commands that had been given. In order to get around this, I implemented the commands twice (bad practice, I know!).

The first implementation are the public interface methods, like Commands() (which processes a bunch of commands all at once), and the Forwards() and Turn() methods. The Commands() method follows a bunch of commands, using the private member variables at the top of the Turtle class to keep track of it's position. Each command it understands is then added to a store of valid commands ready for processing by the Draw() function (highlighted, lines 104-140). Whilst processing all these commands, the bounds of the area that the turtle has visited are tracked in the private bounds member variable (highlighted, line 30), which is of type Area (see the class at the top of the file).

The Draw() function creates a new Cairo canvas (lines 106-107) to the size of the bounds calculated previously (plus a bit extra of a border), and replays all the processed commands back on top of it. The final image is then dumped to disk.

After all that, we are missing one last piece of the puzzle: A bridge to connect the two together and provide an interface through which we can interact with the program. This is what I used:

using System;
using System.Collections.Generic;
using System.IO;

class Program
{
    public static void Main(string[] args)
    {
        if(args.Length < 2)
        {
            Console.WriteLine("Usage: ");
            Console.WriteLine(@"    ./Simpleturtle.exe <filename> <generations> [<stepCount> [<startingGeneration>]]");
            return;
        }

        int generations = int.Parse(args[1]);
        string filename = args[0];
        int movementStep = 25;
        if (args.Length > 2)
            movementStep = int.Parse(args[2]);
        int startingGeneration = 0;
        if (args.Length > 3)
            startingGeneration = int.Parse(args[3]);

        LSystem lsystem = LSystem.FromFile(filename);
        foreach(KeyValuePair<string, string> kvp in lsystem.Definitions)
        {
            Console.WriteLine("Setting {0} to {1}.", kvp.Key, kvp.Value);
        }
        Turtle turtle = new Turtle();

        for(int i = startingGeneration; i < generations; i++)
        {
            turtle.ApplyDefinitions(lsystem.Definitions);
            turtle.MovementStep = movementStep;
            turtle.Commands(lsystem.CurrentGeneration);
            turtle.Draw(string.Format("lsystem_generation_{0}.png", i));
            lsystem.Simulate();
            File.WriteAllText(string.Format("lsystem_generation_{0}.txt", i), lsystem.CurrentGeneration);
            Console.WriteLine("Generation {0} completed.", i);
        }
    }
}

There's nothing too interesting here, just a help message and a parameter reading code. If you put all of this code together, though, you can generate pretty pictures like the ones I showed you above.

This is only the beginning of what you can do with L Systems though. You could extend it to work in 3D, or give your turtle teleporting powers (look half way down) and draw some trees. you could even edit a few characters randomly after completing the main simulation to make every result slightly different. The possibilities are endless!

Sources and Further Reading

Art by Mythdael