## Stardust Blog

Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

## Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## Summer Project Series List

At this point, it is basically the end of my summer project series - at least for a while while I start my PhD (more on that in a future post). To this end, I'm releasing the series list for it.

## Next Gen Search, Part 2: Pushing the limits

In the last part, we looked at how I built a new backend for storing inverted indexes for Pepperminty Wiki, which allows for partial index deserialisation and other nice features that boost performance considerably.

Since the last post, I've completed work on the new search system - though there are a few bits around the edges that I still want to touch up and do some more work on.

In this post though, I want to talk about how I generated test data to give my full-text search engine something to chew on. I've done this before for my markov chain program I wrote a while back, and that was so much fun I did it again for my search engine here.

After scratching my head for a bit to think of a data source I could use, I came up with the perfect plan. Ages ago I downloaded a Wikipedia dump - just the content pages in Wikitext markup. Why not use that?

As it turns out, it was a rather good idea. Some processing of said dump was required though to transform it into a format that Pepperminty Wiki can understand, though. Pepperminty Wiki stores pages on disk as flat text files in markdown, and indexes them in pageindex.json. If pageindex.json doesn't exist, then Pepperminty Wiki rebuilds it automagically by looking for content pages on disk.

This makes it easy to import batches of new pages into Pepperminty Wiki, so all we need to do is extract the wiki text, convert to markdown, and import! This ended up requiring a number of different separate steps though, so let's take it 1 at a time

First, we need a Wikipedia database dump in the XML format. These are available from dumps.wikimedia.org. There are many different ones available, but I suggest grabbing one that has a filename similar to enwiki-20180201-pages-articles.xml - i.e. just content pages - no revision history, user pages, or additional extras. I think the most recent one as of the time of posting is downloadable here - though I'd warn you that it's 15.3GiB in size! You can see a list of available dump dates for the English Wikipedia here.

Now that we've got our dump, let's extract the pages from it. This is nice and easy to do with wikiextractor on GitHub:

nice -n20 wikiextractor enwiki-20180201-pages-articles.xml --no_templates --html --keep_tables --lists --links --sections --json --output wikipages --compress --bytes 25M >progress.log 2>&1 

This will parse the dump and output a number of compressed files to the wikipages directory. These will have 1 JSON object per line, each containing information about a single page on Wikipedia - with page content pre-converted to HTML for us. The next step is to extract the page content and save it to a file with the correct name. This ended up being somewhat complicated, so I wrote a quick Node.js script to do the job:

#!/usr/bin/env node

const fs = require("fs");

if(!fs.existsSync("pages"))
fs.mkdirSync("pages", { mode: 0o755 });

// From https://stackoverflow.com/a/44195856/1460422
function html_unentities(encodedString) {
var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
var translate = {
"nbsp":" ",
"amp" : "&",
"quot": "\"",
"lt"  : "<",
"gt"  : ">"
};
return encodedString.replace(translate_re, function(match, entity) {
return translate[entity];
}).replace(/&#(\d+);/gi, function(match, numStr) {
var num = parseInt(numStr, 10);
return String.fromCharCode(num);
});
}

const interface = readline.createInterface({
input: process.stdin,
//output: process.stdout
});

interface.on("line", (text) => {
const obj = JSON.parse(text);

fs.writeFileSync(pages/${obj.title.replace(/\//, "-")}.html, html_unentities(obj.text)); console.log(${obj.id}\t${obj.title}); }); This basically takes the stream of JSON object on the standard input, parses them, and saves the relevant content to disk. We can invoke it like so: bzcat path/to/*.bz2 | ./parse.js Don't forget to chmod +x parse.js if you get an error here. The other important thing about the above script ist hat we have to unescape the HTML entities (e.g. &gt;), because otherwise we'll have issues later with HTML conversation and page names will look odd. This is done by the html_unentities() function in the above script. This should result in a directory containing a large number of files - 1 file per content page. This is much better, but we're still not quite there yet. Wikipedia uses wiki markup (which we converted to HTML with wikiextractor) and Pepperminty Wiki uses Markdown - the 2 of which are, despite all their similarities, inherently incompatible. Thankfully, pandoc is capable of converting from HTML to markdown. Pandoc is great at this kind of thing - it uses an intermediate representation and allows you to convert almost any type of textual document format to any other format. Markdown to PDF, EPUB to plain text, ..... and HTML to markdown (just to name a few). It actually looks like it shares a number of features with traditional compilers like GCC. Anyway, let's use it to convert our folder full of wikitext files to a folder full of markdown: mkdir -p pages_md; find pages/ -type f -name "*.html" -print0 | nice -n20 xargs -P4 -0 -n1 -I{} sh -c 'filename="{}"; title="${filename##*/}"; title="${title%.*}"; pandoc --from "html" --to "markdown+backtick_code_blocks+pipe_tables+strikeout" "${filename}" -o "pages_md/${title}.md"; echo "${title}";';

_(See this on explainshell.com - doesn't include the nice -n20 due to a bug on their end)_

This looks complicated, but it really isn't. Let's break it down a bit:

find pages/ -type f -name "*.html" -print0

This finds all the HTML files that we want to convert to Markdown, and delimits the output with a NUL byte - i.e. 0x0. This makes the next step easier:

... | nice -n20 xargs -P4 -0 -n1 -I{} sh -c '....'

This pushes a new xargs instance into the background, which will execute 4 commands at a time. xargs executes a command for each line of input it receives. In our case, we're delimiting with NUL 0x0 instead though. We explicitly specify that we can 1 command per line of input though, as xargs tries to optimise and do command file1 file2 file3 instead.

The sh -c bit is starting a subshell, in which we execute a small wrapper script that then calls pandoc. This is of course inefficient, but I couldn't find any way around spawning a subshell in this instance.

filename="{}";
title="${filename##*/}"; title="${title%.*}";
pandoc --from "html" --to "markdown+backtick_code_blocks+pipe_tables+strikeout" "${filename}" -o "pages_md/${title}.md";
echo "${title}"; I've broken the sh -c subshell script down into multiple liens for readability. Simply put, it extracts the page title from the filename, converts the HTML to Markdown, and saves it to a new file in a different directory with the .md replacing the original .html extension. When you put all these components together, you get a script that converts a folder full of HTML files to Markdown. Just like with the markov chains extraction I mentioned at the beginning of this post, Bash and shell scripting really is all about lego bricks. This is due in part to the Unix philosophy: Make each program do one thing well. There is more to it, but this is the most important point to remember. Many of the core utilities you'll find on the terminal follow this way of thinking. There's 1 last thing we need to take care of before we have them in the right format though - we need to convert the [display text](page name) markdown-format links back into the Wikipedia [[internal link]] format that Pepperminty Wiki also uses. Thankfully, another command-line tool I know of called repren is well-suited to this: repren --from '$([^$]+)\]$([^):]+)$' --to '[[\1]]' pages_md/*.md It took some fiddling, but I got all the escaping figured out and the above converts back into the [[internal link]] format well enough. Now that we've got our folder full of markdown files, we need to extract a random portion of them to act as a test for Pepperminty Wiki - as the whole lot might be a bit much for it to handle (though if Pepperminty Wiki was capable of handling it all eventually that'd be awesome :D). Let's try 500 pages to start: find path/to/wikipages/ -type f -name "*.md" -print0 | shuf --zero-terminated | head -n500 --zero-terminated | xargs -0 -n1 -I{} cp "{}" . (See this on explainshell.com) This is another lego-brick style command. Let's break it down too: find path/to/wikipages/ -type f -name "*.md" -print0 This lists all the .md files in a directory, delimiting them with a NUL character, as before. It's better to do this than use ls, as find is explicitly designed to be machine-readable. .... | shuf --zero-terminated The shuf command randomly shuffles the input lines. In this case, we're telling it that the input is delimited by the NUL byte. .... | head -n500 --zero-terminated Similar deal here. head takes the top N lines of input, and discards the rest. .... | xargs -0 -n1 -I{} cp "{}" . Finally, xargs calls cp to copy the selected files to the current directory - which is, in this case, the root directory of my test Pepperminty Wiki instance. Since I'm curious, let's now find out roughly how many words we're dealing with here: cat data_test/*.md | wc --words 1593190  1.5 million words! That's a lot. I wonder how quickly we can search that? 24.8ms? Awesome! That's so much better than before. If you're wondering about new coat of paint in the screenshot - Pepperminty Wiki is getting dark theme, thanks to prefers-color-scheme :D I wonder what happens if we push it to 2K pages? This time we get ~120ms for 5.9M total words - wow! I wasn't expecting it to perform so well. At this scale, rebuilding the entire index is particularly costly - so if I was to push it even further it would make sense to implement an incremental approach that spreads the work over multiple requests, assuming I can't squeeze any more performance out the system as-is. The last thing I want to do here is make a rough estimate about time time complexity the search system as-is, given the data we have so far. This isn't particularly difficult to do. Given the results above, we can calculate that at 1.5M total words, an increase of ~60K total words results in an increase of 1ms of execution time. At 5.9M words, it's only ~49K words / ms of execution time - a drop of ~11K words / ms of execution time. From this, we can speculate that for every million words total added to a wiki, we can expect a ~2.5K words / ms of execution time drop - not bad! We'd need more data points to make any reasonable guess as to the Big-O complexity function that it conforms to. My guess would be something like$O(xN^2)$, where x is a constant between ~0.2 and 2. Maybe at some point I'll go to the trouble of running enough tests to calculate it, but with all the variables that affect the execution time (number of pages, distribution of words across pages, etc.), I'm not in any hurry to calculate it. If you'd like to do so, go ahead and comment below! Next time, I'll unveil the inner working of the STAS: my new search-term analysis system. Found this interesting? Got your own story about some cool code you've written to tell? Comment below! ## Powahroot: Client and Server-side routing in Javascript If I want to really understand something, I usually end up implementing it myself. This is the case with my latest library - powahroot, but also because I didn't really like the way any of the alternatives functioned because I'm picky. Originally I wrote it for this project (although it's actually for a little satellite project that isn't open-source unfortunately - maybe at some point in the future!) - but I liked it so much that I decided that I had to turn it into a full library that I could share here. In short, a routing framework helps you get requests handled in the right places in your application. I've actually blogged about this before, so I'd recommend you go and read that post first before continuing with this one. For all the similarities between the server side (as mentioned in my earlier post) and the client side, the 2 environments are different enough that they warrant having 2 distinctly separate routers. In powahroot, I provide both a ServerRouter and a ClientRouter. The ServerRouter is designed to handle Node.js HTTP request and response objects. It provides shortcut methods .get(), .post(), and others to quickly create routes for different request types - and also supports middleware to enable logical separation of authentication, request processing, and response generation. The ClientRouter, on the other hand, is essentially a stripped-down version of the ServerRouter that's tailored to functioning in a browser environment. It doesn't support middleware (yet?), but it does support the pushstate that's part of the History API. I've also published it on npm, so you can install it like this: npm install --save powahroot Then you can use it like this: # On the server import ServerRouter from 'powahroot/Server.mjs'; // .... const router = new ServerRouter(); router.on_all(async (context, next) => { console.debug(context.url); await next()}) router.get("/files/::filepath", (context, _next) => context.send.plain(200, You requested${context.params.filepath}));
// .....
# On the client
import ClientRouter from 'powahroot/Client.mjs';

// ....

const router = new ClientRouter({
// Options object. Default settings:
verbose: false, // Whether to be verbose in console.log() messages
listen_pushstate: true, // Whether to react to browser pushstate events (excluding those generated by powahroot itself, because that would cause an infinite loop :P)
});

As you can see, powahroot uses ES6 Modules, which makes it easy to split up your code into separate independently-operating sections.

In addition, I've also generated some documentation with the documentation tool on npm. It details the API available to you, and should serve as a good reference when using the library.

You can find that here: https://starbeamrainbowlabs.com/code/powahroot/docs/

It's automatically updated via continuous integration and continuous deployment, which I really do need to get around to blogging about (I've spent a significant amount of time setting up the base system upon which powahroot's CI and CD works. In short I use Laminar CI and a GitHub Webhook, but there's a lot of complicated details).

Found this interesting? Used it in your own project? Got an idea to improve powahroot? Comment below!

## Easy Ansi Escape Code in C♯ and Javascript

I've been really rather ill over the weekend, so I wasn't able to write a post when I wanted to. I'm starting to recover now though, so I thought I'd write a quick post about a class I've written in C♯ (and later ported to Javascript via an ES6 Module) that makes using VT100 Ansi escape codes easy.

Such codes are really useful for changing the text colour & background in a terminal or console, and for moving the cursor around to draw fancy UI elements with just text.

I initially wrote it a few months ago as part of an ACW (Assessed CourseWork), but out of an abundance of caution I've held back on open-sourcing this particular code fragment that drove the code I submitted for that assessment.

Now that the assessment is done & marked though (and I've used this class in a few projects since), I feel comfortable with publishing the code I've written publically. Here it is in C♯:

(Can't see the above? Try this direct link instead)

In short, it's a static class that contains a bunch of Ansi escape codes that you can include in strings that you send out via Console.WriteLine() - no special fiddling required! Though, if you're on Windows, don't forget you need to enable escape sequences first. Here's an example that uses the C♯ class:

// ....
catch(Exception error) {
Console.WriteLine($"{Ansi.FRed}Error: Oops! Something went wrong."); Console.WriteLine($"    {error.Message}{Ansi.Reset}");
return;
}
// ....

Of course, you can get as elaborate as you like. In addition, if you need to disable all escape sequence output for some reason (e.g. you know you're writing to a log file), simply set Ansi.Enabled to false.

Some time later, I found myself needing this class in Javascript (for a Node.js application I'm pretty sure) quite badly (reason - I might blog about it soon :P). To that end, I wound up porting it from C♯ to Javascript. The process wasn't actually that painful - probably because it's a fairly small and simple class.

With porting complete, I intend to keep both versions in sync with each other as I add more features - but no promises :P

Here's the Javascript version:

(Can't see the above? Try this direct link instead)

The Javascript version isn't actually a static class as like the C♯ version, due to the way ES6 modules work. In the future, I'd like to do some science and properly figure out the ins and outs of ES6 modules and refactor this into the JS equivalent of a globally static class (or a Singleton, though in JS we don't have to worry about thread safety 'cause the only way to communicate between threads (also) in JS is through messaging).

Still, it works well enough for my purposes for now.

Found this interesting? Got an improvement? Just want to say hi? Comment below!

## Building Javascript (and other things) with Rollup

Hey, another blog post!

Recently I've been somewhat distracted by another project which has been most interesting. I've learnt a bunch of things (including getting started with LeafletJS and Chart.JS), but most notably I've been experimenting with another Javascript build system.

I've used quite a few build systems over the years. For the uninitiated, their primary purpose is to turn lots of separate source files into as few output files as possible. This is important with things that run in the browser, because the browser has to download everything before it can execute it. Fewer files and less data means a speedier application - so everyone wins.

I've been rather unhappy with the build systems I've used so far:

• Gulp.js was really heavy and required a lot of configuration in order to get things moving.
• Browserify was great, but I found the feature set lacking and ended up moving on because it couldn't do what I wanted it to.
• Webpack was ok, but it also has a huge footprint and was super slow too.

One thing led to another, and after trying a few other systems (I can't remember their names, but one begin with P and has a taped-up cardboard box as a logo) I found Rollup (I think Webpack requires Rollup as a dependency under the hood, if I'm not mistaken).

Straight away, I found that Rollup was much faster than Webpack. It also requires minimal configuration (which doesn't need changing every time you create a new source file) - which allows me to focus on the code I'm writing instead of battling with the build system to get it to do what I want.

It also supports plugins - and with a bunch available on npm, there's no need to write your own plugin to perform most common tasks. For my recent project I linked to above, I was able to put a bunch of plugins together with a configuration file - and, with a bit of fiddling around, I've got it to build both my Javascript and my CSS (via a postcss integration plugin) and putt he result in a build output folder that I've added to my .gitignore file.

I ended up using the following plugins:

• rollup-plugin-node-resolve - essential. Allows import statements to be automatically resolved to their respective files.
• rollup-plugin-commonjs - An addition to the above, if I recall correctly
• rollup-plugin-postcss - Provides integration with Postcss. I've found Postcss with be an effective solution to the awkward issue of CSS
• rollup-plugin-terser - Uses Terser to minify modern Javascript. I didn't know this existed until I found the Rollup plugin - it's a fork of and the successor to UglifyJS2, which only supports ES5 syntax.

....and the following Postcss plugins:

• postcss-import - Inlines @import statements into a single CSS source file, allowing you to split your CSS up into multiple files (finally! :P)
• postcss-copy - after trying 5 different alternatives, I found this to be the most effective at automatically copying images etc. to the build output folder - potentially renaming them - and altering the corresponding references in the CSS source code

The definition of plugins is done in a Javascript configuration file in a array - making it clear that your code goes through a pipeline of plugins that transform step-by-step to form the final output - creating and maintaining a source map along the way (very handy for debugging code in the browser. It's basically a file that lets the browser reverse-engineer the final output to present you with the original source code again whilst debugging) - which works flawlessly in my browser (the same can't be said for Webpack - I had lots of issues with the generated source maps there).

I ended up with a config file like this:

(Can't see the file above? Try viewing it directly.)

In short, the pipeline goes something like this:

(Source file; Created in Draw.io)

Actually calling Rollup is done in a Bash-based general build task file. Rollup is installed locally to the project (npm install --save rollup), so I call it like this:

node_modules/.bin/rollup --sourcemap --config rollup.config.js;

I should probably make a blog post about that build task system I wrote and why/how I made it, but that's a topic for another time :P

I'll probably end up tweaking the setup I've described here a bit in the future, but for now it does everything I'd like it to :-)

Found this interesting? Got a different way of doing it? Still confused? Comment below!

## Generating word searches for fun and profit

(Above: A Word search generated with the tool below)

A little while ago I was asked about generating a wordsearch in a custom shape. I thought to myself "someone has to have built this before...", and while I was right to an extent, I couldn't find one that let you use any shape you liked.

This, of course, was unacceptable! You've probably guessed it by now, but I went ahead and wrote my own :P

While I wrote it a little while ago, I apparently never got around to posting about it on here.

In short, it works by using an image you drop into the designated area on the page as the shape the word search should take. Each pixel is a single cell of the word search - with the alpha channel representing whether or not a character is allowed to be placed there (transparent means that it can't contain a character, and opaque means that it can).

Creating such an image is simple. Personally, I recommend Piskel or GIMP for this purpose.

Once done, you can start building a wordlist in the wordlist box at the right-hand-side. It should rebuild the word search as soon as you click out of the box. If it doesn't, then you've found a bug! Please report it here.

With the word search generated, you can use the Question Sheet and Answer Sheet links to open printable versions for export.

You can find my word search generator here:

I've generated a word search of the current tags in the tag cloud on this blog too: Question Sheet [50.3KiB], Answer Sheet [285.6KiB]

The most complicated part of this was probably the logistics behind rude word removal. Thankfully, I did't have to find and maintain such a list of words, as the futility npm package does this for me, but algorithmically guaranteeing that by censoring 1 rude word another is not accidentally created in another direction is a nasty problem.

If you're interested in a more technical breakdown of one (or several!) particular aspects of this - let me know! While writing about all of it would probably make for an awfully long post, a specific aspect or two should be more manageable.

In the future, I'll probably revisit this and add additional features to it, such as the ability to restrict which directions words are placed in, for example. If you've got a suggestion of your own, open an issue (or even better, open a pull request :D)!

## Converting my timetable to ical with Node.JS and Nightmare

(Source: Taken by me!)

My University timetable is a nightmare. I either have to use a terrible custom app for my phone, or an awkwardly-built website that feels like it's at least 10 years old!

Thankfully, it's not all doom and gloom. For a number of years now, I've been maintaining a Node.JS-based converter script that automatically pulls said timetable down from the JSON backend of the app - thanks to a friend who reverse-engineered said app. It then exports it as a .ical file that I can upload to my server & subscribe to in my pre-existing calendar.

Unfortunately, said backend changed quite dramatically recently, and broke my script. With the only alternative being the annoying timetable website that really don't like being scraped.

Where there's a will, there's a way though. Not to be deterred, I gave it a nightmare of my own: a scraper written with Nightmare.JS - a Node.JS library that acts, essentially, as a scriptable web-browser!

While the library has some kinks (especially with .wait("selector")), it worked well enough for me to implement a scraper that pulled down my timetable in HTML form, which I then proceeded to parse with cheerio.

The code is open-source (find it here!) - and as of this week I've updated it to work with the new update to the timetabling system this semester. A further update will be needed in early December time, which I'll also be pushing to the repository.

The README of the repository should contain adequate instructions for getting it running yourself, but if not, please open an issue!

Note that I am not responsible for anything that happens as a result of using this script! I would strongly recommend setting up the secure storage of your password if you intend to automate it. I've just written this to solve a problem in order to ensure that I can actually get to my lectures on time - and not an hour late or on the wrong week because I've misread the timetable (again)!

In the future, I'd like to experiment with other scriptable web-browser frameworks to compare them with my experiences with NightmareJS.

Found this interesting? Found a better way to do this? Comment below!

## What I've learnt from #LOWREZJAM 2018

(Above: The official LOWREZJAM 2018 logo)

While I didn't manage to finish my submission in time for #LOWREZJAM, I've had ton of fun building it - and learnt a lot too! This post is a summary of my thoughts and the things I've learnt.

Firstly, completing a game for a game jam requires experience - and preparation - I think. As this was my first game jam, I didn't know how I should prepare for such a thing in order to finish my submission successfully - in subsequent jams I now know what it is that I need to prepare in advance. This mainly includes learning the technology(ies) and libraries I intend to use.

(Sorry, but this image is © Starbeamrainbowlabs 2018)

This time around, I went with building my own game engine. While this added a lot of extra work (which probably wasn't helpful) and contributed to the architectural issues that prevented me from finishing my submission in time, I found it a valuable experience in learning how a game engine is put together and how it works. This information, I feel, will help inform my choice in which other game engine I learn next. I've got my eye on Crafty.JS, Phaser (v3 vs the community edition, anyone? I'm confused), and perhaps MelonJS - your thoughts on the matter are greatly appreciated! I may blog my choice after exploring the available options at a later date.

I've also got some ideas as to how to improve my Sprite Packer, which I wrote a while ago. While it proved very useful in the jam, I think it could benefit from some tweaks to the algorithm to ensure it packs things in neater than it currently does.

I'd also like to investigate other things such as QuadTrees (used to quickly search an area for nearby objects - and also to compress images, apparently) and Particle Systems, as they look really interesting.

I'd also like to talk a little about my motivation behind participating in this game jam. For me, it's a couple of reasons. Not only does it help me figure out what I need to work on (see above), but it's always been a personal dream of mine to make a game that tells a story. I find games to be an art form for communicating ideas, stories, and emotions in a interactive way that simply isn't possible with other media formats, such as films, for instance (films are great too though).

While I'll never be able to make a game of triple-A quality on my own, many developers have shown time and again that you don't need an army of game developers, artists, musicians, and more to make a great game.

As my art-related skills aren't amazing, I decided on participating in #LOWREZJAM, because lower-resolution sprites are much easier to draw (though I've had lots of help from the same artist who helped me design my website here). Coupled with the 2-week timespan (I can't even begin to fathom building a game in 4 hours - let alone the 24 hours from 3 Thing Game!), it was the perfect opportunity to participate and improve my skills. While I haven't managed to complete my entry this year, I'll certainly be back again for more next year!

(Sorry, but this image is © Starbeamrainbowlabs 2018)

Found this interesting? Had your own experience with a game jam? Comment below!

## Acorn Validator

Edit: Corrected title and a bunch of grammatical mistakes. I typed this out on my phone with Monospace - and I seems that my keyboard (and Phone-Laptop Bluetooth connection!) leave something to be desired.....

Over the last week, I've been hard at work on an entry for #LOWREZJAM. While it's not finished yet (submission is on Thursday), I've found some time to write up a quick blog post about a particular aspect of it. Of course, I'll be blogging about the rest of it later once it's finished :D

The history of my entry is somewhat.... complicated. Originally, I started work on it a few months back as an independent project, but due to time constraints and other issues I was unable to get very far with it. Once I discovered #LOWREZJAM, I made the decision to throw away the code I had written and start again from (almost) scratch.

It is for this reason that I have a Javascript validator script lying around. I discovered when writing it originally that my editor Atom didn't have syntax validation support for Javascript. While there are extensions that do the job, it looked like a complicated business setting one up for just syntax checking (I don't want your code style guideline suggestions! I have my own style!). To this end, I wrote myself a quick bash script to automatically check the syntax of all my javascript files that I can then include as a build step - just before webpack.

Over the time I've been working on my #LOWREZJAM entry here, I've been tweaking and improving it - and thought I'd share it here. In the future, I'm considering turning it into a linting provider for the Atom editor I mentioned above (it depends on how complicated that is, and how good the documentation is to help me understand the process).

The script (which can be found in full at the bottom of this post), has 2 modes of operation. In the first mode, it acts as a co-ordinator process that starts all the sub-processes that validate the javascript. In the second mode, it validates a single file - outputting the result to the standard output and also logging any errors in a special place that the co-ordinator process can find them later.

It decides which mode to operate in based on whether it recieves an argument telling it which file to validate:

if [ "$1" != "" ]; then validate_file "$1" "$2"; exit$?;
fi

If it detects an argument, then it calls the validate_file function and exits with the returned exit code.

If not, then the script continues into co-ordinator mode. In this mode it chains a bunch of commands together like lego bricks to start subprocesses to validate all of the javascript files it can find a fast as possible - acorn, the validator itself, can only check one file as a time it would appear. It does this like so:

find . -not -path "./node_modules/" -not -path "./dist/" | grep -iP '\.mjs$' | xargs -n1 -I{} -P32$0 "{}" "${counter_dirname}"; This looks complicated, but it can be broken down into smaller, easy-to-understand chunks. explainshell.com is rather good at demonstrating this. Particularly of note here is the $0. This variable holds the path to the currently executing script - allowing the co-ordinator to call itself in validator mode.

The validator function itself is also quite simple. In short, it runs the validator, storing the result in a variable. It then also saves the exit code for later analysis. Once done, it outputs to the standard output, and then also outputs the validator's output - but only is there was an error to keep things neat and tidy. Finally, if there was an error, it outputs a file to a temporary directory (whose name is determined by the co-ordinator and passed to sub-processes via the 2nd argument) with a name of its PID (the content doesn't matter - I just output 1, but anything will do). This allows the co-ordinator to count the number of errors that the subprocesses encounter, without having to deal with complicated locks arising from updating the value stored in a single file. Here's that in bash:

counter_dirname=$2; # ....... # Use /dev/shm here since apparently while is in a subshell, so it can't modify variables in the main program O.o if ! [ "${validate_exit_code}" -eq 0 ]; then
echo 1 &gt;"${counter_dirname}/$$"; fi Once all the subprocesses have finished up, the co-ordinator counts up all the errors and outputs the total at the bottom: error_count=(ls {counter_dirname} | wc -l); echo echo Errors: error_count echo  Finally, the co-ordinator cleans up after the subprocesses, and exits with the appropriate error code. This last bit is important for automation, as a non-zero exit code tells the patent process that it failed. My build script (which uses my lantern build engine, which deserves a post of its own) picks up on this and halts the build if any errors were found. rm -rf "{counter_dirname}"; if [ {error_count} -ne 0 ]; then exit 1; fi exit 0; That's about all there is to it! The complete code can be found at the bottom of this post. To use it, you'll need to run npm install acorn in the directory that you save it to. I've done my best to optimize it - it can process a dozen or so files in ~1 second - but I think I can do much better if I rewrite it in Node.JS - as I can eliminate the subprocesses by calling the acorn API directly (it's a Node.JS library), rather than spawning many subprocesses via the CLI. Found this useful? Got a better solution? Comment below! #!/usr/bin/env sh validate_file() { filename=1; counter_dirname=2; validate_result=(node_modules/.bin/acorn --silent --allow-hash-bang --ecma9 --module filename 2&gt;&amp;1); validate_exit_code=?; validate_output=([ {validate_exit_code} -eq 0 ] &amp;&amp; echo ok || echo {validate_result}); echo "{filename}: {validate_output}"; # Use /dev/shm here since apparently while is in a subshell, so it can't modify variables in the main program O.o if ! [ "{validate_exit_code}" -eq 0 ]; then echo 1 &gt;"{counter_dirname}/$$"; fi } if [ "$1" != "" ]; then
validate_file "$1" "$2";
exit $?; fi counter_dirname=$(mktemp -d -p /dev/shm/ -t acorn-validator.XXXXXXXXX.tmp);
# Parallelisation trick from https://stackoverflow.com/a/33058618/1460422
# Updated to use xargs
find . -not -path "./node_modules/" -not -path "./dist/" | grep -iP '\.mjs$' | xargs -n1 -I{} -P32$0 "{}" "${counter_dirname}"; error_count=$(ls ${counter_dirname} | wc -l); echo echo Errors:$error_count
echo

rm -rf "${counter_dirname}"; if [${error_count} -ne 0 ]; then
exit 1;
fi

exit 0;

## Routers: Essential, everywhere, and yet exasperatingly elusive

Now that I've finished my University work for the semester (though do have a few loose ends left to tie up), I've got some time on my hands to do a bunch of experimenting that I haven't had the time for earlier in the year.

In this case, it's been tracking down an HTTP router that I used a few years ago. I've experimented with a few now (find-my-way, micro-http-router, and rill) - but all of them few something wrong with them, or feel too opinionated for my taste.

I'm getting slightly ahead of myself though. What's this router you speak of, and why is it so important? Well, it call comes down to application design. When using PHP, you can, to some extent, split your application up by having multiple files (though I would recommend filtering everything through a master index.php). In Node.JS, which I've been playing around with again recently, that's not really possible.

Unlike PHP, which gets requests handed to it from a web server like Nginx via CGI (Common Gateway Interface), Node.JS is the server. You can set up your very own HTTP server listening on port 9898 like this:

import http from 'http';
const http_server = http.createServer((request, response) => {
});
response.end("Hello, world!");
}).listen(9898, () => console.log("Listening on pot 9898"));

This poses a problem. How do we know what the client requested? Well, there's the request object for that - and I'm sure you can guess what the response object is for - but the other question that remains is how to we figure out which bit of code to call to send the client the correct response?

That's where a request router comes in handy. They come in all shapes and sizes - ranging from a bare-bones router to a full-scale framework - but their basic function is the same: to route a client's request to the right place. For example, a router might work a little bit like this:

import http from 'http';
import Router from 'my-awesome-router-library';

// ....

const router = new Router();

const http_server = http.createServer(router.handler()).listen(20202);

Pretty simple, right? This way, every route can lead to a different function, and each of those functions can be in a separate file! Very cool. It all makes for a nice and neat way to structure one's application, preventing any issues relating to any one file getting too big - whilst simultaneously keeping everything orderly and in its own place.

Except when you're picky like me and you can't find a router you like, of course. I've got some pretty specific requirements. For one, I want something flexible and unopinionated enough that I can do my own thing without it getting in the way. For another, I'd like first-class support for middleware.

What's middleware you ask? Well, I've only just discovered it recently, but I can already tell that's its a very powerful method of structuring more complex applications - and devastatingly dangerous if used incorrectly (the spaghetti is real).

Basically, the endpoint of a route might parse some data that a client has sent it, and maybe authenticate the request against a backend. Perhaps a specific environment needs to be set up in order for a request to be fulfilled.

While we could do these things in the end route, it would clutter up the code in the end route, and we'd likely have more boilerplate, parsing, and environment setup code than we have actual application logic! The solution here is middleware. Think of it as an onion, with the final route application logic in the middle, and the parsing, logging, and error handling code as the layers on the outside.

In order to reach the application logic at the centre, an incoming request must first make its way through all the layers of middleware that are in the way. Similarly, it must also churn through the layers of middleware in order to get out again. We could represent this in code like so:

// Middleware that runs for every request
router.use(middleware_error_handler);
router.use(middleware_request_logger);

// Decode all post data with middleware
// This won't run for GET / HEAD / PUT / etc. requests - only POST requests
router.post(middleware_decode_post_data);

// For GET requestsin under /inbox, run some middleware
router.get("/inbox", middleware_setup_user_area);

// Endpoint routes
// These function just like middleware too (i.e. we could
// pass the request through to another layer if we wanted
// to), but that don't lead anywhere else, so it's probably
// better if we keep them separate
router.any("/honeypot", route_spambot_trap);

router.post("/login", route_do_login);

Quite a neat way of looking at it, right? Lets take a look at some example middleware for our fictional router:

async function middleware_catch_errors(context, next) {
try {
await next();
} catch(error) {
console.error(error.stack);
"content-type": "text/plain"
});
// todo make this fancier
context.response.end("Ouch! The server encountered an error and couldn't respond to your request. Please contact bob at bob@bobsrockets.com!");
}
}

See that next() call there? That function call there causes the application to enter the next layer of middleware. We can have as many of these layers as we like - but don't go crazy! It'll cause you problems later whilst debugging.....

What I've shown here is actually very similar to the rill` framework - it just has a bunch of extras tagged on that I don't like - along with some annoying limitations when it comes to defining routes.

To that end, I think I'll end up writing my own router, since none of the ones I've found will do the job just right. It kinda fits with the spirit of the project that this is for, too - teaching myself new things that I didn't know before.

If you're curious as to how a Node.JS application is going to fit in with a custom HTTP + WebSockets server written in C♯, then the answer is a user management panel. I'm not totally sure where this is going myself - I'll see where I end up! After all, with my track record, you're bound to find another post or three showing up on here again some time soon.

Until then, Goodnight!

Found this useful? Still got questions? Comment below!

Art by Mythdael