## Stardust Blog

Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

## Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## Animated PNG for all!

I recently discovered that Animated PNGs are now supported by most major browsers:

I stumbled across the concept of an animated PNG a number of years ago (on caniuse.com actually if I remember right!), but at the time browser support was very bad (read: non-existent :P) - so I moved on to other things.

I ended up re-discovering it a few weeks ago (also through caniuse.com!), and since browser support is so much better now I decided that I just had to play around with it.

It hasn't disappointed. Traditional animated GIFs (Graphics Interchange Format for the curious) are limited to 256 colours, have limited transparency support (a pixel is either transparent, or it isn't - there's no translucency support), and don't compress terribly well.

Animated PNGs (Portable Network Graphics), on the other hand, let you enjoy all the features of a regular PNG (as many colours as you want, translucent pixels, etc.) while also supporting animation, and compressing better as well! Best of all, if a browser doesn't support the animated PNG standard, they will just see and render the first frame as a regular PNG and silently ignore the rest.

Let's take it out for a spin. For my test, I took an image and created a 'panning' animation from one side of it to the other. Here's the image I've used:

(Credit: The background on the download page for Mozilla's Firefox Nightly builds. It isn't available on the original source website anymore (and I've lost the link), but can still be found on various wallpaper websites.)

The first task is to generate the frames from the original image. I wrote a quick shell script for this purpose. Firstly, I defined a bunch of variables:

#!/usr/bin/env bash
set -e; # Crash if we hit an error

input_file="Input.png";
output_file="Output.apng"

# The maximum number of frames
max_frame=32;
end_x=960; end_y=450;
start_x=0; start_y=450;

crop_size_x=960; crop_size_y=540;

mkdir -p ./frames;
Variable Meaning
input_file The input file to generate frames from
output_file The file to write the animated png to
max_frame The number of frames (plus 1) to generate
start_x The starting position to pan from on the x axis
start_y The starting position to pan from on the y axis
end_x The ending position to pan to on the x axis
end_y The ending position to pan to on the y axis
crop_size_x The width of the cropped frames
crop_size_y The height of the cropped frames

It's worth noting here that it's probably a good idea to implement a proper CLI to this script at this point, but since it's currently only a one-off script I didn't bother. Perhaps in the future I'll come back and improve it if I find myself needing it again for some other purpose.

With the parameters set up (and a temporary directory created - note that you should use mktemp -d instead of the approach I've taken here!), we can then use a for loop to repeatedly call ImageMagick to crop the input image to generate our frames. This won't run in parallel unfortunately, but since it's only a few frames it shouldn't take too long to render. This is only a quick shell script after all :P


for ((frame=0; frame <= "${max_frame}"; frame++)); do this_x="$(calc -p "${start_x}+((${end_x}-${start_x})*(${frame}/${max_frame}))")"; this_y="$(calc -p "${start_y}+((${end_y}-${start_y})*(${frame}/${max_frame}))")"; percent="$(calc -p "round((${frame}/${max_frame})*100, 2)")";

convert "${input_file}" -crop "${crop_size_x}x${crop_size_y}+${this_x}+${this_y}" "frames/Frame-$(printf "%02d" "${frame}").jpeg"; echo -ne "${frame} / ${max_frame} (${percent}%)          \r";
done
echo ""; # Move to the line after the progress indicator

This looks complicated, but it really isn't. The for loop iterates over each of the frame numbers. We do some maths to calculate the (x, y) co-ordinates of the frame we want to extract, and then get ImageMagick's convert command to do the cropping for us. Finally we write a quick progress indicator out to stdout (\r resets the cursor to the beginning of the line, but doesn't go down to the next one).

The maths there is probably better represented like this:

$this_x=start_x+((end_x-start_x)*\frac{currentframe}{max_frame})$

$this_y=start_y+((end_y-start_y)*\frac{currentframe}{max_frame})$

Much better :-) In short, we convert the current frame number into a percentage (between 0 and 1) of how far through the animation we are and use that as a multiplier to find the distance between the starting and ending points.

I use the calc command-line tool (in the apcalc package on Ubuntu) here to do the calculations, because the bash built-in result=$(((4 + 5))) only supports integer-based maths, and I wanted floating-point support here. With the frames generated, we only need to stitch them together to make the final animated png. Unfortunately, an easy-to-use tool does not yet exist (like gifsicle for GIFs) - so we'll have to make-do with ffpmeg (which, while very powerful, has a rather confusing CLI!). Here's how I stitched the frames together: ffmpeg -r 10 -i frames/Frame-%02d.jpeg -plays 0 "${output_file}";
• -r - The frame rate
• -i - The input filename
• -plays - The number of times to loop (0 = infinite; defaults to no looping if omitted)
• "${output_file}" - The output file Here's the final result: (Filesize: ~2.98MiB) Of course, it'd be cool to compare it to a traditional animated gif. Let's do that too! First, we'll need to convert the frames to gif (gifsicle, our tool of choice, doesn't support anything other than GIFs as far as I'm aware): mogrify -format gif frames/*.jpeg Easy-peasy! mogrify is another tool from ImageMagick that makes such in-place conversions trivial. Note that the frames themselves are stored as JPEGs because I was experiencing an issue whereby each of the frames apparently had a slightly different colour palette, and ffmpeg wasn't smart enough to correct for this - choosing instead to crash. With the frames converted, we can make our animated GIF like so: gifsicle --optimize --colors=256 --loopcount=10 --delay=10 frames/*.gif >Output.gif; Lastly, we probably want to delete the intermediate frames: rm -r ./frames Here's the animated GIF version: (Filesize: ~3.28MiB) Woah! That's much bigger. (Generated (and then extracted & edited with the Firefox developer tools) from Meta-Chart) It's ~9.7% bigger in fact! Though not a crazy amount, smaller resulting files are always good. I suspect that this effect will stack the more frames you have. Others have tested this too, finding pretty similar results to those that I've found here - though it does of course depend on your specific scenario. With that observation, I'll end this blog post. The next time you think about inserting an animation into a web page or chat window, consider making it an Animated PNG. Found this interesting? Found a cool CLI tool for manipulating APNGs? Having trouble? Comment below! ### Sources and Further Reading ## Easy Node.JS Dependencies Updates Once you've had a project around for a while, it's inevitable that dependency updates will become available. Unfortunately, npm (the Node Package Manager), while excellent at everything else, is completely terrible at notifying you about updates. The solution to this is, of course, to use an external tool. Personally, I use npm-check, which is also installable via npm. It shows you a list of updates to your project's dependencies, like so: (Can't see the above? View it directly on asciinema.org) It even supports the packages that you've install globally too, which no other tool appears to do as far as I can tell (although it does appear to miss some packages, such as npm and itself). To install it, simply do this: sudo npm install --global npm-check Once done, you can then use it like this: # List updates, but don't install them npm-check # Interactively choose the updates to install npm-check -u # Interactively check the globally-installed packages sudo npm-check -gu The tool also checks to see which of the dependencies are actually used, and prompts you to check the dependencies it think you're not using (it doesn't always get it right, so check carefully yourself before removing!). There's an argument to disable this behaviour: npm-check --skip-unused Speaking of npm package dependencies, the other big issue is security vulnerabilities. GitHub have recently started giving maintainers of projects notifications about security vulnerabilities in their dependencies, which I think is a brilliant idea. Actually fixing said vulnerabilities is a whole other issue though. If you don't want to update all the dependencies of a project to the latest version (perhaps you're just doing a one-off contribution to a project or aren't very familiar with the codebase yet), there's another tool - this time built-in to npm - to help out - the npm audit subcommand. # Check for reported security issues in your dependencies npm audit # Attempt to fix said issues by updating packages *just* enough npm audit fix (Can't see the above? View it directly on asciinema.org) This helps out a ton with contributing to other projects. Issues arise when the vulnerabilities are not in packages you directly depend on, but instead in packages that you indirectly depend on via dependencies of the packages you've installed. Thankfully the vulnerabilities in the above can all be traced back to development dependencies (and aren't essential for Peppermint Wiki itself), but it's rather troubling that I can't install updated packages because the packages I depend on haven't updated their dependencies. I guess I'll be sending some pull requests to update some dependencies! To help me in this, the output of npm audit even displays the dependency graph of why a package is installed. If this isn't enough though, there's always the npm-why which, given a package name, will figure out why it's installed. Found this interesting? Got a better solution? Comment below! ## Generating Atom 1.0 Feeds with PHP (the proper way) I've generated Atom feeds in PHP before, but recently I went on the hunt to discover if PHP has something like C♯'s XMLWriter class - and it turns out it does! Although poorly documented (probably why I didn't find it in the first place :P), it's actually quite logical and easy to pick up. To this end, I thought I'd blog about how I used to write the Atom 1.0 feed generator for recent changes on Pepperminty Wiki that I've recently implemented (coming soon in the next release!), as it's so much cleaner than atom.gen.php that I blogged about before! It's safer too - as all the escaping is handled automatically by PHP - so there's no risk of an injection attack because I forgot to escape a character in my library code. It ends up being a bit verbose, but a few helper methods (or a wrapper class?) should alleviate this nicely - I might refactor it at a later date. To begin, we need to create an instance of the aptly-named XMLLWriter class. It's probable that you'll need the php-xml package installed in order to use this. $xml = new XMLWriter();
$xml->openMemory();$xml->setIndent(true); $xml->setIndentString("\t");$xml->startDocument("1.0", "utf-8");

In short, the above creates a new XMLWriter instance, instructs it to write the XML to memory, enables pretty-printing of the output, and writes the standard XML document header.

With the document created, we can begin to generate the Atom feed. To figure out the format (I couldn't remember from when I wrote atom.gen.php - that was ages ago :P), I ended following this helpful guide on atomenabled.org. It seems familiar - I think I might have used it when I wrote atom.gen.php. To start, we need something like this:

<feed xmlns="http://www.w3.org/2005/Atom">
......
</feed>

In PHP, that translates to this:

$xml->startElement("feed");$xml->writeAttribute("xmlns", "http://www.w3.org/2005/Atom");

$xml->endElement(); // </feed> Next, we probably want to advertise how the Atom feed was generated. Useful for letting the curious know what a website is powered by, and for spotting usage of your code out in the wild! Since I'm writing this for Pepperminty Wiki, I'm settling on something not unlike this: <generator uri="https://github.com/sbrl/Pepperminty-Wiki/" version="v0.18-dev">Pepperminty Wiki</generator> In PHP, this translates to this: $xml->startElement("generator");
$xml->writeAttribute("uri", "https://github.com/sbrl/Pepperminty-Wiki/");$xml->writeAttribute("version", $version); // A variable defined elsewhere in Pepperminty Wiki$xml->text("Pepperminty Wiki");
$xml->endElement(); Next, we need to add a <link rel="self" /> tag. This informs clients as to where the feed was fetched from, and the canonical URL of the feed. I've done this: xml->startElement("link");$xml->writeAttribute("rel", "self");
$xml->writeAttribute("type", "application/atom+xml");$xml->writeAttribute("href", full_url());
$xml->endElement(); That full_url() function is from StackOverflow, and calculates the full URI that was used to make a request. As Pepperminty Wiki can be run in any directory on ayn server, I can't pre-determine this url - hence the complexity. Note also that I output type="application/atom+xml" here. This specifies the type of content that can be found at the supplied URL. The idea here is that if you represent the same data in different ways, you can advertise them all in a standard way, with other formats having rel="alternate". Pepperminty Wiki does this - generating the recent changes list in HTML, CSV, and JSON in addition to the new Atom feed I'm blogging about here (the idea is to make the wiki data as accessible and easy-to-parse as possible). Let's advertise those too: $xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "text/html");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=html");
$xml->endElement();$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "application/json");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=json");
$xml->endElement();$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "text/csv");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=csv");
$xml->endElement(); Before we can output the articles themselves, there are a few more pieces of metadata left on our laundry list - namely <updated />, <id />, <icon />, <title />, and <subtitle />. There are others in the documentation too, but aren't essential (as far as I can tell) - and not appropriate in this specific case. Here's what they might look like: <updated>2019-02-02T21:23:43+00:00</updated> <id>https://wiki.bobsrockets.com/?action=recent-changes&amp;format=atom</id> <icon>https://wiki.bobsrockets.com/rocket_logo.png</icon> <title>Bob's Wiki - Recent Changes</title> <subtitle>Recent Changes on Bob's Wiki</subtitle> The <updated /> tag specifies when the feed was last updated. It's unclear as to whether it's the date/time the last change was made to the feed or the date/time the feed was generated, so I've gone with the latter. If this isn't correct, please let me know and I'll change it. The <id /> element can contain anything, but it must be a globally-unique string that identifies this feed. I've seen other feeds use the canonical url - and I've gone to the trouble of calculating it for the <link rel="self" /> - so it seems a shame to not use it here too. The remaining elements (<icon />, <title />, and <subtitle />) are pretty self explanatory - although it's worth mentioning that the icon must be square apparently. Let's whip those up with some more PHP: $xml->writeElement("updated", date(DateTime::ATOM));
$xml->writeElement("id", full_url());$xml->writeElement("icon", $settings->favicon);$xml->writeElement("title", "$settings->sitename - Recent Changes");$xml->writeElement("subtitle", "Recent Changes on $settings->sitename"); PHP even has a present for generating a date string in the correct format required by the spec :D $settings is an object containing the wiki settings that's a parsed form of peppermint.json, and contains useful things like the wiki's name, icon, etc.

Finally, with all the preamble done, we can turn to the articles themselves. In the case of Pepperminty Wiki, the final result will look something like this:

<entry>
<title type="text">Edit to Compute Core by Sean</title>
<id>https://seanssatellites.co.uk/wiki/?page=Compute%20Core</id>
<updated>2019-01-29T10:21:43+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User:&lt;/strong&gt;  Sean&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page name:&lt;/strong&gt; Compute Core&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Tue, 29 Jan 2019 10:21:43 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 1.36kb&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +1&lt;/li&gt;
&lt;/ul&gt;</content>
<link rel="alternate" type="text/html" href="https://seanssatellites.co.uk/wiki/?page=Compute%20Core"/>
<author>
<name>Sean</name>
<uri>https://seanssatellites.co.uk/wiki/?page=Users%2FSean</uri>
</author>
</entry>

There are a bunch of elements here that deserve attention:

• <title /> - The title of the article. Easy peasy!
• <id /> - Just like the id of the feed itself, each article entry needs an id too. Here I've followed the same system I used for the feed, and given the url of the page content.
• <updated /> - The last time the article was updated. Since this is part of a feed of recent changes, I've got this information readily at hand.
• <content /> - The content to display. If the content is HTML, it must be escaped and type="html" present to indicate this.
• <link rel="alternate" /> Same deal as above, but on an article-by-article level. In this case, it should link to the page the article content is from. In this case, I link to the page & revision of the change in question. In other cases, you might link to the blog post in question for example.
• <author /> - Can contain <name />, <uri />, and <email />, and should indicate the author of the content. In this case, I use the name of the user that made the change, along with a link to their user page.

Here's all that in PHP:

foreach($recent_changes as$recent_change) {
if(empty($recent_change->type))$recent_change->type = "edit";

$xml->startElement("entry"); // Change types: revert, edit, deletion, move, upload, comment$type = $recent_change->type;$url = "$full_url_stem?page=".rawurlencode($recent_change->page);

$content = ".......";$xml->startElement("title");
$xml->writeAttribute("type", "text");$xml->text("$type$recent_change->page by $recent_change->user");$xml->endElement();

$xml->writeElement("id",$url);
$xml->writeElement("updated", date(DateTime::ATOM,$recent_change->timestamp));

$xml->startElement("content");$xml->writeAttribute("type", "html");
$xml->text($content);
$xml->endElement();$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "text/html");
$xml->writeAttribute("href",$url);
$xml->endElement();$xml->startElement("author");
$xml->writeElement("name",$recent_change->user);
$xml->writeElement("uri", "$full_url_stem?page=".rawurlencode("$settings->user_page_prefix/$recent_change->user"));
$xml->endElement();$xml->endElement();
}

I've omitted the logic that generates the value of the <content /> tag, as it's not really relevant here (you can check it out here if you're curious :D).

This about finishes the XML we need to generate for our feed. To extract the XML from the XMLWriter, we can do this:

$atom_feed =$xml->flush();

Then we can do whatever we want to with the generated XML!

When the latest version of Pepperminty Wiki comes out, you'll be able to see a live demo here! Until then, you'll need to download a copy of the latest master version and experiment with it yourself. I'll also include a complete demo feed below:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<generator uri="https://github.com/sbrl/Pepperminty-Wiki/" version="v0.18-dev">Pepperminty Wiki</generator>
<link rel="self" type="application/atom+xml" href="http://[::]:35623/?action=recent-changes&amp;format=atom&amp;count=3"/>
<link rel="alternate" type="text/html" href="http://[::]:35623/?action=recent-changes&amp;format=html"/>
<link rel="alternate" type="application/json" href="http://[::]:35623/?action=recent-changes&amp;format=json"/>
<link rel="alternate" type="text/csv" href="http://[::]:35623/?action=recent-changes&amp;format=csv"/>
<updated>2019-02-03T17:25:10+00:00</updated>
<id>http://[::]:35623/?action=recent-changes&amp;format=atom&amp;count=3</id>
<icon>data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAMAAAAoLQ9TAAAB3VBMVEXhERHbKCjeVVXjb2/kR0fhKirdHBziDg6qAADaHh7qLy/pdXXUNzfMAADYPj7ZPDzUNzfbHx/fERHpamrqMTHgExPdHx/bLCzhLS3fVFTjT0/ibm7kRkbiLi7aKirdISHeFBTqNDTpeHjgERHYJCTVODjYQkLaPj6/AADVOTnpbW3cIyPdFRXcJCThMjLiTU3ibW3fVVXaKyvcERH4ODj+8fH/////fHz+Fxf4KSn0UFD/CAj/AAD/Xl7/wMD/EhL//v70xMT/+Pj/iYn/HBz/g4P/IyP/Kyv/7Oz0QUH/9PT/+vr/ior/Dg7/vr7/aGj/QED/bGz/AQH/ERH/Jib/R0f/goL/0dH/qan/YWH/7e3/Cwv4R0f/MTH/enr/vLz/u7v/cHD/oKD/n5//aWn+9/f/k5P/0tL/trb/QUH/cXH/dHT/wsL/DQ3/p6f/DAz/1dX/XV3/kpL/i4v/Vlb/2Nj/9/f/pKT+7Oz/V1f/iIj/jIz/r6//Zmb/lZX/j4//T0//Dw/4MzP/GBj/+fn/o6P/TEz/xMT/b2//Tk7/OTn/HR3/hIT/ODj/Y2P/CQn/ZGT/6Oj0UlL/Gxv//f3/Bwf/YmL/6+v0w8P/Cgr/tbX0QkL+9fX4Pz/qNzd0dFHLAAAAAXRSTlMAQObYZgAAAAFiS0dEAIgFHUgAAAAJcEhZcwAACxMAAAsTAQCanBgAAAAHdElNRQfeCxINNSdmw510AAAA5ElEQVQYGQXBzSuDAQCA8eexKXOwmSZepa1JiPJxsJOrCwcnuchBjg4O/gr7D9zk4uAgJzvuMgcTpYxaUZvSm5mUj7TX7ycAqvoLIJBwStVbP0Hom1Z/ejoxrbaR1Jz6nWinbKWttGRgMSSjanPktRY6mB9WtRNTn7Ilh7LxnNpKq2/x5LnBitfz+hx0qxUaxhZ6vwqq9bx6f2XXvuUl9SVQS38NR7cvln3v15tZ9bQpuWDtZN3Lgh5DWJex3Y+z1KrVhw21+CiM74WZo83DiXq0dVBDYNJkFEU7WrwDAZhRtQrwDzwKQbT6GboLAAAAAElFTkSuQmCC</icon>
<title>Pepperminty Wiki - Recent Changes</title>
<subtitle>Recent Changes on Pepperminty Wiki</subtitle>
<entry>
<title type="text">Edit to Internal link by admin</title>
<id>http://[::]:35623/?page=Internal%20link</id>
<updated>2019-01-29T19:55:08+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User:&lt;/strong&gt;  admin&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page name:&lt;/strong&gt; Internal link&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Tue, 29 Jan 2019 19:55:08 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 2.11kb&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +2007&lt;/li&gt;
&lt;/ul&gt;</content>
<link rel="alternate" type="text/html" href="http://[::]:35623/?page=Internal%20link"/>
<author>
<name>admin</name>
<uri>http://[::]:35623/?page=Users%2FInternal%20link</uri>
</author>
</entry>
<entry>
<title type="text">Edit to Main Page by admin</title>
<id>http://[::]:35623/?page=Main%20Page</id>
<updated>2019-01-05T20:14:07+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User:&lt;/strong&gt;  admin&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page name:&lt;/strong&gt; Main Page&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Sat, 05 Jan 2019 20:14:07 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 317b&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +68&lt;/li&gt;
&lt;/ul&gt;</content>
<link rel="alternate" type="text/html" href="http://[::]:35623/?page=Main%20Page"/>
<author>
<name>admin</name>
<uri>http://[::]:35623/?page=Users%2FMain%20Page</uri>
</author>
</entry>
<entry>
<title type="text">Edit to Main Page by admin</title>
<id>http://[::]:35623/?page=Main%20Page</id>
<updated>2019-01-05T17:53:08+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User:&lt;/strong&gt;  admin&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page name:&lt;/strong&gt; Main Page&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Sat, 05 Jan 2019 17:53:08 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 249b&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +31&lt;/li&gt;
&lt;/ul&gt;</content>
<link rel="alternate" type="text/html" href="http://[::]:35623/?page=Main%20Page"/>
<author>
<name>admin</name>
<uri>http://[::]:35623/?page=Users%2FMain%20Page</uri>
</author>
</entry>
</feed>

...this is from my local development instance.

Found this interesting? Confused about something? Want to say hi? Comment below!

## Keeping the Internet free and open for years to come - #ForTheWeb

I've recently come across contractfortheweb.org. It sounds obvious, but certain qualities of the web that we take for granted aren't, in fact, universal. Things like a lack of censorship (China & their great firewall; Cuba; Venezuela), consumer privacy (US Mobile Carriers; Google; Google/Android Antitrust), and fair pricing (AT&T; BT) are rather a problem in more places than is noticeable at first glance.

The Contract for the Web is a set of principles - backed by the famous Tim Berners-Lee - for a free, open, and fair Internet. The aim is to build a full contract based on these principles to guide the evolution of the web for year to come.

Personally, I first really started to use the web to learn how to program. By reading (lots) of tutorials freely available on the web, I learnt to build websites, write Javascript, and more!

Since then, I've used the web to share the things I've been learning on this blog, stay up-to-date with the latest technology news, research new ideas, and play games that deliver amazing experiences.

The web (and, by extension, the Internet - the web refers to just HTTP + HTTPS) for me represents freedom of information. Freedom to express (and learn about) new thoughts and ideas without fear of being censored. Freedom to communicate with anyone in the world - regardless of physical distances.

It is for this reason that I've signed the Contract for the Web. It's my hope that this effort will ensure that the Internet becomes more open and neutral going forwards, so that everyone can experience the benefits of the open web for a long time to come.

## Building Javascript (and other things) with Rollup

Hey, another blog post!

Recently I've been somewhat distracted by another project which has been most interesting. I've learnt a bunch of things (including getting started with LeafletJS and Chart.JS), but most notably I've been experimenting with another Javascript build system.

I've used quite a few build systems over the years. For the uninitiated, their primary purpose is to turn lots of separate source files into as few output files as possible. This is important with things that run in the browser, because the browser has to download everything before it can execute it. Fewer files and less data means a speedier application - so everyone wins.

I've been rather unhappy with the build systems I've used so far:

• Gulp.js was really heavy and required a lot of configuration in order to get things moving.
• Browserify was great, but I found the feature set lacking and ended up moving on because it couldn't do what I wanted it to.
• Webpack was ok, but it also has a huge footprint and was super slow too.

One thing led to another, and after trying a few other systems (I can't remember their names, but one begin with P and has a taped-up cardboard box as a logo) I found Rollup (I think Webpack requires Rollup as a dependency under the hood, if I'm not mistaken).

Straight away, I found that Rollup was much faster than Webpack. It also requires minimal configuration (which doesn't need changing every time you create a new source file) - which allows me to focus on the code I'm writing instead of battling with the build system to get it to do what I want.

It also supports plugins - and with a bunch available on npm, there's no need to write your own plugin to perform most common tasks. For my recent project I linked to above, I was able to put a bunch of plugins together with a configuration file - and, with a bit of fiddling around, I've got it to build both my Javascript and my CSS (via a postcss integration plugin) and putt he result in a build output folder that I've added to my .gitignore file.

I ended up using the following plugins:

• rollup-plugin-node-resolve - essential. Allows import statements to be automatically resolved to their respective files.
• rollup-plugin-commonjs - An addition to the above, if I recall correctly
• rollup-plugin-postcss - Provides integration with Postcss. I've found Postcss with be an effective solution to the awkward issue of CSS
• rollup-plugin-terser - Uses Terser to minify modern Javascript. I didn't know this existed until I found the Rollup plugin - it's a fork of and the successor to UglifyJS2, which only supports ES5 syntax.

....and the following Postcss plugins:

• postcss-import - Inlines @import statements into a single CSS source file, allowing you to split your CSS up into multiple files (finally! :P)
• postcss-copy - after trying 5 different alternatives, I found this to be the most effective at automatically copying images etc. to the build output folder - potentially renaming them - and altering the corresponding references in the CSS source code

The definition of plugins is done in a Javascript configuration file in a array - making it clear that your code goes through a pipeline of plugins that transform step-by-step to form the final output - creating and maintaining a source map along the way (very handy for debugging code in the browser. It's basically a file that lets the browser reverse-engineer the final output to present you with the original source code again whilst debugging) - which works flawlessly in my browser (the same can't be said for Webpack - I had lots of issues with the generated source maps there).

I ended up with a config file like this:

(Can't see the file above? Try viewing it directly.)

In short, the pipeline goes something like this:

(Source file; Created in Draw.io)

Actually calling Rollup is done in a Bash-based general build task file. Rollup is installed locally to the project (npm install --save rollup), so I call it like this:

node_modules/.bin/rollup --sourcemap --config rollup.config.js;

I should probably make a blog post about that build task system I wrote and why/how I made it, but that's a topic for another time :P

I'll probably end up tweaking the setup I've described here a bit in the future, but for now it does everything I'd like it to :-)

Found this interesting? Got a different way of doing it? Still confused? Comment below!

## Bridging the gap between XMPP and shell scripts

In a previous post, I set up a semi-automated backup system for my Raspberry Pi using duplicity, sendxmpp, and an external drive. It's been working fabulously for a while now, but unfortunately the other week sendxmpp suddenly stopped working with no obvious explanation. Given the long list of arguments I had to pass it:

sendxmpp --file "${xmpp_config_file}" --resource "${xmpp_resource}" --tls --chatroom "${xmpp_target_chatroom}" ........... ....and the fact that I've had to tweak said arguments on a number of occasions, I thought it was time to switch it out for something better suited to the task at hand. Unfortunately, finding such a tool proved to be a challenge. I even asked on Reddit - but nobody had anything that fit the bill (xmpp-bridge wouldn't compile correctly - and didn't support multi-user chatrooms anyway, and xmpppy was broken too). If you're unsure as to what XMPP is, I'd recommend checkout out either this or this tutorial. They both give a great introduction to what it is, what it does, and how it works - and the rest of this post will make much more sense if you read that first :-) To this end, I finally gave in and wrote my own tool, which I've called xmppbridge. It's a global Node.JS script that uses the simple-xmpp to forward the standard input to a given JID over XMPP - which can optionally be a group chat. In this post, I'm going to look at how I put it together, some of the issues I ran into along the way, and how I solved them. If you're interested in how to install and use it, then the package page on npm will tell you everything you need to know: xmppbridge on npm ### Architectural Overview The script consists of 3 files: • index.sh - Calls the main script with ES6 modules enabled • index.mjs - Parses the command-line arguments and environment variables out, and provides a nice CLI • XmppBridge.mjs - The bit that actually captures input from stdin and sends it via XMPP Let's look at each of these in turn - starting with the command-line interface. ### CLI Parsing The CLI itself is relatively simple - and follows a paradigm I've used extensively in C♯ (although somewhat modified of course to get it to work in Node.JS, and without fancy ANSI colouring etc.). #!/usr/bin/env node "use strict"; import XmppBridge from './XmppBridge.mjs'; const settings = { jid: process.env.XMPP_JID, destination_jid: null, is_destination_groupchat: false, password: process.env.XMPP_PASSWORD }; let extras = []; // The first arg is the script name itself for(let i = 1; i < process.argv.length; i++) { if(!process.argv[i].startsWith("-")) { extras.push(process.argv[i]); continue; } switch(process.argv[i]) { case "-h": case "--help": // ........ break; // ........ default: console.error(Error: Unknown argument '${process.argv[i]}'.);
process.exit(2);
break;
}
}

We start with a shebang, telling Linux-based systems to execute the script with Node.JS. Following that, we import the XmppBridge class that's located in XmppBrdige.mjs (we'll come back to this later). Then, we define an object to hold our settings - and pull in the environment variables along with defining some defaults for other parameters.

With that setup, we can then parse the command-line arguments themselves - using the exact same paradigm I've used time and time again in C♯.

Once the command-line arguments are parsed, we validate the final settings to ensure that the user hasn't left any required parameters undefined:

for(let environment_varable of ["XMPP_JID", "XMPP_PASSWORD"]) {
if(typeof process.env[environment_varable] == "undefined") {
console.error(Error: The environment variable ${environment_varable} wasn't found.); process.exit(1); } } if(typeof settings.destination_jid != "string") { console.error("Error: No destination jid specified."); process.exit(5); } That's basically all that index.mjs does. All that's really left is passing the parameters to an instance of XmppBridge: const bridge = new XmppBridge( settings.destination_jid, settings.is_destination_groupchat ); bridge.start(settings.jid, settings.password); ### Shebang Trouble Because I've used ES6 modules here, currently Node must be informed of this via the --experimental-modules CLI argument like this: node --experimental-modules ./index.mjs If we're going to make this a global command-line tool via the bin directive in package.json, then we're going to have to ensure that this flag gets passed to Node and not our program. While we could alter the shebang, that comes with the awkward problem that not all systems (in fact relatively few) support using both env and passing arguments. For example, this: #!/usr/bin/env node --experimental-modules Wouldn't work, because env doesn't recognise that --experimental-modules is actually a command-line argument and not part of the binary name that it should search for. I did see some Linux systems support env -S to enable this functionality, but it's hardly portable and doesn't even appear to work all the time anyway - so we'll have to look for another solution. Another way we could do it is by dropping the env entirely. We could do this: #!/usr/local/bin/node --experimental-modules ...which would work fine on my system, but probably not on anyone else's if they haven't installed Node to the same place. Sadly, we'll have to throw this option out the window too. We've still got some tricks up our sleeve though - namely writing a bash wrapper script that will call node telling it to execute index.mjs with the correct arguments. After a little bit of fiddling, I came up with this: #!/usr/bin/env bash install_dir="$(dirname "$(readlink -f$0)")";
exec node --experimental-modules "${install_dir}/index.mjs"$@

2 things are at play here. Firstly, we have to deduce where the currently executing script actually lies - as npm uses a symbolic link to allow a global command-line tool to be 'found'. Said symbolic link gets put in /usr/local/bin/ (which is, by default, in the user's PATH), and links to where the script is actually installed to.

To figure out the directory that we've been installed to is (and hence the location of index.mjs), we need to dereference the symbolic link and strip the index.sh filename away. This can be done with a combination of readlink -f (dereferences the symbolic link), dirname (get the parent directory of a given file path), and $0 (holds the path to the currently executing script in most circumstances) - which, in the case of the above, gets put into the install_dir variable. The other issue is passing all the existing command-line arguments to index.mjs unchanged. We do this with a combination of $@ (which refers to all the arguments passed to this script except the script name itself) and exec (which replaces the currently executing process with a new one - in this case it replaces the bash shell with node).

This approach let's us customise the CLI arguments, while still providing global access to our script. Here's an extract from xmppbridge's package.json showing how I specify that I want index.sh to be a global script:

{
.....

"bin": {
"xmppbridge": "./index.sh"
},

.....
}

### Bridging the Gap

Now that we've got Node calling our script correctly and the arguments parsed out, we can actually bridge the gap. This is as simple as some glue code between simple-xmpp and readline. simple-xmpp is an npm package that makes programmatic XMPP interaction fairly trivial (though I did have to look at examples in the GitHub repository to figure out how to send a message to a multi-user chatroom).

readline is a Node built-in that allows us to read the standard input line-by-line. It does other things too (and is great for interactive scripts amongst other things), but that's a tale for another time.

The first task is to create a new class for this to live in:

"use strict";

import readline from 'readline';

import xmpp from 'simple-xmpp';

class XmppBridge {

/**
* Creates a new XmppBridge instance.
* @param   {string}    in_login_jid        The JID to login with.
* @param   {string}    in_destination_jid  The JID to send stdin to.
* @param   {Boolean}   in_is_groupchat     Whether the destination JID is a group chat or not.
*/
constructor(in_destination_jid, in_is_groupchat) {
// ....
}
}

export default XmppBridge;

Very cool! That was easy. Next, we need to store those arguments and connect to the XMPP server in the constructor:

this.destination_jid = in_destination_jid;
this.is_destination_groupchat = in_is_groupchat;

this.client = xmpp;
this.client.on("online", this.on_connect.bind(this));
this.client.on("error", this.on_error.bind(this));
this.client.on("chat", ((_from, _message) => {
// noop
}).bind(this));

I ended up having to define a chat event handler - even though it's pointless, as I ran into a nasty crash if I didn't do so (I suspect that this use-case wasn't considered by the original package developer).

The next area of interest is that online event handler. Note that I've bound the method to the current this context - this is important, as it would be able to access the class instance's properties otherwise. Let's take a look at the code for that handler:

console.log([XmppBridge] Connected as ${data.jid}.); if(this.is_destination_groupchat) { this.client.join(${this.destination_jid}/bot_{data.jid.user}); } this.stdin = readline.createInterface({ input: process.stdin, output: process.stdout, terminal: false }); this.stdin.on("line", this.on_line_handler.bind(this)); this.stdin.on("close", this.on_stdin_close_handler.bind(this)); This is the point at which we open the standard input and start listening for things to send. We don't do it earlier, as we don't want to end up in a situation where we try sending something before we're connected! If we're supposed to be sending to a multi-user chatroom, this is also the point at which it joins said room. This is required as you can't send a message to a room that you haven't joined. The resource (the bit after the forward slash /), for a group chat, specifies the nickname that you want to give to yourself when joining. Here, I automatically set this to the user part of the JID that we used to login prefixed with bot_. The connection itself is established in the start method: start(jid, password) { this.client.connect({ jid, password }); } And every time we receive a line of input, we execute the send() method: on_line_handler(line_text) { this.send(line_text); } I used a full method here, as initially I had some issues and wanted to debug which methods were being called. That send method looks like this: send(message) { this.client.send( this.destination_jid, message, this.is_destination_groupchat ); } The last event handler worth mentioning is the close event handler on the readline interface: on_stdin_close_handler() { this.client.disconnect(); } This just disconnects from the XMXPP server so that Node can exit cleanly. That basically completes the script. In total, the entire XmppBridge.mjs class file is 72 lines. Not bad going! You can install this tool for yourself with sudo npm install -g xmppbridge. I've documented how it use it in the README, so I'd recommend heading over there if you're interested in trying it out. Found this interesting? Got a cool use for XMPP? Comment below! ### Sources and Further Reading ## Compilers, VMs, and JIT: Spot the difference It's about time for another demystification post, I think :P This time, I'm going to talk about Compilers, Virtual Machines (VMs), and Just-In-Time Compilation (JIT) - and the way that they are both related and yet different. ### Compilers To start with, a compiler is a program that converts another program written in 1 language into another language (usually of a lower level). For example, gcc compiles C++ into native machine code. A compiler usually has both a backend and a frontend. The frontend is responsible for the lexing and parsing of the source programming language. It's the part of the compiler that generates compiler warnings and errors. The frontend outputs an abstract syntax tree (or parse tree) that represents the source input program. The backend then takes this abstract syntax tree, walks it, and generates code in the target output language. Such code generators are generally recursive in nature. Depending on the compiler toolchain, this may or may not be the last step in the build process. gcc, for example, generates code to object files - which are then strung together by a linking program later in the build process. Additional detail on the structure of a compiler (and how to build one yourself!) is beyond the scope of this article, but if you're interested I recommend reading my earlier Compilers 101 post. ### Virtual Machines A virtual machine comes in several flavours. Common to all types is the ability to execute instructions - through software - as if they were being executed on a 'real' hardware implementation of the programming language in question. Examples here include: • KVM - The Linux virtual machine engine. Leverages hardware extensions to support various assembly languages that are implemented in real hardware - everything from Intel x86 Assembly to various ARM instruction sets. Virtual Machine Manager is a good GUI for it. • VirtualBox - Oracle's offering. Somewhat easier to use than the above - and cross-platform too! • .NET / Mono - The .NET runtime is actually a virtual machine in it's own right. It executes Common Intermediate Language in a managed environment. It's also important to note the difference between a virtual machine and an emulator. An emulator is very similar to a virtual machine - except that it doesn't actually implement a language or instruction set itself. Instead, it 'polyfills' hardware (or environment) features that don't exist in the target runtime environment. A great example here is WINE or Parallels, which allow programs written for Microsoft Windows to be run on other platforms without modification. ### Just-In-Time Compilation JIT is sort of a combination of the above. The .NET runtime (officially knows as the Common Language Runtime, or CLR) is an example of this too - in that it compiles the CIL in the source assembly to native code just before execution. Utilising such a mechanism does result in an additional delay during startup, but usually pays for itself in the vast performance improvements that can be made over an interpreter - as a JITting VM can automatically optimise code as it's executing to increase performance in real-time. This is different to an interpreter, which reads a source program line-by-line - parsing and executing it as it goes. If it needs to go back to another part of the program, it will usually have to re-parse the code before it can execute it. ### Conclusion In this post, we've taken a brief look at compilers, VMs, and JIT. We've looked at the differences between them - and how they are different from their siblings (emulators and interpreters). In many ways, the line between the 3 can become somewhat blurred (hello, .NET!) - so learning the characteristics of each is helpful for disentangling different components of a program. If there's anything I've missed - please let me know! If you're still confused on 1 or more points, I'm happy to expand on them if you comment below :-) Found this useful? Spotted a mistake? Comment below! ## Setup your very own VPN in 10 minutes flat Hey! Happy new year :-) I've been looking to setup a personal VPN for a while, and the other week I discovered a rather brilliant project called PiVPN, which greatly simplifies the process of setting one up - and managing it thereafter. It's been working rather well so far, so I thought I'd post about it so you can set one up for yourself too. But first though, we should look at the why. Why a VPN? What does it do? Basically, a VPN let you punch a great big hole in the network that you're connected to and appear as if you're actually on a network elsewhere. The extent to which this is the case varies depending on the purpose, (for example a University or business might setup a VPN that allows members to access internal resources, but doesn't route all traffic through the VPN), but the general principle is the same. It's best explained with a diagram. Imagine you're at a Café: Everyone on the Café's WiFi can see the internet traffic you're sending out. If any of it is unencrypted, then they can additionally see the content of said traffic - e.g. emails you send, web pages you load, etc. Even if it's encrypted, statistical analysis can reveal which websites you're visiting and more. If you don't trust a network that you're connected to, then by utilising a VPN you can create an encrypted tunnel to another location that you do trust: Then, all that the other users of the Café's WiFi will see is an encrypted stream of packets - all heading for the same destination. All they'll know is roughly how much traffic you're sending and receiving, but not to where. This is the primary reason that I'd like my own VPN. I trust the network I've got setup in my own house, so it stands to reason that I'd like to setup a VPN server there, and pretend that my devices when I'm out and about are still at home. In theory, I should be able to access the resources on my home network too when I'm using such a VPN - which is an added bonus. Other reasons do exist for using a VPN, but I won't discuss them here. In terms of VPN server software, I've done a fair amount of research into the different options available. My main criteria are as follows: • Fairly easy to install • Easy to understand what it's doing once installed (transparency) • Easy to manage The 2 main technologies I came across were OpenVPN and IPSec. Each has their own strengths & weaknesses. An IPSec VPN is, apparently, more efficient - especially since it executes on the client in kernel-space instead of user-space. It's a lighter protocol, too - leading to less overhead. It's also much more likely to be detected and blocked when travelling through strict firewalls, making me slightly unsure about it. OpenVPN, on the other hand, executes entirely in user-space on both the client and the server - leading to a slightly greater overhead (especially with the mitigations for the recent Spectre & Meltdown hardware bugs). It does, however, use TLS (though over UDP by default). This characteristic makes it much more likely it'll slip through stricter firewalls. I'm unsure if that's a quality that I'm actually after or not. Ultimately, it's the ease of management that points the way to my final choice. Looking into it, with both choices there's complex certificate management to be done whenever you want to add a new client to the VPN. For example, with StrongSwan (an open-source IPSec VPN program), you've got to generate a number of certificates with a chain of rather long commands - and the users themselves have passwords stored in plain text in a file! While I've got no problem with reading and understanding such commands, I do have a problem with rememberability. If I want to add a new client, how easy is that to do? How long would I have to spend re-reading documentation to figure out how to do it? Sure, I could write a program to manage the configuration files for me, but that would also require maintenance - and probably take much longer than I anticipate to write. I forget where I found it, but it is for this reason that I ultimately decided to choose PiVPN. It's a set of scripts that sets up and manages one's an OpenVPN installation. To this end, it provides a single command - pivpn - that can be used to add, remove, and list clients and their statistics. With a concise help text, it makes it easy to figure out how to perform common tasks utilising existing terminal skills by conforming to established CLI interface norms. If you want to install it yourself, then simply do this: curl -L https://install.pivpn.io | bash Of course, simply downloading and executing a random script from the Internet is never a good idea. Let's read it first: curl -L https://install.pivpn.io | less Once you're happy that it's not going to do anything malign to your system, proceed with the installation by executing the 1st command. It should guide you through a number of screens. Some important points I ran into: • The static IP address it talks about is the IP address of your server on the local network. The installation asks about the public IP address in a later step. If you've already got a static IP setup on your server (and you probably have), then you don't need to worry about this. • It asks you to install and enable unattended-upgrades. You should probably do this, but I ended up skipping this - as I've already got apticron setup and sending me regular emails - as I rather like to babysit the upgrade of packages on the main machines I manage. I might look into unattended-upgrades in the future if I acquire more servers than are comfortable to manage this way. • Make sure you fully update your system before running the installation. I use this command: sudo apt update && sudo apt-get dist-upgrade && sudo apt-get autoclean && sudo apt-get autoremove • Changing the port of the VPN isn't a bad idea, since PiVPN will automatically assemble .ovpn configuration files for you. I didn't end up doing this to start with, but I can always change it in the NAT rule I configured on my router later. • Don't forget to allow OpenVPN through your firewall! For ufw users (like me), then it's something like sudo ufw allow <port_number>/udp. • Don't forget to setup a NAT rule / port forwarding on your router if said server doesn't have a public IP address (if it's IPv4 it probably doesn't). If you're confused on this point, comment below and I'll blog about it. It's..... a complicated topic. If you'd like a more in-depth guide to setting up PiVPN, then I can recommend this guide. It's a little bit dated (PiVPN now uses elliptical-curve cryptography by default), but still serves to illustrate the process pretty well. If you're confused about some of the concepts I've presented here - leave a comment below! I'm happy to explain them in more detail. Who knows - I might end up writing another blog post on the subject.... ## Where in the world does spam come from? Answer: The US, apparently. I was having a discussion with someone recently, and since I have a rather extensive log of comment failures for debugging & analysis purposes (dating back to February 2015!) they suggested that I render a map of where the spam is coming from. It was such a good idea that I ended up doing just that - and somehow also writing this blog post :P First, let's start off by looking at the format of said log file: [ Sun, 22 Feb 2015 07:37:03 +0000] invalid comment | ip: a.b.c.d | name: Nancyyeq | articlepath: posts/015-Rust-First-Impressions.html | mistake: longcomment [ Sun, 22 Feb 2015 14:55:50 +0000] invalid comment | ip: e.f.g.h | name: Simonxlsw | articlepath: posts/015-Rust-First-Impressions.html | mistake: invalidkey [ Sun, 22 Feb 2015 14:59:59 +0000] invalid comment | ip: x.y.z.w | name: Simontuxc | articlepath: posts/015-Rust-First-Impressions.html | mistake: invalidkey Unfortunately, I didn't think about parsing it programmatically when I designed the log file format.... Oops! It's too late to change it now, I suppose :P Anyway, as an output, we want a list of countries in 1 column, and a count of the number of IP addresses in another. First things first - we need to extract those IP addresses. awk is ideal for this. I cooked this up just quickly: BEGIN { FS="|" } { gsub(" ip: ", "",2);
print $2; } This basically tells awk to split lines on the solid bar character (|), extracts the IP address bit (ip: p.q.r.s), and then strips out the ip: bit. With this done, we're ready to lookup all these IP addresses to find out which country they're from. Unfortunately, IP addresses can change hands semi-regularly - even across country borders, so my approach here isn't going to be entirely accurate. I don't anticipate the error generated here to be all that big though, so I think it's ok to just do a simple lookup. If I was worried about it, I could probably investigate cross-referencing the IP addresses with a GeoIP database from the date & time I recorded them. The effort here would be quite considerable - and this is a 'just curious' sort of thing, so I'm not going to do that here. If you have done this, I'd love to hear about it though - post a comment below. Actually doing a GeoIP lookup itself is fairly easy to do, actually. While for the odd IP address here and there I usually use ipinfo.io, when there are lots of lookups to be done (10,479 to be exact! Wow.), it's probably best to utilise a local database. A quick bit of research reveals that Ubuntu Server has a package I can install that should do the job called geoip-bin:  sudo apt install geoip-bin (....) geoiplookup 1.1.1.1 # CloudFlare's 1.1.1.1 DNS service GeoIP Country Edition: AU, Australia  Excellent! We can now lookup IP addresses automagically via the command line. Let's plug that in to the little command chain we got going on here: cat failedcomments.log | awk 'BEGIN { FS="|" } { gsub(" ip: ", "",$2); print $2 }' | xargs -n1 geoiplookup It doesn't look like geoiplookup supports multiple IP addresses at once, which is a shame. In that case, the above will take a while to execute for 10K IP addresses.... :P Next up, we need to remove the annoying label there. That's easy with sed: (...) | sed -E 's/^[A-Za-z: ]+, //g' I had some trouble here getting sed to accept a regular expression. At some point I'll have to read the manual pages more closely and write myself a quick reference guide. Come to think about it, I could use such a thing for awk too - their existing reference guide appears to have been written by a bunch of mathematicians who like using single-letter variable names everywhere. Anyway, now that we've got our IP address list, we need to strip out any errors, and then count them all up. The first point is somewhat awkward, since geoiplookup doesn't send errors to the standard error for some reason, but we can cheese it with grep -v: (...) | grep -iv 'resolve hostname' The -v here tells grep to instead remove any lines that match the specified string, instead of showing us only the matching lines. This appeared to work at first glance - I simply copied a part of the error message I saw and worked with that. If I have issues later, I can always look at writing a more sophisticated regular expression with the -P option. The counting bit can be achieved in bash with a combination of the sort and uniq commands. sort will, umm, sort the input lines, and uniq with de-duplicate multiple consecutive input lines, whilst optionaly counting them. With this in mind, I wound up with the following: (...) | sort | uniq -c | sort -n The first sort call sorts the input to ensure that all identical lines are next to each other, reading for uniq. uniq -c does the de-duplication, but also inserts a count of the number of duplicates for us. Lastly, the final sort call with the -n argument sorts the completed list via a natural sort, which means (in our case) that it handles the numbers as you'd expect it too. I'd recommend you read the Wikipedia article on the subject - it explains it quite well. This should give us an output like this:  1 Antigua and Barbuda 1 Bahrain 1 Bouvet Island 1 Egypt 1 Europe 1 Guatemala 1 Ireland 1 Macedonia 1 Mongolia 1 Saudi Arabia 1 Tuvalu 2 Bolivia 2 Croatia 2 Luxembourg 2 Paraguay 3 Kenya 3 Macau 4 El Salvador 4 Hungary 4 Lebanon 4 Maldives 4 Nepal 4 Nigeria 4 Palestinian Territory 4 Philippines 4 Portugal 4 Puerto Rico 4 Saint Martin 4 Virgin Islands, British 4 Zambia 5 Dominican Republic 5 Georgia 5 Malaysia 5 Switzerland 6 Austria 6 Belgium 6 Peru 6 Slovenia 7 Australia 7 Japan 8 Afghanistan 8 Argentina 8 Chile 9 Finland 9 Norway 10 Bulgaria 11 Singapore 11 South Africa 12 Serbia 13 Denmark 13 Moldova, Republic of 14 Ecuador 14 Romania 15 Cambodia 15 Kazakhstan 15 Lithuania 15 Morocco 17 Latvia 21 Pakistan 21 Venezuela 23 Mexico 23 Turkey 24 Honduras 24 Israel 29 Czech Republic 30 Korea, Republic of 32 Colombia 33 Hong Kong 36 Italy 38 Vietnam 39 Bangladesh 40 Belarus 41 Estonia 44 Thailand 50 Iran, Islamic Republic of 53 Spain 54 GeoIP Country Edition: IP Address not found 60 Poland 88 India 113 Netherlands 113 Taiwan 124 Indonesia 147 Sweden 157 Canada 176 United Kingdom 240 Germany 297 China 298 Brazil 502 France 1631 Russian Federation 2280 Ukraine 3224 United States Very cool. Here's the full command for reference explainshell explanation: cat failedcomments.log | awk 'BEGIN { FS="|" } { gsub(" ip: ", "",$2); print \$2 }' | xargs -n1 geoiplookup | sed -e 's/GeoIP Country Edition: //g' | sed -E 's/^[A-Z]+, //g' | grep -iv 'resolve hostname' | sort | uniq -c | sort -n

With our list in hand, I imported it into LibreOffice Calc to parse it into a table with the fixed-width setting (Google Sheets doesn't appear to support this), and then pulled that into a Google Sheet in order to draw a heat map:

At first, the resulting graph showed just a few countries in red, and the rest in white. To rectify this, I pushed the counts through the natural log (log()) function, which yielded a much better map, where the countries have been spamming just a bit are still shown in a shade of red.

From this graph, we can quite easily conclude that the most 'spammiest' countries are:

1. The US
2. Russia
3. Ukraine (I get lots of spam emails from here too)
4. China (I get lots of SSH intrusion attempts from here)
5. Brazil (Wat?)

Personally, I was rather surprised to see the US int he top spot. I figured that with with tough laws on that sort of thing, spammers wouldn't risk attempting to buy a server and send spam from here.

On further thought though, it occurred to me that it may be because there are simply lots of infected machines in the US that are being abused (without the knowledge of their unwitting users) to send lots of spam.

At any rate, I don't appear to have a spam problem on my blog at the moment - it's just fascinating to investigate where the spam I do block comes from.

Found this interesting? Got an observation of your own? Plotted a graph from your own data? Comment below!

## Happy Christmas 2018!

Happy Christmas! I hope you have a restful and peaceful holiday. Have a picture of a cool snow-globe:

Art by Mythdael