Archive

## Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber jam javascript js bin labs learning library linux low level lua maintenance manjaro network networking nibriboard node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference release releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## Placeholder Image Generator

(Above: An example output with debugging turned on from my placeholder generation service)

For a quite a considerable amount of time now, I've been running my own placeholder image generation service here at starbeamrainbowlabs.com - complete with etag generation and custom colour selection. Although it's somewhat of an Easter Egg, it's not actually that hard to find if you know what you're looking for (hint: I use it on my home page, but you may need to view source to find it).

I decided to post about it now because I've just finished fixing the angle GET parameter - and it was interesting (and hard) enough to warrant a post to remind myself how I did it for future reference. Comment below if you knew about it before this blog post came out!

The script itself is split into 3 loose parts:

• The initial settings / argument parsing
• The polyfills and utility functions
• The image generation.

My aim here was to keep the script contained within a single file - so that it fits well in a gist (hence why it currently looks a bit messy!). It's probably best if show the code in full:

(Having trouble viewing code above? Visit it directly here: placeholder.php)

If you append ?help to the url, it will send back a plain text file detailing the different supported parameters and what they do:

(Can't see the above? View it directly here)

Aside from implementing the random value for fg_colour and bg_colour, the angle has been a huge pain. I use GD - a graphics drawing library that's bundled with practically every PHP installation ever - to draw the placeholder image, and when you ask it to draw some rotated text, it decides that it's going to pivot it around the bottom-left corner instead of the centre.

Naturally, this creates some complications. Though some people on the PHP manual page said method (imagettftext) have attempetd to correct for this (exhibits a and b), neither of their solutions work for me. I don't know if it's because their code is just really old (13 and 5 years respectively as of the time of typing) or what.

Anyway, I finally decided recently that enough was enough - and set about fixing it myself. Basing my code on the latter of the 2 pre-existing solutions, I tried fixing it - but ended up deleting most of it and starting again. It did give me some idea as to how to solve the problem though - all I needed to do was find the centre of where the text would be drawn when it is both not rotated and rotated and correct for it (these are represented by the green and blue crosses respectively on the image at the top of this post).

PHP does provide a method for calculating the bounding box of some prospective text that you're thinking of drawing via imagettfbbox. Finding the centre of the box though sounded like a horrible maths-y problem that would take me ages to work through on a whiteboard first, but thankfully (after some searching around) Wikipedia had a really neat solution for finding the central point of any set of points.

It calls it the centroid, and claims that the geometric centre of a set of points is simply the average of all the points involved. It just so happens that the geometric centre is precisely what I'm after:

$$C=\frac{a+b+c+d+....}{N}$$

...Where $C$ is the geometric centre of the shape as an $\binom{x}{y}$ vector, $a$, $b$, $c$, $d$, etc. are the input points as vectors, and $N$ is the number of points in question. This was nice and easy to program:

$bbox = imagettfbbox($size, 0, $fontfile,$text);
$orig_centre = [ ($bbox[0] + $bbox[2] +$bbox[4] + $bbox[6]) / 4, ($bbox[1] + $bbox[3] +$bbox[5] + $bbox[7]) / 4 ];$text_width = $bbox[2] -$bbox[0];
$text_height =$bbox[3] - $bbox[5];$bbox = imagettfbbox($size,$angle, $fontfile,$text);
$rot_centre = [ ($bbox[0] + $bbox[2] +$bbox[4] + $bbox[6]) / 4, ($bbox[1] + $bbox[3] +$bbox[5] + $bbox[7]) / 4 ]; The odd and even indexes of $bbox there are referring to the $x$ and $y$ co-ordinates of the 4 corners of the bounding boxes - imagettfbbox outputs the co-ordinates in 1 single array for some reason. I also calculate the original width and height of the text - in order to perform an additional corrective translation later.

With these in hand, I could calculate the actual position I need to draw it (indicated by the yellow cross in the image at the top of this post):

$$delta = C_{rotated} - C_{original}$$

.

$$pivot = \frac{pos_x - ( delta_x + \frac{text_width}{2})}{pos_y - delta_y + \frac{text_height}{2}}$$

In short, I calculate the distance between the 2 centre points calculated above, and then find the pivot point by taking this distance plus half the size of the text from the original central position we wanted to draw the text at. Here's that in PHP:

// Calculate the x and y shifting needed
$dx =$rot_centre[0] - $orig_centre[0];$dy = $rot_centre[1] -$orig_centre[1];

// New pivot point
$px =$x - ($dx +$text_width/2);
$py =$y - $dy +$text_height/2;

I generated a video of it in action (totally not because I thought it would look cool :P) with a combination of curl, and ffmpeg:

This was really quite easy to generate. In a new folder, I executed the following:

echo {0..360} | xargs -P3 -d' ' -n1 -I{} curl 'https://starbeamrainbowlabs.com/images/placeholder/?width=1920&height=1920&fg_colour=inverse&bg_colour=white&angle={}' -o {}.jpeg
ffmpeg -r 60 -i %d.jpeg -q:v 3 /tmp/text-rotation.webm

The first command downloads all the required frames (3 at a time), and the second stitches them together. The -q:v 3 bit of the ffmpeg command is of note - by default webm videos apparently have a really low quality - this corrects that. Lower is better, apparently - it goes up to about 40 I seem to remember reading somewhere. 3 to 5 is supposed to be the right range to get it to look ok without using too much disk space.

That's about everything I wanted to talk about to remind myself about what I did. Let me know if there's anything you're confused about in the comments below, and I'll explain it in more detail.

(Found this interesting? Comment below!)

## Just another day: More spam, more defences

When post is released I'll be in an exam, but I wanted to post again about the perfectly fascinating spam situation here on my blog. I've blogged about fending off spam on here before (exhibits a, b, c), but I find the problem is detecting it in a transparent manner that you as the reader don't notice very interesting, so I think I'll write another post on the subject. I could use a service like Google's ReCAPTCHA, but that would be boring :P

Recently I've had a trio of spam comments make it all the way through my (rather extensive) checks and onto my blog here. I removed them, of course, but it still baffled me as to why they made it through.

It didn't take long to find out. When I was first implementing comments on here, I added a logger specifically for purposes such as this that saves everything about current environmental state to a log file for later inspection - for both comments that make it through, and those that don't. It's not available publically available, of course (but if you'd like to take a look, just ask and I'll consider it). Upon isolating the entries for the spam comments, I discovered a few interesting things.

• The comment keys were aged 21, 21, and 17 seconds respectively (the lower limit I have set is 10 seconds)
• All 3 comments claimed that they were Firefox 57
• 2 out of 3 comments used HTTP 1.0 (even though they claimed to be Firefox 57, and despite my server offering HTTP/1.1 and HTTP/2.0)
• All 3 comments utilised HTTPS
• The IP Addresses that the comments came form were in Ukraine, Russia, and Canada (hey?) respectively
• All 3 appear to be phishing scams, with a link leading to a likely malicious website
• The 2 using HTTP/1.0 also asked my server to close the connection after sending a response
• All 3 asked not to be tracked via the DNT HTTP header
• The last comment had some really weird capitalisation. After consulting someone experienced on the subject, I learnt that the writer likely natively spoke an eastern language, such as Chinese

This was most interesting. From this, I can conclude:

• The last comment was likely submitted by a Chinese operator - even though the source IP address is located in Ukraine
• All three are spoofing their user agent string.
• Firefox 57 uses HTTP/2.0 by default if you're really in a browser, and the spam comments utilised HTTP/1.0 and HTTP/1.1
• Curiously, all of this took place over HTTPS. I'd be really curious to log which cipher was used for the connection here.
• In light of this, if I knew more about HTTP client libraries, I could probably identify what software was really used to submit the spam comments (and possibly even what operating system it was running on). If you know, please comment below!

To combat this development, I thought of a few options. Firstly, raising the minimum comment age, whilst effective, may disrupt the user experience, which I don't want to do. Plus, the bot owners could just increase the delay even more. To that end, I decided not to do this.

Secondly, with the amount of data I've collected, I could probably write an AI that takes the environment in and spits out a 'spaminess' score, much like SpamAssassin and rspamd do for email. Perhaps a multi-weighted system would work, with a series of tests that add or take away from the final score? I might investigate upgrading my spam detection system to do this in the future, as it would not only block spam more effectively, but provide a more distilled overview of the characteristics of each comment submission than I have currently.

Lastly, I could block HTTP/1.0 requests. While not perfect (1 out of 3 requests used HTTP/1.1), it would still catch some more bots out without disrupting user experience - as normal browsers (include text-based ones IIRC) use HTTP/1.1 or above. HTTP/1.1 has been around since 1991 (27 years!), so if you're not using it by now - upgrade! For now, this is the best option I can see.

From today, if you try to submit a comment and get a HTTP 505 HTTP Version Not Supported error and see a message saying something like this:

You sent your request via HTTP/1.0, but this is not supported for submitting comments due to high volume of spam. Please retry with HTTP/1.1 or higher.

## How to prevent php-fpm from overriding your PHP-based caching logic

A while ago I implemented ETag support to the dynamic preview generator in Pepperminty Wiki. While I thought it worked during testing, for some reason on a private instance of Pepperminty Wiki I discovered recently that my browser didn't seen to be taking advantage of it. This got me curious, so I decided to do a little bit of digging to find out why.

It didn't take long to find the problem. For some reason, all the responses from the server had a Cache-Control: no-cache, no-store, must-revalidate header attached to them. How strange! Even more so that it was in capital letters - my convention in Pepperminty Wiki is to always make the headers lowercase.

I checked the codebase with via the Project Find feature of Atom just to make sure that I hadn't left in a debugging statement or anything (I hadn't), and then I turned my attention to Nginx (engine-x) - the web server that I use on my server. Maybe it had added a caching header?

A quick grep later revealed that it wasn't responsible either - which leaves just one part of the system unchecked - php-fpm, the PHP FastCGI server that sits just behind Nginx that's responsible for running the various PHP scripts I have that power this website and other corners of my server. Another quick grep returned a whole bunch of garbage, so after doing some research I discovered that php-fpm, by default, is configured to send this header - and that it has to be disabled by editing your php.ini (for me it's in /etc/php/7.1/fpm/php.ini), and changing

;session.cache_limiter = nocache

to be explicitly set to an empty string, like so:

session.cache_limiter = ''

This appears to have solved by problem for now - allowing me to regain control over how the resources I send back via PHP are cached. Hopefully this saves someone else the hassle of pulling their entire web server stack apart and putting it back together again the future :P

## Deterring spammers with a comment key system

I recently found myself reimplementing the comment key system I use on this blog (I posted about it here) for a another purpose. Being more experienced now, my new implemention (which I should really put into use on this blog actually) is stand-alone in a separate file - so I'm blogging about it here both to help out anyone who reads this other than myself - and for myself as I know I'll forget otherwise :P

The basic algorithm hasn't changed much since I first invented it: take the current timestamp, apply a bunch or arbitrary transformations to it, put it as a hidden field in the comment form, and then reverse the transformations on the comment key the user submits as part of the form to discover how long they had the page loaded for. Bots will have it loaded for either less than 10-ish seconds, or more than 24 hours. Humans will be somewhere in the middle - at least according to my observations!

Of course, any determined spammer can easily bypass this system if they spend even a little bit of time analysing the system - but I'm banking on the fact that my blog is too low-value for a spammer to bother reverse-engineering my system to figure out how it works.

This time, I chose to use simple XOR encryption, followed by reversing the string, followed by base64 encoding. It should be noted that XOR encryption is certainly not secure - but in this case it doesn't really matter. If my website becomes a high-enough value target for that to matter, I'll investigate proper AES encryption - which will probably be a separate post in and of itself, as a quick look revealed that it's rather involved - and will probably require quite a bit of research and experimentation working correctly.

Let's take a look at the key generation function first:

function key_generate($pass) {$new_key = strval(time());
// Repeat the key so that it's long enough to XOR the key with
$pass_enc = str_repeat($pass, (strlen($new_key) / strlen($pass)) + 1);
$new_key =$new_key ^ $pass_enc; return base64_encode(strrev($new_key));
}

As I explained above, this first XORs the timestamp against a provided 'passcode' of sorts, and then it reverses it, base64 encodes it, and then returns it. I discovered that I needed to repeat the passcode to make sure it's at least as long as the timestamp - because otherwise it cuts the timestamp short! Longer passwords are always desirable for certain, but I wanted to make sure I addressed it here - just in case I need to lift this algorithm from here for a future project.

Next up is the decoding algorithm, that reverses the transformations we apply above:

    function key_decode($key,$pass) {
$key_dec = strrev(base64_decode($key));
// Repeat the key so that it's long enough to XOR the key with
$pass_dec = str_repeat($pass, (strlen($key_dec) / strlen($pass)) + 1);
return intval($key_dec ^$pass_dec);
}

Very similar. Again, the XOR passphrase has to be repeated to make it long enough to apply to the whole encoded key without inadvertently chopping some off the end. Additionally, we also convert the timestamp back into an integer - since it is the number of seconds since the last UNIX epoch (1st January 1970 as of the time of typing).

With the ability to create and decode keys, let's write a helper method to make the verification process a bit easier:

    function key_verify($key,$pass, $min_age,$max_age) {
$age = time() - key_decode($key, $pass); return$age >= $min_age &&$age <= $max_age; } It's fairly self-explanatory, really. It takes an encoded key, decodes it, and verifies that it's age lies between the specified bounds. Here's the code in full (it updates every time I update the code in the GitHub Gist): (Above: The full comment key code. Can't see it? Check it out on GitHub Gist here.) ## Website Integrations (Mini!) Series List Since I ended up doing a mini-series on the various website integrations I implemented, I thought that you might find a (mini-)series list useful. Here it is: That's just about all I've got to mention here. Do you have any suggestions and / or requests on what I should blog about? Let me know below in the comments! ## Website Integrations #3: Twitter cards (The posts featured in the above images are this one about my new Raspberry Pi 3, and the latest coding conundrums evolved post). You have arrived in the third of three parts in my mini-series on how I implemented rich snippets. In the last two parts I tackled open graph and becoming an oEmbed provider. In this part, I'll be talking a bit about twitter cards, and how I implemented them. Twitter's take on the problem seems to be much simpler than Facebook's, which makes for easy implementing :D Like in the other two protocols too, they decided to have multiple different types of, well, in this case, cards. I decided to implement the summary card type. Like open graph, it adds a bunch of <meta /> tags to the header. Sigh. Anyway, here are the property names I needed to implement: • twitter:card - The type of card. In my case this is set to summary • twitter:site - This one's confusing. Although it's called 'site', it should actually be set to your own twitter handle - mine is @SBRLabs. • twitter:title - The title of the content. Practically identical to open graph's og:title. • twitter:description - The description of the content. The same as og:description. • twitter:image - A url pointing to an image that should be displayed next to the title and description. Unlike Facebook's open graph, twitter appears to support https urls here with no problem at all. Since after implementing open graph I already had 90% of the infrastructure and calculations in place already, throwing together values for the above wasn't too difficult. Here's an example set of twitter card <meta /> tags generated by the updated code: <meta property="twitter:card" content="summary" /> <meta property="twitter:site" content="@SBRLabs" /> <meta property="twitter:title" content="Running Prolog on Linux" /> <meta property="twitter:description" content="Hello! I hope you had a nice restful Easter. I've been a bit busy this last 6 months, but I've got a holiday at the moment, and I've just received a .... (click to read more)" /> <meta property="twitter:image" content="https://starbeamrainbowlabs.com/blog/images/20151015-learning-swi-prolog-banner.svg" /> Easy peasy. Next up was testing time. Thankfully, Twitter made this easy too by providing an official testing tool. Interestingly, they whitelist domains based on whether the webmaster has run a url through their tool - so if you want twitter cards to show up, make sure you plug at least one of your website's page urls through their tool. After a few tweaks, I got this: With that, my work was complete. This brings us to the end of my mini-series on rich-snippet integrations (unless I've missed a protocol O.o Comment below if I have)! I hope you've found it useful. If you have (or even if you haven't!) please let me know in the comments below :D ## Website Integrations #2: oEmbed Welcome to part 2 of this impromptu miniseries! In this second part of three, I'll be showing you a little about how I set up and tested a simple oEmbed provider for my blog posts - I've seen lots of oEmbed client information out there, but not much in the way of provider (or server) implementations. If you haven't read part one about the open graph protocol yet, then you might find it interesting. oEmbed is a bit different to open graph in that instead of throwing a bunch of meta tags into your <head />, you instead use a special <link /> element that points interested parties in the direction of some nice tasty json. Personally, I find this approach to be more sensible and easier to handle - the kind of thing you'd expect from an open standard. To start with, I took a read of their specification, as I did with open graph. It doesn't have as many examples as I'd have liked, and I had to keep jumping around, but it's certainly not the worst I've seen. oEmbed is built on the idea of providers (that's me!) and consumers (the programs and website you use). Providers, erm, provide machine-readable information about urls passed to them, and consumers take this information provided to them and display it to the user in a manner they think is appropriate. To start with, I created a new PHP file to act as my provider over at https://starbeamrainbowlabs.com/blog/oembed.php and took a look at the different oEmbed types available - oEmbed has a type system of sorts, similar to open graph. I decided on link - while a rich would look cool, it would be almost impossible to test with every client out there, and I can't guarantee how the html would be rendered or what space it would have either. With that decided, I made a list of the properties that I'd need to include in the json response: • version - The version of oEmbed. Currently 1.0 as of the time of typing. • type - The oEmbed type. I chose link here. • title - The title of the page • author_name - The name of the author • author_url - A link to the author's homepage. • provider_name - The provider's name. • provider_url - A link to the provider's homepage. I chose my blog index, since this script will only serve my blog. • cache_age - How long consumers should cache the response for. I put 1 hour (3600 seconds) here, since I usually correct mistakes after posting that I've missed, and I want them to go out fairly quickly. • thumbnail_url - A link to a suitable thumbnail picture. • thumbnail_width - The width of the thumbnail image, in pixels. • thumbnail_height - The width of the thumbnail image, in pixels. Then I looked at the data I'd be getting from the client. It all comes in the form of GET parameters: • format - Either json or xml. Personally, I only support json. • url - The url to send oEmbed information for. With all the information close at hand, I spent a happy hour or so writing code, and ended up with a script that outputs something like this: { "version": "1.0", "type": "link", "title": "Website Integrations #1: Open Graph", "author_name": "Starbeamrainbowlabs", "author_url": "https:\/\/starbeamrainbowlabs.com\/", "provider_name": "Stardust | Starbeamrainbowlabs' Blog", "provider_url": "https:\/\/starbeamrainbowlabs.com\/blog\/", "cache_age": 3600, "thumbnail_url": "https:\/\/starbeamrainbowlabs.com\/images\/logos\/open-graph.png", "thumbnail_width": 300, "thumbnail_height": 300 } Though the specification includes requirements for satisfying 2 extra GET parameters, maxwidth and maxheight, I chose to ignore them since writing a dynamic thumbnail rescaling script is both rather complicated and requires a not insignificant amount of processing power every time it is used. After finishing the oEmbed script, I turned my attention to one final detail: The special <link /> tag required for auto-discovery. A quick bit of PHP in the article page renderer adds something like this to the header: <link rel="alternate" type="application/json+oembed" href="https://starbeamrainbowlabs.com/blog/oembed.php?format=json&url=https%3A%2F%2Fstarbeamrainbowlabs.com%2Fblog%2Farticle.php%3Farticle%3Dposts%252F229-Website-Integrations-1-Open-Graph.html" /> and with that, my oEmbed provider implementation is complete - but it still needs testing! Unfortunately, testing tool for oEmbed are few and far between, but I did manage to find a few: • oEmbed Tester - A basic testing tool. Appears to work well for the most part - except the preview. Not sure why it says "Preview not available." all the time. • Iframely URL Debugger - Actually a testing tool for some commercial tool or other, but it still appears to accurately test not only oEmbed, but open graph and twitter cards (more on them in the next post!) too! After testing and fixing a few bugs, my oEmbed provider was complete! Next time, I'll be taking a look at twitter's take on the subject: Twitter cards. Found this interesting? Comment below! Share it with a friend! ## Website Integrations #1: Open Graph These days, if you share a link to a website or a blog post with a friend or on a social networking site, sometimes the link expands to a preview of the link you've just posted. Personally, I find this behaviour to be quite helpful, as it lets me get an idea as to what it is that I'm about to click on. Unfortunately, when it comes to the code behind these previews, there are no less than 3(!) different protocols that you need to implement in order to get it to work, since facebook, twitter, and the rest of the web community haven't been talking to each other quite like they should have been. Anyway, after implementing these 3 protocols and having a bit of trouble with them, I thought I'd write up a mini-series on the process I went through, the problems I encountered, and how I solved them. In this post, I'm going to explain Facebook's Open Graph protocol. I decided that I'd implement these 3 protocols on my home page and each blog post's page. Open Graph was the easiest - all it requires is a bunch of meta tags. These tags are split into 2 parts - the common tags, which all page types should have, and the type-specific tags, which depend on the type of page you're implementing them on. Here's the list of common tags I implemented: • og:title - The title of your page • og:description - A short description of your page • og:image, og:image:url, and og:image:secure_url - The url of an image that would fit as a preview for the page • og:url - The url of the page (not sure why this is required, since you have to know the url in order to require the page... :P Perhaps it's to help with deduplication - I'm not sure) These were fairly easy for my home page: <meta property="og:title" content="Starbeamrainbowlabs" /> <meta property="og:description" content="Hi! I am a computer science student who is in their second year at Hull University. I started out teaching myself about various web technologies, and then I managed to get a place at University, where I am now." /> <meta property="og:image" content="http://starbeamrainbowlabs.com/favicon.png" /> <meta property="og:image:url" content="http://starbeamrainbowlabs.com/favicon.png" /> <meta property="og:image:secure_url" content="https://starbeamrainbowlabs.com/favicon.png" /> <meta property="og:url" content="https://starbeamrainbowlabs.com/" /> When I went to test it using Facebook's official testing tool, the biggest problem I had was that the image wouldn't show up - no matter what I did. I eventually found this stackoverflow answer which explained that Facebook doesn't support https urls in anything other than the og:image:secure_url meta tag (even though they say they do) - so changing the urls to regular http solved the problem. Next, I took a look at the type-specific tags. There's a whole bunch of them (check out this section of the spec) - I decided on the profile type for the index page of my website here: <meta property="og:type" content="profile" /> The profile type has a few extra specific meta tags that need setting too, so I added those: <meta property="profile:first_name" content="Starbeamrainbowlabs" /> <meta property="profile:last_name" content="Tjovik" /> <meta property="profile:username" content="Starbeamrainbowlabs" /> With that done, I turned my attention to my blog posts. Since the page is rendered in PHP (and typing out all those meta tags was a rather annoying), I created a teensy little framework to output the meta tags for me $metaTags = [];
$metaTags["property"] = "value";$renderedMetaTags = "";
foreach($metaTags as$metaKey => $metaValue)$renderedMetaTags .= "\t<meta property=\"$metaKey\" content=\"$metaValue\" />";

Now I can add as many meta tags as I like, with a fraction of the typing - and it looks neater too :D With that done, I implemented the basic meta tags. Here's some example output from the last post:

<meta property="og:title" content="4287 Reasons why your comments weren't posted" />
<meta property="og:description" content="I don't get a lot of real comments on here from what I can tell, as you've probably noticed. I don't particularly mind (though it's always awesome whe.... (click to read more)" />
<meta property="og:image" content="http://starbeamrainbowlabs.com/blog/images/20170406-Spammer-Mistakes.png" />
<meta property="og:image:url" content="http://starbeamrainbowlabs.com/blog/images/20170406-Spammer-Mistakes.png" />
<meta property="og:image:secure_url" content="https://starbeamrainbowlabs.com/blog/images/20170406-Spammer-Mistakes.png" />
<meta property="og:url" content="https://starbeamrainbowlabs.com/blog/article.php?article=posts%2F228-4287-Reasons-Your-Comments-Were-Not-Posted.html" />

That wasn't too tough. Next, I looked at the list of types again, and chose the article type for my blog posts.

<meta property="og:type" content="article" />

Like the profile type earlier, the article type also comes with a few type-specific meta tags (what they mean by not fitting into a 'vertical' I have no idea). I decided not to implement all the type-specific meta tags available here, since not all of them were practical to implement. Here's some more example output for the new tags:

<meta property="article:author" content="https://starbeamrainbowlabs.com/" />
<meta property="article:published_time" content="2017-04-08T12:56:46+01:00" />

Unfortunately, the article published time is really awkward to get hold of actually (even though it's outputted at the bottom of every article) , so I went with the 'last modified' time instead. The published time is marked up with html microdata Hopefully it doesn't cause too many issues later - though I can always change it :P

With that (and a final test), it looked like my Open Graph implementation was working as intended. Next time, I'll show you how I implemented a simple oEmbed provider.

I don't get a lot of real comments on here from what I can tell, as you've probably noticed. I don't particularly mind (though it's always awesome when I do get one!) - but what I do mind about is the spam. Since February 2015, I've gotten 4287 spam comments. 4287! It's actually quite silly, when you think about it.

The other day I was fiddling with the code behind this blog (posts about that coming soon!), and I discovered that I implemented a log ages ago that records each and every spammer, and the mistake they made - and I thought I'd share some statistics here, and some tips for dealing with spam yourself (I've posted about tactics before here.

Here's an extract from my logs (full logs available on request):

[ Sun, 02 Apr 2017 23:17:25 +0100] invalid comment | ip: 94.181.153.194 | name: ghkkll | articlepath: posts/120-Cpu-Registers.html | mistake: shortcomment
[ Mon, 03 Apr 2017 02:16:58 +0100] invalid comment | ip: 191.96.242.17 | name: exercise pants | articlepath: posts/010-Gif-Renderer.html | mistake: invalidkey
[ Mon, 03 Apr 2017 02:16:58 +0100] invalid comment | ip: 191.96.242.17 | name: exercise pants | articlepath: posts/010-Gif-Renderer.html | mistake: invalidkey
[ Mon, 03 Apr 2017 02:16:59 +0100] invalid comment | ip: 191.96.242.17 | name: exercise pants | articlepath: posts/010-Gif-Renderer.html | mistake: invalidkey
[ Mon, 03 Apr 2017 02:16:59 +0100] invalid comment | ip: 191.96.242.17 | name: exercise pants | articlepath: posts/010-Gif-Renderer.html | mistake: invalidkey

Since the output format I chose is nice and regular, I could use a quick bit of bash magic to whip up some statistics:

cat failedcomments.log | sed -e 's/^.*mistake\: //' | grep -iv '\[' | sort | uniq -c | sort -nr

(explanation, courtesy of explainshell.com )

That gave me this output:

Count Reason
3922 invalidkey
148 website
67 nokey
67 noarticleid
56 noname
17 shortcomment
3 longcomment
2 invalidemail
1 shortname

My first thoughts here were firstly along the lines of "wow, that's a lot of spam", secondly "that comment key is working really well!", and thirdly "I didn't realise how helpful that fake website field is". Still, though I had a table, I thought a visualisation might help to put things into perspective.

There - much better :D As you might have suspected if you've been following my blog here for a while, having an invalid comment key is the most common mistake spammers make.

The comment key is a hidden field in the comment form that is actually a transformed timestamp of the time you loaded the page. Working it backwards, I can work out how long it took you to submit a comment from first loading the page.

Using a companion log file to the one that I generated the above pie chart from, I've calculated that 3558 potential comments were submitted within 10 seconds of loading the page! No ordinary humans are that fast.... (especially considering you probably want to read the article before commenting!) they have to be bots. Here's a graph to illustrate the dropoff (the time is in seconds):

Out of the other reasons that people failed, "website" was the second most common mistake, with ~3.45% of spammers getting caught out on it. This mistake refers to another of my little spam traps - defense in depth is always good! This particular one is a regular website address hidden field, which is hidden via some fancy CSS. Curious, I decided to investigate further - and what I found was fascinating.

About 497 spammers entered an invalid website address (i.e. one that doesn't start with http) into the website box - which I really can't understand, since it's got a (hidden) label and an appropriate name and type to match - 90 of which decided that "seo plugin" was a brilliant thing to fill it with! It's important to note here that spammers who got caught by the invalid comment key filter above are included in these statistics - here's the bash command I used here:

grep -i '"website"' rawsubmits.jsonlog  | sed -e 's/^.*"website": "//' -e 's/",//' -e 's/\\\///' | uniq | egrep -iv '^http' | wc -l

Other examples include "watch live sports free", "Samantha", "just click the following web site", long strings of html-encoded unicode characters (japanese I think, after decoding one), and more. Perfectly baffling, if you ask me (if you can shed some light on this one, please comment below!).

57 spambots forgot their own name. This could be because the box you put your name in below has a name of 'name', but an id of 'namebox' - which may have caused some confusion for some of the more stupid bots.

After all that, there were 3 long comments (probably a bunch of word salad), 2 invalid email addresses that weren't caught by any filters above, and 1 short name (under 3 characters).

That's about it for this impromptu analysis of my comments log! This took far longer than I thought it would to type up. Did you find it interesting? Thinking of putting some of these techniques into practice yourself? Comment below!

## I now have a public website status page!

Just recently Uptime Robot (the awesome service that I use to monitor my server's uptime) have released a new feature: Public status pages! Status pages appear to be free (for now), so I've gone and set one up. Now all of you can see what's up with my website if it's down.

They even allow you to point a (sub)domain at it too. I did this too, so you can visit my status page at status.starbeamrainbowlabs.com.

Art by Mythdael