Starbeamrainbowlabs

Stardust
Blog


Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter

Tag Cloud

3d account algorithms announcement archives arduino artificial intelligence assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compiling css dailyprogrammer debugging demystification distributed computing downtime embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday html html5 html5 canvas interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance network networking node.js operating systems performance photos php pixelbot portable privacy programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtualisation visual web website windows windows 10 xmpp xslt

/r/dailyprogrammer hard challenge #322: Static HTTP 1.0 server

Recently I happened to stumble across the dailyprogrammer subreddit's latest challenge. It was for a static HTTP 1.0 server, and while I built something similar for my networking ACW, I thought I'd give this one a go to create an extendable http server that I can use in other projects. If you want to follow along, you can find the challenge here!

My language of choice, as you might have guessed, was C♯ (I know that C♯ has a HttpServer class inbuilt already, but to listen on 0.0.0.0 on Windows it requires administrative privileges).

It ended up going rather well, actually. In a little less than 24 hours after reading the post, I had myself a working solution, and I thought I'd share here how I built it. Let's start with a class diagram:

A class diagram for the gliding squirrel. (Above: A class diagram for the GlidingSquirrel. Is this diagram better than the last one I drew?)

I'm only showing properties on here, as I'll be showing you the methods attached to each class later. It's a pretty simple design, actually - HttpServer deals with all the core HTTP and networking logic, FileHttpServer handles the file system calls (and can be swapped out for your own class), and HttpRequest, HttpResponse, HttpMethod, HttpResponseCode all store the data parsed out from the raw request coming in, and the data we're about to send back out again.

With a general idea as to how it's put together, lets dive into how it actually works. HttpServer would probably be a good place to start:

public abstract class HttpServer
{
    public static readonly string Version = "0.1-alpha";

    public readonly IPAddress BindAddress;
    public readonly int Port;

    public string BindEndpoint { /* ... */ }

    protected TcpListener server;

    private Mime mimeLookup = new Mime();
    public Dictionary<string, string> MimeTypeOverrides = new Dictionary<string, string>() {
        [".html"] = "text/html"
    };

    public HttpServer(IPAddress inBindAddress, int inPort)
    { /* ... */ }
    public HttpServer(int inPort) : this(IPAddress.IPv6Any, inPort)
    {
    }

    public async Task Start() { /* ... */ }

    public string LookupMimeType(string filePath) { /* ... */ }

    protected async void HandleClientThreadRoot(object transferredClient) { /* ... */ }

    public async Task HandleClient(TcpClient client) { /* ... */ }

    protected abstract Task setup();

    public abstract Task HandleRequest(HttpRequest request, HttpResponse response);
}

(Full version)

It's heavily abbreviated because there's actually quite a bit of code to get through here, but you get the general idea. The start method is the main loop that accepts the TcpClients, and calls HandleClientThreadRoot for each client it accepts. I decided to use the inbuilt ThreadPool class to do the threading for me here:

TcpClient nextClient = await server.AcceptTcpClientAsync();
ThreadPool.QueueUserWorkItem(new WaitCallback(HandleClientThreadRoot), nextClient);

C♯ handles all the thread spawning and killing for me internally this way, which is rather nice. Next, HandleClientThreadRoot sets up a net to catch any errors that are thrown by the next stage (as we're now in a new thread, which can make debugging a nightmare otherwise), and then calls the main HandleClient:

try
{
    await HandleClient(client);
}
catch(Exception error)
{
    Console.WriteLine(error);
}
finally
{
    client.Close();
}

No matter what happens, the client's connection will always get closed. HandleClient is where the magic start to happen. It attaches a StreamReader and a StreamWriter to the client:

StreamReader source = new StreamReader(client.GetStream());
StreamWriter destination = new StreamWriter(client.GetStream()) { AutoFlush = true };

...and calls a static method on HttpRequest to read in and decode the request:

HttpRequest request = await HttpRequest.FromStream(source);
request.ClientAddress = client.Client.RemoteEndPoint as IPEndPoint;

More on that later. With the request decoded, HandleClient hands off the request to the abstract method HandleRequest - but not before setting up a secondary safety net first:

try
{
    await HandleRequest(request, response);
}
catch(Exception error)
{
    response.ResponseCode = new HttpResponseCode(503, "Server Error Occurred");
    await response.SetBody(
        $"An error ocurred whilst serving your request to '{request.Url}'. Details:\n\n" +
        $"{error.ToString()}"
    );
}

This secondary safety net means that we can send a meaningful error message back to the requesting client in the case that the abstract request handler throws an exception for some reason. In the future, I'll probably make this customisable - after all, you don't always want to let the client know exactly what crashed inside the server's internals!

The FileHttpServer class that handles the file system logic is quite simple, actually. The magic is in it's implementation of the abstract HandleRequest method that the HttpServer itself exposes:

public override async Task HandleRequest(HttpRequest request, HttpResponse response)
{
    if(request.Url.Contains(".."))
    {
        response.ResponseCode = HttpResponseCode.BadRequest;
        await response.SetBody("Error the requested path contains dangerous characters.");
        return;
    }

    string filePath = getFilePathFromRequestUrl(request.Url);
    if(!File.Exists(filePath))
    {
        response.ResponseCode = HttpResponseCode.NotFound;
        await response.SetBody($"Error: The file path '{request.Url}' could not be found.\n");
        return;
    }

    FileInfo requestFileStat = null;
    try {
        requestFileStat = new FileInfo(filePath);
    }
    catch(UnauthorizedAccessException error) {
        response.ResponseCode = HttpResponseCode.Forbidden;
        await response.SetBody(
            "Unfortunately, the server was unable to access the file requested.\n" + 
            "Details:\n\n" + 
            error.ToString() + 
            "\n"
        );
        return;
    }

    response.Headers.Add("content-type", LookupMimeType(filePath));
    response.Headers.Add("content-length", requestFileStat.Length.ToString());

    if(request.Method == HttpMethod.GET)
    {
        response.Body = new StreamReader(filePath);
    }
}

With all the helper methods and properties on HttpResponse, it's much shorter than it would otherwise be! Let's go through it step by step.

if(request.Url.Contains(".."))

This first step is a quick check for anything obvious that could be used against the server to break out of the web root. There are probably other dangerous things you can do(or try to do, anyway!) to a web server to attempt to trick it into returning arbitrary files, but I can't think of any of the top of my head that aren't covered further down. If you can, let me know in the comments!

string filePath = getFilePathFromRequestUrl(request.Url);

Next, we translate the raw path received in the request into a path to a file on disk. Let's take a look inside that method:

protected string getFilePathFromRequestUrl(string requestUrl)
{
    return $"{WebRoot}{requestUrl}";
}

It's rather simplistic, I know. I can't help but feel that there's something I missed here.... Let me know if you can think of anything. (If you're interested about the dollar syntax there - it's called an interpolated string, and is new in C♯ 6! Fancy name, I know. Check it out!)

if(!File.Exists(filePath))
{
    response.ResponseCode = HttpResponseCode.NotFound;
    await response.SetBody($"Error: The file path '{request.Url}' could not be found.\n");
    return;
}

Another obvious check. Can't have the server crashing every time it runs into a 404! A somewhat interesting note here: File.Exists only checks to see if there's a file that exists under the specified path. To check for the existence of a directory, you have to use Directory.Exists - which would make directory listing rather easy to implement. I might actually try that later - with an option to turn it off, of course.

FileInfo requestFileStat = null;
try {
    requestFileStat = new FileInfo(filePath);
}
catch(UnauthorizedAccessException error) {
    response.ResponseCode = HttpResponseCode.Forbidden;
    await response.SetBody(
        "Unfortunately, the server was unable to access the file requested.\n" + 
        "Details:\n\n" + 
        error.ToString() + 
        "\n"
    );
    return;
}

Ok, on to something that might be a bit more unfamiliar. The FileInfo class can be used to get, unsurprisingly, information about a file. You can get all sorts of statistics about a file or directory with it, such as the last modified time, whether it's read-only from the perspective of the current user, etc. We're only interested in the size of the file though for the next few lines:

response.Headers.Add("content-type", LookupMimeType(filePath));
response.Headers.Add("content-length", requestFileStat.Length.ToString());

These headers are important, as you might expect. Browsers to tend to like to know the type of content they are receiving - and especially it's size.

if(request.Method == HttpMethod.GET)
{
    response.Body = new StreamReader(filePath);
}

Lastly, we send the file's contents back to the user in the response - but only if it's a GET request. This rather neatly takes care of HEAD requests - but might cause issues elsewhere. I'll probably end up changing it if it does become an issue.

Anyway, now that we've covered everything right up to sending the response back to the client, let's end our tour with a look at the request parsing system. It's a bit backwards, but it does seem to work in an odd sort of way! It all starts in HttpRequest.FromStream.

public static async Task<HttpRequest> FromStream(StreamReader source)
{
    HttpRequest request = new HttpRequest();

    // Parse the first line
    string firstLine = await source.ReadLineAsync();
    var firstLineData = ParseFirstLine(firstLine);

    request.HttpVersion = firstLineData.httpVersion;
    request.Method = firstLineData.requestMethod;
    request.Url = firstLineData.requestPath;

    // Extract the headers
    List<string> rawHeaders = new List<string>();
    string nextLine;
    while((nextLine = source.ReadLine()).Length > 0)
        rawHeaders.Add(nextLine);

    request.Headers = ParseHeaders(rawHeaders);

    // Store the source stream as the request body now that we've extracts the headers
    request.Body = source;

    return request;
}

It looks deceptively simple at first glance. To start with, I read in the first line, extract everything useful from it, and attach them to a new request object. Then, I read in all the headers I can find, parse those too, and attach them to the request object we're building.

Finally, I attach the StreamReader to the request itself, as it's now pointing at the body of the request from the user. I haven't actually tested this, as I don't actually use it anywhere just yet, but it's a nice reminder just in case I do end up needing it :-)

Now, let's take a look at the cream on the cake - the method that parses the first line of the incoming request. I'm quite pleased with this actually, as it's my first time using a brand new feature of C♯:

public static (float httpVersion, HttpMethod requestMethod, string requestPath) ParseFirstLine(string firstLine)
{
    List<string> lineParts = new List<string>(firstLine.Split(' '));

    float httpVersion = float.Parse(lineParts.Last().Split('/')[1]);
    HttpMethod httpMethod = MethodFromString(lineParts.First());

    lineParts.RemoveAt(0); lineParts.RemoveAt(lineParts.Count - 1);
    string requestUrl = lineParts.Aggregate((string one, string two) => $"{one} {two}");

    return (
        httpVersion,
        httpMethod,
        requestUrl
    );
}

Monodevelop, my C♯ IDE, appears to go absolutely nuts over this with red squiggly lines everywhere, but it still compiles just fine :D

As I was writing this, a thought popped into my head that a tuple would be perfect here. After reading somewhere a month or two ago about a new tuple syntax that's coming to C♯ I thought I'd get awesomely distracted and take a look before continuing, and what I found was really cool. In C♯ 7 (the latest and quite possibly greatest version of C♯ to come yet!), there's a new feature called value tuples, which let's you dynamically declare tuples like I have above. They're already fully supported by the C♯ compiler, so you can use them today! Just try to ignore your editor if it gets as confused as mine did... :P

If you're interested in learning more about them, I'll leave a few links at the bottom of this post. Anyway, back to the GlidingSquirrel! Other than the new value tuples in the above, there's not much going on, actually. A few linq calls take care of the heavy lifting quite nicely.

And finally, here's my header parsing method.

public static Dictionary<string, string> ParseHeaders(List<string> rawHeaders)
{
    Dictionary<string, string> result = new Dictionary<string, string>();

    foreach(string header in rawHeaders)
    {
        string[] parts = header.Split(':');
        KeyValuePair<string, string> nextHeader = new KeyValuePair<string, string>(
            parts[0].Trim().ToLower(),
            parts[1].Trim()
        );
        if(result.ContainsKey(nextHeader.Key))
            result[nextHeader.Key] = $"{result[nextHeader.Key]},{nextHeader.Value}";
        else
            result[nextHeader.Key] = nextHeader.Value;
    }

    return result;
}

While I have attempted to build in support for multiple definitions of the same header according to the spec, I haven't actually encountered a time when it's actually been needed. Again, this is one of those things I've built in now for later - as I do intend on updating this and adding more features later - and perhaps even work it into another secret project I might post about soon.

Lastly, I'll leave you with a link to the repository I'm storing the code for the GlidingSquirrel, and a few links for your enjoyment:

GlidingSquirrel

Sources and Further Reading

How to set up a WebDav share with Nginx

I've just been setting up a WebDav share on a raspberry pi 3 for my local network (long story), and since it was a bit of a pain to set up (and I had to combine a bunch of different tutorials out there to make mine work), I thought I'd share how I did it here.

I'll assume you have a raspberry pi all set up and up-to-date in headless mode, with a ufw for your firewall (if you need help with this, post in the comments below or check out the Raspberry Pi Stack Exchange). To start with, we need to install the nginx-full package:

sudo apt update
sudo apt install  nginx-full

Note that we need the nginx-full package here, because the nginx-extras or just simply nginx packages don't include the required additional webdav support modules. Next, we need to configure Nginx. Nginx's configuration files live at /etc/nginx/nginx.conf, and in /etc/nginx/conf.d. I did something like this for my nginx.conf:

user www-data;
worker_processes 4;
pid /run/nginx.pid;

events {
    worker_connections 768;
    # multi_accept on;
}

http {

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    # server_tokens off;

    # server_names_hash_bucket_size 64;
    # server_name_in_redirect off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # SSL Settings
    ##

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
    ssl_prefer_server_ciphers on;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;

    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Not many changes here. Then, I created a file called 0-webdav.conf in the conf.d directory, and this is what I put in it:

server {
    listen 80;
    listen [::]:80;

    server_name plans.helenshydrogen.be;

    auth_basic              realm_name;
    auth_basic_user_file    /etc/nginx/.passwords.list;

    dav_methods     PUT DELETE MKCOL COPY MOVE;
    dav_ext_methods PROPFIND OPTIONS;
    dav_access      user:rw group:rw all:r

    client_body_temp_path   /tmp/nginx/client-bodies;
    client_max_body_size    0;
    create_full_put_path    on;

    root /mnt/hydroplans;
}

Now this is where the magic happens. The dav_access directive tells nginx to allow everyone to read, but only logged in users to write to the share. This isn't actually particularly relevant, because of the auth_basic and auth_basic_user_file directives, which tell nginx to require people to login to the share before they are allowed to access it.

It's also important to note that I found that Windows (10, at least), didn't like the basic authentication - even though Ubuntu's Nautilus accepted it just fine - so I had to comment that bit out :-(

If you do still want authentication (hey! May you'll have better luck than I :P), then you'll need to set up the passwords file. Here's how you create it:


echo -n 'helen:' | sudo tee /etc/nginx/.passwords.list
openssl passwd -apr1 | sudo tee -a /etc/nginx/.passwords.list 
Password:

The above creates a user called helen, and asks you to type a password. If you're adding another user to the file, simply change the first tee to be tee -a to avoid overwriting the first one.

With that all configured, it's time to test the configuration file, and, if we're lucky, restart nginx!


sudo nginx -t
sudo systemctl restart nginx

That's all you should need to do to set up a simple WebDav share. Remember that this is a starting point, and not an ending point - there are a few big holes in the above that you'll need to address, depending on your use case (for example, I haven't included the setup of https / encryption - try letsencrypt for that).

Here are the connection details for the above for a few different clients:

  • Ubuntu / Nautilus: (Go to "Other Locations" and paste this into the "Connect to Server" box) dav://plans.helenshydrogen.be/
  • Windows: (Go to "Map Network Drive" and paste this in) http://plans.helenshydrogen.be/

Did this work for you? Have any problems? Got instructions for a WebDav client not listed here? Let me know in the comments!

The HTTPS version of my website is insecure? Nonsense!

A chrome privacy warning.

I'm still rather ill, but I wanted to post about an issue I've just had with my website. Upon visiting my website in the latest version of chrome beta (57 as of the time of typing), I discovered that chrome had decided that the connection was 'insecure'. It didn't tell me precisely what the problem was (even in the developer tools :-) - why would I possibly need to know that? - only that it considered it insecure.

After googling around a bit, I didn't find any specific articles on the subject - their recent move to start considering regular http connections insecure is swamping all the relevant articles in the search results I suspect.

The big clue came when I discovered that one of my subdomains that uses a letsencrypt works as expected. You see, the main website actually used a StartSSL certificate. My running theory is that even though my certificate was an SHA2 cerrtificate, chrome decided that it was not trustworthy as there was an SHA1 certificate in the trust chain somewhere.

The fix: Replace all my existing StartSSL certificates with Let's Encrypt ones. It seems to have fixed the issue for now. I also discovered that Let's Encrypt certificates can also be used in mail servers (i.e. SMTP and IMAP) too - so I don't have to go and fiddle about with finding an alternative certificate provider.

In future, it would certainly be helpful if Google actually told people precisely what they were going to do before they do it....!

Was this useful? Could it be improved? Would you like a Let's encrypt tutorial? Let me know in the comments below!

Fancy message of the day over SSH

Since my time to sit down for a good chunk of time and write some code has been extremely limited as of late, I've been playing around with a few smaller projects. One of those is a fancy message of the day when you log into a remote machine (in my case the server this website is hosted on!), and I thought I'd share it here.

My take on a fancy SSH message of the day.

The default message shown at the top when you login via ssh is actually generated by something called update-motd, and is generated from a set of scripts in /etc/update-motd.d. By customising these scripts, we can do almost anything we like!

To start off with, I disabled the execution of all the scripts in the directory (sudo chmod -x /etc/update-motd.d/*), and created a subfolder to store the script in that actually generated the system information (sudo mkdir /etc/update-motd.d/parts). Here's the script I wrote to generate the system information:

#!/usr/bin/env bash

. /etc/lsb-release

LOAD=$(cat /proc/loadavg | cut -d' ' -f 2);

CPU_COUNT=$(cat /proc/cpuinfo | grep -i "core id" | uniq | wc -l);
THREAD_COUNT=$(cat /proc/cpuinfo | grep -i "core id" | wc -l);

APT_UPDATE_DETAILS="$(/usr/lib/update-notifier/apt-check --human-readable | fold -w 40 -s)"

IPV4_ADDRESS=$(dig +short myip.opendns.com A @resolver1.opendns.com)
IPV6_ADDRESS=$(dig +short myip.opendns.com AAAA @2620:0:ccc::2);

LAST_LOGIN=$(last -1 | head -n 1 | awk '{ print $1,"at",$4,$5,$6,$7,"from",$3 }');

REBOOT_REQUIRED=$(/usr/lib/update-notifier/update-motd-reboot-required);

echo 
echo Welcome to $(hostname)
echo "  running ${DISTRIB_DESCRIPTION}"
echo 
echo Kernel: $(uname -r)
echo Uptime: $(uptime --pretty | sed -e 's/up //')
echo Load: ${LOAD}
echo 
echo IPs: ${IPV4_ADDRESS}, ${IPV6_ADDRESS}
echo 
echo "${APT_UPDATE_DETAILS}"
echo 
echo "${REBOOT_REQUIRED}"
#echo 
#echo Last login: ${LAST_LOGIN}

exit 0

Basically, I collect a bunch of information from random places on my system (several of which were taken from the existing scripts in /etc/update-motd.d/) and re-output them in a different format.

Then, I converted an image of my favicon logo with the brilliant catimg by posva to a set of unicode characters and sent that to a file (catimg -w 35 image.png >/etc/update-motd.d/sbrl-logo.txt) - you could alternatively use some ascii art from the internet (e.g. this site). Once done, I put the two together with the following script directly in my /etc/update-motd.d/ folder:

#!/usr/bin/env bash

### Settings ###
TMP_FILENAME=/run/sysinfo.txt

#/etc/update-motd.d/parts/sysinfo

################

/etc/update-motd.d/parts/sysinfo >$TMP_FILENAME

### Output ###
echo 
pr -mtJ /etc/update-motd.d/sbrl-logo.txt $TMP_FILENAME

##############

### Cleanup ###
rm $TMP_FILENAME

###############

Finally, I manually cleared and regenerated the message of of the day with sudo update-motd, giving the result you see at the top of this blog post. I also made sure to re-enable the execution of the other scripts I didn't use in my fancy motd so as to not miss out on their notifications.

If you're interested, I've generated an archive of my final /etc/update-motd.d folder (minus my logo in text format), which you can find here: 20170203-Fancy-Motd.7z.

Can you do better? Got a cool enhancement of your own? Post about it below!

I now have a public website status page!

My new status page! Just recently Uptime Robot (the awesome service that I use to monitor my server's uptime) have released a new feature: Public status pages! Status pages appear to be free (for now), so I've gone and set one up. Now all of you can see what's up with my website if it's down.

They even allow you to point a (sub)domain at it too. I did this too, so you can visit my status page at status.starbeamrainbowlabs.com.

Server migration!

The Kimsufi logo

I've been watching Kimsufi's server page for what feels like absolutely ages now, waiting to get my hands on an ultra-cheap €4.99 per month (excluding VAT of course) KS-1 dedicated server. Unfortunately I've never been quite fast enough, so yesterday I decided that enough was enough and went ahead and bought a KS-2B at €9.99 per month (again excluding VAT). After all is said and done it works out to about £8.39 per month, which, for 2 cores / 4 threads, 4GB RAM, and a 40GB SSD, is an absolute bargain in my eyes.

I've been busy moving things across and it's going well, but I haven't finished yet - I still have the web server and the mail server to set up. I'm also looking at using the Hiawatha webserver instead of Nginx. Hiawatha is a security-first and easy to configure web server. Apparently it's also lightweight, but we'll see about that...! Nginx's config files have been annoying me for a while now, so I think that it's high time I tried something different.

Set up your own Git server with Go Git Service

Go Git Service

Recently I've been finding myself with several private codebases (University ACWs and such) that I've wanted to work on in several places at different times, and that I've also wanted to backup in case of emergency. Git, along with the cloud, were naturally my first choice. At the time, GitHub only offered 5 free private repositories to students, so I started looking around at few different self hosted solutions.

I found software like GitLab and GitList, but the one I found that best suited my needs was Go Git Service. GitLab in particular looked really cool, but it has rather steep minimum requirements that I can't meet.

An example diff view from Go Git Service.

Go Git Service has low minimum requirements, supports multiple users, and allows unlimited private repositories. It even has a forking system that's based on GitHub. If that wasn't enough, the icing on the cake is that it's so ridiculously easy to set up. In fact it's so simple I managed to set a fully working git server up (with all the extras) in just half an hour.

If anyone would like a full tutorial on how to set up Go Git Service, I'll gladly write one up and post it here. Let me know in the comments!

Public Service Announcement: Web Server Switch

Hello again - Today's post is a public service announcement instead of the usual ES6 post. Hopefully that will be coming out on Thursday.

This website is now powered by a new piece of web server software: Nginx (pronounced engine-x). Ever since I started this website, I have been using lighttpd. While lighttpd has been my favourite web server software for ages (mainly because of the flexible configuration file syntax, the light footprint, and the speed), it seems that development of lighttpd's core codebase has been moving too slowly for me. Lighttpd, while fast and light, has been missing several features that I would rather like to have - and no release date for the next has been announced yet either.

Nginx, on the other hand, is much more feature-complete. It will support HTTP/2 by the end of 2015, and has a slew of other features to play around with. While, it's configuration files are kind of a pain (it only matches against one location block per request), I feel that Nginx is a better solution for this website in the long run. If development resumes on lighttpd, perhaps I will move back to it - but only if I am sure that development will actually continue.

So the switch has been made! Please notify me if you notice any issues and/or problems with the new setup and I will fix them as soon as I can.

Upgrading to Ubuntu 15.04 Vivid Vervet

Hello!

Yesterday you probably noticed some downtime. This is because I was upgradting this server's operating system from Ubuntu 14.10 to Ubuntu 15.04! Since I noticed a few things that you should watch out for when upgrading, I thought that I would make a post about it.

For the most part the upgrade went smoothly, but I did hit a few snags. Firstly, if you have got any custom ppas set up for apt-get, you will want to make a list of them (they are located in /etc/apt/sources.list.d) because the upgrade will annoyingly disable them all :( It's not too much trouble to fix but it is annoying.

Secondly, there are two new mime types that have been added /etc/mime.types. If you have made any customisations to this file (I have added text/x-markdown), then you will want to make a note of them and re-add them afterwards. Don't forget to restart your http servers after changing it!

There are some changes that require the ssh daemon to be stopped, so make sure you don't do the upgrade over ssh.

You will get asked which interfaces DCHPv6 should listen on / send requests to. If you use your linux box as a router and have it handing out IP addresses, then you will need to take note of which interfaces you have on your box and which one is which.

By far the biggest problem for me though was the switch from upstart to systemd. This server is hosted by OVH under one of their VPS hosting plans (which are great by the way!), which means that it is virtualised using OpenVZ. It also means that I can't choose my kernel :( I suspect that this is the reason that I can't use systemd, though if anyone has any other ideas, I would love to hears them - leave them in the comments below. When it has finished the upgrade, it couldn't reboot, instead telling me that it couldn't find an alternative telinit implementation to spawn. The solution to this is simple though (don't forget to run as root):

apt-get install upstart
apt-get remove systemd
apt-get install upstart-sysv

The last package in the above (upstart-sysv) should be install automatically, but you should make sure that it is installed - it is the package that prevents it from automatically trying to switch you back to systemd at the next available opportunity.

I hope this post is useful! If you do find it helpful, please leave a comment. If people seem to like it I might start posting full upgrade guides.

Fighting Spam on your blog

Since I have written my own blog script from scratch, I have learnt a lot about how spambots spam my site in order to implement measures to stop them. This post is a compilation of all the methods that I have discovered so far.

Currently I have yet to rate the effectiveness of each of these measures since at the time of writing this post I have only just finished rewiring the commenting script so that I can 'measure' the effectiveness of each of the methods described below.

Method 1: Honeypots

If you don't take either an email address or a web address on your blog, try adding a email or website field and hiding it via CSS. The more complex, indirect, and obscure the CSS you hide it with, the better. Just make sure that is actually hidden.

This blog uses a hidden website field along with a warning for users who see it due to poor browser support.

Method 2: No super long comments

This isn't really a proper method, but I found that spam comment on my blog were generally really long. So I am imposing a 2000 character limit on comments. If people have more to say, then they can reply to their own comment, and use service like pastebin or hastebin for code.

Method 3: Keys

This is the really important one. I was finding that while the above 2 methods were stopping some of the spam, I was getting some smart spambots with chrome/firefox-like user agent strings that I can only summarise knew how to tell whether a from control was hidden or not by reading the CSS or my website.

The hidden key field is basically a timestamp of when page was served to the user by the server. In it's simplest form, it can just be the output of PHP's time() function.

In this blog, however, the timestamp is run through a number of different functions, such as base64_encode() and strrev(). Pick a few string manipulation functions that are reversible.

This timestamp can then be analysed by the server. If the timestamp is too far in the past (say 24 hours old), or under 10 seconds old, then the comment is rejected. Spambots will either fetch and cache your page for longer than 24 hours, or they will fetch your page and post a comment immediately. As soon as I set this blog to reject comments posted within 10 seconds of loading the page, I haven't had a single spam comment :)

Summary

So there you go: 2 1/2 methods to banish spam on your blog - for now. The real secret here to log as much information about your commenters as possible (in my case I have been capturing the contents of $_POST, $_GET, and $_SERVER) and working your way through it comparing the requests of legitimate commenters and spammers. The above are simply exploits of the differences I found (with some help from Google). If you can think of any more tricks, please post a comment below!

Art by Mythdael