/r/dailyprogrammer hard challenge #322: Static HTTP 1.0 server

Recently I happened to stumble across the dailyprogrammer subreddit's latest challenge. It was for a static HTTP 1.0 server, and while I built something similar for my networking ACW, I thought I'd give this one a go to create an extendable http server that I can use in other projects. If you want to follow along, you can find the challenge here!

My language of choice, as you might have guessed, was C♯ (I know that C♯ has a HttpServer class inbuilt already, but to listen on 0.0.0.0 on Windows it requires administrative privileges).

It ended up going rather well, actually. In a little less than 24 hours after reading the post, I had myself a working solution, and I thought I'd share here how I built it. Let's start with a class diagram:

(Above: A class diagram for the GlidingSquirrel. Is this diagram better than the last one I drew?)

I'm only showing properties on here, as I'll be showing you the methods attached to each class later. It's a pretty simple design, actually - HttpServer deals with all the core HTTP and networking logic, FileHttpServer handles the file system calls (and can be swapped out for your own class), and HttpRequest, HttpResponse, HttpMethod, HttpResponseCode all store the data parsed out from the raw request coming in, and the data we're about to send back out again.

With a general idea as to how it's put together, lets dive into how it actually works. HttpServer would probably be a good place to start:

public abstract class HttpServer
{
    public static readonly string Version = "0.1-alpha";

    public readonly IPAddress BindAddress;
    public readonly int Port;

    public string BindEndpoint { /* ... */ }

    protected TcpListener server;

    private Mime mimeLookup = new Mime();
    public Dictionary<string, string> MimeTypeOverrides = new Dictionary<string, string>() {
        [".html"] = "text/html"
    };

    public HttpServer(IPAddress inBindAddress, int inPort)
    { /* ... */ }
    public HttpServer(int inPort) : this(IPAddress.IPv6Any, inPort)
    {
    }

    public async Task Start() { /* ... */ }

    public string LookupMimeType(string filePath) { /* ... */ }

    protected async void HandleClientThreadRoot(object transferredClient) { /* ... */ }

    public async Task HandleClient(TcpClient client) { /* ... */ }

    protected abstract Task setup();

    public abstract Task HandleRequest(HttpRequest request, HttpResponse response);
}

(Full version)

It's heavily abbreviated because there's actually quite a bit of code to get through here, but you get the general idea. The start method is the main loop that accepts the TcpClients, and calls HandleClientThreadRoot for each client it accepts. I decided to use the inbuilt ThreadPool class to do the threading for me here:

TcpClient nextClient = await server.AcceptTcpClientAsync();
ThreadPool.QueueUserWorkItem(new WaitCallback(HandleClientThreadRoot), nextClient);

C♯ handles all the thread spawning and killing for me internally this way, which is rather nice. Next, HandleClientThreadRoot sets up a net to catch any errors that are thrown by the next stage (as we're now in a new thread, which can make debugging a nightmare otherwise), and then calls the main HandleClient:

try
{
    await HandleClient(client);
}
catch(Exception error)
{
    Console.WriteLine(error);
}
finally
{
    client.Close();
}

No matter what happens, the client's connection will always get closed. HandleClient is where the magic start to happen. It attaches a StreamReader and a StreamWriter to the client:

StreamReader source = new StreamReader(client.GetStream());
StreamWriter destination = new StreamWriter(client.GetStream()) { AutoFlush = true };

...and calls a static method on HttpRequest to read in and decode the request:

HttpRequest request = await HttpRequest.FromStream(source);
request.ClientAddress = client.Client.RemoteEndPoint as IPEndPoint;

More on that later. With the request decoded, HandleClient hands off the request to the abstract method HandleRequest - but not before setting up a secondary safety net first:

try
{
    await HandleRequest(request, response);
}
catch(Exception error)
{
    response.ResponseCode = new HttpResponseCode(503, "Server Error Occurred");
    await response.SetBody(
        $"An error ocurred whilst serving your request to '{request.Url}'. Details:\n\n" +
        $"{error.ToString()}"
    );
}

This secondary safety net means that we can send a meaningful error message back to the requesting client in the case that the abstract request handler throws an exception for some reason. In the future, I'll probably make this customisable - after all, you don't always want to let the client know exactly what crashed inside the server's internals!

The FileHttpServer class that handles the file system logic is quite simple, actually. The magic is in it's implementation of the abstract HandleRequest method that the HttpServer itself exposes:

public override async Task HandleRequest(HttpRequest request, HttpResponse response)
{
    if(request.Url.Contains(".."))
    {
        response.ResponseCode = HttpResponseCode.BadRequest;
        await response.SetBody("Error the requested path contains dangerous characters.");
        return;
    }

    string filePath = getFilePathFromRequestUrl(request.Url);
    if(!File.Exists(filePath))
    {
        response.ResponseCode = HttpResponseCode.NotFound;
        await response.SetBody($"Error: The file path '{request.Url}' could not be found.\n");
        return;
    }

    FileInfo requestFileStat = null;
    try {
        requestFileStat = new FileInfo(filePath);
    }
    catch(UnauthorizedAccessException error) {
        response.ResponseCode = HttpResponseCode.Forbidden;
        await response.SetBody(
            "Unfortunately, the server was unable to access the file requested.\n" + 
            "Details:\n\n" + 
            error.ToString() + 
            "\n"
        );
        return;
    }

    response.Headers.Add("content-type", LookupMimeType(filePath));
    response.Headers.Add("content-length", requestFileStat.Length.ToString());

    if(request.Method == HttpMethod.GET)
    {
        response.Body = new StreamReader(filePath);
    }
}

With all the helper methods and properties on HttpResponse, it's much shorter than it would otherwise be! Let's go through it step by step.

if(request.Url.Contains(".."))

This first step is a quick check for anything obvious that could be used against the server to break out of the web root. There are probably other dangerous things you can do(or try to do, anyway!) to a web server to attempt to trick it into returning arbitrary files, but I can't think of any of the top of my head that aren't covered further down. If you can, let me know in the comments!

string filePath = getFilePathFromRequestUrl(request.Url);

Next, we translate the raw path received in the request into a path to a file on disk. Let's take a look inside that method:

protected string getFilePathFromRequestUrl(string requestUrl)
{
    return $"{WebRoot}{requestUrl}";
}

It's rather simplistic, I know. I can't help but feel that there's something I missed here.... Let me know if you can think of anything. (If you're interested about the dollar syntax there - it's called an interpolated string, and is new in C♯ 6! Fancy name, I know. Check it out!)

if(!File.Exists(filePath))
{
    response.ResponseCode = HttpResponseCode.NotFound;
    await response.SetBody($"Error: The file path '{request.Url}' could not be found.\n");
    return;
}

Another obvious check. Can't have the server crashing every time it runs into a 404! A somewhat interesting note here: File.Exists only checks to see if there's a file that exists under the specified path. To check for the existence of a directory, you have to use Directory.Exists - which would make directory listing rather easy to implement. I might actually try that later - with an option to turn it off, of course.

FileInfo requestFileStat = null;
try {
    requestFileStat = new FileInfo(filePath);
}
catch(UnauthorizedAccessException error) {
    response.ResponseCode = HttpResponseCode.Forbidden;
    await response.SetBody(
        "Unfortunately, the server was unable to access the file requested.\n" + 
        "Details:\n\n" + 
        error.ToString() + 
        "\n"
    );
    return;
}

Ok, on to something that might be a bit more unfamiliar. The FileInfo class can be used to get, unsurprisingly, information about a file. You can get all sorts of statistics about a file or directory with it, such as the last modified time, whether it's read-only from the perspective of the current user, etc. We're only interested in the size of the file though for the next few lines:

response.Headers.Add("content-type", LookupMimeType(filePath));
response.Headers.Add("content-length", requestFileStat.Length.ToString());

These headers are important, as you might expect. Browsers to tend to like to know the type of content they are receiving - and especially it's size.

if(request.Method == HttpMethod.GET)
{
    response.Body = new StreamReader(filePath);
}

Lastly, we send the file's contents back to the user in the response - but only if it's a GET request. This rather neatly takes care of HEAD requests - but might cause issues elsewhere. I'll probably end up changing it if it does become an issue.

Anyway, now that we've covered everything right up to sending the response back to the client, let's end our tour with a look at the request parsing system. It's a bit backwards, but it does seem to work in an odd sort of way! It all starts in HttpRequest.FromStream.

public static async Task<HttpRequest> FromStream(StreamReader source)
{
    HttpRequest request = new HttpRequest();

    // Parse the first line
    string firstLine = await source.ReadLineAsync();
    var firstLineData = ParseFirstLine(firstLine);

    request.HttpVersion = firstLineData.httpVersion;
    request.Method = firstLineData.requestMethod;
    request.Url = firstLineData.requestPath;

    // Extract the headers
    List<string> rawHeaders = new List<string>();
    string nextLine;
    while((nextLine = source.ReadLine()).Length > 0)
        rawHeaders.Add(nextLine);

    request.Headers = ParseHeaders(rawHeaders);

    // Store the source stream as the request body now that we've extracts the headers
    request.Body = source;

    return request;
}

It looks deceptively simple at first glance. To start with, I read in the first line, extract everything useful from it, and attach them to a new request object. Then, I read in all the headers I can find, parse those too, and attach them to the request object we're building.

Finally, I attach the StreamReader to the request itself, as it's now pointing at the body of the request from the user. I haven't actually tested this, as I don't actually use it anywhere just yet, but it's a nice reminder just in case I do end up needing it :-)

Now, let's take a look at the cream on the cake - the method that parses the first line of the incoming request. I'm quite pleased with this actually, as it's my first time using a brand new feature of C♯:

public static (float httpVersion, HttpMethod requestMethod, string requestPath) ParseFirstLine(string firstLine)
{
    List<string> lineParts = new List<string>(firstLine.Split(' '));

    float httpVersion = float.Parse(lineParts.Last().Split('/')[1]);
    HttpMethod httpMethod = MethodFromString(lineParts.First());

    lineParts.RemoveAt(0); lineParts.RemoveAt(lineParts.Count - 1);
    string requestUrl = lineParts.Aggregate((string one, string two) => $"{one} {two}");

    return (
        httpVersion,
        httpMethod,
        requestUrl
    );
}

Monodevelop, my C♯ IDE, appears to go absolutely nuts over this with red squiggly lines everywhere, but it still compiles just fine :D

As I was writing this, a thought popped into my head that a tuple would be perfect here. After reading somewhere a month or two ago about a new tuple syntax that's coming to C♯ I thought I'd get awesomely distracted and take a look before continuing, and what I found was really cool. In C♯ 7 (the latest and quite possibly greatest version of C♯ to come yet!), there's a new feature called value tuples, which let's you dynamically declare tuples like I have above. They're already fully supported by the C♯ compiler, so you can use them today! Just try to ignore your editor if it gets as confused as mine did... :P

If you're interested in learning more about them, I'll leave a few links at the bottom of this post. Anyway, back to the GlidingSquirrel! Other than the new value tuples in the above, there's not much going on, actually. A few linq calls take care of the heavy lifting quite nicely.

And finally, here's my header parsing method.

public static Dictionary<string, string> ParseHeaders(List<string> rawHeaders)
{
    Dictionary<string, string> result = new Dictionary<string, string>();

    foreach(string header in rawHeaders)
    {
        string[] parts = header.Split(':');
        KeyValuePair<string, string> nextHeader = new KeyValuePair<string, string>(
            parts[0].Trim().ToLower(),
            parts[1].Trim()
        );
        if(result.ContainsKey(nextHeader.Key))
            result[nextHeader.Key] = $"{result[nextHeader.Key]},{nextHeader.Value}";
        else
            result[nextHeader.Key] = nextHeader.Value;
    }

    return result;
}

While I have attempted to build in support for multiple definitions of the same header according to the spec, I haven't actually encountered a time when it's actually been needed. Again, this is one of those things I've built in now for later - as I do intend on updating this and adding more features later - and perhaps even work it into another secret project I might post about soon.

Lastly, I'll leave you with a link to the repository I'm storing the code for the GlidingSquirrel, and a few links for your enjoyment:

GlidingSquirrel

Update 2018-05-01: Fixed a few links.

Type this	To get this	Notes
`bold text`	bold text	-
`_italics text_`	italics text	-
`~~deleted text~~`	~~deleted~~	-
`code text`	`code text`	Inserts some monospaced code. It is preferred that large blocks of code are linked to using a service such as Pastebin, Github Gists or Ideone.
`> Quote`	Quote	-
`[display text](//google.com)`	display text	Inserts a hyperlink. Please use responsibly. `[rel=nofollow]` is in use and spam will be deleted.
`---`		Inserts a horizontal line. The previous line must be blank.

Stardust
Blog