Starbeamrainbowlabs

Stardust
Blog


Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d account algorithms announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance manjaro network networking node.js operating systems performance photos php pixelbot portable privacy programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Rendering LaTeX documents to PDF: Attempt #2

It was all going rather well, actually - until I discovered that pandoc doesn't support regular bibliographies / references. Upon discovering this, I ended up with a bit of problem. Thankfully, the answer lay in pdflatex - but getting to the point where I could use it without having it crash on me (which, by the way it can't accomplish properly - it gives an exit code of 0 when crashing! O.o) was not a trivial journey.

This blog post is a follow up to my first post on rendering LaTeX documents with pandoc, and is my attempt to document what I did to get it to work. To start with, I installed texlive properly. Here's how to do that on apt-based systems:

sudo apt install texlive-latex-extra --no-install-recommends

The no-install-recommends is useful here to avoid ~450MiB of useless documentation (in PDF form, apparently) being dumped to your hard drive. I've also got an arch-based system (it's actually Artix Linux, that I've blogged about) which I've done this on, so here's the install command for those kind of systems:

sudo pacman -S texlive-latexextra

Once that's installed, we can use it to render our LaTeX document to PDF. Upon discussing my issues with my Lecturer at University, I discovered that you actually have to run 3 commands in succession in order to render a single PDF. Here they are:

bibtex filename
pdflatex --output-directory=. filename.tex
pdflatex --output-directory=. filename.tex

The first one compiles the bibliography using BiBTeX. If it isn't installed already, you might need to search your distribution's repositories and install it. Next, we run the LaTeX file through pdflatex from TeXLive not once but twice - as it apparently needs to resolve the references on the first pass (why it can't do them all in once pass I have no idea :P).

It's also worth noting that the bibtex command doesn't like you to append the filename extension - it does it automatically, apparently.

That's about everything I've got on the process so far. If you've got anything else to add, please let me know in the comments below (I'm rather new to this whole LaTeX thing....)

Further Reading

Rendering LaTeX documents to PDF on Linux (and maybe Windows too)

I'm starting to write another report for University, and unlike other reports, this one apparently has to be a rather specific format. To that end, I've got two choices, apparently: Use the provided Word / LibreOffice template, or use a LaTeX template instead. After the trouble and frustration I had with LibreOffice for my previous report, I've naturally decided that using the LaTeX template might be a good idea.

After downloading it, I ended up doing some research and troubleshooting to get it to render properly to a PDF. Now that I've figured it out, I thought I'd share it here for anyone else who ends up experiencing difficulties or is unsure on how it's done.

The way I'm going to be using it is with a tool called Pandoc. First, install it like so:

sudo apt install pandoc texlive-fonts-recommended

Adjust as necessary for your distribution - Windows users will need to read the download instructions. The texlive-font-recommended package is ~66MiB(!), but it contains a bunch of fonts that are needed when you're rendering LaTeX documents, apparently.

With the dependencies installed, here's the command to convert a LaTeX document to a PDF:

pandoc -s input.tex -o output.pdf

Replace index.tex with the path to your input file, and output.pdf with the desired path to the output file. I haven't figured out how to set the font to sans-serif yet, but I'll probably make another post about it when I do.

Found this helpful? Still having issues? Let me know below! I don't have analytics on here, so that's the only way I'll know if anyone reads this :-)

Sources and Further Reading

A comparison of compression formats for storing JSON

Happy new year, everyone!

I've blogged about different aspects of a (not so?) little project of mine several times now (exhibits A, B, C, D, and finally E - even if I didn't know it at the time), but it appears that I end up running into all sorts of interesting problems that I invent cool solutions for. I also find myself doing a bunch of research that I'm surprised nobody's compiled into a single place yet - as is the case in this blog post. (Also, Happy 2018 everyone! First post of the year :D)

I've been refactoring the subsystem that saves a considerable amount of JSON data to a bunch of different files on disk. Obviously, I'm interested in minimising the amount of space this JSON data takes up on disk. As this saving process happens in the background on a separate thread, I'm not too concerned about performance - other than it can't be too slow. With this in mind, I've found myself testing a bunch of different compression algorithms. Let's introduce our test data:

  1. A 17MiB card game dataset, as a single minified JSON file
  2. A 40KiB 'live' specimen chunk's data, saved as a single minified JSON file.

I can't remember which card game the first dataset is from, but I do know that I found it in the awesome JSON datasets list. Next up, here's our cast of compression algorithms we'll be testing:

  1. The venerable GZip
  2. BZip2 - Apparently GZIP, but smaller and slightly more computationally expensive
  3. XZ (the newer child of LZMA2)
  4. 7zip
  5. Google's Brotli

A colourful cast, to be sure! Let's run them through their paces - starting with the card game dataset. Here are the results I observe with each set to their default settings:

Format Size
Uncompressed 17M
gzip 2.4M
bzip2 1.6M
xz 1.3M
7zip 1.4M
brotli 1.3M

Very interesting. It looks like xz and brotli are tied in first place - though I observed that brotli took ages in comparison to all the other algorithms I tested - and upon closer inspection xz beat it by 17.3KiB. Numbers are all very well, but to really see what's going on here, let's plot it on a graph:

A graph of the data in the table above.

That's better! I can actually make some comparisons now. From this graph we can observe that gzip is the worst of the lot, followed by bzip2. 7zip is surprisingly in third place, but then again it is designed for multiple files, whereas the rest of them are designed for a single stream of data. In second place is the terribly slow brotli, and finally in first place is xz.

Hrm - very interesting. How do our algorithms measure up when confronted a smaller load though? Here are the results for the sample chunk data:

Format Size
Uncompressed 40K
gzip 5.4K
bzip2 4.3K
xz 4.6K
7zip 4.9K
Brotli 4.6K

Interesting results, to be sure, but I can't discern much from that. Let's plot a graph:

A graph of the data in the above table. Further explanation below.

Very interesting. With smaller loads, it appears that bzip2 performs much better with smaller loads than any other algorithm. While gzip is still the worst performing algorithm, while xz and brotli, surprisingly, performed much worse than bzip2.

To that end, I'm think I'm going to be choosing bzip2 as my compression of choice for this job, as it produces the best results for the type of work I'm going to be doing.

I'm really surprised about brotli though, actually. I had high hopes for it, considering it's a new algorithm invented by Google. They claimed that it would provide xz-like compression with gzip-like speeds - but from what I'm seeing, it does anything but.

Sources and Further Reading

Happy (belated?) Christmas 2017!

A pretty decoration from my Christmas tree! :D

Hello again! This is just a small post to wish you a relaxing and peaceful Christmas and new year. Have a picture of a nice Christmas tree decoration from my tree :-)

GlidingSquirrel is now on NuGet with automatic API documentation!

(This post is a bit late as I'm rather under the weather.)

A while ago, I posted about a HTTP/1.0 server library I'd written in C♯ for a challenge. One thing led to another, and I've now ended up adding support for HTTP/1.1, WebSockets, and more....

While these enhancements have been driven mainly by a certain project of mine that keeps stealing all of my attention, they've been fun to add - and now that I've (finally!) worked out how to package it as a NuGet package, it's available on NuGet for everyone to use, under the Mozilla Public License 2.0!

Caution should be advised though, as I've not thoroughly tested it yet to weed out the slew of bugs and vulnerabilities I'm sure are hiding in there somewhere - it's designed mainly to sit behind a reverse-proxy such as Nginx (not that's it's any excuse, I know!) - and also I don't have a good set of test cases I can check it against (I thought for sure there would be at least one test suite out there for HTTP servers, considering the age of the protocol, but apparently not!).

With that out of the way, specifying dependencies didn't turn out to be that hard, actually. Here's the extra bit I added to the .nuspec file to specify the GlidingSquirrel's dependencies:

<dependencies>
    <dependency id="MimeSharp" version="1.0.0" />
    <dependency id="System.ValueTuple" version="4.4.0" />
</dependencies>

The above goes inside the <metadata> element. If you're curious about what this strange .nuspec file is and it's format, I've blogged about it a while back when I figured out how to package my TeleConsole Client - in which I explain my findings and how you can do it yourself too!

I've still not figured out the strange $version$ and $description$ things that are supposed to reference the appropriate values in the .csproj file (if you know how those work, please leave a comment below, as I'd love to know!) - as it keeps telling me that <description> cannot be empty.....

In addition to packaging it and putting it up on NuGet, I've also taken a look at automatic documentation generation. For a while I've been painstakingly documenting the entire project with intellisense comments for especially this purpose - which I'm glad I started early, as it's been a real pain and taken a rather decent chunk of time to do.

Anyway, with the last of the intellisense comments finished today, I set about generating automatic documentation - which turned out to be a surprisingly difficult thing to achieve.

  • Sandcastle, Microsoft's offering has been helpfully discontinued.
  • Sandcastle Help File Builder, the community open-source fork of Sandcastle, appears to be Windows-only (I use Linux myself).
  • DocFX, Microsoft's new offering, appears to be more of a static site generator that uses your code and a bunch of other things as input, rather than the simple HTML API documentation generator I was after.
  • I found Docxygen to produce an acceptable result out-of-the-box, I discovered that configuring it was not a trivial task - the initial configuration file the init action created was over 100 KiB!

I found a few other options too, but they were either Windows-only, or commercial offerings that I can't justify using for an open-source project.

When I was about to give up the search for the day, I stumbled across this page on generating documentation by the mono project, who just happen to be the ones behind mono, the cross-platform .NET runtime that runs on Linux, macOS, and, of course, Windows. They're also the ones who have built mcs, the C♯ compiler that compliments the mono runtime.

Apparently, they've also built a documentation generation that has the ability to export to HTML. While it's certainly nothing fancy, it does look like you've all the power you need at your fingertips to customise the look and feel by tweaking the XSLT stylesheet it renders with, should you need it.

After a bit of research, I found this pair of commands to build the documentation for me quite nicely:

mdoc update -o docs_xml/ -i GlidingSquirrel.xml GlidingSquirrel.dll
mdoc export-html --out docs/ docs_xml

The first command builds the multi-page XML tree from the XML documentation generated during the build process (make sure the "Generate xml documentation" option is ticked in the project build options!), and the second one transforms that into HTML that you can view in your browser.

With the commands figured out, I set about including it in my build process. Here's what I came up with:

<Target Name="AfterBuild">
    <Message Importance="high" Text="----------[ Building documentation ]----------" />

    <Exec Command="mdoc update -o docs_xml/ -i GlidingSquirrel.xml GlidingSquirrel.dll" WorkingDirectory="$(TargetDir)" IgnoreExitCode="true" />
    <Exec Command="mdoc export-html --out $(SolutionDir)/docs docs_xml" WorkingDirectory="$(TargetDir)" IgnoreExitCode="true" />
</Target>

This outputs the documentation to the docs/ folder in the root of the solution. If you're a bit confused as to how to utilise this in your own project, then perhaps the Untangling MSBuild post I wrote to figure out how MSBuild works can help! You can view the GlidingSquirrel's automatically-generated documentation here. It's nothing fancy (yet), but it certainly does the job. I'll have to look into prettifying it later!

Untangling MSBuild: MSBuild for normal people

I don't know about you, but I find the documentation on MSBuild is be rather confusing. Even their definition of what MSBuild is is a bit misleading:

MSBuild is the build system for Visual Studio.

Whilst having to pull apart the .csproj file of a project of mine and put it back together again to get it to do what I wanted, I spent a considerable amount of time reading Microsoft's (bad) documentation and various web tutorials on what MSBuild is and what it does. I'm bound to forget what I've learnt, so I'm detailing it here both to save myself the bother of looking everything up again and to make sense of everything I've read and experimented with myself.

Before continuing, you might find my previous post, Understanding your compiler: C# an interesting read if you aren't already aware of some of the basics of Visual Studio solutions, project files, msbuild, and the C♯ compiler.

Let's start with a real definition. MSBuild is Microsoft's build framework that ties into Visual Studio, Mono, and basically anything in the .NET world. It has an XML-based syntax that allows one to describe what MSBuild has to do to build a project - optionally depending on other projects elsewhere in the file system. It's most commonly used to build C♯ projects - but it can be used to build other things, too.

The structure of your typical Visual Studio solution might look a bit like this:

The basic structure of a Visual Studio solution. Explained below.

As you can see, the .sln file references one or more .csproj files, each of which may reference other .csproj files - not all of which have to be tied to the same .sln file, although they usually are (this can be handy if you're using Git Submodules). The .sln file will also specify a default project to build, too.

The file extension .csproj isn't the only one recognised by MSBuild, either - others such as .pssproj (PowerShell project), .vcxproj (Visual C++ Project), .targets (Shared tasks + targets - we'll get to these later), and others - though the generic extension is simply .proj.

So far, I've observed that MSBuild is pretty intelligent about automatically detecting project / solution files in it's working directory - you can call it with msbuild in a directory and most of the time it'll find and build the right project - even if it finds a .sln file that it has to parse first.

Let's get to the real meat of the subject: targets. Consider the following:

<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" DefaultTargets="Build" ToolsVersion="4.0">
    <Import Project="$(MSBuildBinPath)\Microsoft.CSharp.targets" />

    <Target Name="BeforeBuild">
        <Message Importance="high" Text="Before build executing" />
    </Target>
</Project>

I've simplified it a bit to make it a bit clearer, but in essence the above imports the predefined C♯ set of targets, which includes (amongst others) BeforeBuild, Build itself, and After Build - the former of which is overridden by the local MSBuild project file. Sound complicated? It is a bit!

MSBuild uses a system of targets. When you ask it to do something, you're actually asking it to reach a particular target. By default, this is the Build target. Targets can have dependencies, too - in the case of the above, the Build target depends on the (blank) BeforeBuild target in the default C♯ target set, which is then overridden by the MSBuild project file above.

The second key component of MSBuild project files are tasks. I've used one in the example above - the Message task which, obviously enough, outputs a message to the build output. There are lots of other types of task, too:

  • Reference - reference another assembly or core namespace
  • Compile - specify a C♯ file to compile
  • EmbeddedResource - specify a file to include as an embedded resource
  • MakeDir - create a directory
  • MSBuild - recursively build another MSBuild project
  • Exec - Execute a shell command (on Windows this is with cmd.exe)
  • Copy - Copy file(s) and/or director(y/ies) from one place to another

This is just a small sample of what's available - check out the MSBuild task reference for a full list and more information about how each is used. It's worth noting that MSBuild targets can include multiple tasks one after another - and they'll all be executed in sequence.

Unfortunately, if you try to specify a wildcard in an EmbeddedResource directive, both Visual Studio and Monodevelop 'helpfully' try to auto-expand these wildcards to reference the actual files themselves. To avoid this, a clever hack can be instituted that uses the CreateItem task:

<CreateItem Include="$(ProjectDir)/obj/client_dist/**">
    <Output ItemName="EmbeddedResource" TaskParameter="Include" />
</CreateItem>

The above loops above all the files that are in $(ProjectDir)/obj/client_dist, and dynamically creates an EmbeddedResource directive for each - thereby preventing annoying auto-expansion.

The $(ProjectDir) bit is a variable - the final key component of MSBuild project files. There are a number of built-in variables for different things, but you can also define your own.

  • $(ProjectDir) - The current project's root directory
  • $(SolutionDir) - The root directory of the current solution. Undefined if a project is being built directly.
  • $(TargetDir) - The output directory that the result of the build will be written to.
  • $(Configuration) - The selected configuration to build. Most commonly Debug or Release.

As with the tasks, there's a full list available that you can check out for more information.

That's the basics of how MSBuild works, as far as I'm aware at the moment. If I discover anything else of note, I'll post again in the future about what I've learnt. If this has helped, please comment below! If you're still confused, let me know in the comments too, and I'll try my best to help you out :-)

Want a tutorial on Git Submodules? Let me know by posting a comment below!

Finding the distance to a (finite) line from a point in Javascript

A screenshot of the library I've written. Explanation below.

For a project of mine (which I might post about once it's more stable), I'm going to need a way to find the distance to a point from the mouse cursor to implement an eraser. I've attempted this problem before - but it didn't exactly go to plan. To that end, I decided to implement the algorithm on its own to start with - so that I could debug it properly without all the (numerous) moving parts of the project I'm writing it for getting in the way.

As you may have guessed since you're reading this post, it actually went rather well! Using the C++ implementation on this page as a reference, it didn't take more than an hour or two to get a reasonable implementation working - and it didn't take a huge amount of time to tidy it up into an npm package for everyone to use!

My implementation uses ES6 Modules - so you may need to enable them in about:config or chrome://flags if you haven't already (don't believe the pages online that say you need Firefox / Chrome nightly - it's available in stable, just disabled by default) before taking a look at the demo, which you can find here:

Line Distance Calculator

(Click and drag to draw a line - your distance from it is shown in the top left)

The code behind it is actually quite simple - just rather full of nasty maths that will give you a headache if you try and understand it all at once (I broke it down, which helped). The library exposes multiple methods to detect a point's distance from different kinds of line - one for multi-segmented lines (which I needed in the first place), one for a single (finite) line (which the multi-segmented line employs), and one for a single infinite line - which I implemented first, using this Wikipedia article - before finding that it was buggy because it was for an infinite line (even though the article's name is apparently correct)!

I've written up a usage guide if you're interested in playing around with it yourself.

I've also got another library that I've released recently (also for Nibriboard) that simplifies multi-segmented lines instead of finding the distance to them, which I may post about about soon too!

Update: Looks like I forgot that I've already posted about the other library! You can read about it here: Line Simplification: Visvalingam's Algorithm

Got a question? Wondering why I've gone to the trouble of implementing such an algorithm? Comment below - I'd love to hear your thoughts!

Change the way you think: Languages and Compilers in Review

Change the way you think.

I haven't seen it used recently, but when I first arrived at Hull University to start my degree, that's the phrase I saw on a number of posters about the place. I've been thinking about it a lot over the course of my degree - and I've found that it has rung true more than one. My understanding of programming has been transformed 3 times that I can count, and before I get to the Languages and Compilers review that this post is supposed to be about, I'd like to talk a little bit about that first.

The first time my understanding transformed was when I arrived in my first year. Up until that point, I'd been entirely self-taught - learning languages such as _GML_ and later Javascript. All of a sudden, I was introduced to the concept of _Object-Oriented Programming_, and suddenly I understood that by representing things as classes and objects, it was possible to build larger programs without having them fall apart because they were a nightmare to maintain.

Then, when I was finishing up my year in industry, my understanding was transformed again. I realised the value of the experienced I'd had while I was out on my year in industry - and they have not only shown me mechanisms by which a project can be effectively managed, but they have also given me a bit more of an idea what I'd like to do when I finish my degree.

Finally, I think that in Languages and Compilers is the third time I've changed the way I think about programming. It's transformed my understanding of how programming languages are built, how their compilers and interpreters work, why things are they way they are. It's also shown me that there's no best programming language - there only the best one for the task at hand.

With this in hand, it gives me the tools I need to pick up and understand a new language much more easily than I could before, by comparing it's features to the ones that I already know about. I've found a totally new way of looking at programming languages: looking at them not on their own, but how their lexical style and paradigms compare to those employed by other languages.

If you're considering whether a degree in Computer Science is worth it, I'd say that if you're serious about programming for a living, then the answer is a resounding yes.

Have your own thoughts to add about (your?) CS degree? Have a question? Post a comment below!

Nasty Example SPL Programs

I've written a few example test programs to debug various edge cases whilst writing a compiler for my coursework at University, and I thought I've post them here to help others out. As this coursework is ongoing I'm not going to say anything else, but I hope that these help someone else out :-)

Anyone posting coursework compiler code here will have it deleted (but additional example SPL programs are welcome). In addition, I make no guarantees that these programs are even valid SPL that should compile.

Nasty A

NastyA:
DECLARATIONS
    a OF TYPE INTEGER;
CODE
    READ(a);
    WRITE(a);
    WRITE(45, 'c', 5.68249)
ENDP NastyA.

Nasty B

NastyB:

DECLARATIONS

    i, toggle, step, limit OF TYPE INTEGER;

CODE

    1 -> toggle;
    1 -> step;
    10 -> limit;
    FOR i IS 1 BY step TO limit DO
        IF toggle = 1 THEN
            -1 -> step;
            -1 -> toggle;
            -10 -> limit
        ENDIF;
        WRITE(i);
        NEWLINE;
        WRITE(toggle);
        NEWLINE;
        WRITE(step);
        NEWLINE;
        NEWLINE
    ENDFOR;
    NEWLINE

ENDP NastyB.

Nasty C

NastyC:

DECLARATIONS
    a,b,c OF TYPE INTEGER;
    d,e,f,g OF TYPE REAL;
CODE
    5 + 8 -> a;
    4.5 + 6.8 -> d;
    WRITE(a);
    NEWLINE;
    WRITE(d);
    NEWLINE;
    WRITE((8 - 4));
    NEWLINE
ENDP NastyC.

Jump around a filesystem with a bit of bash

Your shell in the middle of a teleporter.

(Banner remixed from images found on openclipart)

I've seen things like jump, which allow you to bookmark places on your system so that you can return to them faster. The trouble is, I keep forgetting to use it. I open the terminal and realise that I need to be in a specific directory, and forget to bookmark it once I cd to it - or I forget that I bookmarked it and cd my way there anyway :P

To solve the problem, I thought I'd try implementing my own simplified system, under the name teleport, telepeek, and telepick. Obviously, we'll have to put these scripts in something like .bash_aliases as functions - otherwise it won't cd in the terminal itself. Let's start with teleport:

function teleport() {
    cd $(find . -type d | grep -iP "$1" | head -n1);
}

Not bad for a first attempt! Basically, it does a find to list all the subdirectories in the current directory, filters the results with the specified regex, and changes directory to the first result returned. Here's an example of how it's used:

~ $ teleport 'pep.*mint'
~/Documents/code/some/path/pepperminty-wiki/ $ 

We can certainly improve it though. Let's start by removing that head call:

function teleport() {
    cd $(find . -type d | grep -m1 -iP "$1");
}

What about all those Permission denied messages that pop up when you're jumping around places that you might not have permission to go everywhere? Let's suppress those too:

function teleport() {
    cd $(find . -type d 2>/dev/null | grep -m1 -iP "$1");
}

Much better. With a teleport command in hand, it might be nice to inspect the list of directories the find + grep combo finds. For that, let's invent a telepeek variant:

function telepeek() {
    find . -type d 2>/dev/null | grep -iP "$1" | less
}

Very cool. It doesn't have line numbers though, and they're useful. Let's fix that:

function telepeek() {
    find . -type d 2>/dev/null | grep -iP "$1" | less -N
}

Better, but I'd prefer them to be highlighted so that I can tell them apart from the directory paths. For that, we've got to change our approach to the problem:

function telepeek() {
    find . -type d 2>/dev/null | grep -iP "$1" | cat -n | sed 's/^[ 0-9]*[0-9]/\o033[34m&\o033[0m/' | less -R
}

By using a clever combination of cat -n to add the line numbers and a strange sed recipe (which I found in a comment on this Stack Overflow answer) to highlight the numbers themselves, we can get the result we want.

This telepeek command has given me an idea. Why not ask for an index to jump to after going to the trouble of displaying line numbers and jump to that directory? Let's cook up a telepick command!

function telepick() {
    telepeek $1;
    read -p "jump to index: " line_number;
    cd $(find . -type d 2>/dev/null | grep -iP "$1" | sed "${line_number}q;d");
}

That wasn't too hard. By using a few different commands rather like lego bricks, we can very easily create something that does what we want with minimal effort. The read -p "jump to index: " line_number bit fetches the index that the user wants to jump to, and sed comes to the rescue again to pick out the line number we're interested in with sed "${line_number}q;d".

Art by Mythdael