Archive

## Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## PhD Update 1: Directions

Welcome to my first PhD update post. I intend to post these at bimonthly intervals. In the last post, I talked a bit about my PhD project that I'm doing and my initial thoughts. Since then, I've done heaps of investigation into a number of different potential directions I could take the project. For reference, my PhD title is actually as follows:

Using the Internet of Things, Big Data, and AI to dynamically map flood risk.

There are 3 main elements to this project:

• Big Data
• Artificial Intelligence (AI)
• The Internet of Things (IoT)

I'm pretty sure that each of them will have an important role to play in the final product - even if I'm not sure what those roles are just yet :P

Particularly of concern at the moment is this blog post by Google. It talks about they've managed to significantly improve flood forecasting with AI along with a seriously impressive visualisation to back it up - but I can't find a paper on it anywhere. I'm concerned that anything I try to do in the area won't be useful if they are already streets ahead of everyone else like that.

I guess one of the strong points I should try to hit is the concept of explainable AI if possible.

### All the data sources!

As it stands right now, I'm currently evaluating various different potential data sources that I've managed to gain access to. My aim here is to evaluate how useful they will be in solving the wider problem - and whether they are useful enough to be worth investigating further.

#### Environment Agency

Some great people from the environment agency came into University recently to chat with us about what they did. The discussion we had was very interesting - but they also asked if there was anything they could do to help our PhD projects out.

Seeing the opportunity, I jumped at the chance to get a hold of some of their historical datasets. They actually maintain a network of high-quality sensors across the country that monitor everything from rainfall to river statistics. While they have a real-time API that you can use to download recent measurements, it doesn't appear to go back further than March 2017. To this end, I asked for data from 2005 up to the end of 2017, so that I could get a clearer picture of the 2007 and 2013 floods for AI training purposes.

So far, this dataset has proved very useful at least initially as a testbed for training various kinds of AI as I learn PyTorch (see my recent post for how that has been going - I've started with a basic LSTM first. For reference, an LSTM is a neural network architecture that is good at processing time-series data - but is quite computationally expensive to run.

#### Met Office

I've also been investigating the datasets that the Met Office provide. These chiefly appear to be in the form of their free DataPoint API. Particularly of interest are their rainfall radar images, which are 500x500 pixels and are released every 15 minutes. Sadly they are only available for a few hours at best, so you have to grab them fast if you want to be able to analyse particularly interesting ones later.

Annoyingly though, their API does not appear to give any hints as to the bounding boxes of these images - and neither can I find any information about this online. I posted in their support forum, but it doesn't appear that anyone actually monitors it - so at this point I suspect that I'm unlikely to receive a response. Without knowing the (lat, lng) co-ordinates of the images produced by the API, they are little more use than pretty wall art.

#### Internet of Things

On the Internet of Things front, I'm already part of Connected Humber, which have a network of sensors setup that are monitoring everything from air quality to temperature, humidity, and air pressure. While these things aren't directly related to my project, the dataset that we're collecting as a group may very well come in handy as an input to a model of some description.

I'm pretty sure that I'll need to setup some additional custom sensors of my own at some point (probably soonish too) to collect the measurement readings that I'm missing from other pre-existing datasets.

Whilst I've been doing this, I've also been reading up a storm. I've started by reading into traditional physics-based flood modelling simulations (such as caesar-lisflood) - which appear to fall into a number of different categories, which also have sub-categories. It's quite a rabbit hole - but apparently I'm diving all the way down to the very bottom.

The most interesting paper on this subject I found was this one from 2017. It splits physics-based models up into 3 categories:

• Empirical models (i.e. ones that just display sensor readings, calculate some statistics, and that's about it)
• Hydrodynamic models - the best-known models that simulate water flow etc - can be categorised as either 1D, 2D, or 3D - also very computationally expensive - especially in higher dimensions
• Simplified conceptual models - don't actually simulate water flow, but efficient enough to be used on large areas - also can be quite inaccurate with complex terrain etc.

As I'm going to be using artificial intelligence as the core of my project, it quickly became evident that this is just stage-setting for the actual kind of work I'll be doing. After winding my way through a bunch of other less interesting papers, I found my way to this paper from 2018 next, which is similar to the previous one I linked to - just for AI and flood modelling.

While I haven't yet had a chance to follow up on all the interesting papers referenced, it has a number of interesting points to keep in mind:

• Artificial Intelligences need lots of diverse data points to train well
• It's important to measure a trained network's ability to generalise what it's learnt to other situations it hasn't seen yet

The odd thing about this paper is that it claims that regular neural networks were better than recurrent neural network structures - despite the fact that it is only citing a single old 2013 paper (which I haven't yet read). This led me on to read a few more papers - all of which were mildly interesting and had at least something to do with neural networks.

I certainly haven't read everything yet about flood modelling and AI, so I've got quite a way to go until I'm done in this department. Also of interest are 2 newer neural network architectures which I'm currently reading about:

### Next steps

I want to continue to read about the above neural networks. I also want to implement a number of the networks I've read about in PyTorch to continue to learn the library.

Lastly, I want to continue to find new datasets to explore. If you're aware of a dataset that I haven't yet talked about on here, comment below!

## Summer Project Part 5: When is a function not a function?

Another post! Looks like I'm on a roll in this series :P

In the last post, I looked at the box I designed that was ready for 3D printing. That process has now been completed, and I'm now in possession of an (almost) luminous orange and pink box that could almost glow in the dark.......

I also looked at the libraries that I'll be using and how to manage the (rather limited) amount of memory available in the AVR microprocessor.

Since last time, I've somehow managed to shave a further 6% program space off (though I'm not sure how I've done it), so most recently I've been implementing 2 additional features:

• An additional layer of AES encryption, to prevent The Things Network for having access to the decrypted data
• GPS delta checking (as I'm calling it), to avoid sending multiple messages when the device hasn't moved

After all was said and done, I'm now at 97% program space and 47% global variable RAM usage.

To implement the additional AES encryption layer, I abused LMiC's IDEETRON AES-128 (ECB mode) implementation, which is stored in src/aes/ideetron/AES-128_V10.cpp.

It's worth noting here that if you're doing crypto yourself, it's seriously not recommended that you use ECB mode. Please don't. The only reason that I used it here is because I already had an implementation to hand that was being compiled into my program, I didn't have the program space to add another one, and my messages all start with a random 32-bit unsigned integer that will provide a measure of protection against collision attacks and other nastiness.

Specifically, it's the method with this signature:

void lmic_aes_encrypt(unsigned char *Data, unsigned char *Key);

Since this is an internal LMiC function declared in a .cpp source file with no obvious header file twin, I needed to declare the prototype in my source code as above - as the method will only be discovered by the compiler when linking the object files together (see this page for more information about the C++ compilation process. While it's for regular Linux executable binaries, it still applies here since the Arduino toolchain spits out a very similar binary that's uploaded to the microprocessor via a programmer).

However, once I'd sorted out all the typing issues, I slammed into this error:

/tmp/ccOLIbBm.ltrans0.ltrans.o: In function transmit_send':
sketch/transmission.cpp:89: undefined reference to lmic_aes_encrypt(unsigned char*, unsigned char*)'
collect2: error: ld returned 1 exit status

Very strange. What's going on here? I declared that method via a prototype, didn't I?

Of course, it's not quite that simple. The thing is, the file I mentioned above isn't the first place that a prototype for that method is defined in LMiC. It's actually in other.c, line 35 as a C function. Since C and C++ (for all their similarities) are decidedly different, apparently to call a C function in C++ code you need to declare the function prototype as extern "C", like this:

extern "C" void lmic_aes_encrypt(unsigned char *Data, unsigned char *Key);

This cleaned the error right up. Turns out that even if a function body is defined in C++, what matters is where the original prototype is declared.

I'm hoping to release the source code, but I need to have a discussion with my supervisor about that at the end of the project.

Found this interesting? Come across some equally nasty bugs? Comment below!

## Summer Project Part 4: Threading the needle and compacting it down

In the last part, I put the circuit for the IoT device together and designed a box for said circuit to be housed inside of.

In this post, I'm going to talk a little bit about 3D printing, but I'm mostly going to discuss the software aspect of the firmware I'm writing for the Arduino Uno that's going to be control the whole operation out in the field.

Since last time, I've completed the design for the housing and sent it off to my University for 3D printing. They had some great suggestions for improving the design like making the walls slightly thicker (moving from 2mm to 4mm), and including an extra lip on the lid to keep it from shifting around. Here are some pictures:

(Left: The housing itself. Right: The lid. On the opposite side (not shown), the screw holes are indented.)

At the same time as handling sending the housing off to be 3D printed, I've also been busily iterating on the software that the Arduino will be running - and this is what I'd like to spend the majority of this post talking about.

I've been taking an iterative approach to writing it - adding a library, interfacing with it and getting it to do what I want on it's own, then integrating it into the main program.... and then compacting the whole thing down so that it'll fit inside the Arduino Uno. The thing is, the Uno is powered by an ATmega328P (datasheet). Which has 32K of program space and just 2K of RAM. Not much. At all.

The codebase I've built for the Uno is based on the following libraries.

• LMiC (the matthijskooijman fork) - the (rather heavy and needlessly complicated) LoRaWAN implementation
• Entropy for generating random numbers as explained in part 2
• TinyGPS, for decoding NMEA messages from the NEO-6M
• SdFat, for interfacing with microSD cards over SPI

### Memory Management

Packing the whole program into a 32K + 2K box is not exactly an easy challenge, I discovered. I chose to first deal with the RAM issue. This was greatly aided by the FreeMemory library, which tells you how much RAM you've got left at a given point in the execution of your program. While it's a bit outdated, it's still a useful tool. It works a bit like this:

#include <MemoryFree.h>;

void setup() {
Serial.begin(115200);
Serial.println(freeMemory, DEC);
char test[] = "Bobs Rockets";
Serial.println(freeMemory, DEC); // Should be lower than the above call
}

void loop() {
// Nothing here
}

It's worth taking a moment to revise the way stacks and heaps work - and the differences between how they work in the Arduino environment and on your desktop. This is going to get rather complicated quite quickly - so I'd advise reading this stack overflow answer first before continuing.

First, let's look at the locations in RAM for different types of allocation:

• Things on the stack
• Things on the heap
• Global variables

Unlike on the device you're reading this on, the Arduino does not support multiple processes - and therefore the entirety of the RAM available is allocated to your program.

Since I wasn't sure about preecisly how the Arduino does it (it's processor architecture-specific), I wrote a simple test program to tell me:

#include <Arduino.h>

struct Test {
uint32_t a;
char b;
};

Test global_var;

void setup() {
Serial.begin(115200);

Test stack;
Test* heap = new Test();

Serial.print(F("Stack location: "));
Serial.println((uint32_t)(&stack), DEC);

Serial.print(F("Heap location: "));
Serial.println((uint32_t)heap, DEC);

Serial.print(F("Global location: "));
Serial.println((uint32_t)&global_var, DEC);
}

void loop() {
// Nothing here
}

This prints the following for me:

Stack location: 2295
Heap location: 461
Global location: 284

From this we can deduce that global variables are located at the beginning of the RAM space, heap allocations go on top of globals, and the stack grows down starting from the end of RAM space. It's best explained with a diagram:

Now for the differences. On a normal machine running an operating system, there's an extra layer of abstraction between where things are actually located in RAM and where the operating system tells you they are located. This is known as virtual memory address translation (see also virtual memory, virtual address space).

It's a system whereby the operating system maintains a series of tables that map physical RAM to a virtual address space that the running processes actually use. Usually each process running on a system will have it's own table (but this doesn't mean that it will have it's own physical memory - see also shared memory, but this is a topic for another time). When a process accesses an area of memory with a virtual address, the operating system will transparently translate the address using the table to the actual location in RAM (or elsewhere) that the process wants to access.

This is important (and not only for security), because under normal operation a process will probably allocate and deallocate a bunch of different lumps of memory at different times. With a virtual address space, the operating system can defragment the physical RAM space in the background and move stuff around without disturbing currently running processes. Keeping the free memory contiguous speeds up future allocations, and ensures that if a process asks for a large block of contiguous memory the operating system will be able to allocate it without issue.

As I mentioned before though, the Arduino doesn't have a virtual memory system - partly because it doesn't support multiple processes (it would need an operating system for that). The side-effect here is that it doesn't defragment the physical RAM. Since C/C++ isn't a managed language, we don't get _heap compaction_ either like in .NET environments such as Mono.

All this leads us to an environment in which heap allocation needs to be done very carefully, in order to avoid fragmenting the heap and causing a stack crash. If an object somewhere in the middle of the heap is deallocated, the heap will not shrink until everything to the right of it is also deallocated. This post has a good explanation of the problem too.

Other things we need to consider are keeping global variables to a minimum, and trying to keep most things on the stack if we can help it (though this may slow the program down if it's copying things between stack frames all the time).

To this end, we need to choose the libraries we use with care - because they can easily break these guidelines that we've set for ourselves. For example, the inbuilt SD library is out, because it uses a global variable that eats over 50% of our available RAM - and it there's no way (that I can see at least) to reclaim that RAM once we're finished with it.

This is why I chose SdFat instead, because it's at least a little better at allowing us to reclaim most of the RAM it used once we're finished with it by letting the instance fall out of scope (though in my testing I never managed to reclaim all of the RAM it used afterwards).

Alternatives like µ-Fat do exist and are even lighter, but they have restrictions such as no appending to files for example - which would make the whole thing much more complicated since we'd have to pre-allocate the space for the file (which would get rather messy).

The other major tactic you can do to save RAM is to use the F() trick. Consider the following sketch:

#include

void setup() {
Serial.begin(115200);
Serial.println("Bills boosters controller, version 1");
}

void loop() {
// Nothing here
}

On the highlighted line we've got an innocent Serial.println() call. What's not obvious here is that the string literal here is actually copied to RAM before being passed to Serial.println() - using up a huge amount of our precious working memory. Wrapping it in the F() macro forces it to stay in your program's storage space:

Serial.println(F("Bills boosters controller, version 1"));

### Saving storage

With the RAM issue mostly dealt with, I then had to deal with the thorny issue of program space. Unfortunately, this is not as easy as saving RAM because we can't just 'unload' something when it's not needed.

My approach to reducing program storage space was twofold:

• Picking lightweight alternatives to libraries I needed
• Messing with flags of said libraries to avoid compiling parts of libraries I don't need

It is for these reasons that I ultimately went with TinyGPS instead of TinyGPS++, as it saved 1% or so of the program storage space.

It's also for this reason that I've disabled as much of LMiC as possible:

#define DISABLE_JOIN
#define DISABLE_PING
#define DISABLE_BEACONS
#define DISABLE_MCMD_DCAP_REQ
#define DISABLE_MCMD_DN2P_SET

This disables OTAA, Class B functionality (which I don't need anyway), receiving messaages, the duty cycle cap system (which I'm not sure works between reboots), and a bunch of other stuff that I'd probably find rather useful.

In the future, I'll probably dedicate an entire microcontroller to handling LoRaWAN functionality - so that I can still use the features I've had to disable here.

Even doing all this, I still had to trim down my Serial.println() calls and remove any non-essential code to bring it under the 32K limit. As of the time of typing, I've got jut 26 bytes to spare!

Next time, after tuning the TPL5110 down to the right value, we're probably going to switch gears and look at the server-side of things - and how I'm going to be storing the data I receive from the Arudino-based device I've built.

Found this interesting? Got a suggestion? Comment below!

## Summer Project Part 3: Putting it together

In the first post in this series, I outlined my plans for my Msc summer project and what I'm going to be doing. In the second post, I talked about random number generation for the data collection.

In this post, I'm going to give a general progress update - which will mostly centre around the Internet of Things device I'm building to collect the signal strength data.

Since the last post, I've got nearly all the parts I need for the project, except the TPL5111 power manager and 4 rechargeable AA batteries (which should be easy to come by - I'm sure I've got some lying around somewhere).

I've also wired the thing up, with a cable standing in for the TPL5111.

The power management board there technically doesn't need a breadboard, but it makes mounting it in the box easier.

I still need to splice the connector onto the battery box I had lying around with some soldering and electrical tape - I'll do that later this week.

The wiring there is kind of messy, but I've tested each device individually and they all appear to work as intended. Here's a clearer diagram of what's going on that drew up in Fritzing (sudo apt install fritzing for Linux users):

Speaking of mounting things in the box, I've discovered OpenSCAD thanks to help from a friend and have been busily working away at designing a box to put everything in that can be 3D printed:

I've just got the lid to do next (which I'm going to do after writing this blog post), and then I'm going to get it printed.

With this all done, it's time to start working on the transport for the messages - namely using LMIC to connection to the network and send the GPS location to the application server, which is also unfinished.

The lovely people at the hardware meetup have lent me a full 8-channel LoRaWAN gateway that's connected to The Things Network for my project, which will make this process a lot easier.

Next time, I'll likely talk about 3D printing and how I've been 'threading the needle', so to speak.

## Easy Paper Referencing and Research

For my Msc (Masters in Science) degree that I'm currently working towards, I'm having to do an increasing amount of research and 'official' referencing in my University's referencing style.

Unlike when I first started at University, however, by now I've developed a number of strategies to aid me in doing this referencing properly with minimal effort - and actually finding papers in the first place. To this end, I wanted to write a quick post on what I do for my own reference - and hopefully someone else will find it useful too (comment below!).

Although I really like using Markdown for writing blog posts and other documents, when I've got to do proper referencing I often end up using LaTeX instead. I first used LaTeX for my interim report in my final year of my undergraduate degree, and while I'm still looking for a decent renderer (I think I'm using TeX Live, and it's a pain), it's great for referencing because of BibTeX. With the template provided by my University, I can enter something like this:

@Misc{Techopedia2017,
author = {{Techopedia Inc.}},
year = {2017},
title = {What is Hamming Distance? - Definition from Techopedia},
howpublished = {Available online: \url{https://www.techopedia.com/definition/19723/hamming-distance} [Accessed 04/04/2019]}
}

...and it will automagically convert it into the right style according to the template my University has given me. Then I just reference it (\citep{Techopedia2017}), and I'm away!

For actually finding papers, I've been finding Google Scholar useful. It even has a button you can press that generates the BibTeX definition as shown above for you to paste into your references file!

Finally, I've just recently discovered Microsoft Academic. While I haven't used it too much yet, it seems to be a great alternative to Google Scholar. It's got a cool AI-based semantic search engine, and an interesting summary view of how a given paper relates to other papers to help you find more papers on a topic. It shows a papers references,the papers that cite that paper, and papers that it determines to be related - giving you plenty to choose from.

Using a combination of these approaches, I've been able to effectively focus on the finding and referencing papers, without getting bogged down too much with the semantics of how I should actually do the referencing itself.

Found this useful? Got another tip? Comment below!

## Delivering Linux 101

Achievement get: Deliver workshop!

At the beginning of my time here at University I never thought I'd be planning and leading the delivery an entire workshop on the basics of Linux. Assessed coursework presentations have nothing on this!

Overall, I think it went rather well, actually. About a dozen people attended in total, and most people seemed to manage to get near the end of the tasks I had prepared:

1. Installing Ubuntu
2. Installing Mono
3. Investigating Monodevelop

I think next time I want to better prepare for the gap when installing the operating system, as it took much longer than I expected. Perhaps choosing the "minimal" installation instead of the "normal" installation would help here?

Preparing some slides on things like the folder structure and layout, and re-ordering the slides about package management would would all help.

If I can't cut down on the installation time, pre-installed virtual machines would also work - but I'd like to keep the OS installation if possible to show that it's an easy process installing Ubuntu on their own machines.

Moving forwards, I've already received a bunch of feedback on what future sessions could contain:

1. Setting up remote access
• This would be SSH, which is already installed & pre-setup on a server installation of Ubuntu
2. Gaming
• I unsure precisely what's meant by this. Is it installation of various games? Or maybe it's configuration of various platforms such as Steam? Perhaps someone could elaborate on it?
3. Server installation & maintenance
• Installation is largely similar to a desktop
• I'd want to measure how long it takes to install, because much of the work with a server is the post-install tasks
• Perhaps looking into a pre-installed server might be beneficial here, but security would be a slight concern

I think for anything more advanced, I'll probably go with a lab sheet-style setup instead, so that people can work at their own pace - especially since something like server configuration has many different steps to it.

I'd certainly want a goal to work towards for such a session. I've had some ideas already:

• Setting up a web server
• Installing Nginx
• Writing and understanding configuration files
• Possibly some FastCGI? PHP / Python? Probably not, what with everything else
• Setting up a server to host a custom application
• Writing systemd service files
• Setting up log rotation

Common to both of these ideas would be:

• Basic terminal skills
• Basic hardening
• Using sudo instead of root if it doesn't come configured with that setup already
• Securing SSH
• Enabling UFW

I'm pretty sure I'll be doing another one of these sessions, although I'm unsure as to whether there's the demand for a repeat of this one.

If you've got any thoughts, let me know in the comments below!

Thanks also to @MoirkoB and everyone else who provided both time and resources to enable this to go ahead. Without, I'm sure it wouldn't have happened.

If you'd like to view the slide deck I used, you can do so here:

Linux 101 Slide Deck

If you missed it, but would like to be notified of future sessions, then fill out this Google Form:

Linux 101 Overflow

## Compilers, VMs, and JIT: Spot the difference

It's about time for another demystification post, I think :P This time, I'm going to talk about Compilers, Virtual Machines (VMs), and Just-In-Time Compilation (JIT) - and the way that they are both related and yet different.

### Compilers

To start with, a compiler is a program that converts another program written in 1 language into another language (usually of a lower level). For example, gcc compiles C++ into native machine code.

A compiler usually has both a backend and a frontend. The frontend is responsible for the lexing and parsing of the source programming language. It's the part of the compiler that generates compiler warnings and errors. The frontend outputs an abstract syntax tree (or parse tree) that represents the source input program.

The backend then takes this abstract syntax tree, walks it, and generates code in the target output language. Such code generators are generally recursive in nature. Depending on the compiler toolchain, this may or may not be the last step in the build process.

gcc, for example, generates code to object files - which are then strung together by a linking program later in the build process.

Additional detail on the structure of a compiler (and how to build one yourself!) is beyond the scope of this article, but if you're interested I recommend reading my earlier Compilers 101 post.

### Virtual Machines

A virtual machine comes in several flavours. Common to all types is the ability to execute instructions - through software - as if they were being executed on a 'real' hardware implementation of the programming language in question.

Examples here include:

• KVM - The Linux virtual machine engine. Leverages hardware extensions to support various assembly languages that are implemented in real hardware - everything from Intel x86 Assembly to various ARM instruction sets. Virtual Machine Manager is a good GUI for it.
• VirtualBox - Oracle's offering. Somewhat easier to use than the above - and cross-platform too!
• .NET / Mono - The .NET runtime is actually a virtual machine in it's own right. It executes Common Intermediate Language in a managed environment.

It's also important to note the difference between a virtual machine and an emulator. An emulator is very similar to a virtual machine - except that it doesn't actually implement a language or instruction set itself. Instead, it 'polyfills' hardware (or environment) features that don't exist in the target runtime environment. A great example here is WINE or Parallels, which allow programs written for Microsoft Windows to be run on other platforms without modification.

### Just-In-Time Compilation

JIT is sort of a combination of the above. The .NET runtime (officially knows as the Common Language Runtime, or CLR) is an example of this too - in that it compiles the CIL in the source assembly to native code just before execution.

Utilising such a mechanism does result in an additional delay during startup, but usually pays for itself in the vast performance improvements that can be made over an interpreter - as a JITting VM can automatically optimise code as it's executing to increase performance in real-time.

This is different to an interpreter, which reads a source program line-by-line - parsing and executing it as it goes. If it needs to go back to another part of the program, it will usually have to re-parse the code before it can execute it.

### Conclusion

In this post, we've taken a brief look at compilers, VMs, and JIT. We've looked at the differences between them - and how they are different from their siblings (emulators and interpreters). In many ways, the line between the 3 can become somewhat blurred (hello, .NET!) - so learning the characteristics of each is helpful for disentangling different components of a program.

If there's anything I've missed - please let me know! If you're still confused on 1 or more points, I'm happy to expand on them if you comment below :-)

Found this useful? Spotted a mistake? Comment below!

## Disassembling .NET Assemblies with Mono

As part of the Component-Based Architectures module on my University course, I've been looking at what makes the .NET ecosystem tick, and how .NET assemblies (i.e. .NET .exe / .dll files) are put together. In the process, we looked as disassembling .NET assemblies into the text-form of the Common Intermediate Language (CIL) that they contain. The instructions on how to do this were windows-specific though - so I thought I'd post about the process on Linux and other platforms here.

Our tool of choice will be Mono - but before we get to that we'll need something to disassemble. Here's a good candidate for the role:

using System;

namespace SBRL.Demo.Disassembly {
static class Program {
public static void Main(string[] args) {
int a = int.Parse(Console.ReadLine()), b = 10;
Console.WriteLine(
"{0} + {1} = {2}",
a, b,
a + b
);
}
}
}

Excellent. Let's compile it:

csc Program.cs

This should create a new Program.exe file in the current directory. Before we get to disassembling it, it's worth mentioning how the compilation and execution process works in .NET. It's best explained with the aid of a diagram:

As is depicted in the diagram above, source code in multiple languages get compiled (maybe not with the same compiler, of course) into Common Intermediate Language, or CIL. This CIL is then executed in an Execution Environment - which is usually a virtual machine (Nope! not as in Virtual Box and KVM. It's not a separate operating system as such, rather than a layer of abstraction), which may (or may not) decide to compile the CIL down into native code through a process called JIT (Just-In-Time compilation).

It's also worth mentioning here that the CIL code generated by the compiler is in binary form, as this take up less space and is (much) faster for the computer to operate on. After all, CIL is designed to be efficient for a computer to understand - not people!

We can make it more readable by disassembling it into it's textual equivalent. Doing so with Mono is actually quite simple:

monodis Program.exe >Program.il

Here I redirect the output to a file called Program.il for convenience, as my editor has a plugin for syntax-highlighting CIL. For those reading without access to Mono, here's what I got when disassembling the above program:

.assembly extern mscorlib
{
.ver 4:0:0:0
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4..
}
.assembly 'Program'
{
.custom instance void class [mscorlib]System.Runtime.CompilerServices.CompilationRelaxationsAttribute::'.ctor'(int32) =  (01 00 08 00 00 00 00 00 ) // ........

.custom instance void class [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::'.ctor'() =  (
01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78   // ....T..WrapNonEx
63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01       ) // ceptionThrows.

.custom instance void class [mscorlib]System.Diagnostics.DebuggableAttribute::'.ctor'(valuetype [mscorlib]System.Diagnostics.DebuggableAttribute/DebuggingModes) =  (01 00 07 01 00 00 00 00 ) // ........

.hash algorithm 0x00008004
.ver  0:0:0:0
}

.namespace SBRL.Demo.Disassembly
{
.class private auto ansi beforefieldinit Program
extends [mscorlib]System.Object
{

// method line 1
.method public static hidebysig
default void Main (string[] args)  cil managed
{
// Method begins at RVA 0x2050
.entrypoint
// Code size 47 (0x2f)
.maxstack 5
.locals init (
int32   V_0,
int32   V_1)
IL_0000:  nop
IL_0006:  call int32 int32::Parse(string)
IL_000b:  stloc.0
IL_000c:  ldc.i4.s 0x0a
IL_000e:  stloc.1
IL_000f:  ldstr "{0} + {1} = {2}"
IL_0014:  ldloc.0
IL_0015:  box [mscorlib]System.Int32
IL_001a:  ldloc.1
IL_001b:  box [mscorlib]System.Int32
IL_0020:  ldloc.0
IL_0021:  ldloc.1
IL_0023:  box [mscorlib]System.Int32
IL_0028:  call void class [mscorlib]System.Console::WriteLine(string, object, object, object)
IL_002d:  nop
IL_002e:  ret
} // end of method Program::Main

// method line 2
.method public hidebysig specialname rtspecialname
instance default void '.ctor' ()  cil managed
{
// Method begins at RVA 0x208b
// Code size 8 (0x8)
.maxstack 8
IL_0000:  ldarg.0
IL_0001:  call instance void object::'.ctor'()
IL_0006:  nop
IL_0007:  ret
} // end of method Program::.ctor

} // end of class SBRL.Demo.Disassembly.Program
}


Very interesting. There are a few things of note here:

• The metadata at the top of the CIL tells the execution environment a bunch of useful things about the assembly, such as the version number, the classes contained within (and their signatures), and a bunch of other random attributes.
• An extra .ctor method has been generator for us automatically. It's the class' constructor, and it automagically calls the base constructor of the object class, since all classes are descended from object.
• The ints a and b are boxed before being passed to Console.WriteLine. Exactly what this does and why is quite complicated, and best explained by this Stackoverflow answer.
• We can deduce that CIL is a stack-based language form the add instruction, as it has no arguments.

I'd recommend that you explore this on your own with your own test programs. Try changing things and see what happens!

• Try making the Program class static
• Try refactoring the int.Parse(Console.ReadLine()) into it's own method. How is the variable returned?

This isn't all, though. We can also recompile the CIL back into an assembly with the ilasm code:

ilasm Program.il

This makes for some additional fun experiments:

• See if you can find where b's value is defined, and change it
• What happens if you alter the Console.WriteLine() format string so that it becomes invalid?
• Can you get ilasm to reassemble an executable into a .dll library file?

Found this interesting? Discovered something cool? Comment below!

## Converting my timetable to ical with Node.JS and Nightmare

(Source: Taken by me!)

My University timetable is a nightmare. I either have to use a terrible custom app for my phone, or an awkwardly-built website that feels like it's at least 10 years old!

Thankfully, it's not all doom and gloom. For a number of years now, I've been maintaining a Node.JS-based converter script that automatically pulls said timetable down from the JSON backend of the app - thanks to a friend who reverse-engineered said app. It then exports it as a .ical file that I can upload to my server & subscribe to in my pre-existing calendar.

Unfortunately, said backend changed quite dramatically recently, and broke my script. With the only alternative being the annoying timetable website that really don't like being scraped.

Where there's a will, there's a way though. Not to be deterred, I gave it a nightmare of my own: a scraper written with Nightmare.JS - a Node.JS library that acts, essentially, as a scriptable web-browser!

While the library has some kinks (especially with .wait("selector")), it worked well enough for me to implement a scraper that pulled down my timetable in HTML form, which I then proceeded to parse with cheerio.

The code is open-source (find it here!) - and as of this week I've updated it to work with the new update to the timetabling system this semester. A further update will be needed in early December time, which I'll also be pushing to the repository.

The README of the repository should contain adequate instructions for getting it running yourself, but if not, please open an issue!

Note that I am not responsible for anything that happens as a result of using this script! I would strongly recommend setting up the secure storage of your password if you intend to automate it. I've just written this to solve a problem in order to ensure that I can actually get to my lectures on time - and not an hour late or on the wrong week because I've misread the timetable (again)!

In the future, I'd like to experiment with other scriptable web-browser frameworks to compare them with my experiences with NightmareJS.

Found this interesting? Found a better way to do this? Comment below!

## Achievement Get: Complete Degree!

Hey! I've just realised that this was my 300th post. Wow! Thanks for all the support so far. Here's to the next 300 :D

I've just finish my degree at University this week. I'm still waiting on results, but I thought I'd make a post about it documenting my thoughts so far before I forget (also, happy 300th post! Wow, have I reached that already?). Note that this doesn't mean the end of this blog - far from it! I'll be doing a masters this next academic year.

It's been a great journey to have the chance to go on. I feel like I've improved in so many different ways having gone to University. Make no mistake: University isn't for everyone (if you're considering it, make sure you do your research about all your options!) - but I've found that it's been right for me.

I'm glad that Rob Miles' suggestion of starting a blog has been a great investment of my time. For one, I've been able to document the things that I've been learning so that I can come back to them later, and read a more personalised guide to the thing I blogged about. I've also learnt a ton about Linux server management too - as I manage the server that this blog runs on entirely through the terminal (sorry, hackers who want to get into my non-existent management web interface - I know you're out there - you leave pawprints all over my server logs!). All very valuable experiences - I highly suggest that you start one too (you won't regret it. I promise!).

I've also found that my eyes have been opened in more ways than one whilst doing my degree - both to new ways of approaching problems, and new ways of solving them - and many other things that would take too long to list here). I've blogged about some of my favourite modules in this regard before - particularly Virtual Reality and Languages and Compilers.

Thanks to all the amazing people I've met along the way, I've ended up in a much better place than when I started.

Art by Mythdael