Disassembling .NET Assemblies with Mono
As part of the Component-Based Architectures module on my University course, I've been looking at what makes the .NET ecosystem tick, and how .NET assemblies (i.e. .NET .exe / .dll files) are put together. In the process, we looked as disassembling .NET assemblies into the text-form of the Common Intermediate Language (CIL) that they contain. The instructions on how to do this were windows-specific though - so I thought I'd post about the process on Linux and other platforms here.
Our tool of choice will be Mono - but before we get to that we'll need something to disassemble. Here's a good candidate for the role:
using System;
namespace SBRL.Demo.Disassembly {
static class Program {
public static void Main(string[] args) {
int a = int.Parse(Console.ReadLine()), b = 10;
Console.WriteLine(
"{0} + {1} = {2}",
a, b,
a + b
);
}
}
}
Excellent. Let's compile it:
csc Program.cs
This should create a new Program.exe file in the current directory. Before we get to disassembling it, it's worth mentioning how the compilation and execution process works in .NET. It's best explained with the aid of a diagram:
As is depicted in the diagram above, source code in multiple languages get compiled (maybe not with the same compiler, of course) into Common Intermediate Language, or CIL. This CIL is then executed in an Execution Environment - which is usually a virtual machine (Nope! not as in Virtual Box and KVM. It's not a separate operating system as such, rather than a layer of abstraction), which may (or may not) decide to compile the CIL down into native code through a process called JIT (Just-In-Time compilation).
It's also worth mentioning here that the CIL code generated by the compiler is in binary form, as this take up less space and is (much) faster for the computer to operate on. After all, CIL is designed to be efficient for a computer to understand - not people!
We can make it more readable by disassembling it into it's textual equivalent. Doing so with Mono is actually quite simple:
monodis Program.exe >Program.il
Here I redirect the output to a file called Program.il for convenience, as my editor has a plugin for syntax-highlighting CIL. For those reading without access to Mono, here's what I got when disassembling the above program:
.assembly extern mscorlib
{
.ver 4:0:0:0
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4..
}
.assembly 'Program'
{
.custom instance void class [mscorlib]System.Runtime.CompilerServices.CompilationRelaxationsAttribute::'.ctor'(int32) = (01 00 08 00 00 00 00 00 ) // ........
.custom instance void class [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::'.ctor'() = (
01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78 // ....T..WrapNonEx
63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01 ) // ceptionThrows.
.custom instance void class [mscorlib]System.Diagnostics.DebuggableAttribute::'.ctor'(valuetype [mscorlib]System.Diagnostics.DebuggableAttribute/DebuggingModes) = (01 00 07 01 00 00 00 00 ) // ........
.hash algorithm 0x00008004
.ver 0:0:0:0
}
.module Program.exe // GUID = {D6162DAD-AD98-45B3-814F-C646C6DD7998}
.namespace SBRL.Demo.Disassembly
{
.class private auto ansi beforefieldinit Program
extends [mscorlib]System.Object
{
// method line 1
.method public static hidebysig
default void Main (string[] args) cil managed
{
// Method begins at RVA 0x2050
.entrypoint
// Code size 47 (0x2f)
.maxstack 5
.locals init (
int32 V_0,
int32 V_1)
IL_0000: nop
IL_0001: call string class [mscorlib]System.Console::ReadLine()
IL_0006: call int32 int32::Parse(string)
IL_000b: stloc.0
IL_000c: ldc.i4.s 0x0a
IL_000e: stloc.1
IL_000f: ldstr "{0} + {1} = {2}"
IL_0014: ldloc.0
IL_0015: box [mscorlib]System.Int32
IL_001a: ldloc.1
IL_001b: box [mscorlib]System.Int32
IL_0020: ldloc.0
IL_0021: ldloc.1
IL_0022: add
IL_0023: box [mscorlib]System.Int32
IL_0028: call void class [mscorlib]System.Console::WriteLine(string, object, object, object)
IL_002d: nop
IL_002e: ret
} // end of method Program::Main
// method line 2
.method public hidebysig specialname rtspecialname
instance default void '.ctor' () cil managed
{
// Method begins at RVA 0x208b
// Code size 8 (0x8)
.maxstack 8
IL_0000: ldarg.0
IL_0001: call instance void object::'.ctor'()
IL_0006: nop
IL_0007: ret
} // end of method Program::.ctor
} // end of class SBRL.Demo.Disassembly.Program
}
Very interesting. There are a few things of note here:
- The metadata at the top of the CIL tells the execution environment a bunch of useful things about the assembly, such as the version number, the classes contained within (and their signatures), and a bunch of other random attributes.
- An extra
.ctormethod has been generator for us automatically. It's the class' constructor, and it automagically calls the base constructor of theobjectclass, since all classes are descended fromobject. - The
intsaandbare boxed before being passed toConsole.WriteLine. Exactly what this does and why is quite complicated, and best explained by this Stackoverflow answer. - We can deduce that CIL is a stack-based language form the
addinstruction, as it has no arguments.
I'd recommend that you explore this on your own with your own test programs. Try changing things and see what happens!
- Try making the
Programclassstatic - Try refactoring the
int.Parse(Console.ReadLine())into it's own method. How is the variable returned?
This isn't all, though. We can also recompile the CIL back into an assembly with the ilasm code:
ilasm Program.il
This makes for some additional fun experiments:
- See if you can find where
b's value is defined, and change it - What happens if you alter the
Console.WriteLine()format string so that it becomes invalid? - Can you get
ilasmto reassemble an executable into a.dlllibrary file?
Found this interesting? Discovered something cool? Comment below!