Starbeamrainbowlabs

Stardust
Blog

Flexible Bison: Compiler Theory

One of the modules I've picked to do in my first semester of my third year at university is Lanuguages and their Compilers. Naturally, this entails building a compiler to compile a program that's written in a source language (spec provided, thankfully! :D) into plain old ANSI C.

The tools we're going to be using for this and the steps involved in actually compiling something into another language are somewhat complicated, and I'm having a bit of difficulty getting my head around the different steps a compiler goes through and how these steps relate to the tools we're going to be using. This blog post is my attempt to make sense of what I've learnt so far.

Firstly, let me introduce the tools I'll be using: GNU flex and GNU bison. Apparently they have a much shallower learning curve than other tools out there. At first, this doesn't appear to be the case - but the more I think about it the more I realise that this is true.

Flex, as far as I can tell, is a regular-expression based scanning tokeniser. In other words, it breaks down an input string into a series of tokens. It has a method that, when called, finds and returns the next token from the source string.

Bison uses tokenised output from flex to construct a parse tree. This parse tree is then optimised with redundant nodes removed, loops optimised, and other such tweaks. Finally, this optimised tree is then used to generate the output code.

With the cast introduced, I can get to the stages of a compiler:

  1. Lexical Analysis - Tokenisation
  2. Syntactical Analysis - Conversion of the token stream into a parse tree
  3. Semantic Analysis - Correction of the tree - e.g. automatic type conversion
  4. Intermediate Code Generation - Sometimes the compiler outputs sets of 3 values in a list of tuples. This was needed in older computers that couldn't hold all the steps of a compiler in memory at once! In my case, I'll be outputting the parse tree generated in step 3 I guess - but not to disk, as today we can have all the passes of the compiler in memory at the same time :D
  5. Optimisation - Redundant parts of the parse tree are removed etc. - loops are focused in particular
  6. Code Generation - The output code in the target language is generated here - whether that be in C (very common), Assembly, or another language.

This seems somewhat familiar. The Lexical Analysis phase seems to be rather similar to what flex is designed for, and the Semantic Analysis stage appears to what bison does. As for the other stages, I'm not really sure. I'm guessing that it'll become clear later as we build this compiler in stages - but I'm suspecting that we'll be writing them in plain C - unless I've missed something about bison :P

If you've made it this far, thanks for reading! If this feels somewhat disorganised - then it probably is - after all, this is mainly to get it all straight in my own head :P

If you've got any questions, please ask away in the comments below :-)

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Archive

Art by Mythdael