Starbeamrainbowlabs

Stardust
Blog

Flexible Bison: Compiler Theory

One of the modules I've picked to do in my first semester of my third year at university is Lanuguages and their Compilers. Naturally, this entails building a compiler to compile a program that's written in a source language (spec provided, thankfully! :D) into plain old ANSI C.

The tools we're going to be using for this and the steps involved in actually compiling something into another language are somewhat complicated, and I'm having a bit of difficulty getting my head around the different steps a compiler goes through and how these steps relate to the tools we're going to be using. This blog post is my attempt to make sense of what I've learnt so far.

Firstly, let me introduce the tools I'll be using: GNU flex and GNU bison. Apparently they have a much shallower learning curve than other tools out there. At first, this doesn't appear to be the case - but the more I think about it the more I realise that this is true.

Flex, as far as I can tell, is a regular-expression based scanning tokeniser. In other words, it breaks down an input string into a series of tokens. It has a method that, when called, finds and returns the next token from the source string.

Bison uses tokenised output from flex to construct a parse tree. This parse tree is then optimised with redundant nodes removed, loops optimised, and other such tweaks. Finally, this optimised tree is then used to generate the output code.

With the cast introduced, I can get to the stages of a compiler:

  1. Lexical Analysis - Tokenisation
  2. Syntactical Analysis - Conversion of the token stream into a parse tree
  3. Semantic Analysis - Correction of the tree - e.g. automatic type conversion
  4. Intermediate Code Generation - Sometimes the compiler outputs sets of 3 values in a list of tuples. This was needed in older computers that couldn't hold all the steps of a compiler in memory at once! In my case, I'll be outputting the parse tree generated in step 3 I guess - but not to disk, as today we can have all the passes of the compiler in memory at the same time :D
  5. Optimisation - Redundant parts of the parse tree are removed etc. - loops are focused in particular
  6. Code Generation - The output code in the target language is generated here - whether that be in C (very common), Assembly, or another language.

This seems somewhat familiar. The Lexical Analysis phase seems to be rather similar to what flex is designed for, and the Semantic Analysis stage appears to what bison does. As for the other stages, I'm not really sure. I'm guessing that it'll become clear later as we build this compiler in stages - but I'm suspecting that we'll be writing them in plain C - unless I've missed something about bison :P

If you've made it this far, thanks for reading! If this feels somewhat disorganised - then it probably is - after all, this is mainly to get it all straight in my own head :P

If you've got any questions, please ask away in the comments below :-)

Tag Cloud

3d account algorithms announcement archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling css dailyprogrammer debugging demystification distributed computing downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday html html5 html5 canvas infrastructure interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance manjaro network networking node.js operating systems performance photos php pixelbot portable privacy programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtualisation visual web website windows windows 10 xmpp xslt

Archive

Art by Mythdael