Starbeamrainbowlabs

Stardust
Blog


Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance manjaro network networking node.js operating systems performance photos php pixelbot portable privacy programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Android app architecture: First steps and impressions

The android logo.

This post, obviously, is not endorsed by Google or the Android Open-Source Project at all in any way. It's just my attempt to consolidate what I've learnt about it so far.

I've been learning about Android development at University recently - this post is my attempt to consolidate what I've learnt. I'm by no means as confused as I have been in the past at similar stages of a module (take AI and compilers for example, though later on I figured it out). If you notice a mistake, please do let me know by posting a comment below, and I'll correct it.

Note that this post isn't meant to be a complete tutorial on the subject - just to consolidate what you've already learnt (or guide your learning if you're just starting out). I'd recommend taking a course at University, or reading an tutorial on the web on the subject.

Android apps, unlike a regular C or C# program, are made up of one or more activities. They don't have any particular entry point, such as the main method of a C or C# program - instead an activity is selected as the one that should be launched when the user taps the icon on their home screen. Other entry point to the app are possible too - for example services (persistent background processes) and scheduled tasks (broadcast receivers I think). Other apps can even launch your app's activities themselves!

An activity is like a single screen - it's job is to perform a single, focused task. For example a contact list, or a contact details screen.

When an app is launched, a new 'back stack' is created - into which new activities are inserted. It's this mechanism that makes the back button go back to the contacts list from the contact details screen when you press the back button on your phone. Activities can choose to launch an activity in a ne back stack if they want - though I think this only implies to implicit intents (more on these later) that target activities in other apps.

Intents are used to instruct Android as to which child activity a parent activity would like to launch, and to carry data (serialised to a string) around between activities. There are two kinds: implicit and explicit.

Explicit intents are useful when you know the exact name of the intent that you want to launch (their names are like C♯ namespaces I believe). They are most useful when you want to launch another activity that's part of your app (or an extension or your app I suppose).

Implicit intents are used when you know what kind of app you want to launch, but not what it's called. Examples of this include selecting a contact from the address book, opening a URL in a web browser, and pre-filling an email or text message for the user to send.

Fairly simple, right? Unfortunately, this is complicated by 2 things: a large number of Android versions (or API Versions) in use currently (I think Google are working on this long-term), and fragments.

Fragments are like mini-activities. Multiple fragments can be displayed at once, and can be detached / reattached to and from the screen by way of the fragment manager. This is most useful for tablets and other devices with larger screens - An activity can dynamically fetch multiple fragments and display them at the same time. Sticking with the address book / contacts theme, on a tablet one might have the contact list down the left-hand-side of the screen, and the details of the currently selected contact down the right-hand-side of the screen.

The activity is responsible for shuffling messages around between fragments (or other activities) - fragments should be completely self-contained and shouldn't be aware of other fragments that may or may not be displayed at the same time as it is.

From what I can tell, the Android ecosystem has plenty of structure to it. It's (in theory) easy to put together an app even if you haven't written any Java (I can see how Java is said to be C♯'s predecessor) before - especially with the assistance that Android Studio provides, though it does feel somewhat heavy-handed and opinionated at times. I suppose any sufficiently advanced IDE carries considerable risk of being opinionated.

I anticipate, going forwards, that the real problems will start to occur when I start considering compatibility between different Android API versions. Thankfully, I've got experience dealing with web browser compatibility issues, so I'm hoping that Android won't be much more problematic than that - especially since everything appears to be well-documented as to which API versions they were introduced / deprecated in.

Compilers 101: Build your own flex + bison compiler in a few easy(?) steps

So you want to build your own compiler? Great! Don't know where to start? This guide should help! At University, we're building our own compiler for a custom programming language invented by our lecturer with a pair of GNU tools by the name of flex and bison - which I've blogged about before. Since that post, I've learnt a ton about how the whole process works, so I thought I'd write up a more coherent blog post on the subject :P

A diagram explaining the process of building a compiler. Explained below.

Stage 1: Planning

The whole process starts with railroad diagrams (also known as flowcharts) of the language you want to write a compiler for. Having an accurate set of railroad diagrams is essential to understanding precisely how the language is put together, which is rather useful for the next step.

Converting the railroad diagrams into plain BNF (Backus Naur Form). Unfortunately, bison doesn't support EBNF-like notation at the current time, so only plain-old BNF will do.

Stage 2: Lexing

With your railroad diagrams converted into BNF, you can start writing code! The first chunk of code that needs writing is the lexer. Lexing is what flex is good at - and involves converting the input source code into lexemes - discrete sequences of characters that match a particular pattern and can be assigned a particular category name, turning it into a token. Perhaps an example would help. Consider the following:

void do_awesome_stuff(int a, string b) {
    /* Code here */
}

The above can be turned into a sequence of tokens, not unlike the following (ignoring whitespace tokens, of course):

TYPE: void
IDENTIFIER: do_awesome_stuff
OPEN_BRACKET: (
TYPE: int
IDENTIFIER: a,
COMMA: ,
TYPE: string
IDENTIFIER: b
CLOSE_BRACKET: )
OPEN_BRACE: {
COMMENT: /* Code here */*
CLOSE_BRACE: }

See? We can identify 8 token types in the source string: TYPE, IDENTIFIER, COMMA, OPEN_BRACKET, CLOSE_BRACKET, OPEN_BRACE, COMMENT, and CLOSE_BRACE. These types and the rules to match them can be found by analysing a combination of the railroad diagrams and the BNF you created earlier.

Stage 3: Parser the first

With a lexer in hand, we can now look at writing the parser. This is done in two stages. The parser itself, and upgrading said parser to generate a parse tree.

Let's talk about the parser first. The parser can be largely created simply by running a few regular-expression find and replace rules on your BNF, actually. From there, it's just a case of adding the header and the footer to complete the document.

Let's take a look at some example BNF:

<instructions> ::= START <lines> END

<lines> ::= <lines> <line>
    | <line>

<line> ::= <command>

<command> ::= <cmd_name> <number>

<cmd_name> ::= FD
    | BK
    | LT
    | RT

<number> ::= <number> <digit>
    | <digit>

<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

The above matches something like the following:

START
FD 100
RT 180
FD 125
LT 90
BK 50
END

Very interesting (a virtual cookie is available for anyone who gets the reference as to what this grammar represents!). Let's look at converting that into something bison can understand:

instructions : START lines END
    ;

lines : lines line
    | line
    ;

line : command
    ;

command : cmd_name number
    ;

cmd_name : FD
    | BK
    | LT
    | RT
    ;

number : number digit
    | digit
    ;

digit : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

That's looking much better already! Simply by using the regular expression substitutions:

  1. <([a-z_]+)> -> $1
  2. ::= -> :
  3. \n\n -> \n\t;\n\n

....we can get most of the way there to something that bison can understand. Next, we need to refactor it a bit to tell it which tokens are coming from the lexer (which I'll leave up to you to write as an exercise), so it doesn't get them confused with the rules - which are defined in the BNF(-like) rules.

Let's get rid of the rule for number and digit first, since we can do those in the lexer quite easily. Next, let's add a %token definition to the top to tell it which are coming from the lexer. It's good practice to define everything that comes from the lexer in uppercase, and everything that's a rule that exists only in bison in lowercase:

%start instructions
%token FD BK LT RT NUMBER START END

We've also defined a start symbol - the one which when bison reaches it, it knows that it's completed the parsing process, as bison is a bottom-up parser.

Lastly, we need to reference the lexer itself. Thankfully that's easy too by appending to your new bison file:

%%

#include "lex.yy.c"

Very nice. Don't forget about the new line at the end of the file - flex and bison will complain if it isn't present! Here's the completed bison file:

%start instructions
%token FD BK LT RT NUMBER START END

instructions : START lines END
    ;

lines : lines line
    | line
    ;

line : command
    ;

command : cmd_name NUMBER
    ;

cmd_name : FD
    | BK
    | LT
    | RT
    ;

%%

#include "lex.yy.c"

With a brand-new bison file completed, there's just one component of the parser left - a plain-old C file that calls it. Let's create one of those quickly:


#include 

int yyparse(void);

int main(void)
{       
    # if YYDEBUG == 1
    extern int yydebug;
    yydebug = 1;
    #endif

    return yyparse();
}

void yyerror(char *error_message)
{
    fprintf(stderr, "Error: %s\nExiting\n", error_message);
}

The highlighted lines enable a special debugging mode built-in to bison if the standard compile-time symbol YYDEBUG is specified - and bison is run with a few special parameters. Here's the sequence of commands needed to compile this:

flex lexer.l
bison -tv parser.y
gcc -Wall -Wextra -g parser.tab.c main.c -lfl -ly -DYYDEBUG -D_XOPEN_SOURCE=700

The gcc command is probably a bit long-winded, but it does several useful things for us:

  • Shows additional warnings just in case we've made a mistake that might be an issue later (-Wall -Wextra)
  • Include additional debugging information in the output file to allow debugging with gdb (the GNU Debugger) if necessary (-g)
  • Fix strange errors on some systems (-D_XOPEN_SOURCE=700)

If you're on a Windows system, you may need to remove the -ly - which appears to be required on the Linux machines I use - it tells gcc that we'll be referencing the bison library.

Stage 4: Parser again

Congratulations on getting this far! You've now got a lexer and a parser - so it's time to put them to use. This is done by utilising the parser to build a parse tree - a tree of nodes that represent the input. Here's an example tree:

An example parse tree.

As you can see, each high-level node references one or more lower-level nodes, and the structure of the tree represents the first 2 lines of the example input above. The nodes in yellow are lexical tokens that come directly from flex - these are called terminals, or leaf nodes. The ones in purple come from the bison rules (which we derived from the BNF we wrote at the beginning of this post) - and are non-terminals, or tree nodes.

With this in mind, let's introduce another feature or two of bison. Firstly, let's take a look at revising that %token declaration we created above:

%token FD BK LT RT START END
%token<val_num> NUMBER

The important bit here is the <val_num>. Here we tell bison that a value should be attached to the token NUMBER - and that it will be of type int. After telling bison that it should expect a value, we need to give it a place to put it. Let's write some more code to go just below the %token declarations:

%union {
    int val_num;
}

There we go! Excellent - we've got a place to put it. Now we just need to alter the lexer to convert the token value to an int and put it there. That's not too tough, thankfully - but if you're having trouble with it, here's a hint:

{number}        { yylval.val_num = atoi(yytext); return(NUMBER); }

Now we have it passing numbers correctly, let's look briefly at generating that parse tree. I'm not going to give the game away - just a few helpful hints as to what you need to do here - otherwise it's not as fun :P

Generating the parse tree can be considered the both the most challenging part of the experience (especially if you don't know what you're doing) and the easiest to deal with at same time. Knowing your stuff and your end goal before you start makes the whole process a lot easier.

The first major step is to create a struct that can represent a type of node in your parse tree. It might be useful to store several properties here - such as the node type (An enum will come in handy here), a spot for the value of a lexical token (or a reference to it in a symbol table if you have one), and references to child nodes in the parse tree.

The second major step of the process is to create a utility method that creates a new node of the tree on the heap, and then revise the bison file to get each rule to create new nodes on the tree in such a way that it creates a parse tree when it reaches start node (or top node of the tree - which, in the case of the above, is instructions). For the purposes of this post, I'll be using a method with this prototype:

TreeNode create_node(int item, int node_type, TreeNode left, TreeNode right);

Your tree node struct and subsequent creation method may vary. With this in hand, we can revise the bison rules we created above to create these nodes we've been talking about. Here's a quick pointer on how to revise the rule for command above:

command : cmd_name NUMBER   { $$ = create_node($2, NODE_COMMAND, $1, NULL); }
    ;

This might look a bit strange, but let's break it down. The bit in curly braces is some (almost) plain C code that creates the node and returns a pointer to it to bison. The $$ is the return value for that node - which, I might add reminds me of something I forgot above. We need to tell bison about our new tree node data type and which rules should return it:

%type<val_tnode> instructions lines line command cmd_name

/* And in %union { ... } ..... */
TreeNode val_tnode;

This is almost the same as the %token<val_num> we did before, but we're defining the return value of a rule this type - not a token. With that little interlude out of the way, let's return to the code above. $1 and $2 refer to the first and second items in the rule definition respectively - and hold the type that we defined above in the %token and %type directives. Since bison is a bottom-up parser - this means that by the time this code executes, all it's child nodes have (hopefully) been created - and we just have to tie them all up together with a new node. In the case of my little example above, $1 is of type TreeNode, and $2 is of type int (that is if I didn't make any mistakes further up!).

Stage 5: Blasting off to code generation and beyond

Phew! That's a lot of work. If you've read this far, thank you and well done! It's been a long journey for both you the reader and me the writer, but you're almost done.

While conceptually simple when broken down, the whole process actually gets rather involved - especially when writing the BNF and the parser (the latter of which can be a particular pain due to shift/reduce and reduce/reduce errors), and the amount of code and head scratching you've got to do to get to this point is enormous. My best advice is to take it slow and don't rush - you'll only cause most problems for yourself if you try and jump the gun. Make sure that each stage works as you intend before you continue - back-pedalling to fix an issue can be particularly annoying as it can be bothersome to work out which stage the bug is actually in.

The last step of the whole process is to actually do something with the parse tree we've worked so hard to create. Thankfully, that's not too difficult - as we can put some additional code in the { } block of the starting symbol to call methods that will do things like perform some optimisation, print the tree to the console - or generate some sweet code. While the actual generation of code is beyond the scope of this article, I may end up posting about some optimisation techniques you can use on a parse tree after I've finished fiddling with float handling, symbol tables, and initial code generation in my ACW (Assessed Course Work).

Found this useful? Found a bug in this post? Got a suggestion? Comment below! Since I don't have any real analytics on this blog besides the server logs, I've no idea if you've read it really unless you comment :P

Sources and Further Reading

Art by Mythdael