Archive

## Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance manjaro network networking node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## Shift-Reduce Parser Part 2: Building Furniture (1)

Hello and welcome! I got a bit distracted by other things as you can tell, but I'm back with part 2 of my series on building a shift-reduce parser. If you're not sure what I'm talking about, then I'd advise reading part 1 first and then coming back here. It might be a good idea to re-read it anyway, juts to refresh your memory :-)

Last time, we created some data classes to store the various rules and tokens that we'll be generating. Today, we're going to build on that and start turning a set of rules into a parse table. Let's introduce the rules we'll working with:

<start> ::= <expression>

<expression> ::= <expression> PLUS <value>
| <term>

<term> ::= <term> DIVIDE <value>
| <value>

<value> ::= <number>
| BRACKET_OPEN <expression> BRACKET_CLOSE

<number> ::= DIGIT
| <number> DIGIT

The above represents a very basic calculator-style syntax, which only supports adding and dividing. It's written in Backus-Naur Form, which is basically a standardised way of writing parsing rules.

To build a parse table, we first must understand what such a thing actually is. Let's take a look at an example:

state action goto
* + 0 1 $E B 0 s1 s2 3 4 1 r4 r4 r4 r4 r4 2 r5 r5 r5 r5 r5 3 s5 s6 goal 4 r3 r3 r3 r3 r3 5 s1 s2 7 6 s1 s2 8 7 r1 r1 r1 r1 r1 8 r2 r2 r2 r2 r2 _(Source: Adapted from the LR Parser on Wikipedia.)_ While it looks complex at first, let's break it down. There are 3 parts to this table: The state, the action, and the goto. The action and goto represent What should happen if a particular token is encountered. In this case, the input stream contains both terminal (i.e. DIGIT, DIVIDE, BRACKET_CLOSE, etc. in the case of our BNF above) and non-terminal (i.e. number, term, expression, etc. in the case of our BNF above) symbols - if understand it correctly, so there are actually 2 parts to the table here to make sure that both are handled correctly. We'll be connecting this to our lexer, which outputs only terminal symbols, so we should be ok I believe (if you know better, please post a comment below!). The state refers to the state in the table. As I've mentioned before, a given state may contain one or more configurations. It's these configurations that give rise to the actions in the table above, such as s2 (shift and then go to state 2) or r3 (reduce and jump to state 3). To use the table, the parser must know what state it's in, and then take a look across the top row for the next symbol it has in the token stream. Once found, it can follow it down to figure out what action it should take, as explained above. If there isn't an action in the box, then there must be an error in the input, as the table doesn't tell us what to do in this situation. To that end, we should try and generate a meaningful error message to help the user to find the mistake in the input (or the developer in the parser!). We're kind of getting ahead of ourselves here though. We need to build this table first, and to do that, we need to figure out which configurations go in which state. And, going down the rabbit hole, we need to know what a configuration is. Again, it's best if I demonstrate. Consider the following parsing rule from our example BNF at the beginning of this post: <value> ::= BRACKET_OPEN <expression> BRACKET_CLOSE A single configuration represent a possible state of the parser at a particular instant in time. I could split that above rule up like so: <value> ::= BRACKET_OPEN * <expression> BRACKET_CLOSE <value> ::= BRACKET_OPEN <expression> * BRACKET_CLOSE <value> ::= BRACKET_OPEN <expression> BRACKET_CLOSE * The asterisk represent where the parser might have gotten up to. Everything to the left is on the stack of the parser, and everything to the right hasn't happened yet. Since this isn't a top-level rule (in our example that honour goes to the rule for the start non-terminal), the parser will never be in a position where the first item there doesn't exist yet on the stack, so we can ignore the configuration in which the asterisk would be to the left of BRACKET_OPEN. Confused? Let me try and help here. Let's draw a diagram of how our parser is going to operate: _(Source: Made by me, but adapted from the LR Parser article on Wikipedia)_ Basically, the parser will be taking in the input token stream and either shift a new terminal token onto the stack, or reduce one or more existing tokens on the stack into a single non-terminal token, which replaces those existing tokens on the stack. The configurations above represent possible states of the stack (the bit to the left of the asterisk), and possible directions that the parser could take when parsing (the bit to th right of the asterisk). Finally, when the goal is reached, the output is returned to the caller (which, by the time we're done, should be a parse tree). Said tree can then be optimised and processed for whatever purpose we desire! With this knowledge, we can deduce that we can build the entire table by recursing over the tree of rules from the start state. That way, we'll visit every rule that we'll need to parse everything required to reach the goal state by recursing over all the rules for all the non-terminals referenced by all the rules we visit. We could even generate a warning if we detect that some rules don't connect to this 'tree'. Here's a tree of our example ruleset from the beginning of this post: It's a bit spaghetti-ish, but it should be legible enough :P This gives us an idea as to how we're going to tackle this. Taking into account the data classes we created in the last post, we need to make sure we keep the following in mind: 1. Since the main ShiftReduceParser class is going to hold the rules, the ParseTable class will need a reference to its parent ShiftReduceParser in order to query the rules. 2. In light of this, the SHiftReduceParser should be responsible for satisfying any queries the ParseTable has about rules - the ParseTable should not have to go looking & filtering the rule list held by ShiftReduceParser itself. 3. ParseTable will need a recursive method that will take a single top-level rule and recurse over it and its child rules (according to the tree I've talked about above) 4. This method in ParseTale will need to be extremely careful it doesn't get stuck in a loop. To that end, it'll have to keep track of whether it's already processed a rule or not. 5. It'll probably also have to keep track of which configurations it has added to the table class structure we defined in the last post to avoid adding rules twice. 6. Once ParseTable has figured out all the configurations and grouped them all into the right states, it will then have to recurse over the generated table and fill in all the shift / reduce / goal action(s) - not forgetting about the links to the other states they should point to. It's quite the laundry list! Thankfully, most of this is quite simple if we tackle it one step at a time. The most annoying bit is the grouping of configurations into states. This is done by looking at the token immediately before the asterisk in each configuration - all the configurations with the same token here will get grouped into the same state (while there are more complex algorithms that allow for more complex grammars, we'll stick with this for now as anything else makes my head hurt! Maybe in the future I'll look as figuring out precisely what kind of LR-style parser this is, and upgrading it to be a canonical LR(1) parser - the most advanced type I know of). This is quite a lot to take in, so I think I'll leave this post here for you to digest - and we'll get to writing some code in the next one. Found this useful? Spotted a mistake? Having trouble getting your head around it? Post a comment below! ## Shift-reduce Parser Part 1: First Steps Now that I've done the Languages and Compilers module at University, it's opened my eyes to a much better and more extensible way of handling complex text in a way that can easily be read by any subsequent code I write. To that end, I've found that at least 3 different various projects of mine could benefit from the inclusion of a shift-reduce parser, but I haven't been able to track one down for C♯ yet. With this in mind, I thought to myself: "Why not build one myself?" Sure, my Lecturer didn't go into too many details as to how they work, but it can't be that difficult, can it? Oh, I had no idea..... In this mini-series, I'm going to take you through the process of building a shift-reduce parser of your very own. As I write this, I haven't actually finished mine yet - I've just got to the important milestone of building a parse table! Thankfully, that's going to be a few posts away, as there's a fair amount of ground to cover until we get to that point. Warning: This series is not for the faint of heart! It's rather complicated, and extremely abstract - making it difficult to get your head around. I've had great difficulty getting mine around it - and ended up writing it in multiple stages. If you want to follow along, be prepared for lots of research, theory, and preparation! Let's start out by taking a look at what a shift-reduce parser does. If you haven't already, I'd recommend reading my previous compilers 101 post, which explains how to write a compiler, and the different stages involved. I'd also recommend checking out my earlier post on building a lexer, as it ties in nicely with the shift-reduce parser that we'll be building. In short, a shift-reduce parser compiles a set of BNF-style rules into a Parse Table, which it then utilises as a sort of state-machine when parsing a stream on input tokens. We'll take a look at this table compilation process in a future blog post. In this post, let's set up some data structures to help us along when we get to the compilation process in the next blog post. Here's the class structure we'll be going for: Let's start with a class to represent a single token in a rule: public enum ParserTokenClass { Terminal, NonTerminal } public struct ParserToken { public readonly ParserTokenClass Class; public readonly string Type; public ParserToken(ParserTokenClass inTokenType, string inType) { Class = inTokenType; Type = inType; } public override bool Equals(object obj) { ParserToken otherTokenType = (ParserToken)obj; return Class == otherTokenType.Class && Type == otherTokenType.Type; } public override int GetHashCode() { return$"{Class}:{Type}".GetHashCode();
}

public override string ToString()
{
string terminalDisplay = Class == ParserTokenClass.Terminal ? "T" : "NT";
return $"[ParserToken {terminalDisplay}: {Type}]"; } public static ParserToken NonTerm(string inType) { return new ParserToken(ParserTokenClass.NonTerminal, inType); } public static ParserToken Term(string inType) { return new ParserToken(ParserTokenClass.Terminal, inType); } } Pretty simple! A token in a rule can either be a terminal (basically a token straight from the lexer), or a non-terminal (a token that the parser reduces a set of other tokens into), and has a type - which we represent as a string. Unfortunately due to the complex comparisons we'll be doing later, it's a huge hassle to use an enum with a template class as I did in the lexer I built that I linked to earlier. Later on (once we've built the parse table), we'll extend this class to support attaching values and other such pieces of information to it, but for now we'll leave that out to aid simplicity. I also override Equals() and GetHashCode() in order to make comparing tokens easier later on. Overriding ToString() makes the debugging process much easier later, as we'll see in the next post! With a class to represent a token, we also need one to represent a rule. Let's create one now: public class ParserRule { /// <summary> /// A function to call when a reduce operation utilises this rule. /// </summary> public Action MatchingAction; public ParserToken LeftSide; public ParserToken[] RightSideSequence; public ParserRule(Action inMatchingAction, ParserToken inLeftSide, params ParserToken[] inRightSideSequence) { if (inLeftSide.Class != ParserTokenClass.NonTerminal) throw new ArgumentException("Error: The left-hand side must be a non-terminal token type."); MatchingAction = inMatchingAction; LeftSide = inLeftSide; RightSideSequence = inRightSideSequence; } public bool RightSideSequenceMatches(IEnumerable<ParserToken> otherRhs) { int i = 0; foreach (ParserToken nextToken in otherRhs) { if (!nextToken.Equals(RightSideSequence[i])) return false; i++; } return true; } public override string ToString() { StringBuilder result = new StringBuilder(); result.Append($"ParserRule: {LeftSide} = ");
foreach (ParserToken nextToken in RightSideSequence)
result.Append($" {nextToken}"); result.Append(";"); return result.ToString(); } } The above represents a single parser rule, such as <number> ::= <digit> <number>. Here we have the token on the left-hand-side (which we make sure is a non-terminal), and an array of tokens (which can be either terminal or non-terminal) for the right-hand-side. We also have an Action (which is basically a lamba function) that we'll call when we match against the rule, so that we have a place to hook into when we write code that actually does the tree building (not to be confused with the shift-reduce parser itself). Here I also add a method that we'll need later, which compares an array of tokens against the current rule, to see if they match - and we override ToString() here again to aid debugging. Now that we can represent tokens and rules, we can start thinking about representing configurations and states. Not sure what these are? All will be explained in the next post, don't worry :-) For now, A state can be seen as a row in the parse table, and it contains a number of configurations - which are like routes to different other states that the parser decides between, depending where it's gotten to in the token stream. public enum ParseTableAction { Shift, Reduce, Goal, Error } public class ParseTableConfiguration { public readonly ParserRule Rule; public readonly int RhsPosition; public ParseTableAction LinkingAction = ParseTableAction.Error; public ParseTableState LinkingState = null; public ParserToken TokenAfterDot { get { return Rule.RightSideSequence[RhsPosition]; } } public ParserToken TokenBeforeDot { get { return Rule.RightSideSequence[RhsPosition - 1]; } } /// <summary> /// Whether this configuration is the last in the sequence of configurations for the specified rule or not. /// </summary> /// <value><c>true</c> if is last in rule; otherwise, <c>false</c>.</value> public bool IsLastInRule { get { return RhsPosition > Rule.RightSideSequence.Length - 1; } } public ParseTableConfiguration(ParserRule inRule, int inRhsPosition) { Rule = inRule; RhsPosition = inRhsPosition; } public IEnumerable<ParserToken> GetParsedRhs() { return Rule.RightSideSequence.TakeWhile((ParserToken token, int index) => index <= RhsPosition); } public bool MatchesRhsSequence(ParserRule otherRule) { int i = 0; foreach (ParserToken nextToken in otherRule.RightSideSequence) { if (i > RhsPosition) break; if (!nextToken.Equals(otherRule.RightSideSequence[i])) return false; i++; } return true; } public override bool Equals(object obj) { ParseTableConfiguration otherConfig = obj as ParseTableConfiguration; if (otherConfig == null) return false; return Rule == otherConfig.Rule && RhsPosition == otherConfig.RhsPosition; } public override int GetHashCode() { return$"{Rule}:{RhsPosition}".GetHashCode();
}

public override string ToString()
{
StringBuilder result = new StringBuilder();

result.Append($"Configuration: {LinkingAction} "); if (LinkingState != null) result.Append($"to State {LinkingState.Id} ");
result.Append($"{Rule.LeftSide} = "); for (int i = 0; i <= Rule.RightSideSequence.Length; i++) { if (i == RhsPosition) result.Append(" * "); if (i == Rule.RightSideSequence.Length) continue; result.Append($"{Rule.RightSideSequence[i]} ");
}
result.Append(";");
return result.ToString();
}
}

This class is slightly more complicated. First, we define an enum that holds information about what the parser should do if it chooses this configuration. Then, we declare the configuration class itself. This entails specifying which parse rule we're deriving the configuration from, and both which tokens in the right-hand-side of the rule should have been parsed already, and which should still be somewhere in the token stream. Again, I'll explain this in more detail in the next post!

Then, we declare a few utility methods and properties to fetch different parts of the configuration's rule, such as the token to the immediate left and right of the right-hand-side position (which was represented as a dot . in the book I followed), all the tokens before the dot ., and whether a given rule matches this configuration in the basis of everything before the dot ..

Finally, I continue with the trend of overriding the equality checking methods and ToString(), as it makes a world of difference in the code coming up in future blog posts!

Now that we've got a class for configurations, the last one on our list is one for the states themselves. Let's do that now:

public class ParseTableState
{

public int Id {
get {
return ParentTable.FindStateId(this);
}
}

public List<ParseTableConfiguration> Configurations = new List<ParseTableConfiguration>();

public ParseTableState(ParseTable inParentTable)
{
ParentTable = inParentTable;
}

public override string ToString()
{
StringBuilder result = new StringBuilder();
foreach(ParseTableConfiguration nextConfiguration in Configurations)
result.AppendLine($" - {nextConfiguration}"); return result.ToString(); } } Much simpler than the configuration rule class, right? :P As I mentioned earlier, all a state consists of is a list of configurations in that state. In our case, we'll be assigning an id to the states in our parse table, so I include a property here that fetches a state's id from the parent parse table that it's part to make the later code as simple as possible. Still with me? Congratulations! You've got the beginnings of a shift-reduce parser. Next time, we'll expand on some of theory behind the things I've touched on in this post, and possibly look at building the start of the recursive parse table builder itself. Found this interesting? Confused about something? Comment below! ## Securing a Linux Server Part 2: SSH Wow, it's been a while since I posted something in this series! Last time, I took a look at the Uncomplicated Firewall, and how you can use it to control the traffic coming in (and going out) of your server. This time, I'm going to take a look at steps you can take to secure another vitally important part of most servers: SSH. Used by servers and their administrators across the world to talk to one another, if someone manages to get in who isn't supposed to, they could do all kinds of damage! The first, and easiest thing we can do it improve security is to prevent the root user logging in. If you haven't done so already, you should create a new user on your server, set a good password, and give it superuser privileges. Login with the new user account, and then edit /etc/ssh/sshd_config, finding the line that says something like PermitRootLogin yes ....and change it to PermitRootLogin no Once done, restart the ssh server. Your config might be slightly different (e.g. it might be PermitRootLogin without-password) - but the principle is the same. This adds an extra barrier to getting into your server, as now attackers must not only guess your password, but your username as well (some won't even bother, and keep trying to login to the root account :P). Next, we can move SSH to a non-standard port. Some might argue that this isn't a good security measure to take and that it doesn't actually make your server more secure, but I find that it's still a good measure to take for 2 reasons: defence in depth, and preventing excessive CPU load from all the dumb bots that try to get in on the default port. With that, it's make another modification to /etc/ssh/sshd_config. Make sure you test at every step you take, as if you lock yourself out, you'll have a hard time getting back in again.... Port 22 Change 22 in the above to any other number between about 1 and 65535. Next, make sure you've allowed the new port through your firewall! If you're using ufw, my previous post (link above) gives a helpful guide on how to do this. Once done, restart your SSH server again - and try logging in before you close your current session. That way if you make a mistake, you can fix through your existing session. Once you're confident that you've got it right, you can close port 22 on your firewall. So we've created a new user account with a secure password (tip: use a password manager if you have trouble remembering it :-)), disabled root login, and moved the ssh port to another port number that's out of the way. Is there anything else we can do? Turns out there is. Passwords are not the only we can authenticate against an SSH server. Public private keypairs can be used too - and are much more secure - and convenient - than passwords if used correctly. You can generate your own public-private keypair like so: ssh-keygen -t ed25519 It will ask you a few questions, such as a password to encrypt the private key on disk, and where to save it. Once done, we need to tell ssh to use the new public-private keypair. This is fairly easy to do, actually (though it took me a while to figure out how!). Simply edit ~/.ssh/config (or create it if it doesn't exist), and create (or edit) an entry for your ssh server, making it look something like this: Host bobsrockets.com Port {port_name} IdentityFile {path/to/private/keyfile} It's the IdentityFile line that's important. The port line simply makes it such that you can type ssh bobsrockets.com (or whatever your server is called) and it will figure out the port number for you. With a public-private keypair now in use, there's just one step left: disable password-based logins. I'd recommend trailing it for a while to make sure you haven't messed anything up - because once you disable it, if you lose your private key, you won't be getting back in again any time soon! Again, open /etc/ssh/sshd_config for editing. Find the line that starts with PasswordAuthentication, and comment it out with a hash symbol (#), if it isn't already. Directly below that line, add PasswordAuthentication no. Once done, restart ssh for a final time, and check it works. If it does, congratulations! You've successfully secured your SSH server (to the best of my knowledge, of course). Got a tip I haven't covered here? Found a mistake? Let me know in a comment below! ## Compilers 101: Build your own flex + bison compiler in a few easy(?) steps So you want to build your own compiler? Great! Don't know where to start? This guide should help! At University, we're building our own compiler for a custom programming language invented by our lecturer with a pair of GNU tools by the name of flex and bison - which I've blogged about before. Since that post, I've learnt a ton about how the whole process works, so I thought I'd write up a more coherent blog post on the subject :P ### Stage 1: Planning The whole process starts with railroad diagrams (also known as flowcharts) of the language you want to write a compiler for. Having an accurate set of railroad diagrams is essential to understanding precisely how the language is put together, which is rather useful for the next step. Converting the railroad diagrams into plain BNF (Backus Naur Form). Unfortunately, bison doesn't support EBNF-like notation at the current time, so only plain-old BNF will do. ### Stage 2: Lexing With your railroad diagrams converted into BNF, you can start writing code! The first chunk of code that needs writing is the lexer. Lexing is what flex is good at - and involves converting the input source code into lexemes - discrete sequences of characters that match a particular pattern and can be assigned a particular category name, turning it into a token. Perhaps an example would help. Consider the following: void do_awesome_stuff(int a, string b) { /* Code here */ } The above can be turned into a sequence of tokens, not unlike the following (ignoring whitespace tokens, of course): TYPE: void IDENTIFIER: do_awesome_stuff OPEN_BRACKET: ( TYPE: int IDENTIFIER: a, COMMA: , TYPE: string IDENTIFIER: b CLOSE_BRACKET: ) OPEN_BRACE: { COMMENT: /* Code here */* CLOSE_BRACE: } See? We can identify 8 token types in the source string: TYPE, IDENTIFIER, COMMA, OPEN_BRACKET, CLOSE_BRACKET, OPEN_BRACE, COMMENT, and CLOSE_BRACE. These types and the rules to match them can be found by analysing a combination of the railroad diagrams and the BNF you created earlier. ### Stage 3: Parser the first With a lexer in hand, we can now look at writing the parser. This is done in two stages. The parser itself, and upgrading said parser to generate a parse tree. Let's talk about the parser first. The parser can be largely created simply by running a few regular-expression find and replace rules on your BNF, actually. From there, it's just a case of adding the header and the footer to complete the document. Let's take a look at some example BNF: <instructions> ::= START <lines> END <lines> ::= <lines> <line> | <line> <line> ::= <command> <command> ::= <cmd_name> <number> <cmd_name> ::= FD | BK | LT | RT <number> ::= <number> <digit> | <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 The above matches something like the following: START FD 100 RT 180 FD 125 LT 90 BK 50 END Very interesting (a virtual cookie is available for anyone who gets the reference as to what this grammar represents!). Let's look at converting that into something bison can understand: instructions : START lines END ; lines : lines line | line ; line : command ; command : cmd_name number ; cmd_name : FD | BK | LT | RT ; number : number digit | digit ; digit : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 That's looking much better already! Simply by using the regular expression substitutions: 1. <([a-z_]+)> -> $1
2. ::= -> :
3. \n\n -> \n\t;\n\n

....we can get most of the way there to something that bison can understand. Next, we need to refactor it a bit to tell it which tokens are coming from the lexer (which I'll leave up to you to write as an exercise), so it doesn't get them confused with the rules - which are defined in the BNF(-like) rules.

Let's get rid of the rule for number and digit first, since we can do those in the lexer quite easily. Next, let's add a %token definition to the top to tell it which are coming from the lexer. It's good practice to define everything that comes from the lexer in uppercase, and everything that's a rule that exists only in bison in lowercase:

%start instructions
%token FD BK LT RT NUMBER START END

We've also defined a start symbol - the one which when bison reaches it, it knows that it's completed the parsing process, as bison is a bottom-up parser.

Lastly, we need to reference the lexer itself. Thankfully that's easy too by appending to your new bison file:

%%

#include "lex.yy.c"


Very nice. Don't forget about the new line at the end of the file - flex and bison will complain if it isn't present! Here's the completed bison file:

%start instructions
%token FD BK LT RT NUMBER START END

instructions : START lines END
;

lines : lines line
| line
;

line : command
;

command : cmd_name NUMBER
;

cmd_name : FD
| BK
| LT
| RT
;

%%

#include "lex.yy.c"


With a brand-new bison file completed, there's just one component of the parser left - a plain-old C file that calls it. Let's create one of those quickly:


#include

int yyparse(void);

int main(void)
{
# if YYDEBUG == 1
extern int yydebug;
yydebug = 1;
#endif

return yyparse();
}

void yyerror(char *error_message)
{
fprintf(stderr, "Error: %s\nExiting\n", error_message);
}


The highlighted lines enable a special debugging mode built-in to bison if the standard compile-time symbol YYDEBUG is specified - and bison is run with a few special parameters. Here's the sequence of commands needed to compile this:

flex lexer.l
bison -tv parser.y
gcc -Wall -Wextra -g parser.tab.c main.c -lfl -ly -DYYDEBUG -D_XOPEN_SOURCE=700

The gcc command is probably a bit long-winded, but it does several useful things for us:

• Shows additional warnings just in case we've made a mistake that might be an issue later (-Wall -Wextra)
• Include additional debugging information in the output file to allow debugging with gdb (the GNU Debugger) if necessary (-g)
• Fix strange errors on some systems (-D_XOPEN_SOURCE=700)

If you're on a Windows system, you may need to remove the -ly - which appears to be required on the Linux machines I use - it tells gcc that we'll be referencing the bison library.

### Stage 4: Parser again

Congratulations on getting this far! You've now got a lexer and a parser - so it's time to put them to use. This is done by utilising the parser to build a parse tree - a tree of nodes that represent the input. Here's an example tree:

As you can see, each high-level node references one or more lower-level nodes, and the structure of the tree represents the first 2 lines of the example input above. The nodes in yellow are lexical tokens that come directly from flex - these are called terminals, or leaf nodes. The ones in purple come from the bison rules (which we derived from the BNF we wrote at the beginning of this post) - and are non-terminals, or tree nodes.

With this in mind, let's introduce another feature or two of bison. Firstly, let's take a look at revising that %token declaration we created above:

%token FD BK LT RT START END
%token<val_num> NUMBER

The important bit here is the <val_num>. Here we tell bison that a value should be attached to the token NUMBER - and that it will be of type int. After telling bison that it should expect a value, we need to give it a place to put it. Let's write some more code to go just below the %token declarations:

%union {
int val_num;
}

There we go! Excellent - we've got a place to put it. Now we just need to alter the lexer to convert the token value to an int and put it there. That's not too tough, thankfully - but if you're having trouble with it, here's a hint:

{number}        { yylval.val_num = atoi(yytext); return(NUMBER); }

Now we have it passing numbers correctly, let's look briefly at generating that parse tree. I'm not going to give the game away - just a few helpful hints as to what you need to do here - otherwise it's not as fun :P

Generating the parse tree can be considered the both the most challenging part of the experience (especially if you don't know what you're doing) and the easiest to deal with at same time. Knowing your stuff and your end goal before you start makes the whole process a lot easier.

The first major step is to create a struct that can represent a type of node in your parse tree. It might be useful to store several properties here - such as the node type (An enum will come in handy here), a spot for the value of a lexical token (or a reference to it in a symbol table if you have one), and references to child nodes in the parse tree.

The second major step of the process is to create a utility method that creates a new node of the tree on the heap, and then revise the bison file to get each rule to create new nodes on the tree in such a way that it creates a parse tree when it reaches start node (or top node of the tree - which, in the case of the above, is instructions). For the purposes of this post, I'll be using a method with this prototype:

TreeNode create_node(int item, int node_type, TreeNode left, TreeNode right);

Your tree node struct and subsequent creation method may vary. With this in hand, we can revise the bison rules we created above to create these nodes we've been talking about. Here's a quick pointer on how to revise the rule for command above:

command : cmd_name NUMBER   { $$= create_node(2, NODE_COMMAND, 1, NULL); } ; This might look a bit strange, but let's break it down. The bit in curly braces is some (almost) plain C code that creates the node and returns a pointer to it to bison. The $$ is the return value for that node - which, I might add reminds me of something I forgot above. We need to tell bison about our new tree node data type and which rules should return it:

%type<val_tnode> instructions lines line command cmd_name

/* And in %union { ... } ..... */
TreeNode val_tnode;

This is almost the same as the %token<val_num> we did before, but we're defining the return value of a rule this type - not a token. With that little interlude out of the way, let's return to the code above. $1 and $2 refer to the first and second items in the rule definition respectively - and hold the type that we defined above in the %token and %type directives. Since bison is a bottom-up parser - this means that by the time this code executes, all it's child nodes have (hopefully) been created - and we just have to tie them all up together with a new node. In the case of my little example above, $1 is of type TreeNode, and $2 is of type int (that is if I didn't make any mistakes further up!).

### Stage 5: Blasting off to code generation and beyond

Phew! That's a lot of work. If you've read this far, thank you and well done! It's been a long journey for both you the reader and me the writer, but you're almost done.

While conceptually simple when broken down, the whole process actually gets rather involved - especially when writing the BNF and the parser (the latter of which can be a particular pain due to shift/reduce and reduce/reduce errors), and the amount of code and head scratching you've got to do to get to this point is enormous. My best advice is to take it slow and don't rush - you'll only cause most problems for yourself if you try and jump the gun. Make sure that each stage works as you intend before you continue - back-pedalling to fix an issue can be particularly annoying as it can be bothersome to work out which stage the bug is actually in.

The last step of the whole process is to actually do something with the parse tree we've worked so hard to create. Thankfully, that's not too difficult - as we can put some additional code in the { } block of the starting symbol to call methods that will do things like perform some optimisation, print the tree to the console - or generate some sweet code. While the actual generation of code is beyond the scope of this article, I may end up posting about some optimisation techniques you can use on a parse tree after I've finished fiddling with float handling, symbol tables, and initial code generation in my ACW (Assessed Course Work).

Found this useful? Found a bug in this post? Got a suggestion? Comment below! Since I don't have any real analytics on this blog besides the server logs, I've no idea if you've read it really unless you comment :P

## TeleConsole Client is available on NuGet!

(Above: Some cool-looking circuits that feature on the NuGet website)

Hey! After a large amount of research, I've finally figured out how to create a simple NuGet package. Since I ended up using TeleConsole in a recent ACW and it doesn't have any dependencies (making packaging easier!), I decided to use it to test the system.

I think it's turned out rather well actually - you can find it on NuGet here.

Since it's been such a complicated process, rather than talking about TeleConsole itself, I'd like to give a sort-of tutorial on how I did it instead (if you'd like to read more about TeleConsole, I posted about it when I first released it here).

To start with, you'll need to install NuGet. Once done, the next step is to create a .nuspec file for your project. It goes in the same directory as the .csproj file for the project you want to publish on NuGet. I haven't yet figured out how to reference another project in the same solution and have it work with NuGet, but I should imagine it's mostly automatic. Here's the .nuspec file for TeleConsole:

<?xml version="1.0"?>
<package>
<id>TeleConsoleClient</id>
<version>0.3</version>
<title>$title$</title>
<authors>Starbeamrainbowlabs</authors>
<owners>Starbeamrainbowlabs</owners>
<projectUrl>https://github.com/sbrl/TeleConsole/</projectUrl>
<iconUrl>https://github.com/sbrl/TeleConsole/blob/master/logo.png?raw=true</iconUrl>
<description>$description$</description>
<releaseNotes>Initial nuget release.</releaseNotes>
<tags>Debugging Networking Console Remote</tags>
</package>

As you can see, it's actually a fairly simple format - based on XML of course, since C♯ seems to love it for some reason. The bits in $ signs are special - they are references to the corresponding values in the .csproj file. You can do it for <version> too - but I was experiencing issues with it not picking this up correctly, so I'm specifying it manually. More information about this can be found on the Microsoft website - links are available at the bottom of this post. With that done, the next step is to package it into a .nupkg file 0 which is basically just a renamed .zip file with a specific structure. The nuget command-line application has the ability to do this for us - but doesn't work on Linux without an extra argument (thanks to @ArtOfSettling on GitHub for discovering this!): nuget pack -MsbuildPath /usr/lib/mono/msbuild/15.0/bin/ ...Windows users don't need to include the -MsbuildPath /usr/lib/mono/msbuild/15.0/bin/ bit. This command will output a .nupkg file, which you can then upload to nuget.org here, once you're signed in. ### Sources and Further Reading ## Deep dive: Email, Trust, DKIM, SPF, and more (Above: Lots of parcels. Hopefully you won't get this many through the door at once..... Source) Now that I'm on holiday, I've got some time to write a few blog posts! As I've promised a few people a post on the email system, that's what I'll look at this this post. I'm going to take you on a deep dive through the email system and trust. We'll be journeying though the fields of DKIM signatures, and climb the SPF mountain. We'll also investigate why the internet needs to take this journey in the first place, and look at some of the challenges one faces when setting up their own mail server. Hang on to your hats, ladies and gentlemen! If you get to the end, give yourself a virtual cookie :D Before we start though, I'd like to mention that I'll be coming at this from the perspective of my own email server that I set up myself. Let me introduce to you the cast: Postfix (the SMTP MTA), Dovecot (the IMAP MDA), rspamd (the spam filter), and OpenDKIM (the thing that deals with DKIM signatures). With that out of the way, let's begin! We'll start of our journey by mapping out the journey a typical email undertakes. Let's say Bob Kerman wants to send Bill an email. Here's what happens: 1. Bill writes the email and hits send. His email client connects to his email server, logs in, and asks the server to deliver a message for him. 2. The server takes the email and reads the From header (in this case it's bill@billsboosters.com), figures out where the mail server is located, connects to it, and asks it to deliver Bob's message to Bill. mail.billsboosters.com takes the email and files it in Bill's inbox. 3. Bill connects to his mail server and retrieves Bob's message. Of course, this is simplified in several places. mail.bobsrockets.com will obviously need to do a few DNS lookups to find billsboosters.com's mail server and fiddle with the headers of Bob's message a bit (such as adding a Received header etc.), and smtp.billsboosters.com won't just accept the message for delivery without checking out the server it came from first. How does it check though? What's preventing seanssatellites.net pretending to be bobsrockets.com and sending an imposter? Until relatively recently, the answer was, well, nothing really. Anyone could send an email to anyone else without having to prove that they could indeed send email in the name of a domain. Try it out for yourself by telnetting to a mail server on port 25 (unencrypted SMTP) and trying in something like this: HELO mail.bobsrockets.com MAIL From: <frank@franksfuel.io> RCPT TO <bill@billsboosters.com> DATA From: sean@seanssatellites.net To: bill@billsboosters.com Hello! This is a email to remind you..... . QUIT Oh, my! Frank at franksfuel.io can connect to any mail server and pretend that sean@seanssatellites.net is sending a message to bill@billsboosters.com! Mail servers that allow this are called open relays, and today they usually find themselves on several blacklists within minutes. Ploys like these are easy to foil, thankfully (by only accepting mail for your own domains), but it still leaves the problem of what to do about random people connecting to your mail server delivering spam to your inbox that claims to be from someone they aren't supposed to be sending mail for. In response, some mail servers demanded things like the IP that connects to send an email must reverse to the domain name that they want to send email from. Clever, but when you remember that anyone can change their own PTR records, you realise that it's just a minor annoyance to the determined spammer, and another hurdle to the legitimate person in setting up their own mail server! Clearly, a better solution is needed. Time to introduce our first destination: SPF. SPF stands for sender policy framework, and defines a mechanism by which a mail server can determine which IP addresses a domain allows mail to be sent from in it's name. It's a TXT record that sites at the root of a domain. It looks something like this: v=spf1 a mx ptr ip4:5.196.73.75 ip6:2001:41d0:e:74b::1 a:starbeamrainbowlabs.com a:mail.starbeamrainbowlabs.com -all The above is my SPF TXT record for starbeamrainbowlabs.com. It's quite simple, really - let's break it down. v=spf1 This just defines the version of the SPF standard. There's only one version so far, so we include this to state that this record is an SPF version 1 record. a mx ptr This says that the domain that the sender claims to be from must have an a and an mx record that matches the IP address that's sending the email. It also says that the ptr record associated with the sender's IP must resolve to the domain the sender claims to be sending from, as described above (it does help with dealing with infected machines and such). ip4:5.196.73.75 ip6:2001:41d0:e:74b::1 This bit says that the IP addresses 5.196.73.75 and 2001:41d0:e:74d::1 are explicitly allowed to send mail in the name of starbeamrainbowlabs.com. a:starbeamrainbowlabs.com a:mail.starbeamrainbowlabs.com After all of the above, this bit isn't strictly necessary, but it says that all the IP addresses found in the a records for starbeamrainbowlabs.com and mail.starbeamrainbowlabs.com are allowed to send mail in the name of starbeamrainbowlabs.com. -all Lastly, this says that if you're not on the list, then your message should be rejected! Other variants on this include ~all (which says "put it in the spam box instead"), and +all (which says "accept it anyway", though I can't see how that's useful :P). As you can see, SPF allows a mail server to verify if a given client is indeed allowed to send an email in the name of any particular domain name. For a while, this worked a treat - until a new problem arose. Many of the mail servers on the internet don't (and probably still don't!) support encryption when connecting to and delivering mail, as certificates were expensive and difficult to get hold of (nowadays we've got LetsEncrypt who give out certificates for free!). The encryption used when mail servers connect to one another is practically identical to that used in HTTPS - so if done correctly, the identity of the remote server can be verified and the emails exchanged encrypted, if the world's certification authorities aren't corrupted, of course. Since most emails weren't encrypted when in transit, a new problem arose: man-in-the-middle attacks, whereby an email is altered by one or more servers in the delivery chain. Thinking about it - this could still happen today even with encryption, if any one server along an email's route is compromised. To this end, another mechanism was desperately needed - one that would allow the receiving mail server to verify that an email's content / headers hadn't been surreptitiously altered since it left the origin mail server - potentially preventing awkward misunderstandings. Enter stage left: DKIM! DKIM stands for Domain Keys Identified Mail - which, in short, means that it provides a method by which a receiving mail server can cryptographically prove that a message hasn't been altered during transit. It works by having a public-private keypair, in which the public key can only decrypt things, but the private key is capable of encrypting things. A hash of the email's headers / content is computed and encrypted with the private key. Then the encrypted hash is attached to the email in the DKIM-Signature header. The receiving mail server does a DNS lookup to find the public key, and decrypts the hash. It then computes it's own hash of the email headers / content, and compares it against the decrypted hash. If it matches, then the email hasn't been fiddled with along the way! Of course, not all the headers in the email are hashed - only a specific subset are included in the hash, since some headers (like Received and X-Spam-Result) are added and altered during transit. If you're interested in implementing DKIM yourself - DigitalOcean have a smashing tutorial on the subject, which should adapt easily to whatever system you're running yourself. With both of those in place, billsboosters.com's mail server can now verify that mail.bobsrockets.com is allowed to send the email on behalf of bobsrockets.com, and that the message content hasn't been tampered with since it left mail.bobsrockets.com. mail.billsboosters.com can also catch franksfuel.io in the act of trying to deliver spam from seanssatellites.net! There is, however, one last piece of the puzzle left to reveal. With all this in place, how do you know if your mail was actually delivered? Is it possible to roll SPF and DKIM out gradually so that you can be sure you've done it correctly? This can be a particular issue for businesses and larger email server setups. This is where DMARC comes in. It's a standard that lets you specify an email address you'd like to receive DMARC reports at, which contain statistics as to how many messages receiving mail servers got that claimed to be from you, and what they did with them. It also lets you specify what percentage of messages should be subject to DMARC filtering, so you can roll everything out slowly. Finally, it lets you specify what should happen to messages that fail either SPF, DKIM, or both - whether they should be allowed anyway (for testing purposes), quarantined, or rejected. DMARC policies get specified (yep, you guessed it!) in a DNS record. unlike SPF though, they go in _dmarc.megsmicroprocessors.org as a TXT record, substituting megsmicroprocessors.org for your domain name. Here's an example: v=DMARC1; p=none; rua=mailto:dmarc@megsmicroprocessors.org This is just a simple example - you can get much more complex ones than this! Let's go through it step by step. v=DMARC1; Nothing to see here - just a version number as in SPF. p=none; This is the policy of what should happen to messages that fail. In this example we've used none, so messages that fail will still pass right on through. You can set it to quarantine or even reject as you gain confidence in your setup. rua=mailto:dmarc@megsmicroprocessors.org This specifies where you want DMARC reports to be sent. Each mail server that receives mail from your mail server will bundle up statistics and send them once a day to this address. The format is in XML (which won't be particularly easy to read), but there are free DMARC record parsers out there on the internet that you can use to decode the reports, like dmarcian. That completes the puzzle. If you're still reading, then congratulations! Post in the comments and say hi :D We've climbed the SPF mountain and discovered how email servers validate who is allowed to send mail in the name of another domain. We've visited the DKIM signature fields and seen how the content of email can be checked to see if it's been altered during transit. Lastly, we took a stroll down DMARC lane to see how it's possible to be sure what other servers are doing with your mail, and how a large email server setup can implement DMARC, DKIM, and SPF more easily. Of course, I'm not perfect - if there's something I've missed or got wrong, please let me know! I'll try to correct it as soon as possible. Lastly, this is, as always, a starting point - not an ending point. An introduction if you will - it's up to you to research each technology more thoroughly - especially if you're thinking of implementing them yourself. I'll leave my sources at the bottom of this post if you'd like somewhere to start looking :-) ### Sources and Further Reading ## Semi-automated backups with duplicity and an external drive (Above: A bunch of hard drives. The original can be found here.) Since I've recently got myself a raspberry pi to act as a server, I naturally needed a way to back it up. Not seeing anything completely to my tastes, I ended up putting something together that did the job for me. For this I used an external hard drive, duplicity, sendxmpp (sudo apt install sendxmpp), and a bit of bash. Since it's gone rather well for me so far, I thought I'd write a blog post on how I did it. It still needs some tidying up, of course - but it works in it's current state, and perhaps it will help someone else put together their own system! ### Step 1: Configuring the XMPP server I use XMPP as my primary instant messaging server, so it's only natural that I'd want to integrate the system in with it to remind me when to plug in the external drive, and so that it can tell me when it's done and what happened. Since I use prosody as my XMPP server, I can execute the following on the server: sudo prosodyctl adduser rasperrypi@bobsrockets.com ...and then enter a random password for the new account. From there, I set up a new private persistent multi-user chatroom for the messages to filter into, and set my client to always notify when a message is posted. After that, it was a case of creating a new config file in a format that sendxmpp will understand: rasperrypi@bobsrockets.com:5222 thesecurepassword ### Step 2: Finding the id of the drive partition With the XMPP side of things configured, next I needed a way to detect if the drie was plugged in or not. Thankfully all partitions have a unique id built-in, which you can use to see if it's plugged in or not. It's easy to find, too: sudo blkid The above will list all available partitions and their UUID - the unique id I mentioned. With that in hand, we can now check if it's plugged in or not with a cleverly crafted use of the readlink command: readlink /dev/disk/by-uuid/${partition_uuid} 1>/dev/null 2>&2;
partition_found=$? if [[ "${partition_found}" -eq "0" ]]; then
echo "It's plugged in!";
else
echo "It's not plugged in :-(";
fi

Simple, right? readlink has an exit code of 0 if it managed to read the symbolik link in /dev/disk/by-uuid ok, and 1 if it didn't. The symbolic links in /deve/disk/by-uuid are helpfuly created automatically for us :D From here, we can take it a step further to wait until the drive is plugged in:

# Wait until the drive is available
while true
do
readlink "${partition_uuid}"; if [[ "$?" -eq 0 ]]; then
break
fi

sleep 1;
done

### Step 3: Mounting and unmounting the drive

Raspberry Pis don't mount drive automatically, so we'll have do that ourselves. Thankfully, it's not so tough:

# Create the fodler to mount the drive into
mkdir -p ${backup_drive_mount_point}; # Mount it in read-write mode mount "/dev/disk/by-uuid/${partition_uuid}" "${backup_drive_mount_point}" -o rw; # Do backup thingy here # Sync changes to disk sync # Unmount the drive umount "${backup_drive_mount_point}";

Make sure you've got the ntfs-3g package installed if you want to back up to an NTFS volume (Raspberry Pis don't come with it by default!).

### Step 4: Backup all teh things!

There are more steps involved in getting to this point than I thought there were, but if you've made it this far, than congrats! Have a virtual cookie :D 🍪

The next part is what you probably came here for: duplicity itself. I've had an interesting time getting this to work so far, actually. It's probably easier if I show you the duplicity commands I came up with first.

# Create the archive & temporary directories
mkdir -p /mnt/data_drive/.duplicity/{archives,tmp}/{os,data_drive}
# Do a new backup
PASSPHRASE=${encryption_password} duplicity --full-if-older-than 2M --archive-dir /mnt/data_drive/.duplicity/archives/os --tempdir /mnt/data_drive/.duplicity/tmp/os --exclude /proc --exclude /sys --exclude /tmp --exclude /dev --exclude /mnt --exclude /var/cache --exclude /var/tmp --exclude /var/backups / file://${backup_drive_mount_point}/duplicity-backups/os/
PASSPHRASE=${data_drive_encryption_password} duplicity --full-if-older-than 2M --archive-dir /mnt/data_drive/.duplicity/archives/data_drive --tempdir /mnt/data_drive/.duplicity/tmp/data_drive /mnt/data_drive --exclude '**.duplicity/**' file://${backup_drive_mount_point}/duplicity-backups/data_drive/

# Remove old backups
PASSPHRASE=${encryption_password} duplicity remove-older-than 6M --force --archive-dir /mnt/data_drive/.duplicity/archives/os file:///${backup_drive_mount_point}/duplicity-backups/os/
PASSPHRASE=${data_drive_encryption_password} duplicity remove-older-than 6M --force --archive-dir /mnt/data_drive/.duplicity/archives/data_drive file:///${backup_drive_mount_point}/duplicity-backups/data_drive/

Path names have been altered for privacy reasons. The first duplicity command in the above was fairly straight forward - backup everything, except a few folders with cache files / temporary / weird stuff in them (like /proc).

I ended up having to specify the archive and temporary directories here to be on another disk because the Raspberry Pi I'm running this on has a rather... limited capacity on it's internal micro SD card, so the default location for both isn't a good idea.

The second duplicity call is a little more complicated. It backs up the data disk I have attached to my Raspberry Pi to the external drive I've got plugged in that we're backing up to. The awkward bit comes when you realise that the archive and temporary directories are located on this same data-disk that we're trying to back up. To this end, I eventually found (through lots of fiddling) that you can exclude a folder duplicity via the --exclude '**.duplicity/**' syntax. I've no idea why it's different when you're not backing up the root of the filesystem, but it is (--exclude ./.duplicity/ didn't work, and neither did /mnt/data_drive/.duplicity/).

The final two duplicity calls just clean up and remove old backups that are older than 6 months, so that the drive doesn't fill up too much :-)

### Step 5: What? Where? Who?

We've almost got every piece of the puzzle, but there's still one left: letting us know what's going on! This is a piece of cake in comparison to the above:

function xmpp_notify {
echo $1 | sendxmpp --file "${xmpp_config_file}" --resource "${xmpp_resource}" --tls --chatroom "${xmpp_target_chatroom}"
}

Easy! All we have to do is point sendxmpp at our config file we created waaay in step #1, and tell it where the chatroom is that we'd like it to post messages in. With that, we can put all the pieces of the puzzle together:

#!/usr/bin/env bash

source .backup-settings

function xmpp_notify {
echo $1 | sendxmpp --file "${xmpp_config_file}" --resource "${xmpp_resource}" --tls --chatroom "${xmpp_target_chatroom}"
}

xmpp_notify "Waiting for the backup disk to be plugged in.";

# Wait until the drive is available
while true
do
readlink "${backup_drive_dev}"; if [[ "$?" -eq 0 ]]; then
break
fi

sleep 1;
done

xmpp_notify "Backup disk detected - mounting";

mkdir -p ${backup_drive_mount_point}; mount "${backup_drive_dev}" "${backup_drive_mount_point}" -o rw xmpp_notify "Mounting complete - performing backup"; # Create the archive & temporary directories mkdir -p /mnt/data_drive/.duplicity/{archives,tmp}/{os,data_drive} echo '--- Root Filesystem ---' >/tmp/backup-status.txt # Create the archive & temporary directories mkdir -p /mnt/data_drive/.duplicity/{archives,tmp}/{os,data_drive} # Do a new backup PASSPHRASE=${encryption_password} duplicity --full-if-older-than 2M --archive-dir /mnt/data_drive/.duplicity/archives/os --tempdir /mnt/data_drive/.duplicity/tmp/os --exclude /proc --exclude /sys --exclude /tmp --exclude /dev --exclude /mnt --exclude /var/cache --exclude /var/tmp --exclude /var/backups / file://${backup_drive_mount_point}/duplicity-backups/os/ 2>&1 >>/tmp/backup-status.txt echo '--- Data Disk ---' >>/tmp/backup-status.txt PASSPHRASE=${data_drive_encryption_password} duplicity --full-if-older-than 2M --archive-dir /mnt/data_drive/.duplicity/archives/data_drive --tempdir /mnt/data_drive/.duplicity/tmp/data_drive /mnt/data_drive --exclude '**.duplicity/**' file://${backup_drive_mount_point}/duplicity-backups/data_drive/ 2>&1 >>/tmp/backup-status.txt xmpp_notify "Backup complete!" cat /tmp/backup-status.txt | sendxmpp --file "${xmpp_config_file}" --resource "${xmpp_resource}" --tls --chatroom "${xmpp_target_chatroom}"
rm /tmp/backup-status.txt

xmpp_notify "Performing cleanup."

PASSPHRASE=${encryption_password} duplicity remove-older-than 6M --force --archive-dir /mnt/data_drive/.duplicity/archives/os file:///${backup_drive_mount_point}/duplicity-backups/os/
PASSPHRASE=${data_drive_encryption_password} duplicity remove-older-than 6M --force --archive-dir /mnt/data_drive/.duplicity/archives/data_drive file:///${backup_drive_mount_point}/duplicity-backups/data_drive/

sync;
umount "\${backup_drive_mount_point}";

xmpp_notify "Done! Backup completed. You can now remove the backup disk."

I've tweaked a few of the pieces to get them to work better together, and created a separate .backup-settings file to store all the settings in.

That completes my backup script! Found this useful? Got an improvement? Use a different strategy? Post a comment below!

## An (unscientific) Introduction to I2C

I've recently bought an LCD display for a project. Since I don't have many pins to play with, I ended up buying an I2C-driven display to cut the data pins down to just 2: One for outgoing messages, and one for receiving incoming messages from other devices.

It's taken me some time to get to grips with the idea of I2C, so I thought I'd write up what I've learnt so far here, along with some helpful hints if you run into problems yourself.

In effect, I2C is a wire protocol that allows multiple devices to talk to each other over a single pair of cables. Every I2C device has an 8 bit hardware address burned into it that it uses to address itself - much like the Internet Protocol when it comes to it, actually. Devices can send messages to one another using these addresses - though not all at the same time, obviously!

If you want to talk directly over I2C with a device, then Wire.h is the library you want to use. Normally though, devices will come with their own library that utilises Wire.h and communicates with it for you.

As a good first test to see if I2C is working, I found an I2C scanner that scans for connected devices. Since the address space is so limited, it doesn't take long at all:

/* --------------------------------------
* i2c_scanner
*
* Version 1
*  This program (or code that looks like it)
*  can be found in many places.
*  For example on the Arduino.cc forum.
*  The original author is not know.
* Version 2, Juni 2012, Using Arduino 1.0.1
*  Adapted to be as simple as possible by Arduino.cc user Krodal
* Version 3, Feb 26  2013
*  V3 by louarnold
* Version 4, March 3, 2013, Using Arduino 1.0.3
*  by Arduino.cc user Krodal.
*  Changes by louarnold removed.
*  Scanning addresses changed from 0...127 to 1...119,
*  according to the i2c scanner by Nick Gammon
*
* Version 5, March 28, 2013
*  As version 4, but address scans now to 127.
*  A sensor seems to use address 120.
* Version 6, November 27, 2015.
*  Added waiting for the Leonardo serial communication.
* This sketch tests the standard 7-bit addresses
* Devices with higher bit address might not be seen properly.
*/

#include <Wire.h>

void setup()
{
Wire.begin();

Serial.begin(9600);
while (!Serial);             // Leonardo: wait for serial monitor
Serial.println("\nI2C Scanner");
}

void loop()
{
int nDevices;

Serial.println("Scanning...");

nDevices = 0;
{
// The i2c_scanner uses the return value of
// the Write.endTransmission to see if
// a device did acknowledge to the address.
error = Wire.endTransmission();

if (error == 0)
{
Serial.print("I2C device found at address 0x");
Serial.print("0");
Serial.println("  !");

nDevices++;
}
else if (error==4)
{
Serial.print("0");
}
}
if (nDevices == 0)
Serial.println("No I2C devices found\n");
else
Serial.println("done\n");

delay(5000);           // wait 5 seconds for next scan
}

As the initial comment mentions, I can't claim ownership of this code! I got it from here.

With the code in mind, it's time to look at the circuit design.

(Above: A simple I2C circuit. Credits go to openclipart.org for the images.)

The above connects an Arduino Uno version 3 with a simple LCD display via a breadboard to allow for expansion to connect future devices. The power (the red and blue cables) link the 5V and GND pins from the Arduino to the appropriate pins on the back of the LCD (the image of an LCD I found didn't have the pins showing :P), and the I2C pins (green and yellow) connect the SDA and SCL pins on the Arduino to the LCD display.

With the circuit done, that completes the system! All that remains now is to build something cool with the components we've put together :D

## Access your home linux box from anywhere with SSH tunnels

(Header by GDJ from openclipart.org. Source page)

....and other things! Recently, I bought a Raspberry Pi 3. Now that the rest of the components have arrived, I've got a rather nice little home server that's got a 1 terabyte WD PiDrive attached to it to provide lots of lovely shared storage, which is rather nice.

However, within a few weeks I was faced with a problem. How do I access my new box to configure it from my internship when I'm on lunch? Faced with such a challenge, I did what anyone would, and took to the internet to find a solution.

It didn't take long. A while ago I heard about these things called 'SSH tunnels', which, while not designed for a high throughput, are more than adequate for a low-intensity SSH connection that runs a few kilobytes a second in either direction. After reading this excellent answer by erik on the Unix & Linux StackExchange, I had an understanding of how SSH tunnels work, and was ready to put together a solution. You should go and read that answer if you'd like to understand SSH tunnels too - it explains it much better than I ever could :P

With that knowledge in hand, I went about planning the SSH tunnel. I already have a server a public IP address (it's hosting this website!), so I needed a reverse tunnel to allow me to access a port local to my linux box at home (called elessar - a virtual cookie for anyone who gets the reference!) from starbeamrainbowlabs.com.

Important! Ask yourself whether it's moral and ethical to set up an ssh tunnel before you think about following along with this article! If you find yourself behind a firewall or something similar, then the chances are that it's there for a good reason - and you might get into trouble if you try and circumvent it. I won't be held responsible for any loss or damages of any description caused by the reading of this post.

First job: create a limited account on starbeamrainbowlabs.com for elessar to SSH into. That's easy:

sudo useradd --system ssh-tunnel

Then, with a few quick lines in /etc/ssh/sshd_config:

Match User ssh-tunnel
ForceCommand echo 'This account can only be used for ssh tunnelling.'

....we can prevent the ssh-tunnel user from being abused to gain shell access to the server (let me know if there are any further measures I can put in place here).

Now that I had a user account to ssh in as, I could set up a public / private keypair to authenticate with starbeamrainbowlabs.com, and cook up an SSH command for elessar that would set up the appropriate tunnel. After fiddling around a bit, I came up with this that did the job:

ssh -TN -R30582:localhost:5724 ssh-tunnel@starbeamrainbowlabs.com

Very cool. So with that command executing on elessar, I could ssh into elessar from starbeamrainbowlabs.com! In short, it sets up a tunnel that will make port 30582 on starbeamrainbowlabs.com tunnel through to port 5724 on elessar - the port on elessar that has SSH running on it, without allocating a pseudo-tty to save resources. explainshell.com can, well, explain it in more detail if you're interested.

Having an SSH command that would set up the tunnel is nice, but it's not very useful, since I have to execute it first before I can actually SSH into elessar from afar.

The solution was actually a little bit complicated. First, I wrote a simple systemd service file (systemd is what I have installed, since it's vanilla raspbian - this should be easily adaptable to other systems and setups) to start the SSH tunnel automagically on boot:

[Unit]
Description=SSH tunnel from starbeamrainbowlabs.com to local ssh server.

[Service]
Type=simple
ExecStart=/usr/bin/ssh -TN -R30582:localhost:5724 ssh-tunnel@starbeamrainbowlabs.com

[Install]
WantedBy=network-online.target

I quickly realised that there were a few flaws with this approach. Firstly, it tried to start the SSH connection before my router had connected to the internet, since my router starts faster than the box that initialises the fibre connection to my ISP. Secondly, it fails to retry when the connection dies.

The first problem can be solved relatively easily, by wrapping the ssh command in a clever bit of shell scripting:

/bin/sh -c 'until ping -c1 starbeamrainbowlabs.com &>/dev/null && sleep 5; do :; done && /usr/bin/ssh -TN -R30582:localhost:5724 ssh-tunnel@starbeamrainbowlabs.com

The above tries to ping starbeamrainbowlabs.com every 5 seconds until it succeeds, and only then does it attempt to open the SSH connection. This solves the first problem. To solve the second, we need to look at autossh. Autossh is a small tool that monitors an ssh connection in a variety of configurable ways and restarts the connection if ever dies for whatever reason. You can install it with your favourite package manager:

sudo apt install autossh

Substitute apt with whatever package manager you use on your system. With it installed, we can use a command like this:

autossh -o "UserKnownHostsFile /home/ssh-tunnel/.ssh/known_hosts" -o "IdentityFile /home/ssh-tunnel/.ssh/ssh-tunnel_ed25519" -o "PubkeyAuthentication=yes" -o "PasswordAuthentication=no" -o "ServerAliveInterval 900" -TN -R30582:localhost:5724 -p 7261 ssh-tunnel@starbeamrainbowlabs.com

to automatically start our ssh tunnel, and restart it if anything goes wrong. Note all the extra settings I had to specify here. This is because even though I had many of them specified in ~/.ssh/config for the ssh-tunnel user, because of systemd's weird environment when it starts a service, I found I had to specify everything in the command line with absolute paths (ugh).

Basically, the above tells autossh where the known_hosts file is (important for automation!), that it should only attempt public / private keypair authentication and not password authentication, that it should check the server's still there every 15 minutes, and all the other things we figured out above.

Finally, I combined the solutions I came up with for both problems, which left me with this:

[Unit]
Description=SSH tunnel from starbeamrainbowlabs.com to local ssh server.

[Service]
Type=simple
ExecStart=/bin/sh -c 'until ping -c1 starbeamrainbowlabs.com &>/dev/null && sleep 5; do :; done && /usr/bin/autossh -o "UserKnownHostsFile /home/pi/.ssh/known_hosts" -o "IdentityFile /home/pi/.ssh/ssh-tunnel_ed25519" -o "PubkeyAuthentication=yes" -o "PasswordAuthentication=no" -o "ServerAliveInterval 900" -TN -R30582:localhost:5724 -p 7261 ssh-tunnel@starbeamrainbowlabs.com'

[Install]
WantedBy=network-online.target

Here's a version that utilises the -f parameter of autossh to put the autossh into the background, which eliminates the sh parent process:

[Unit]
Description=SSH tunnel from starbeamrainbowlabs.com to local ssh server.

[Service]
Type=forking
Environment=AUTOSSH_PIDFILE=/var/run/sbrl-ssh-tunnel/ssh-tunnel.pid
PIDFile=/var/run/sbrl-ssh-tunnel/ssh-tunnel.pid
ExecStartPre=/bin/mkdir -p /var/run/sbrl-ssh-tunnel
ExecStartPre=-/bin/chown ssh-tunnel:ssh-tunnel /var/run/sbrl-ssh-tunnel
ExecStart=/bin/sh -c 'until ping -c1 starbeamrainbowlabs.com &>/dev/null && sleep 5; do :; done && /usr/bin/autossh -f -o "UserKnownHostsFile /home/pi/.ssh/known_hosts" -o "IdentityFile /home/pi/.ssh/ssh-tunnel_ed25519" -o "PubkeyAuthentication=yes" -o "PasswordAuthentication=no" -o "ServerAliveInterval 900" -TN -R30582:localhost:5724 -p 7261 ssh-tunnel@starbeamrainbowlabs.com'

[Install]
WantedBy=network-online.target

I ended up further modifying the above to set up an additional tunnel to allow elessar to send emails via the postfix email server that's running on starbeamrainbowlabs.com. Let me know if you'd be interested in a tutorial on this!

## How to set up a WebDav share with Nginx

I've just been setting up a WebDav share on a raspberry pi 3 for my local network (long story), and since it was a bit of a pain to set up (and I had to combine a bunch of different tutorials out there to make mine work), I thought I'd share how I did it here.

I'll assume you have a raspberry pi all set up and up-to-date in headless mode, with a ufw for your firewall (if you need help with this, post in the comments below or check out the Raspberry Pi Stack Exchange). To start with, we need to install the nginx-full package:

sudo apt update
sudo apt install  nginx-full

Note that we need the nginx-full package here, because the nginx-extras or just simply nginx packages don't include the required additional webdav support modules. Next, we need to configure Nginx. Nginx's configuration files live at /etc/nginx/nginx.conf, and in /etc/nginx/conf.d. I did something like this for my nginx.conf:

user www-data;
worker_processes 4;
pid /run/nginx.pid;

events {
worker_connections 768;
# multi_accept on;
}

http {

##
# Basic Settings
##

sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# server_tokens off;

# server_names_hash_bucket_size 64;
# server_name_in_redirect off;

include /etc/nginx/mime.types;
default_type application/octet-stream;

##
# SSL Settings
##

ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
ssl_prefer_server_ciphers on;

##
# Logging Settings
##

access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;

##
# Gzip Settings
##

gzip on;

gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

##
# Virtual Host Configs
##

include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}

Not many changes here. Then, I created a file called 0-webdav.conf in the conf.d directory, and this is what I put in it:

server {
listen 80;
listen [::]:80;

server_name plans.helenshydrogen.be;

auth_basic              realm_name;

dav_methods     PUT DELETE MKCOL COPY MOVE;
dav_ext_methods PROPFIND OPTIONS;
dav_access      user:rw group:rw all:r

client_body_temp_path   /tmp/nginx/client-bodies;
client_max_body_size    0;
create_full_put_path    on;

root /mnt/hydroplans;
}

Now this is where the magic happens. The dav_access directive tells nginx to allow everyone to read, but only logged in users to write to the share. This isn't actually particularly relevant, because of the auth_basic and auth_basic_user_file directives, which tell nginx to require people to login to the share before they are allowed to access it.

It's also important to note that I found that Windows (10, at least), didn't like the basic authentication - even though Ubuntu's Nautilus accepted it just fine - so I had to comment that bit out :-(

If you do still want authentication (hey! May you'll have better luck than I :P), then you'll need to set up the passwords file. Here's how you create it:


echo -n 'helen:' | sudo tee /etc/nginx/.passwords.list
openssl passwd -apr1 | sudo tee -a /etc/nginx/.passwords.list


The above creates a user called helen, and asks you to type a password. If you're adding another user to the file, simply change the first tee to be tee -a to avoid overwriting the first one.

With that all configured, it's time to test the configuration file, and, if we're lucky, restart nginx!


sudo nginx -t
sudo systemctl restart nginx


That's all you should need to do to set up a simple WebDav share. Remember that this is a starting point, and not an ending point - there are a few big holes in the above that you'll need to address, depending on your use case (for example, I haven't included the setup of https / encryption - try letsencrypt for that).

Here are the connection details for the above for a few different clients:

• Ubuntu / Nautilus: (Go to "Other Locations" and paste this into the "Connect to Server" box) dav://plans.helenshydrogen.be/
• Windows: (Go to "Map Network Drive" and paste this in) http://plans.helenshydrogen.be/`

Did this work for you? Have any problems? Got instructions for a WebDav client not listed here? Let me know in the comments!

Art by Mythdael