Archive

Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Automatically rotating log files on Linux

I'm rather busy at the moment with University, but I thought I'd post about Linux's log rotating system, which I've discovered recently. This post is best read as a follow-up to my earlier post, creating a system service with systemd, in which I talk about how to write a systemd service file - and how to send the output of your program to syslog - which will put it in /var/log for you.

Log rotating is the practice of automatically renaming and moving log files around at regular intervals - and keeping only so many log files at once. For example, I might define the following rules:

• Rotate the log files every week
• Keep 10 log files in total
• Compress log files past the 2nd one

This would yield me a set of log files like this, for instance:

dpkg.log
dpkg.log.1
dpkg.log.2.gz
dpkg.log.3.gz
dpkg.log.4.gz
dpkg.log.5.gz
dpkg.log.6.gz
dpkg.log.7.gz
dpkg.log.8.gz
dpkg.log.9.gz
dpkg.log.10.gz

When the logs are next rotated, the last one is deleted and all the rest are renamed sequentially - like 10 in the bed.

Compressing log files is good for saving space, but in order to read them again we have to fiddle about with zcat / gzip.

The log rotating system on Linux is a cron job that runs at regular intervals - it doesn't run as a system service. It's configured by a series of files in /etc/logrotate.d/ - 1 for each service that has log files that want rotating automatically. Here's an example definition file:

/var/log/rhinoreminds/rhinoreminds.log {
rotate 12
weekly
missingok
notifempty
compress
delaycompress
}

Basically you specify the filename first, and then a bunch of directives to tell it what to do inside { }. The above is for RhinoReminds, an XMPP reminder bot I've written, and defines the following:

• Keep 12 log files in the rotation cycle
• Rotate the logs every week
• It's ok if the log file doesn't exist
• Don't rotate the log file if it's empty
• Compress log files on rotation if they aren't already
• .....but delay this by 1 rotation cycle

Very cool! This should produce the following:

/var/log/rhinoreminds/rhinoreminds.log
/var/log/rhinoreminds/rhinoreminds.log.1
/var/log/rhinoreminds/rhinoreminds.log.2.gz
/var/log/rhinoreminds/rhinoreminds.log.3.gz
/var/log/rhinoreminds/rhinoreminds.log.4.gz
/var/log/rhinoreminds/rhinoreminds.log.5.gz
/var/log/rhinoreminds/rhinoreminds.log.6.gz
/var/log/rhinoreminds/rhinoreminds.log.7.gz
/var/log/rhinoreminds/rhinoreminds.log.8.gz
/var/log/rhinoreminds/rhinoreminds.log.9.gz
/var/log/rhinoreminds/rhinoreminds.log.10.gz
/var/log/rhinoreminds/rhinoreminds.log.11.gz
/var/log/rhinoreminds/rhinoreminds.log.12.gz

Setup your very own VPN in 10 minutes flat

Hey! Happy new year :-)

I've been looking to setup a personal VPN for a while, and the other week I discovered a rather brilliant project called PiVPN, which greatly simplifies the process of setting one up - and managing it thereafter.

It's been working rather well so far, so I thought I'd post about it so you can set one up for yourself too. But first though, we should look at the why. Why a VPN? What does it do?

Basically, a VPN let you punch a great big hole in the network that you're connected to and appear as if you're actually on a network elsewhere. The extent to which this is the case varies depending on the purpose, (for example a University or business might setup a VPN that allows members to access internal resources, but doesn't route all traffic through the VPN), but the general principle is the same.

It's best explained with a diagram. Imagine you're at a Café:

Everyone on the Café's WiFi can see the internet traffic you're sending out. If any of it is unencrypted, then they can additionally see the content of said traffic - e.g. emails you send, web pages you load, etc. Even if it's encrypted, statistical analysis can reveal which websites you're visiting and more.

If you don't trust a network that you're connected to, then by utilising a VPN you can create an encrypted tunnel to another location that you do trust:

Then, all that the other users of the Café's WiFi will see is an encrypted stream of packets - all heading for the same destination. All they'll know is roughly how much traffic you're sending and receiving, but not to where.

This is the primary reason that I'd like my own VPN. I trust the network I've got setup in my own house, so it stands to reason that I'd like to setup a VPN server there, and pretend that my devices when I'm out and about are still at home.

In theory, I should be able to access the resources on my home network too when I'm using such a VPN - which is an added bonus. Other reasons do exist for using a VPN, but I won't discuss them here.

In terms of VPN server software, I've done a fair amount of research into the different options available. My main criteria are as follows:

• Fairly easy to install
• Easy to understand what it's doing once installed (transparency)
• Easy to manage

The 2 main technologies I came across were OpenVPN and IPSec. Each has their own strengths & weaknesses. An IPSec VPN is, apparently, more efficient - especially since it executes on the client in kernel-space instead of user-space. It's a lighter protocol, too - leading to less overhead. It's also much more likely to be detected and blocked when travelling through strict firewalls, making me slightly unsure about it.

OpenVPN, on the other hand, executes entirely in user-space on both the client and the server - leading to a slightly greater overhead (especially with the mitigations for the recent Spectre & Meltdown hardware bugs). It does, however, use TLS (though over UDP by default). This characteristic makes it much more likely it'll slip through stricter firewalls. I'm unsure if that's a quality that I'm actually after or not.

Ultimately, it's the ease of management that points the way to my final choice. Looking into it, with both choices there's complex certificate management to be done whenever you want to add a new client to the VPN. For example, with StrongSwan (an open-source IPSec VPN program), you've got to generate a number of certificates with a chain of rather long commands - and the users themselves have passwords stored in plain text in a file!

While I've got no problem with reading and understanding such commands, I do have a problem with rememberability. If I want to add a new client, how easy is that to do? How long would I have to spend re-reading documentation to figure out how to do it?

Sure, I could write a program to manage the configuration files for me, but that would also require maintenance - and probably take much longer than I anticipate to write.

I forget where I found it, but it is for this reason that I ultimately decided to choose PiVPN. It's a set of scripts that sets up and manages one's an OpenVPN installation. To this end, it provides a single command - pivpn - that can be used to add, remove, and list clients and their statistics. With a concise help text, it makes it easy to figure out how to perform common tasks utilising existing terminal skills by conforming to established CLI interface norms.

If you want to install it yourself, then simply do this:

curl -L https://install.pivpn.io | bash

curl -L https://install.pivpn.io | less

Once you're happy that it's not going to do anything malign to your system, proceed with the installation by executing the 1st command. It should guide you through a number of screens. Some important points I ran into:

• It asks you to install and enable unattended-upgrades. You should probably do this, but I ended up skipping this - as I've already got apticron setup and sending me regular emails - as I rather like to babysit the upgrade of packages on the main machines I manage. I might look into unattended-upgrades in the future if I acquire more servers than are comfortable to manage this way.
• Make sure you fully update your system before running the installation. I use this command: sudo apt update && sudo apt-get dist-upgrade && sudo apt-get autoclean && sudo apt-get autoremove
• Changing the port of the VPN isn't a bad idea, since PiVPN will automatically assemble .ovpn configuration files for you. I didn't end up doing this to start with, but I can always change it in the NAT rule I configured on my router later.
• Don't forget to allow OpenVPN through your firewall! For ufw users (like me), then it's something like sudo ufw allow <port_number>/udp.
• Don't forget to setup a NAT rule / port forwarding on your router if said server doesn't have a public IP address (if it's IPv4 it probably doesn't). If you're confused on this point, comment below and I'll blog about it. It's..... a complicated topic.

If you'd like a more in-depth guide to setting up PiVPN, then I can recommend this guide. It's a little bit dated (PiVPN now uses elliptical-curve cryptography by default), but still serves to illustrate the process pretty well.

If you're confused about some of the concepts I've presented here - leave a comment below! I'm happy to explain them in more detail. Who knows - I might end up writing another blog post on the subject....

Backing up to AWS S3 with duplicity

The server that this website runs on backs up automatically to the Simple Storage Service, provided by Amazon Web Services. Such an arrangement is actually fairly cheap - only ~20p/month! I realised recently that although I've blogged about duplicity before (where I discussed using an external hard drive), I never covered how I fully automate the process here on starbeamrainbowlabs.com.

(Above: A bunch of hard drives. The original can be found here.)

It's fairly similar in structure to the way it works backing up to an external hard drive - just with a few different components here and there, as the script that drives this is actually older than the one that backs up to an external hard drive.

To start, we'll need an AWS S3 bucket. I'm not going to cover how to do this here, as the AWS interface keeps changing, and this guide will likely become outdated quickly. Instead, the AWS S3 documentation has an official guide on how to create one. Make sure it's private, as you don't want anyone getting a hold of your backups!

With that done, you should have both an access key and a secret. Note these down in a file called .backup-password in a new directory that will hold the backup script like this:

#!/usr/bin/env bash
AWS_ACCESS_KEY_ID="INSERT_AWS_ACCESS_KEY_HERE";
AWS_SECRET_ACCESS_KEY="INSERT_AWS_SECRET_KEY_HERE";

The PASSPHRASE here should be a long and unintelligible string of random characters, and will encrypt your backups. Note that down somewhere safe too - preferably in your password manager or somewhere else at least as secure.

If you're on Linux, you should also set the permissions on the .backup-password file to ensure nobody gets access to it who shouldn't. Here's how I did it:

sudo chown root:root .backup-password
sudo chmod 0400 .backup-password

This ensures that only the root user is able to read the file - and nobody can write to it. With our secrets generated and safely stored, we can start writing the backup script itself. Let's start by reading in the secrets:

#!/usr/bin/env bash
source /root/.backup-password

I stored my .backup-password file in /root. Next, let's export these values. This enables the subprocesses we invoke to access these environment variables:

export PASSPHRASE;
export AWS_ACCESS_KEY_ID;
export AWS_SECRET_ACCESS_KEY;

Now it's time to do the backup itself! Here's what I do:

duplicity \
--full-if-older-than 2M \
--exclude /proc \
--exclude /sys \
--exclude /tmp \
--exclude /dev \
--exclude /mnt \
--exclude /var/cache \
--exclude /var/tmp \
--exclude /var/backups \
--exclude /srv/www-mail/rainloop/v \
--s3-use-new-style --s3-european-buckets --s3-use-ia \
/ s3://s3-eu-west-1.amazonaws.com/INSERT_BUCKET_NAME_HERE

Compressed version:

duplicity --full-if-older-than 2M --exclude /proc --exclude /sys --exclude /tmp --exclude /dev --exclude /mnt --exclude /var/cache --exclude /var/tmp --exclude /var/backups --exclude /srv/www-mail/rainloop/v --s3-use-new-style --s3-european-buckets --s3-use-ia / s3://s3-eu-west-1.amazonaws.com/INSERT_BUCKET_NAME_HERE

This might look long and complicated, but it's mainly due to the large number of directories that I'm excluding from the backup. The key options here are --full-if-older-than 2M and --s3-use-ia, which specify I want a full backup to be done every 2 months and to use the infrequent access pricing tier to reduce costs.

The other important bit here is to replace INSERT_BUCKET_NAME_HERE with the name of the S3 bucket that you created.

Backing is all very well, but we want to remove old backups too - in order to avoid ridiculous bills (AWS are terrible for this - there's no way that you can set a hard spending limit! O.o). That's fairly easy to do:

duplicity remove-older-than 4M \
--force \
--s3-use-new-style --s3-european-buckets --s3-use-ia \
s3://s3-eu-west-1.amazonaws.com/INSERT_BUCKET_NAME_HERE

Again, don't forget to replace INSERT_BUCKET_NAME_HERE with the name of your S3 bucket. Here, I specify I want all backups older than 4 months (the 4M bit) to be deleted.

It's worth noting here that it may not actually be able to remove backups older than 4 months here, as it can only delete a full backup if there are not incremental backups that depend on it. To this end, you'll need to plan for potentially storing (and being charged for) an extra backup cycle's worth of data. In my case, that's an extra 2 months worth of data.

That's the backup part of the script complete. If you want, you could finish up here and have a fully-working backup script. Personally, I want to know how much data is in my S3 bucket - so that I can get an idea as to how much I'll be charged when the bill comes through - and also so that I can see if anything's going wrong.

Unfortunately, this is a bit fiddly. Basically, we have to utilise the AWS command-line interface to recursively list the entire contents of our S3 bucket in summarising mode in order to get it to tell us what we want to know. Here's how to do that:

aws s3 ls s3://INSERT_BUCKET_BAME_HERE --recursive --human-readable --summarize

Don't forget to replace INSERT_BUCKET_BAME_HERE wiith your bucket's name. The output from this is somewhat verbose, so I ended up writing an awk script to process it and output something nicer. Said awk script looks like this:

/^\s*Total\s+Objects/ { parts[i++] = $3 } /^\s*Total\s+Size/ { parts[i++] =$3; parts[i++] = $4; } END { print( "AWS S3 Bucket Status:", parts[0], "objects, totalling " parts[1], parts[2] ); } If we put all that together, it should look something like this: aws s3 ls s3://INSERT_BUCKET_BAME_HERE --recursive --human-readable --summarize | awk '/^\s*Total\s+Objects/ { parts[i++] =$3 } /^\s*Total\s+Size/ { parts[i++] = $3; parts[i++] =$4; } END { print("AWS S3 Bucket Status:", parts[0], "objects, totalling " parts[1], parts[2]); }'

...it's a bit of a mess. Perhaps I should look at putting that awk script in a separate file :P Anyway, here's some example output:

AWS S3 Bucket Status: 602 objects, totalling 21.0 GiB Very nice indeed. To finish off, I'd rather like to know how long it took to do all this. Thankfully, bash has an inbuilt automatic variable that holds the number of seconds since the current process has started, so it's just a case of parsing this out into something readable:

echo "Done in $(($SECONDS / 3600))h $((($SECONDS / 60) % 60))m $(($SECONDS % 60))s.";

...I forget which Stackoverflow answer it was that showed this off, but if you know - please comment below and I'll update this to add credit. This should output something like this:

Done in 0h 12m 51s.

Awesome! We've now got a script that backs up to AWS S3, deletes old backups, and tells us both how much space on S3 is being used and how long the whole process took.

I'm including the entire script at the bottom of this post. I've changed it slightly to add a single variable for the bucket name - so there's only 1 place on line 9 (highlighted) you need to update there.

(Above: A Geopattern, tiled using the GNU Image Manipulation Program)


#!/usr/bin/env bash

# Make sure duplicity exists
test -x $(which duplicity) || exit 1; # Pull in the password . /root/.backup-password AWS_S3_BUCKET_NAME="INSERT_BUCKET_NAME_HERE"; # Allow duplicity to access it export PASSPHRASE; export AWS_ACCESS_KEY_ID; export AWS_SECRET_ACCESS_KEY; # Actually do the backup # Backup strategy: # 1 x backup per week: # 1 x full backup per 2 months # incremental backups in between # S3 Bucket URI: https://${AWS_S3_BUCKET_NAME}/
echo [ $(date +%F%r) ] Performing backup. duplicity --full-if-older-than 2M --exclude /proc --exclude /sys --exclude /tmp --exclude /dev --exclude /mnt --exclude /var/cache --exclude /var/tmp --exclude /var/backups --exclude /srv/www-mail/rainloop/v --s3-use-new-style --s3-european-buckets --s3-use-ia / s3://s3-eu-west-1.amazonaws.com/${AWS_S3_BUCKET_NAME}

# Remove old backups
# You have to plan for 1 extra full backup cycle when
# calculating space requirements - duplicity only
# removes a backup if it won't invalidate those further
# along the chain - the oldest backup will always be
# a full one.
echo [ $(date +%F%r) ] Backup complete. Removing old volumes. duplicity remove-older-than 4M --force --encrypt-key F2A6D8B6 --s3-use-new-style --s3-european-buckets --s3-use-ia s3://s3-eu-west-1.amazonaws.com/${AWS_S3_BUCKET_NAME}
echo [ $(date +%F%r) ] Cleanup complete. aws s3 ls s3://${AWS_S3_BUCKET_NAME} --recursive --human-readable --summarize | awk '/^\s*Total\s+Objects/ { parts[i++] = $3 } /^\s*Total\s+Size/ { parts[i++] =$3; parts[i++] = $4; } END { print("AWS S3 Bucket Status:", parts[0], "objects, totalling " parts[1], parts[2]); }' echo "Done in$(($SECONDS / 3600))h$((($SECONDS / 60) % 60))m$(($SECONDS % 60))s.";  Write an XMPP bot in half an hour Recently I've looked at using AI to extract key information from natural language, and creating a system service with systemd. The final piece of the puzzle is to write the bot itself - and that's what I'm posting about today. Since not only do I use XMPP for instant messaging already but it's an open federated standard, I'll be building my bot on top of it for maximum flexibility. To talk over XMPP programmatically, we're going to need library. Thankfully, I've located just such a library which appears to work well enough, called S22.XMPP. Especially nice is the comprehensive documentation that makes development go much more smoothly. With our library in hand, let's begin! Our first order of business is to get some scaffolding in place to parse out the environment variables we'll need to login to an XMPP account. using System; using System.Linq; using System.Threading; using System.Threading.Tasks; using S22.Xmpp; using S22.Xmpp.Client; using S22.Xmpp.Im; namespace XmppBotDemo { public static class MainClass { // Needed later private static XmppClient client; // Settings private static Jid ourJid = null; private static string password = null; public static int Main(string[] args) { // Read in the environment variables ourJid = new Jid(Environment.GetEnvironmentVariable("XMPP_JID")); password = Environment.GetEnvironmentVariable("XMPP_PASSWORD"); // Ensure they are present if (ourJid == null || password == null) { Console.Error.WriteLine("XMPP Bot Demo"); Console.Error.WriteLine("============="); Console.Error.WriteLine(""); Console.Error.WriteLine("Usage:"); Console.Error.WriteLine(" ./XmppBotDemo.exe"); Console.Error.WriteLine(""); Console.Error.WriteLine("Environment Variables:"); Console.Error.WriteLine(" XMPP_JID Required. Specifies the JID to login with."); Console.Error.WriteLine(" XMPP_PASSWORD Required. Specifies the password to login with."); return 1; } // TODO: Connect here return 0; } } } Excellent! We're reading in & parsing 2 environment variables: XMPP_JID (the username), and XMPP_PASSWORD. It's worth noting that you can call these environment variables anything you like! I chose those names as they describe their contents well. It's also worth mentioning that it's important to use environment variables for secrets passing them as command-line arguments cases them to be much more visible to other uses of the system! Let's connect to the XMPP server with our newly read-in credentials: // Create the client instance client = new XmppClient(ourJid.Domain, ourJid.Node, password); client.Error += errorHandler; client.SubscriptionRequest += subscriptionRequestHandler; client.Message += messageHandler; client.Connect(); // Wait for a connection while (!client.Connected) Thread.Sleep(100); Console.WriteLine($"[Main] Connected as {ourJid}.");

// Wait forever.

// TODO: Automatically reconnect to the server when we get disconnected.

Cool! Here, we create a new instance of the XMPPClient class, and attach 3 event handlers, which we'll look at later. We then connect to the server, and then wait until it completes - and then write a message to the console. It looks like S22.Xmpp spins up a new thread, so unfortunately we can't catch any errors it throws with a traditional try-catch statement. Instead, we'll have to ensure we're really careful that we catch any exceptions we throw accidentally - otherwise we'll get disconnected!

It does appear that XmppClient catches some errors though, which trigger the Error event - so we should attach an event handler to that.

/// <summary>
/// Handles any errors thrown by the XMPP client engine.
/// </summary>
private static void errorHandler(object sender, ErrorEventArgs eventArgs) {
Console.Error.WriteLine($"Error: {eventArgs.Reason}"); Console.Error.WriteLine(eventArgs.Exception); } Before a remote contact is able to talk to our bot, they will send us a subscription request - which we'll need to either accept or reject. This is also done via an event handler. It's the SubscriptionRequest one this time: /// <summary> /// Handles requests to talk to us. /// </summary> /// <remarks> /// Only allow people to talk to us if they are on the same domain we are. /// You probably don't want this for production, but for developmental purposes /// it offers some measure of protection. /// </remarks> /// <param name="from">The JID of the remote user who wants to talk to us.</param> /// <returns>Whether we're going to allow the requester to talk to us or not.</returns> public static bool subscriptionRequestHandler(Jid from) { Console.WriteLine($"[Handler/SubscriptionRequest] {from} is requesting access, I'm saying {(from.Domain == ourJid.Domain?"yes":"no")}");
return from.Domain == ourJid.Domain;
}

This simply allows anyone on our own domain to talk to us. For development purposes this will offer us some measure of protection, but for production you should probably implement a whitelisting or logging system here.

The other interesting thing we can do here is send a user a chat message to either welcome them to the server, or explain why we rejected their request. To do this, we need to write a pair of utility methods, as sending chat messages with S22.Xmpp is somewhat over-complicated:

#region Message Senders

/// <summary>
/// Sends a chat message to the specified JID.
/// </summary>
/// <param name="to">The JID to send the message to.</param>
/// <param name="message">The messaage to send.</param>
private static void sendChatMessage(Jid to, string message)
{
//Console.WriteLine($"[Bot/Send/Chat] Sending {message} -> {to}"); client.SendMessage( to, message, null, null, MessageType.Chat ); } /// <summary> /// Sends a chat message in direct reply to a given incoming message. /// </summary> /// <param name="originalMessage">Original message.</param> /// <param name="reply">Reply.</param> private static void sendChatReply(Message originalMessage, string reply) { //Console.WriteLine($"[Bot/Send/Reply] Sending {reply} -> {originalMessage.From}");
client.SendMessage(
);
}

#endregion


The difference between these 2 methods is that one sends a reply directly to a message that we've received (like a threaded reply), and the other simply sends a message directly to another contact.

Now that we've got all of our ducks in a row, we can write the bot itself! This is done via the Message event handler. For this demo, we'll write a bot that echo any messages to it in reverse:

/// <summary>
/// Handles incoming messages.
/// </summary>
private static void messageHandler(object sender, MessageEventArgs eventArgs) {
Console.WriteLine($"[Bot/Handler/Message] {eventArgs.Message.Body.Length} chars from {eventArgs.Jid}"); char[] messageCharArray = eventArgs.Message.Body.ToCharArray(); Array.Reverse(messageCharArray); sendChatReply( eventArgs.Message, new string(messageCharArray) ); } Excellent! That's our bot complete. The full program is at the bottom of this post. Of course, this is a starting point - not an ending point! A number of issues with this demo stand out. There isn't a whitelist, and putting the whole program in a single file doesn't sound like a good idea. The XMPP logic should probably be refactored out into a separate file, in order to keep the input settings parsing separate from the bot itself. Other issues that probably need addressing include better error handling and more - but fixing them all here would complicate the example rather. Edit: The code is also available in a git repository if you'd like to clone it down and play around with it :-) Found this interesting? Got a cool use for it? Still confused? Comment below! Complete Program using System; using System.Linq; using System.Threading; using System.Threading.Tasks; using S22.Xmpp; using S22.Xmpp.Client; using S22.Xmpp.Im; namespace XmppBotDemo { public static class MainClass { private static XmppClient client; private static Jid ourJid = null; private static string password = null; public static int Main(string[] args) { // Read in the environment variables ourJid = new Jid(Environment.GetEnvironmentVariable("XMPP_JID")); password = Environment.GetEnvironmentVariable("XMPP_PASSWORD"); // Ensure they are present if (ourJid == null || password == null) { Console.Error.WriteLine("XMPP Bot Demo"); Console.Error.WriteLine("============="); Console.Error.WriteLine(""); Console.Error.WriteLine("Usage:"); Console.Error.WriteLine(" ./XmppBotDemo.exe"); Console.Error.WriteLine(""); Console.Error.WriteLine("Environment Variables:"); Console.Error.WriteLine(" XMPP_JID Required. Specifies the JID to login with."); Console.Error.WriteLine(" XMPP_PASSWORD Required. Specifies the password to login with."); return 1; } // Create the client instance client = new XmppClient(ourJid.Domain, ourJid.Node, password); client.Error += errorHandler; client.SubscriptionRequest += subscriptionRequestHandler; client.Message += messageHandler; client.Connect(); // Wait for a connection while (!client.Connected) Thread.Sleep(100); Console.WriteLine($"[Main] Connected as {ourJid}.");

// Wait forever.

// TODO: Automatically reconnect to the server when we get disconnected.

return 0;
}

#region Event Handlers

/// <summary>
/// Handles requests to talk to us.
/// </summary>
/// <remarks>
/// Only allow people to talk to us if they are on the same domain we are.
/// You probably don't want this for production, but for developmental purposes
/// it offers some measure of protection.
/// </remarks>
/// <param name="from">The JID of the remote user who wants to talk to us.</param>
/// <returns>Whether we're going to allow the requester to talk to us or not.</returns>
public static bool subscriptionRequestHandler(Jid from) {
Console.WriteLine($"[Handler/SubscriptionRequest] {from} is requesting access, I'm saying {(from.Domain == ourJid.Domain?"yes":"no")}"); return from.Domain == ourJid.Domain; } /// <summary> /// Handles incoming messages. /// </summary> private static void messageHandler(object sender, MessageEventArgs eventArgs) { Console.WriteLine($"[Handler/Message] {eventArgs.Message.Body.Length} chars from {eventArgs.Jid}");
char[] messageCharArray = eventArgs.Message.Body.ToCharArray();
Array.Reverse(messageCharArray);
eventArgs.Message,
new string(messageCharArray)
);
}

/// <summary>
/// Handles any errors thrown by the XMPP client engine.
/// </summary>
private static void errorHandler(object sender, ErrorEventArgs eventArgs) {
Console.Error.WriteLine($"Error: {eventArgs.Reason}"); Console.Error.WriteLine(eventArgs.Exception); } #endregion #region Message Senders /// <summary> /// Sends a chat message to the specified JID. /// </summary> /// <param name="to">The JID to send the message to.</param> /// <param name="message">The messaage to send.</param> private static void sendChatMessage(Jid to, string message) { //Console.WriteLine($"[Rhino/Send/Chat] Sending {message} -> {to}");
client.SendMessage(
to, message,
null, null, MessageType.Chat
);
}
/// <summary>
/// Sends a chat message in direct reply to a given incoming message.
/// </summary>
/// <param name="originalMessage">Original message.</param>
{
//Console.WriteLine($"[Rhino/Send/Reply] Sending {reply} -> {originalMessage.From}"); client.SendMessage( originalMessage.From, reply, null, originalMessage.Thread, MessageType.Chat ); } #endregion } } Creating a system service with systemd While I've got some grumblings with systemd over how it handles (or not) certain things, it's the most popular service manager on Linux systems today. By this, I mean it starts and stops the various services (like your SSH server, web server, cron) automatically, according to the rules laid out in special service files. Since it's so popular and I keep having to write services (and look up how to do so every time), I thought I'd write a post here about it to save me the trouble :P Bear in mind that systemd isn't the only service manager out there. Others include OpenRC, runit, upstart, and more! If you're using one of these (I'm looking to investigate using one of these on my next server rebuild), then this tutorial isn't for you. I will probably be releasing a tutorial down the road for OpenRC though, if I get around to having a server running an OS that uses it. systemd stores it's service files in /etc/systemd/system/, so to start we need to create a new file in there: sudo sensible-editor /etc/systemd/system/service_name.service With the new file open in your favourite editor, it's time to set out our service definition. This is done with an ini-like syntax. Here's an example: [Unit] Description=Gitea After=syslog.target rsyslog.service network.target [Service] Type=simple User=git Group=git WorkingDirectory=/srv/git/gitea ExecStart=/srv/git/gitea/gitea web Restart=always Environment=USER=git HOME=/srv/git [Install] WantedBy=multi-user.target The above is a simple service file for Gitea, which is the engine behind my personal git server. Let's go through each section one by one. Firstly, the [Unit] section defines the metadata about the service. It's fairly self explanatory actually - we set the description of the service here, and also the other services (space  separated) that we want our service to be started after with theAfter property. Next comes the [Service] section. This section specifies how it should start the service. We tell it that it's a simple service (in other words it doesn't do anything fancy - other types are available, but we won't use them here), the user and group it should run under, and working directory of the process, and the command (and it's arguments) to execute in order to start the process. In addition, we also tell it to restart the service if it crashes, and set a few environment variables to refine the way Gitea behaves. Very cool! The final section, [Install], simply specifies the systemd-equivalent of which run-level this service should start on. It's very interesting from a how-does-my-system-work perspective - I recommend reading this Stack Exchange answer and this article for more information - it's a topic for another post here on this blog :-) To start this new service, do the following: sudo systemctl daemon-reload sudo systemctl enable service_name.service sudo systemctl start service_name.service This starts our new service and configures it to automatically start when the system first boots. With that taken care of, we've now got the basics down of our very own service file! We can take this further though. What if there's a secret key that we need to pass to a service on startup in an environment variable, but we don't want to specify it in the service because it's world-readable? The answer here is a clever bit of shell scripting. Consider the following service file: [Unit] Description=Awesome XMPP Bot After=network.target prosody.service [Service] Type=simple User=bot WorkingDirectory=/srv/bot ExecStart=/srv/bot/start_service.sh Restart=on-failure # Other Restart options: or always, on-abort, etc [Install] WantedBy=multi-user.target In this case, we've defined a service file for an XMPP bot (public server directory). In order for it to connect to an XMPP server, it needs a JID (a username - formatted like an email address) and password. However, we don't want to specify these directly in the service file because they are secret! Instead, we've specified that it should start a shell script that's located at /srv/bot/start_service.sh instead of the bot itself. Here's the contents of that shell script: #!/usr/bin/env bash source .xmpp_credentials export XMPP_JID; export XMPP_PASSWORD; exec /usr/bin/mono Bot.exe This simple shell script loads the contents of the file .xmpp_credentials, specifies that the XMPP_JID and XMPP_PASSWORD environment variables should be passed to any further (sub) processes, and asks for the current process to be terminated and replaced with an instance of Mono executing our bot's code that stored in Bot.exe (this way we don't have an extra bash process sitting around doing nothing, since it's job is done as soon as we start the bot itself). In this way, we can store our precious private details in a file that we can lock down so that only the bot's user account can read it. Here's what that .xmpp_credentials file might look like: #!/usr/bin/env bash XMPP_JID="bot@bobsrockets.com"; XMPP_PASSWORD="sekret"; ....and if I run ls .xmpp_credentials, I might see something like this: -r-x------ 1 bot bot 104 Nov 10 21:27 .xmpp_credentials Here the file permissions allow only the bot user to read and execute the file, but not modify it (sudo chown bot:bot .xmpp_credentials and sudo chmod 0500 .xmpp_credentials set these permissions for the curious). These completes the tutorial on setting up services with systemd. We've seen how to create service files and make them start on boot (much easier than alternatives like running a command manually or using screen!). We've also learnt a simple way to hide credentials (though more advanced alternatives do exist). Found this useful? Found a better way to do it? Comment below! Sources and Further Reading C# & .NET Terminology Demystified: A Glossary After my last glossary post on LoRa, I thought I'd write another one of C♯ and .NET, as (in typical Microsoft fashion it would seem), they're seems to be a lot of jargon floating around whose meaning is not always obvious. If you're new to C♯ and the .NET ecosystems, I wouldn't recommend tackling all of this at once - especially the bottom ~3 definitions - with those in particular there's a lot to get your head around. C♯ C♯ is an object-oriented programming language that was invented by Microsoft. It's cross-platform, and is usually written in an IDE (Integrated Development Environment), which has a deeper understanding of the code you write than a regular text editor. IDEs include Visual Studio (for Windows) and MonoDevelop (for everyone else). Solution A Solution (sometimes referred to as a Visual Studio Solution) is the top-level definition of a project, contained in a file ending in .sln. Each solution may contain one or more Project Files (not to be confused with the project you're working on itself), each of which gets compiled into a single binary. Each project may have its own dependencies too: whether they be a core standard library, another project, or a NuGet package. Project A project contains your code, and sits 1 level down from a solution file. Normally, a solution file will sit in the root directory of your repository, and the projects will each have their own sub-folders. While each project has a single output file (be that a .dll class library or a standalone .exe executable), a project may have multiple dependencies - leading to many files in the build output folder. The build process and dependency definitions for a project are defined in the .csproj file. This file is written in XML, and can be edited to perform advanced build steps, should you need to do something that the GUI of your IDE doesn't support. I've blogged about the structuring of this file before (see here, and also a bit more here), should you find yourself curious. CIL Known as Common Intermediate Language, CIL is the binary format that C♯ (also Visual Basic and F♯ code) code gets compiled into. From here, the .NET runtime (on Windows) or Mono (on macOS, Linux, etc.) can execute it to run the compiled project. MSBuild The build system for Solutions and Projects. It reads a .sln or .csproj (there are others for different languages, but I won't list them here) file and executes the defined build instructions. .NET Framework The .NET Framework is the standard library of C♯ it provides practically everything you'll need to perform most common tasks. It does not provide a framework for constructing GUIs and Graphical Interfaces. You can browse the API reference over at the official .NET API Browser. WPF The Windows Presentation Foundation is a Windows-only GUI framework. Powered by XAML (eXtensible Application Markup Language) definitions of what the GUI should look like, it provides everything you need to create a native-looking GUI on Windows. It does not work on macOS and Linux. To create a cross-platform program that works on all 3 operating systems, you'll need to use an alternative GUI framework, such as XWT or Gtk# (also: Glade). A more complete list of cross-platform frameworks can be found here. It's worth noting that Windows Forms, although a tempting option, aren't as flexible as the other options listed here. C♯ 7 The 7th version of the C♯ language specification. This includes the syntax of the language, but not the .NET Framework itself. .NET Standard A specification of the .NET Framework, but not the C♯ Language. As of the time of typing, the latest version is 2.0, although version 1.6 is commonly used too. The intention here is the improve cross-platform portability of .NET programs by defining a specification for a subset of the full .NET Framework standard library that all platforms will always be able to use. This includes Android and iOS through the use of Xamarin. Note that all .NET Standard projects are class libraries. In order to create an executable, you'll have to add an additional Project to your Solution that references your .NET Standard class library. ASP.NET A web framework for .NET-based programming languages (in our case C♯). Allows you to write C♯ code to handle HTTP (and now WebSockets) requests in a similar manner to PHP, but different in that your code still needs compiling. Compiled code is then managed by a web server IIS web server (on Windows). With the release of .NET Core, ASP.NET is now obsolete. .NET Core Coming in 2 versions so far (1.0 and 2.0), .NET Core is the replacement for ASP.NET (though this is not its exclusive purpose). As far as I understand it, .NET Core is a modular runtime that allows programs targeting it to run multiple platforms. Such programs can either be ASP.NET Core, or a Universal Windows Platform application for the Windows Store. This question and answer appears to have the best explanation I've found so far. In particular, the displayed diagram is very helpful: ....along with the pair of official "Introducing" blog posts that I've included in the Sources and Further Reading section below. Conclusion We've looked at some of the confusing terminology in the .NET ecosystems, and examined each of them in turn. We started by defining and untangling the process by which your C♯ code is compiled and run, and then moved on to the different variants and specifications related to the .NET Framework and C♯. As always, this is a starting point - not an ending point! I'd recommend doing some additional reading and experimentation to figure out all the details. Found this helpful? Still confused? Spotted a mistake? Comment below! Sources and Further Reading LoRa Terminology Demystified: A Glossary (Above: My 2 RFM95s. One works, but the other doesn't yet....) I've been doing some more experimenting with LoRa recently, as I've got 1 of my 2 RFM95 working (yay)! While the other is still giving me trouble (meaning that I can't have 1 transmit and the other receive yet :-/), I've still been able to experiment with other people's implementations. To that end, I've been learning about a bunch of different words and concepts - and thought that I'd document them all here. LoRa The radio protocol itself is called LoRa, which stands for Long Range. It provides a chirp-based system (more on that later under Bandwidth) to allow 2 devices to communicate over great distances. LoRaWAN LoRaWAN builds on LoRa to provide a complete end-to-end protocol stack to allow Internet of Things (IoT) devices to communicate with an application server and each other. It provides: • Standard device classes (A, B, and C) with defined behaviours • Class A devices can only receive for a short time after transmitting • Class B devices receive on a regular, timed, basis - regardless of when they transmit • Class C devices send and receive whenever they like • The concept of a Gateway for picking up packets and forwarding them across the rest of the network (The Things Network is the largest open implementation to date - you should definitely check it out if you're thinking of using LoRa in a project) • Secure multiple-layered encryption of messages via AES ...amongst many other things. The Things Network The largest open implementation of LoRaWAN that I know of. If you hook into The Things Network's LoRaWAN network, then your messages will get delivered to and from your application server and LoRaWAN-enabled IoT device, wherever you are in the world (so long as you've got a connection to a gateway). It's often abbreviated to TTN. Check out their website. (Above: A coverage map for The Things Network. The original can be found here) Data Rate The data rate is the speed at which a message is transmitted. This is measured in bits-per-second, as LoRa itself is an 'unreliable' protocol (it doesn't guarantee that anyone will pick anything up at the other end). There are a number of preset data rates: Code Speed (bits/second) DR0 250 DR1 440 DR2 980 DR3 1760 DR4 3125 DR5 5470 DR6 11000 DR7 50000 These values are a little different in different places - the above are for Europe on 868MHz. Maximum Payload Size Going hand-in-hand with the Data Rate, the Maximum Payload Size is the maximum number of bytes that can be transmitted in a single packet. If more than the maximum number of bytes needs to be transmitted, then it will be split across multiple packets - much like TCP's Maximum Transmission Unit (MTU), when it comes to that. With LoRa, the maximum payload size varies with the Data Rate - from 230 bytes at DR7 to just 59 at DF2 and below. Spreading Factor Often abbreviated to just simply SF, the spreading factor is also related to the Data Rate. In LoRa, the Spreading Factor refers to the duration of a single chirp. There are 6 defined Spreading Factors: ranging from SF7 (the fastest transmission speed) to SF12 (the slowest transmission speed). Which one you use is up to you - and may be automatically determined by the driver library you use (it's always best to check). At first glance, it may seem optimal to choose SF7, but it's worth noting that the slower speeds achieved by the higher spreading factors can net you a longer range. Data Rate Configuration bits / second Max payload size (bytes) DR0 SF12/125kHz 250 59 DR1 SF11/125kHz 440 59 DR2 SF10/125kHz 980 59 DR3 SF9/125kHz 1 760 123 DR4 SF8/125kHz 3 125 230 DR5 SF7/125kHz 5 470 230 DR6 SF7/250kHz 11 000 230 DR7 FSK: 50kpbs 50 000 230 _(Again, from Exploratory Engineering: Data Rate and Spreading Factor)_ Duty Cycle A Duty Cycle is the amount of time something is active as a percentage of a total time. In the case of LoRa(/WAN?), there is an imposed 1% Duty Cycle, which means that you aren't allowed to be transmitting for more than 1% of the time. Bandwidth Often understood, the Bandwidth is the range of frequencies across which LoRa transmits. The LoRa protocol itself uses a system of 'chirps', which are spread form one end of the Bandwidth to the other going either up (an up-chirp), or down (a down-chirp). LoRahas 2 bandwidths it uses: 125kHz, 250kHz, and 500kHz. (Some example LoRa Chirps. Source: This Article on Link Labs) Frequency Frequency is something that most of us are familiar with. Different wireless protocols utilise different frequencies - allowing them to go about their business in peace without interfering with each other. For example, 2.4GHz and 5GHz are used by WiFi, and 800MHz is one of the frequencies used by 4G. In the case of LoRa, different frequencies are in use in different parts of the world. ~868MHz is used in Europe (443MHz can also be used, but I haven't heard of many people doing so), 915MHz is used in the US, and ~780MHz is used in China. Location Frequency Europe 863 - 870MHz US 902 - 928MHz China 779 - 787MHz (Source: RF Wireless World) Found this helpful? Still confused? Found a mistake? Comment below! Sources and Further Reading https://electronics.stackexchange.com/a/305287/180059 Proxies: What's the difference? You've probably heard of proxies. Perhaps you used one when you were at school to access a website you weren't supposed to. But did you know that there are multiple different types of proxies that are used for different things? For example, a reverse proxy perform load-balancing and caching for your web application? And that a transparent proxy can be used to filter the traffic of your internet connection without you knowing (well, almost)? In this post, I'll be explaining the difference between the different types of proxy I'm aware of, why you'd want one, and how to detect their presence. Reverse Proxies A reverse proxy is one that, when it receives a request, repeats it to an upstream server. For example, I use nginx to reverse-proxy PHP requests to a backend PHP-FPM instance. Reverse proxies also come in really handy if you want to run multiple, perhaps unrelated, servers on a single machine with a single IP address, as they can reverse proxy requests to the right place based on the requested subdomain. For example, on my server I not only serve my website (which in and of itself reverse-proxies PHP requests), but also serves my git server - which is a separate process listening on a different port behind my firewall. Caching is another key feature of reverse proxies that comes in dead useful if you're running a medium-high traffic website. Instead of forwarding every single request to your backend for processing, if you've got a blog, for instance, you could cache the responses to requests for the posts themselves and serve them directly from the reverse proxy, leaving the slower backend free to process comments that people make, for example. Both nginx and Varnish have support for this. This with method, it's possible to serve 1000s of requests a minute from a very modestly sized virtual machine (say, 512MB RAM, 1 CPU) if configured correctly. Take that, Apache! Finally, when 1 server isn't enough any more, your can get reverse proxies like nginx to act as a load balancer. In this scenario, there are multiple backend servers (probably running on different machines, with a fast internal LAN connecting them all), and a single front-facing load balancer sitting in front of them all distributing requests to the backend servers. nginx in particular can get very fancy with the logic here, should you need that kind of control. It can even monitor the health of the backend application servers, and avoid sending any requests to unresponsive servers - giving them time to recover from a crash. Forward Proxies Forward proxies are distinctly different to reverse proxies, in that they make requests to the destination client wants to connect to on their behalf. Such a proxy can be instituted for many reasons. Sometimes, it's for security reasons - for example to ensure that all those connecting to a backend local network are authenticated (authentication with a forward proxy is done via a set of special Proxy- HTTP headers). Other times, it's to preserve data on limited and/or expensive internet connections. More often though, it's to censor and surveil the internet connection of the users on a network - and also to bypass such censoring. It is in this manner that HTTP(S) has become so pervasive - in that companies, institutions, (and, in rare cases), Internet Service Providers install forward proxies to censor the connections of their users - as such proxies usually only understand HTTP and HTTPS (clients request that a forward proxy retrieve something for them via a GET https://bobsrockets.net/ HTTP/1.1 request for example). If you're curious though, some forward proxies these days support the CONNECT HTTP method, allowing one to set up a TLS connection with another server (whether that be an HTTPS, SSH, SMTPS, or other protocol server). In addition, the SOCKS protocol now allows for arbitrary TCP connection to be proxied through as well. Forward proxies nearly always require some client-side configuration. If you've wondered what the proxy settings are in your operating system and web browser's settings - this is what they're for. Such can usually by identified by the Via and other headers that they attach to outgoing requests, as per RFC 2616. Online tools exist that exploit this - allowing you to detect whether such a proxy exists. Transparent Proxies Transparent proxies are similar to forward proxies, but do not require any client-side configuration. Instead, they utilise clever networking tricks to intercept network traffic being sent to and from the clients on a network. In this manner, they can cache responses, filter content, and protect the users from attacks without the client necessarily being aware of their existence. It is important to note here though that utilising a proxy is by no means a substitute for maintaining proper defences on your own computer, such as installing and configuring a firewall, ensuring your system has all the latest updates, and, if you're running windows, ensuring you have an antivirus program installing and running (Windows 10 comes with one automatically these days). Even though they don't usually attach the Via header (as they are supposed to), such proxies can usually be detected by cleverly designed tests that exploit their tendency to cache requests, thankfully. Conclusion So there you have it. We've taken a look at Forward proxies, and the benefits (and drawbacks) they can provide to users. We've also investigated Transparent proxies, and how to detect them. Finally, we've looked at Reverse proxies and the advantages they can provide to enable you to scale and structure your next great web (and other protocol! Nginx supports all sorts of other protocols besides HTTP(S)) application. Job Scheduling on Linux Scheduling jobs to happen at a later time on a Linux based machine can be somewhat confusing. Confused by 5 4 8-10/4 6/4 * baffled by 5 */4 * * *? All will be revealed! cron Scheduling jobs on a Linux machine can be done in several ways. Let's start with cron - the primary program that orchestrates the whole proceeding. Its name comes from the Greek word Chronos, which means time. By filling in a crontab (read cron-table), you can tell it what to do when. It's essentially a time-table of jobs you'd like it to run. Your Linux machine should come with cron installed already. You can check if cron is installed and running by entering this command into your terminal: if [[ "$(pgrep -c cron)" -gt 0 ]]; then echo "Cron is installed :D"; else echo "Cron is not installed :-("; fi

If it isn't installed or running, then you'll have to investigate why this isn't the case. The most common is that it isn't installed. It's normally in the official repositories for most distributions - on Debian-based system sudo apt install cron should suffice. Arch-based users may need to check to make sure that the system service is enabled and do so manually.

With cron setup and ready to go, we can start adding jobs to it. This is done by way of a crontab, as explained above. Each user has their own crontab such that they can each configure their own individual sets jobs. To edit it, type this:

crontab -e

This will open your favourite editor with your crontab ready for editing (if you'd like to change your editor, do sudo update-alternatives --config editor or change the EDITOR environment variable). You should see a bunch of lines like this:

# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h  dom mon dow   command

I'd advise you keep this for future reference - just in case you find yourself in a pinch later - so scroll down to the bottom and start adding your jobs there.

Let's look at the syntax for telling cron about a job next. This is best done by example:

0 1 * * 7   cd /root && /root/run-backup

This job, as you might have guessed, runs a custom backup script. It's one I wrote myself, but that's a story for another time (comment below if you'd like me to post about that). What we're interested in is the bit at the beginning: 0 1 * * 7. Scheduling a cron job is done by specifying 5 space-separated values. In the case of the above, the job will run at 1am every Sunday morning. The order is as follows:

• Minute
• Hour
• Day of the Month
• Month
• Day of the week

For of these values, a number of different specifiers can be used. For example, specifying an asterisk (*) will cause the job to run at every interval of that column - e.g. every minute or every hour. If you want to run something on every minute of the day (such as a logging or monitoring script), use * * * * *. Be aware of the system resources you can use up by doing that though!

Specifying number will restrict it to a specific time in an interval. For example, 10 * * * * will run the job at 10 minutes past every hour, and 22 3 * * * will run a job at 03:22 in the morning every day (I find such times great for maintenance jobs).

Sometimes, every hour or every minute is too often. Cron can handle this too! For example 3 */2 * * * will run a job at 3 minutes past every second hour. You can alter this at your leisure: The value after the forward slash (/) decides the interval (i.e. */3 would be every third, */15 would be every 15th, etc.).

The last column, the day of the week, is an alternative to the day of the month column. It lets you specify, as you may assume, the day oft he week a job should run on. This can be specified in 2 way: With the numbers 0-6, or with 3-letter short codes such as MON or SAT. For example, 6 20 * * WED runs at 6 minutes past 8 in the evening on Wednesday, and 0 */4 * * 0 runs every 4th hour on a Sunday.

The combinations are endless! Since it can be a bit confusing combining all the options to get what you want, crontab.guru is great for piecing cron-job specifications together. It describes your cron-job spec in plain English for you as you type!

(Above: crontab.guru displaying a random cronjob spec)

What if I turn my computer off?

Ok, so cron is all very well, but what if you turn your machine off? Well, if cron isn't running at the time a job should be run, then it won't get executed. For those of us who don't leave their laptops on all the time, all is not lost! It's time to introduce the second piece of software at our disposal.

Enter stage left: anacron. Built to be a complement to cron, anacron sets up 3 folders:

• /etc/cron.daily
• /etc/cron.weekly
• /etc/cron.monthly

Any executable scripts in this folder will be run at daily, weekly, and monthly intervals respectively by anacron, and it respects the hash-bang (that #! line at the beginning of the script) too!

Most server systems do not come with anacron pre-installed, though it should be present if your distributions official repositories. Once you've installed it, edit root's crontab (with sudo crontab -e if you can't remember how) and add a job that executes anacron every hour like so:

# Run anacron every hour
5 * * * *   /usr/sbin/anacron

This is important, as anacron does not in itself run all the time like cron does (this behaviour is called a daemon in the Linux world) - it needs a helping hand to get it to run.

If you've got more specific requirements, then anacron also has it's own configuration file you can edit. It's found at /etc/anacrontab, and has a different syntax. In the anacron table, jobs follow the following pattern:

• period - The interval, in days, that the job should run
• delay - The offset, in minutes, that the job should run at
• job identifier - A textual identifier (without spaces, of course) that identifies the job
• command - The command that should be executed

You'll notice that there are 3 jobs specified already - one for each of the 3 folders mentioned above. You can specify your own jobs too. Here's an example:

# Do the weekly backup
7   20  run-backup  cd /root/data-shape-backup && ./do-backup;

The above job runs every 7 days, with an offset of 20 minutes. Note that I've included a command (the line starting with a hash #) to remind myself as to what the job does - I'd recommend you always include such a comment for your own reference - whether you're using cron, anacron, or otherwise.

I'd also recommend that you test your anacron configuration file after editing it to ensure it's valid. This is done like so:

anacron -T

I'm not an administrator, can I still use this?

Sure you can! If you've got anacron installed (you could even compile it from source locally if you haven't) and want to specify some jobs for your local account, then that's easily done too. Just create an anacrontab file anywhere you please, and then in your regular crontab (crontab -e), tell anacron where you put it like this:

# Run anacron every hour
5 * * * *   /usr/sbin/anacron -t "path/to/anacrontab"

Good point. cron and anacron are great for repeating jobs, but what if you want to set up a one-off job to auto-disable your firewall before enabling it just in case you accidentally lock yourself out? Thankfully, there's even an answer for this use-case too: atd.

atd is similar to cron in that it runs a daemon in the background, but instead of executing jobs specified in a crontab, you tell it when you want it to execute a series of commands, and then enter the commands themselves. For example:

$at now + 10 minutes warning: commands will be executed using /bin/sh at> echo -e "Testing" at> uptime at> <EOT> job 4 at Thu Jul 12 14:36:00 2018 In the above, I tell it to run the job 10 minutes from now, and enter a pair of commands. To end the command list, I hit CTRL + D on an empty line. The output of the job will be emailed to me automatically if I've got that set up (cron and anacron also do this). Specifying a time can be somewhat fiddly, but its also quite flexible: • at tomorrow • at now + 5 hours • at 16:06 • at next month • at 2018 09 25 ....and so on. Listing the current scheduled jobs is also just as easy: atq This will output a list of scheduled jobs that haven't been run yet. You can't see any jobs that aren't created by you unless you're root (use sudo), though. You can use the job ids listed here to cancel a job too: # Remove job id 4: atrm 4 Conclusion That just about concludes this whirlwind tour of job scheduling on Linux systems. We've looked at how to schedule jobs with cron, and how to ensure our jobs get run - even when the target machine isn't turned on all the time with anacron. We've also looked at one-time jobs with atd, and how to manage the job queue. As usual, this is a starting point - not an ending point! Job scheduling is just the beginning. From here, you can look at setting up automated backups. You could investigate setting up an email server, and how that integrates with cron. You can utilise cron to perform maintenance for your next great web (or other!) application. The possibilities are endless! Found this useful? Still confused? Comment below! Demystifying Inverted Indexes (The magnifying glass in the above banner came from openclipart) After writing the post that will be released after this one, I realised that I made a critical assumption that everyone knew what an inverted index was. Upon looking for an appropriate tutorial online, I couldn't find one that was close enough to what I did in Pepperminty Wiki, so I decided to write my own. First, some context. What's Pepperminty Wiki? Well, it's a complete wiki engine in a single file of PHP. The source files are obviously not a single file, but it builds into a single file - making it easy to drop into any PHP-enabled web server. One of its features is a full-text search engine. A personal wiki of mine has ~75k words spread across ~550 pages, and it manages to search them all in just ~450ms! It does this with the aid of an inverted index - which I'll be explaining in this post. First though, we need some data to index. How about the descriptions of some video games? Kerbal Space Program In KSP, you must build a space-worthy craft, capable of flying its crew out into space, without killing them. At your disposal is a collection of parts, which must be assembled to create a functional ship. Each part has its own function and will affect the way a ship flies (or doesn't). So strap yourself in, and get ready to try some Rocket Science! Cross Code Meet Lea as she logs into an MMO of the distant future. Follow her steps as she discovers a vast world, meets other players and overcomes all the challenges of the game. Fort Meow Fort Meow is a physics game by Upper Class Walrus about a girl, an old book and a house full of cats! Meow. Factory Balls Factory balls is the brand new bonte game I already announced yesterday. Factory balls takes part in the game design competition over at jayisgames. The goal of the design competition was to create a 'ball physics'-themed game. I hope you enjoy it! Very cool, this should provide us with plenty of data to experiment with. Firstly, let's consider indexing. Take the Factory Balls description. We can split it up into tokens like this: T o k e n s V V factory balls is the brand new bonte game i already announced yesterday factory balls takes part in the game design competition over at jayisgames the goal of the design competition was to create a ball physics themed game i hope you enjoy it Notice how we've removed punctuation here, and made everything lowercase. This is important for the next step, as we want to make sure we consider Factory and factory to be the same word - otherwise when querying the index we'd have to remember to get the casing correct. With our tokens sorted, we can now count them to create our index. It's like a sort of tally chart I guess, except we'll be including the offset in the text of every token in the list. We'll also be removing some of the most common words in the list that aren't very interesting - these are known as stop words. Here's an index generated from that Factory Balls text above: Token Frequency Offsets factory 2 0, 12 balls 2 1, 13 brand 1 4 new 1 5 bonte 1 6 game 3 7, 18, 37 i 2 8, 38 announced 1 10 yesterday 1 11 takes 1 14 design 2 19, 28 competition 2 20, 29 jayisgames 1 23 goal 1 25 create 1 32 ball 1 34 physics 1 35 themed 1 36 hope 1 39 enjoy 1 41 Very cool. Now we can generate an index for each page's content. The next step is to turn this into an inverted index. Basically, the difference between the normal index and a inverted index is that an entry in an inverted index contains not just the offsets for a single page, but all the pages that contain that token. For example, the Cross-Code example above also contains the token game, so the inverted index entry for game would contain a list of offsets for both the Factory Balls and Cross-Code pages. Including the names of every page under every different token in the inverted index would be both inefficient computationally and cause the index to grow rather large, so we should assign each page a unique numerical id. Let's do that now: Id Page Name 1 Kerbal Space Program 2 Cross Code 3 Fort Meow 4 Factory Balls There - much better. In Pepperminty Wiki, this is handled by the ids class, which has a pair of public methods: getid($pagename) and getpagename(\$id). If an id can't be found for a page name, then a new id is created and added to the list (Pepperminty Wiki calls this the id index) transparently. Similarly, if a page name can't be found for an id, then null should be returned.

Now that we've got ids for our pages, let's look at generating that inverted index entry for game we talked about above. Here it is:

• Term: game
Id Frequency Offsets
2 1 31
3 1 5
4 3 5, 12, 23

Note how there isn't an entry for page id 1, as the Kerbal Space Program page doesn't contain the token game.

This, in essence, is the basics of inverted indexes. A full inverted index will contain an entry for every token that's found in at least 1 source document - though the approach used here is far from the only way of doing it (I'm sure there are much more advanced ways of doing it for larger datasets, but this came to mind from reading a few web articles and is fairly straight-forward and easy to understand).

Can you write a program that generates a full inverted index like I did in the example above? Try testing it on the test game descriptions at the start of this post.

You may also have noticed that the offsets used here are of the tokens in the list. If you wanted to generate contexts (like Duck Duck Go or Google do just below the title of a result), you'd need to use the character offsets from the source document instead. Can you extend your program to support querying the inverted index, generating contexts based on the inverted index too?

Liked this post? Got your own thoughts on the subject? Having trouble with the challenges at the end? Comment below!

Art by Mythdael