Archive

## Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## PhD Update 2: The experiment, the data, and the supercomputers

Welcome to another PhD update post. Since last time, a bunch of different things have happened - which I'll talk about here. In particular, 2 distinct strands have become evident: The reading papers and theory bit - and the writing code and testing stuff out bit.

At the moment, I'm focusing much more heavily on the writing code and experimental side of things, as I've recently gained access to the 1km resolution rainfall radar dataset from CEDA. While I'm not allowed to share any data that I've now got, I'm pretty sure I'm safe to talk about how terribly it's formatted.

The data itself is with 1 directory per year, and 1 file per day inside those directories. Logical so far, right? Each of those files is a tar archive, inside which are the binary files that contain the data itself - with 1 file for every 5 minutes. These are compressed with gzip - which seems odd since they could probably gain greater compression ratios if they were tared first and compressed second (then the compression algorithm would be able to compress based on the similarities between files too).

The problems arise when you start to parse out the binary files themselves. They are in a propriety format - which has 3 different versions that don't conform to the (limited) documentation. To this end, it's been proving somewhat of a challenge to parse them and extract the bits I'm interested in.

To tackle this, I've been using Node.js and a bunch of libraries from npm (noisy pirate mutiny? nearest phase modulator? nasty popsicle machine? nah, it's probably node package manager):

• binary-parser - For parsing the binary files themselves. Allows you to define the format of the file programmatically, and it'll parse it out into a nice object you can then manipulate.
• gunzip-maybe - A streaming library that unzips a gzip-compressed stream
• @icetee/ftp - An FTP client library for downloading the files (I know that FTP is insecure, that's all they offer at this time :-/)
• tar-stream - For parsing tar files
• nnng - Stands for No! Not National Grid!. helps with the conversion between OS national grid references and regular longitude latitude.

Aside from the binary file format, I encountered 3 main issues:

1. The data is only a rectangle when using ordnance survey national grid references
2. There's so much data, it needs to be streamed from the remote server
3. Generating a valid gzip file is harder than you expect

Problem 1 here took me a while to figure out. Since as I mentioned the documentation is rather limited, I spent much longer than I would have liked attempting to parse the data in latitude longitude and finding it didn't work.

Problem 2 was rather interesting. Taking a cursory glance over the data before hand revealed that each daily tar file was about 80MiB - and with roughly 5.7K days worth of data (the dataset appears to go back to May 2004-ish), it quickly became clear that I couldn't just download them all and process them later.

It is for this reason that I chose Node.js in the first place for this. For those who haven't encountered it before, it's Javascript for the server - and it's brilliant for 2 main use-cases: networking and streaming data. Both of which were characteristics of the problem at hand - so the answer was obvious.

I'm still working on tweaking and improving my final solution, but as it stands after implementing the extractor on it's own, I've also implemented a wrapper that streams the tar archives from the FTP server, stream-reads the tar archives, streams the files in the tar archives into a gzip decompressor, parses the result, and then streams the interesting bits to disk as a disk object via a gzip compressor.

That's a lot of streams. The great part about this is that I don't accidentally end up with huge chunks of binary files in memory. The only bits that can't be streamed is the binary file parser and the bit that extracts the interesting bits.

I'm still working on the last issue, but I've been encountering nasty problems with the built-in zlib gzip compressor transformation stream. When I send a SIGINT (Ctrl + C) to the Node.js process, it doesn't seem to want to finish writing the gzip file correctly - leading to invalid gzip files with chunks missing from the end.

Since the zlib gzip transformation stream is so badly documented, I've ended up replacing it with a different solution that spawns a gzip child process instead (so you've got to have gzip installed on the machine you're running the script on, which shouldn't be a huge deal on Linux). This solution is better, but still requires some tweaks because it transpires that Node.js automatically propagates signals it receives to child processes - before you've had a chance to tie up all your loose ends. Frustrating.

Even so, I'm hopeful that I've pretty much got it to a workable state for now - though I'll need to implement a daemon-type script at some point to automatically download and process the new files as they are uploaded - it is a living dataset that's constantly being added to after all.

### Papers

The other strand (that's less active at the minute) is reading papers. Last time, I mentioned the summary papers I'd read, and the direction I was considering reading in. Since then, I've both read a number of new papers and talked to a bunch of very talented people on-campus - so I've got a little bit of a better idea as to the direction I'm headed in now.

Firstly, I've looked into a few cutting-edge recurrent neural network types:

• Grid LSTMs - Basically multi-dimensional LSTMs
• Diluted LSTMs - Makes LSTMs less computationally intensive and better at learning long-term relationships
• Transformer Neural Networks - more reading required here
• NARX Networks

Many of these recurrent neural network structures appear to show promise for mapping floods. The last experiment into a basic LSTM didn't go too well (it turned out to be hugely computationally expensive), but learning from that experiment I've got lots of options for my next one.

A friend of mine managed to track down the paper behind Google's AI blog post - which turned out to be an interesting read. It transpires that despite the bold words in the blog post, the paper is more of an initial proposal for a research project - rather than a completed project itself. Most of the work they've done is actually using a traditional physics-based model - which they've basically thrown Google-scale compute power at to make pretty graphs - which they've then critically evaluated and identified a number of areas in which they can improve. They've been a bit light on details - which is probably because they either haven't started - or don't want to divulge their sekrets.

I also saw another interesting paper from Google entitled "Machine Learning for Precipitation Nowcasting from Radar Images", which I found because it was reported on by Ars Technica. It describes the short-term forecasting of rain from rainfall radar in the US (I'm in the UK) using a convolutional neural network-based model (specifically U-Net, apparently - I have yet to read up on it).

The model they use is comprised in part of convolutional neural network (CNN) layers that downsample 256x256 tiles to a smaller size, and then upscale it back to the original size. It has some extra connections that skip part of the model too. They claim that their model manages to improve on existing approaches for up to 6 hours in advance - so their network structure seems somewhat promising as inspiration for my own research.

Initial thoughts include theories as to whether I can use CNN layers like this to sandwich a more complex recurrent layer of some description that remembers the long-term relationships? I'll have to experiment.....

Found this interesting? Got a suggestion? Confused on a point? Comment below!

## I've got an apt repository, and you can too

Hey there!

In this post, I want to talk about my apt repository. I've had it for a while, but since it's been working well for me I thought I'd announce it to wider world on here.

For those not in the know, an apt repository is a repository of software in a particular format that the apt package manager (found on Debian-based distributions such as Ubuntu) use to keep software on a machine up-to-date.

The apt package manager queries all repositories it has configured to find out what versions of which packages they have available, and then compares this with those locally installed. Any packages out of date then get upgraded, usually after prompting you to install the updates.

Linux distributions based on Debian come with a large repository of software, but it doesn't have everything. For this reason, extra repositories are often used to deliver updates to software automatically from third parties.

In my case, I've been finding increasingly that I've been wanting to deliver updates for software that isn't packaged for installation with apt to a number of different machines. Every time I get around to installing update it felt like it was time to install another, so naturally I got frustrated enough with it that I decided to automate my problems away by scripting my own apt repository!

My apt repository can be found here: https://starbeamrainbowlabs.com/

It comes in 2 parts. Firstly, there's the repository itself - which is managed by a script that's based on my lantern build engine. It's this I'll be talking about in this post.

Secondly, I have a number of as yet adhoc custom Laminar job scripts for automatically downloading various software projects from GitHub, such that all I have to do is run laminarc queue apt-softwarename and it'll automatically package the latest version and upload it to the repository itself, which has a cron job set to fold in all of the new packages at 2am every night. The specifics of this are best explain in another post.

Currently this process requires me to login and run the laminarc command manually, but I intend to automate this too in the future (I'm currently waiting for a new release of beehive to fix a nasty bug for this).

Anyway, currently I have the following software packaged in my repository:

• Gossa - A simple HTTP file browser
• The Tiled Map Editor - An amazing 2D tile-based graphical map editor. You should sponsor the developer via any of the means on the Tiled Map Editor's website before using my apt package.
• tldr-missing-pages - A small utility script for finding tldr-pages to write
• webhook - A flexible webhook system that calls binaries and shell scripts when a HTTP call is made
• I've also got a pleaserun-based service file generator packaged for this too in the webhook-service package

Of course, more will be coming as and when I discover and start using cool software.

The repository itself is driven by a set of scripts. These scripts were inspired by a stack overflow post that I have since lost, but I made a number of usability improvements and rewrote it to use my lantern build engine as I described above. I call this improved script aptosaurus, because it sounds cool.

To use it, first clone the repository:

git clone https://git.starbeamrainbowlabs.com/sbrl/aptosaurus.git

Then, create a new GPG key to sign your packages with:

gpg --full-generate-key

Next, we need to export the new keypair to disk so that we can use it in scripts. Do that like this:

# Identify the key's ID in the list this prints out
gpg --list-secret-keys
# Export the secret key
gpg --export-secret-keys --armor INSERT_KEY_ID_HERE >secret.gpg
chmod 0600 secret.gpg # Don't forget to lock down the permissions
# Export the public key
gpg --export --armor INSERT_KEY_ID_HERE >public.gpg

Then, run the setup script:

./aptosaurus.sh setup

It should warn you if anything's amiss.

With the setup complete, you can new put your .deb packages in the sources subdirectory. Once done, run the update command to fold them into the repository:

./aptosaurus.sh update

Now you've got your own repository! Your next step is to setup a static web server to serve the repo subdirectory (which contains the repo itself) to the world! Personally, I use Nginx with the following config:

server {
listen  80;
listen  [::]:80;
listen  443 ssl http2;
listen  [::]:443 ssl http2;

server_name apt.starbeamrainbowlabs.com;
ssl_certificate     /etc/letsencrypt/live/$server_name/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/$server_name/privkey.pem;

add_header link '<https://starbeamrainbowlabs.com$request_uri>; rel="canonical"'; index index.html; root /srv/aptosaurus/repo; include /etc/nginx/snippets/letsencrypt.conf; autoindex off; fancyindex on; fancyindex_exact_size off; fancyindex_header header.html; #location ~ /.well-known { # root /srv/letsencrypt; #} } This requires the fancyindex module for Nginx, which can be installed with sudo apt install libnginx-mod-http-fancyindex on Ubuntu-based systems. To add your new apt repository to a machine, simply follow the instructions for my repository, replacing the domain name and the key ids with yours. Hopefully this release announcement-turned-guide has been either interesting, helpful, or both! Do let me know in the comments if you encounter any issues. If there's enough interest I'll migrate the code to GitHub from my personal Git server if people want to make contributions (express said interest in the comments below). It's worth noting that this is only a very simply apt repository. Larger apt repositories are sectioned off into multiple categories by distribution and release status (e.g. the Ubuntu repositories have xenial, bionic, eoan, etc for the version number of Ubuntu, and main, universe, multiverse, restricted, etc for the different categories of software). If you setup your own simple apt repository using this guide, I'd love it if you could let me know with a comment below too. ## Switching TOTP providers from Authy to andOTP Since I first started using 2-factor authentication with TOTP (Time based One Time Passwords), I've been using Authy to store my TOTP secrets. This has worked well for a number of years, but recently I decided that I wanted to change. This was for a number of reasons: 1. I've acquired a large number of TOTP secrets for various websites and services, and I'd like a better way of sorting the list 2. Most of the web services I have TOTP secrets for don't have an icon in Authy - and there are only so many times you can repeat the 6 generic colours before it becomes totally confusing 3. I'd like the backups of my TOTP secrets to be completely self-hosted (i.e. completely on my own infrastructure) After asking on Reddit, I received a recommendation to use andOTP (F-Droid, Google Play). After installing it, I realised that I needed to export my TOTP secrets from Authy first. Unfortunately, it turns out that this isn't an easy process. Many guides tell you to alter the code behind the official Authy Chrome app - and since I don't have Chrome installed (I'm a Firefox user :D), that's not particularly helpful. Thankfully, all is not lost. During my research I found the authy project on GitHub, which is a command-line app - written in Go - temporarily registers as a 'TOTP provider' with Authy and then exports all of your TOTP secrets to a standard text file of URIs. These can then be imported into whatever TOTP-supporting authenticator app you like. Personally, I did this by generating QR codes for each URI and scanning them into my phone. The URIs generated, when converted to a QR code, are actually in the same format that they were originally when you scan them in the first place on the original website. This makes for an easy time importing them - at least from a walled garden. Generating all those QR codes manually isn't much fun though, so I automated the process. This was pretty simple: #!/usr/bin/env bash exec 3<&0; # Copy stdin while read url; do echo "${url}" | qr --error-correction=H;
read -p "Press a enter to continue" <&3; # Pipe in stdin, since we override it with the read loop
done <secrets.txt;

The exec 3<&0 bit copies the standard input to file descriptor 3 for later. Then we enter a while loop, and read in the file that contains the secrets and iterate over it.

For each line, we convert it to a QR code that displays in the terminal with VT-100 ANSI escape codes with the Python program qr.

Finally, after generating each QR code we pause for a moment until we press the enter key, so that we can generate the QR codes 1 at a time. We pipe in file descriptor 3 here that we copied earlier, because inside the while loop the standard input is the file we're reading line-by-line and not the keyboard input.

With my secrets migrated, I set to work changing the labels, images, and tags for each of them. I'm impressed by the number of different icons it supports - and since it's open-source if there's one I really want that it doesn't have, I'm sure I can open a PR to add it. It also encrypts the TOTP secrets database at rest on disk, which is pretty great.

Lastly came the backups. It looks like andOTP is pretty flexible when it comes to backups - supporting plain text files as well as various forms of encrypted file. I opted for the latter, with GPG encryption instead of a password or PIN. I'm sure it'll come back to bite me later when I struggle to decrypt the database in an emergency because I find the gpg CLI terribly difficult to use - perhaps I should take multiple backups encrypted with long and difficult password too.

To encrypt the backups with GPG, you need to have a GPG provider installed on your phone. It recommended that I install OpenKeychain for managing my GPG private keys on Android, which I did. So far, it seems to be functioning as expected too - additionally providing me with a mechanism by which I can encrypt and decrypt files easily and perform other GPG-related tasks...... if only it was this easy in the Linux terminal!

Once setup, I saved my encrypted backups directly to my Nextcloud instance, since it turns out that in Android 10 (or maybe before? I'm not sure) it appears that if you have the Nextcloud app installed it appears as a file system provider when saving things. I'm certainly not complaining!

While I'm still experimenting with my new setup, I'm pretty happy with it at the moment. I'm still considering how I can make my TOTP backups even more secure while not compromising the '2nd factor' nature of the thing, so it's possible I might post again in the future about that.

Next on my security / privacy todo list is to configure my Keepass database to use my Solo for authentication, and possibly figure out how I can get my phone to pretend to be a keyboard to input passwords into machines I don't have my password database configured on :D

Found this interesting? Got a suggestion? Comment below!

At home, I have a Raspberry Pi 3B+ as a home file server. Lately though, I've been noticing that I've been starting to grow out of it (both in terms of compute capacity and storage) - so I've decided to get thinking early about what I can do about it.

I thought of 2 different options pretty quickly:

• Build a 'proper' server

While both of these options are perfectly viable and would serve my needs well, one of them is distinctly more interesting than the other - that being a cluster. While having a 'proper' server would be much simpler, perhaps slightly more power efficient (though I would need tests to confirm, since ARM - the CPU architecture I'm planning on using for the cluster - is more power efficient), and would put all my system resources on the same box, I like the idea of building a cluster for a number of reasons.

For one, I'll learn new skills setting it up and managing it. So far, I've been mostly managing my servers by hand. When you start to acquire a number of machines though, this quickly becomes unwieldy. I'd like to experiment with a containerisation technology (I'm not sure which one yet) and play around with distributing containers across hosts - and auto-restarting them on a different host if 1 host goes down. If this is decentralised, even better!

For another, having a single larger server is a single point of failure - which would be relatively expensive to replace. If I use lots of small machines instead, then if 1 dies then not only is it cheaper to replace, but it's also not as urgent since the other machines in the cluster can take over while I order a replacement.

Finally, having a cluster is just cool. Do we really need more of a reason than this?

With all this in mind, I've been thinking quite a bit about the architecture of such a cluster. I haven't bought anything yet (and probably won't for a while yet) - because as you may have guessed from the title of this post I've been running into a number of issues that all need researching.

First though let's talk about which machines I'm planning on using. I'm actually considering 2 clusters, to solve 2 different issues: compute and storage. Compute refers to running applications (e.g. Nextcloud etc), and storage refers to a distributed storage mechanism with multiple hosts - each with 1 drive attached - though I'm unsure about the storage cluster at this stage.

For the compute cluster, I'm leaning towards 4 x Raspberry Pi 4 with 4GiB of RAM each. For the storage cluster, I'm considering a number of different boards. 3 identical boards of 1 of the following:

I do seem to remember a board that had USB 3 onboard, which would be useful for connecting to the external drives. Currently the plan is to use a SATA to USB converter connect to internal HDDs (e.g. WD Greens) - but I have yet to find one that doesn't include the power connector or splits the power off into a separate USB cable (more on power later). This would be all be backed up by a Gigabit switch of some description (so the Rock Pi S is not a particularly attractive option, since it would be limited to 100MiB).

I've been using HackerBoards.com to discover different boards which may fit my project - but I'm not particularly satisfied with any of the options here so far. Specifically, I'm after Gigabit Ethernet and USB 3 on the same board if possible.

The next issue is software support. I've been bitten by this before, so I'm being extra cautious this time. I'm after a board that provides good software support, so that I can actually use all the hardware I've paid for.

The other thing relating to software that I'd like if possible is the ability to use a systemd-free operating system. Just like before, when I selected Manjaro OpenRC (which is now called Artix Linux), since I already have a number of systems using systemd I would like to balance it out a bit with some systems that use something else. I don't really mind if this is OpenRC, S6, or RunIt - just that it's something different to broaden my skill set.

Unfortunately, it's been a challenge to locate a distribution of Linux that both has broad support for ARM SoCs and does not use systemd. I suspect that I may have to give up on this, but I'm still holding out hope that there's a distribution out there that can do what I want - even if I have to prepare the system image myself (Alpine Linux looks potentially promising, but at the moment it's a huge challenge to figure out whether a chipset supported or not....). Either way, from my research it looks like having mainline Linux kernel support fro my board of choice is critically important to ensure continued support and updates (both feature and security) in the future.

Lastly, I also have power problems. Specifically, how to power the cluster. The big problem is that the Raspberry Pi 4 requires 3A of power max - instead the usual 2.4A in the 3B+ model. Of course, it won't be using this all the time, but it's apparently important that the ceiling of the power supply is 3A to avoid issues. Problem is, most multi-port chargers can barely provide 2A to connected devices - and I have not yet seen one that would provide 3A to 4+ devices and support additional peripherals such as hard drives and other supporting boards as described above.

To this end, I may end up having to build my own power supply from an old ATX supply that you can find in an old desktop PC. These can generally supply plenty of power (though it's always best to check) - but the problem here is that I'd need to do a ton of research to make sure that I wire it up correctly and safely, to avoid issues there too (I'm scared of blowing a fuse or electrocuting someone etc).

This concludes my first blog post on my cluster plans. It may be a while until the next one, as I have lots more research to do before I can continue. Suggestions and tips are welcomed in the comments below.

## Demystificating VPNs

After seeing yet another article that misunderstands and misrepresents VPNs, I just hda to make a post about it. This post actually started life as a reddit comment, but I decided to expand on it and make it a full post here on my blog.

VPNs are a technology that simply sends your traffic through an encrypted tunnel that pops out somewhere else.

For the curious, they do this by creating what's known as a virtual tunnel interface on your computer (on Linux-based machines this is often tun0) and alter your machine's routing table to funnel all your network traffic destined for the Internet into the tunnel interface.

The tunnel interface actually encrypts your data and streams it to a VPN server (though there are exceptions) that then forwards it on to the wider Internet for you.

This is great if:

• You live in a country that censors your Internet connection
• You don't trust your ISP
• You are connected to a public open WiFi hotspot
• You need to access resources on a remote network that only allow those physically present to use them

By using a VPN, you can make your device appear as though it is somewhere else. You can also hide your Internet traffic from the rest of your network that you are connected to.

However, VPNs are not a magic bullet. They are not so great at:

• Blocking trackers
• Blocking mining scripts that suck up your CPU
• Limiting the amount of your data online services get a hold of

This is because it simply makes you appear as though you are somewhere else - it doesn't block or alter any of the traffic coming and going from your device - online services can still see all your personal data - all that's changed when you use a VPN is that your data is going via a waypoint on it's journey to it's final destination.

All hope is not lost, however - for there are steps you can take to deal with these issues. Try these steps instead:

You may already be aware of these points - but in particular multi-account containers are quite interesting. By using an extension like the one I link to above, you can in effect have multiple browsers open at the same time. By this, I mean that you can have multiple 'sandboxes' - and the site data (e.g. cookies etc) will not cross over from 1 sandbox to another.

This gives websites the illusion of being loaded in multiple different environments - with limited options to figure out that they are in fact on the same machine - especially when combined with other measures.

Hopefully this clears up some of the confusion. If you know anyone else who's confused about VPNs, please share a link to this post with them! The fewer people who get the wrong idea about VPNs, the better.

Found this interesting? Have another privacy related question? Found an error in this post? Comment below!

## Own Your Code Series List

Hey there! It's time for another series list. This time it's for my Own Your Code series, where I take a look into Gitea and Laminar CI.

Following this series, I plan to also post about my apt repository, which is hosting a growing list of software - including the tiled map editor (support them with a donation if you can), gossa (a minimalist file browser interface), and webhook - if you find any issues, you can always get in touch.

Anyway, here's the full list of posts in the series in the Own Your Code series:

In the unlikely event I post another entry in this series, I'll come back and update this list. Most likely though I'll be posting related things standalone, rather than part of this series - so subscribe for updates with your favourite method if you'd like to stay up-to-date with my latest blog posts (Atom/RSS, Email, Twitter, Reddit, and Facebook are all supported - just ask if there's something missing).

## Why the TICK stack probably isn't for me

Recently, I've been experimenting with upgrading my monitoring system. The TICK stack consists of a number of different elements:

Together, these 4 programs provide everything you need to monitor your infrastructure, generate graphs, and send alerts via many different channels when things go wrong. This works reasonably well - and to give the developers credit, the level of integration present is pretty awesome. Telegraf seamlessly inserts metrics into InfluxDB, and Chronograf is designed to integrate with the metrics generated by Telegraf.

I haven't tried Kapacitor much yet, but it has an impressive list of integrations. For reference, I've been testing the TICK stack on an old Raspberry Pi 2 that I had lying around. I also tried Grafana too, which I'll talk about later.

The problems start when we talk about the system I've been using up until now (and am continuing to use). I've got a Collectd setup going - with Collectd Graph Panel (CGP) as a web interface, which is backed by RRD databases.

CGP, while it has it's flaws, is pretty cool. Unlike Chronograf, it doesn't require manual configuration when you enable new metric types - it generates graphs automatically. For a small personal home network, I don't really want to be spending hours manually specifying what all the graphs should look like for all the metrics I'm collecting. It's seriously helpful to have it done automatically.

Grafana also stumbles here. Before I installed the CK part of the TICK stack, I tried Grafana. After some initial installation issues (the Raspberry Pi 2's CPU only supports up to ARMv6, and Grafana uses ARMv7l instructions, causing some awkward and unfortunate issues that were somewhat difficult to track down). While it has an incredible array of different graphs and visualisations you can configure, like Chronograf it doesn't generate any of these graphs for you automatically.

Both solutions do have an import / export system for dashboards, which allows you to share prebuilt dashboards - but this isn't the same as automatic graph generation.

The other issue with the TICK stack is how heavy it is. Spoiler: it's very heavy indeed - especially InfluxDB. It managed to max out my poor Raspberry Pi 2's CPU - and ate all my RAM too! It look quite a bit of tuning to configure it such that it didn't eat all of my RAM for breakfast and knock my SSH session offline.

I'm sure that in a business setting you'd have heaps of resources just waiting to be dedicated to monitoring everything from your mission-critical servers to your cat's lunch - but in a home setting it takes up more resources passively when it isn't even doing anything than everything else I'm monitoring..... combined!

It's for these reasons that I'm probably not going to end up using the TICK (or TIG, for that matter) stack. For the reasons I've explained above, while it's great - it's just not for me. What I'm going to use instead though, I'm not sure. Development on CGP ceased in 2017 (or probably before that) - and I've got a growing list of features I'd like to add to it - including (but not limited to) fixing the SMART metrics display, reconfiguring the length of time metrics are stored for, and fixing a super annoying bug that makes the graphs go nuts when you scroll on them on a touchpad with precise scrolling enabled.

Got a suggestion for another different system I could try? Comment below!

## Happy Christmas 2019!

I've been otherwise occupied enjoying my Christmas holiday this year - and I hope you have been too. Have a picture of a bauble:

## Multi-boot + data + multi-partition = octopus flash drive 2.0?

A while ago, I posted about a multi-boot flash drive. That approach has served me well, but I got a new flash drive a while ago - and for some reason I could never get it to be bootable in the same way.

After a frustrating experience trying to image a yet another machine and not being able to find a free flash drive, I decided that enough was enough and that I'd do something about it. My requirements are as follows:

1. It has to be bootable via legacy BIOS
2. It has to be bootable via (U)EFI
3. I don't want multiple configuration files for each booting method
4. I want to be able to store other files on it too
5. I want it to be recognised by all major operating systems
6. I want to be able to fiddle with the grub configuration without manually mounting a partition

Quite the list! I can confirm that this is all technically achievable - it just takes a bit of work to do so. In this post, I'll outline how you can do it too - with reasoning at each step as to why it's necessary.

Start by finding a completely free flash drive. Note that you'll lose all the data that's currently stored on it, because we need to re-partition it.

I used the excellent GParted for this purpose, which is included in the Ubuntu live CD for those without a supported operating system.

Start by creating a brand-new gpt partition table. We're using GPT here because I believe it's required for (U)EFI booting. I haven't run into a machine that doesn't understand it yet, but there's always a hybrid partition that you can look into if you have issues.

Once done, create a FAT32 partition that fills all but the last 128MiB or so of the disk. Let's call this one DATA.

Next, create another partition that fills the remaining ~128MiB of the disk. Let's call this one EFI.

Write these to disk. Once done, right click on each partition in turn and click "manage flags". Set them as such:

Partition Filesystem Flags
DATA FAT32 msftdata
EFI FAT32 esp, boot

This is important, because only partitions marked with the boot flag can be booted from via EFI. Partitions marked boot also have to be marked esp apparently, which is mutually exclusive with the msftdata flag. The other problem is that only partitions marked with msftdata will be auto-detected by operating systems in a GPT partition table.

It is for this reason that we need to have a separate partition marked as esp and boot - otherwise operating systems wouldn't detect and automount our flash drive.

Once you've finished setting the flags, close GParted and mount the partitions. Windows users may have to use a Linux virtual machine and pass the flash drive in via USB passthrough.

Next, we'll need to copy a pair of binary files to the EFI partition to allow it to boot via EFI. These can be found in this zip archive, which is part of this tutorial that I linked to in my previous post I linked to above. Extract the EFI directory from the zip archive to the EFI partition we created, and leave the rest.

Next, we need to install grub to the EFI partition. We need to do this twice:

• Once for (U)EFI booting
• Once for legacy bios booting

Before you continue, make sure that your host machine is not Ubuntu 19.10. This is really important - as there's a bug in the grub 2.04 version used in Ubuntu 19.10 that basically renders the loopback command (used for booting ISOs) useless when booting via UEFI! Try Ubuntu 18.04 - hopefully it'll get fixed soon.

This can be done like so:

# Install for UEFI boot:
sudo grub-install --target x86_64-efi --force --removable --boot-directory=/media/sbrl/EFI --efi-directory=/media/sbrl/EFI /dev/sdb
# Install for legacy BIOS boot:
sudo grub-install --target=i386-pc --force --removable --boot-directory=/media/sbrl/EFI /dev/sdb --removable

It might complain a bit, but you should be able to (mostly) ignore it.

This is actually ok - as this Unix Stack Exchange post explains - as the two installations don't actually clash with each other and just happen to load and use the same configuration file in the end.

If you have trouble, make sure that you've got the right packages installed with your package manager (apt on Linux-based systems). Most systems will be missing 1 of the following, as it seems that the installer will only install the one that's required for your system:

• For BIOS booting, grub-pc-bin needs to be installed via apt.
• For UEFI booting grub-efi-amd64-bin needs to be installed via apt.

Note that installing these packages won't mess with the booting of your host machine you're working on - it's the grub-pc and grub-efi-amd64 packages that do that.

Next, we can configure grub. This is a 2-step process, as we don't want the main grub configuration file on the EFI partition because of requirement #6 above.

Thankfully, we can achieve this by getting grub to dynamically load a second configuration file, in which we will store our actual configuration.

Create the file grub/grub.cfg on the EFI partition, and paste this inside:

# Load the configfile on the main partition
configfile (hd0,gpt1)/images/grub.cfg

In grub, partitioned block devices are called hdX, where X is a number indexed from 0. Partitions on a block device are specified by a comma, followed by the partition type and the number of the partition (which starts from 1, oddly enough). The block device grub booted from is always device 0.

In the above, we specify that we want to dynamically load the configuration file that's located on the first partition (the DATA partition) of the disk that it booted from. I did it this way around, because I suspect that Windows still has that age-old bug where it will only look at the first partition of a flash drive - which would be marked as esp + boot and thus hidden if we had them the other way around. I haven't tested this though, so I could be wrong.

Now, we can create that other grub configuration file on the DATA partition. I'm storing all my ISOs and the grub configuration file in question in a folder called images (specifically my main grub configuration file is located at /images/grub.cfg on the DATA partition), but you can put it wherever you like - just remember to edit above the grub configuration file on the EFI partition - otherwise grub will get confused and complain it can't find the configuration file on the DATA partition.

For example, here's a (cut-down) portion of my grub configuration file:

# Just a header message - selecting this basically has no effect
menuentry "*** Bootable Images ***" { true }

set isofile="/images/ubuntu-18.04.3-desktop-amd64.iso"
set isoversion="18.04 Bionic Beaver"
#echo "ISO file: ${isofile}, version:${isoversion}";

loopback loop $isofile menuentry "[x64] Ubuntu Desktop${isoversion}" {
linux (loop)/casper/vmlinuz boot=casper setkmap=uk eject noprompt splash  iso-scan/filename=${isofile} -- initrd (loop)/casper/initrd } menuentry "[x64] [ejectable] Ubuntu Desktop${isoversion}" {
linux (loop)/casper/vmlinuz boot=casper iso-scan/filename=$isofile setkmap=uk eject noprompt splash toram iso-scan/filename=${isofile} --
initrd (loop)/casper/initrd
}
menuentry "[x64] [install] Ubuntu Desktop ${isoversion}" { linux (loop)/capser/vmlinuz file=/cdrom/preseed/ubuntu.seed only-ubiquity quiet iso-scan/filename=${isofile} --
initrd (loop)/install/initrd
}
}

# Artix Linux
set isofile="/images/artix-lxqt-openrc-20181008-x86_64.iso"

probe -u $root --set=rootuuid set imgdevpath="/dev/disk/by-uuid/$rootuuid"

loopback loop $isofile probe -l loop --set=isolabel linux (loop)/arch/boot/x86_64/vmlinuz archisodevice=/dev/loop0 img_dev=$imgdevpath img_loop=$isofile archisolabel=$isolabel earlymodules=loop
initrd (loop)/arch/boot/x86_64/archiso.img
}

set isofile="/images/Fedora-Workstation-Live-x86_64-31-1.9.iso"

echo "Setting up loopback"
loopback loop "${isofile}" probe -l loop --set=isolabel echo "ISO Label is${isolabel}"

echo "Booting...."
linux (loop)/isolinux/vmlinuz iso-scan/filename="${isofile}" root=live:CDLABEL=$isolabel  rd.live.image
initrd (loop)/isolinux/initrd.img
}

linux (loop)/VMLINUZ setkmap=uk isoloop=\$isofile
# initrd (loop)/initrd.cgz
initrd (loop)/initrd
}

linux16 /images/memtest86+.bin
}

submenu "Boot from Hard Drive" {
set root=(hd0)
}
set root=(hd1)
}
set root=(hd2)
}
set root=(hd3)
}
}

If you're really interested in building on your grub configuration file, I'll include some useful links at the bottom of this post. Specifically, having an understanding of the Linux boot process can be helpful for figuring out how to boot a specific Linux ISO if you can't find any instructions on how to do so. These steps might help if you are having issues figuring out the right parameters to boot a specific ISO:

• Use your favourite search engine and search for Boot DISTRO_NAME_HERE iso with grub or something similar
• Try the links at the bottom of this post to see if they have the parameters you need
• Try looking for a configuration for a more recent version of the distribution
• Try using the configuration from a similar distribution (e.g. Artix is similar to Manjaro - it's the successor to Manjaro OpenRC, which is derived from Arch Linux)
• Open the ISO up and look for the grub configuration file for a clue
• Try booting it with memdisk
• Ask on the distribution's forums

Memdisk is a tool that copies a given ISO into RAM, and then chainloads it (as far as I'm aware). It can actually be used with grub (despite the fact that you might read that it's only compatible with syslinux):

menuentry "Title" {
linux16 /images/memdisk iso
initrd16 /path/to/linux.iso
}

Sometimes it can help with particularly stubborn ISOs. If you're struggling to find a copy of it out on the web, here's the version I use - though I don't remember where I got it from (if you know, post a comment below and I'll give you attribution).

That concludes this (quite lengthly!) tutorial on creating the, in my opinion, ultimate multi-boot everything flash drive. My future efforts with respect to my flash drive will be directed in the following areas:

• Building a complete portable environment for running practically all the software I need when out and about
• Finding useful ISOs to include on my flash drive
• Anything else that increases the usefulness of flash drive that I haven't thought of yet

If you've got any cool suggestions (or questions about the process) - comment below!

## PhD Update 1: Directions

Welcome to my first PhD update post. I intend to post these at bimonthly intervals. In the last post, I talked a bit about my PhD project that I'm doing and my initial thoughts. Since then, I've done heaps of investigation into a number of different potential directions I could take the project. For reference, my PhD title is actually as follows:

Using the Internet of Things, Big Data, and AI to dynamically map flood risk.

There are 3 main elements to this project:

• Big Data
• Artificial Intelligence (AI)
• The Internet of Things (IoT)

I'm pretty sure that each of them will have an important role to play in the final product - even if I'm not sure what those roles are just yet :P

Particularly of concern at the moment is this blog post by Google. It talks about they've managed to significantly improve flood forecasting with AI along with a seriously impressive visualisation to back it up - but I can't find a paper on it anywhere. I'm concerned that anything I try to do in the area won't be useful if they are already streets ahead of everyone else like that.

I guess one of the strong points I should try to hit is the concept of explainable AI if possible.

### All the data sources!

As it stands right now, I'm currently evaluating various different potential data sources that I've managed to gain access to. My aim here is to evaluate how useful they will be in solving the wider problem - and whether they are useful enough to be worth investigating further.

#### Environment Agency

Some great people from the environment agency came into University recently to chat with us about what they did. The discussion we had was very interesting - but they also asked if there was anything they could do to help our PhD projects out.

Seeing the opportunity, I jumped at the chance to get a hold of some of their historical datasets. They actually maintain a network of high-quality sensors across the country that monitor everything from rainfall to river statistics. While they have a real-time API that you can use to download recent measurements, it doesn't appear to go back further than March 2017. To this end, I asked for data from 2005 up to the end of 2017, so that I could get a clearer picture of the 2007 and 2013 floods for AI training purposes.

So far, this dataset has proved very useful at least initially as a testbed for training various kinds of AI as I learn PyTorch (see my recent post for how that has been going - I've started with a basic LSTM first. For reference, an LSTM is a neural network architecture that is good at processing time-series data - but is quite computationally expensive to run.

#### Met Office

I've also been investigating the datasets that the Met Office provide. These chiefly appear to be in the form of their free DataPoint API. Particularly of interest are their rainfall radar images, which are 500x500 pixels and are released every 15 minutes. Sadly they are only available for a few hours at best, so you have to grab them fast if you want to be able to analyse particularly interesting ones later.

Annoyingly though, their API does not appear to give any hints as to the bounding boxes of these images - and neither can I find any information about this online. I posted in their support forum, but it doesn't appear that anyone actually monitors it - so at this point I suspect that I'm unlikely to receive a response. Without knowing the (lat, lng) co-ordinates of the images produced by the API, they are little more use than pretty wall art.

#### Internet of Things

On the Internet of Things front, I'm already part of Connected Humber, which have a network of sensors setup that are monitoring everything from air quality to temperature, humidity, and air pressure. While these things aren't directly related to my project, the dataset that we're collecting as a group may very well come in handy as an input to a model of some description.

I'm pretty sure that I'll need to setup some additional custom sensors of my own at some point (probably soonish too) to collect the measurement readings that I'm missing from other pre-existing datasets.

Whilst I've been doing this, I've also been reading up a storm. I've started by reading into traditional physics-based flood modelling simulations (such as caesar-lisflood) - which appear to fall into a number of different categories, which also have sub-categories. It's quite a rabbit hole - but apparently I'm diving all the way down to the very bottom.

The most interesting paper on this subject I found was this one from 2017. It splits physics-based models up into 3 categories:

• Empirical models (i.e. ones that just display sensor readings, calculate some statistics, and that's about it)
• Hydrodynamic models - the best-known models that simulate water flow etc - can be categorised as either 1D, 2D, or 3D - also very computationally expensive - especially in higher dimensions
• Simplified conceptual models - don't actually simulate water flow, but efficient enough to be used on large areas - also can be quite inaccurate with complex terrain etc.

As I'm going to be using artificial intelligence as the core of my project, it quickly became evident that this is just stage-setting for the actual kind of work I'll be doing. After winding my way through a bunch of other less interesting papers, I found my way to this paper from 2018 next, which is similar to the previous one I linked to - just for AI and flood modelling.

While I haven't yet had a chance to follow up on all the interesting papers referenced, it has a number of interesting points to keep in mind:

• Artificial Intelligences need lots of diverse data points to train well
• It's important to measure a trained network's ability to generalise what it's learnt to other situations it hasn't seen yet

The odd thing about this paper is that it claims that regular neural networks were better than recurrent neural network structures - despite the fact that it is only citing a single old 2013 paper (which I haven't yet read). This led me on to read a few more papers - all of which were mildly interesting and had at least something to do with neural networks.

I certainly haven't read everything yet about flood modelling and AI, so I've got quite a way to go until I'm done in this department. Also of interest are 2 newer neural network architectures which I'm currently reading about:

### Next steps

I want to continue to read about the above neural networks. I also want to implement a number of the networks I've read about in PyTorch to continue to learn the library.

Lastly, I want to continue to find new datasets to explore. If you're aware of a dataset that I haven't yet talked about on here, comment below!

Art by Mythdael