Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Demystificating VPNs

After seeing yet another article that misunderstands and misrepresents VPNs, I just hda to make a post about it. This post actually started life as a reddit comment, but I decided to expand on it and make it a full post here on my blog.

VPNs are a technology that simply sends your traffic through an encrypted tunnel that pops out somewhere else.

For the curious, they do this by creating what's known as a virtual tunnel interface on your computer (on Linux-based machines this is often tun0) and alter your machine's routing table to funnel all your network traffic destined for the Internet into the tunnel interface.

The tunnel interface actually encrypts your data and streams it to a VPN server (though there are exceptions) that then forwards it on to the wider Internet for you.

This is great if:

  • You live in a country that censors your Internet connection
  • You don't trust your ISP
  • You are connected to a public open WiFi hotspot
  • You need to access resources on a remote network that only allow those physically present to use them

By using a VPN, you can make your device appear as though it is somewhere else. You can also hide your Internet traffic from the rest of your network that you are connected to.

However, VPNs are not a magic bullet. They are not so great at:

  • Blocking trackers
  • Blocking Ads
  • Blocking mining scripts that suck up your CPU
  • Limiting the amount of your data online services get a hold of

This is because it simply makes you appear as though you are somewhere else - it doesn't block or alter any of the traffic coming and going from your device - online services can still see all your personal data - all that's changed when you use a VPN is that your data is going via a waypoint on it's journey to it's final destination.

All hope is not lost, however - for there are steps you can take to deal with these issues. Try these steps instead:

You may already be aware of these points - but in particular multi-account containers are quite interesting. By using an extension like the one I link to above, you can in effect have multiple browsers open at the same time. By this, I mean that you can have multiple 'sandboxes' - and the site data (e.g. cookies etc) will not cross over from 1 sandbox to another.

This gives websites the illusion of being loaded in multiple different environments - with limited options to figure out that they are in fact on the same machine - especially when combined with other measures.

Hopefully this clears up some of the confusion. If you know anyone else who's confused about VPNs, please share a link to this post with them! The fewer people who get the wrong idea about VPNs, the better.

Found this interesting? Have another privacy related question? Found an error in this post? Comment below!

Own Your Code Series List

Hey there! It's time for another series list. This time it's for my Own Your Code series, where I take a look into Gitea and Laminar CI.

Following this series, I plan to also post about my apt repository, which is hosting a growing list of software - including the tiled map editor (support them with a donation if you can), gossa (a minimalist file browser interface), and webhook - if you find any issues, you can always get in touch.

Anyway, here's the full list of posts in the series in the Own Your Code series:

In the unlikely event I post another entry in this series, I'll come back and update this list. Most likely though I'll be posting related things standalone, rather than part of this series - so subscribe for updates with your favourite method if you'd like to stay up-to-date with my latest blog posts (Atom/RSS, Email, Twitter, Reddit, and Facebook are all supported - just ask if there's something missing).

Why the TICK stack probably isn't for me

Recently, I've been experimenting with upgrading my monitoring system. The TICK stack consists of a number of different elements:

Together, these 4 programs provide everything you need to monitor your infrastructure, generate graphs, and send alerts via many different channels when things go wrong. This works reasonably well - and to give the developers credit, the level of integration present is pretty awesome. Telegraf seamlessly inserts metrics into InfluxDB, and Chronograf is designed to integrate with the metrics generated by Telegraf.

I haven't tried Kapacitor much yet, but it has an impressive list of integrations. For reference, I've been testing the TICK stack on an old Raspberry Pi 2 that I had lying around. I also tried Grafana too, which I'll talk about later.

The problems start when we talk about the system I've been using up until now (and am continuing to use). I've got a Collectd setup going - with Collectd Graph Panel (CGP) as a web interface, which is backed by RRD databases.

CGP, while it has it's flaws, is pretty cool. Unlike Chronograf, it doesn't require manual configuration when you enable new metric types - it generates graphs automatically. For a small personal home network, I don't really want to be spending hours manually specifying what all the graphs should look like for all the metrics I'm collecting. It's seriously helpful to have it done automatically.

Grafana also stumbles here. Before I installed the CK part of the TICK stack, I tried Grafana. After some initial installation issues (the Raspberry Pi 2's CPU only supports up to ARMv6, and Grafana uses ARMv7l instructions, causing some awkward and unfortunate issues that were somewhat difficult to track down). While it has an incredible array of different graphs and visualisations you can configure, like Chronograf it doesn't generate any of these graphs for you automatically.

Both solutions do have an import / export system for dashboards, which allows you to share prebuilt dashboards - but this isn't the same as automatic graph generation.

The other issue with the TICK stack is how heavy it is. Spoiler: it's very heavy indeed - especially InfluxDB. It managed to max out my poor Raspberry Pi 2's CPU - and ate all my RAM too! It look quite a bit of tuning to configure it such that it didn't eat all of my RAM for breakfast and knock my SSH session offline.

I'm sure that in a business setting you'd have heaps of resources just waiting to be dedicated to monitoring everything from your mission-critical servers to your cat's lunch - but in a home setting it takes up more resources passively when it isn't even doing anything than everything else I'm monitoring..... combined!

It's for these reasons that I'm probably not going to end up using the TICK (or TIG, for that matter) stack. For the reasons I've explained above, while it's great - it's just not for me. What I'm going to use instead though, I'm not sure. Development on CGP ceased in 2017 (or probably before that) - and I've got a growing list of features I'd like to add to it - including (but not limited to) fixing the SMART metrics display, reconfiguring the length of time metrics are stored for, and fixing a super annoying bug that makes the graphs go nuts when you scroll on them on a touchpad with precise scrolling enabled.

Got a suggestion for another different system I could try? Comment below!

Happy Christmas 2019!

I've been otherwise occupied enjoying my Christmas holiday this year - and I hope you have been too. Have a picture of a bauble:

Multi-boot + data + multi-partition = octopus flash drive 2.0?

A while ago, I posted about a multi-boot flash drive. That approach has served me well, but I got a new flash drive a while ago - and for some reason I could never get it to be bootable in the same way.

After a frustrating experience trying to image a yet another machine and not being able to find a free flash drive, I decided that enough was enough and that I'd do something about it. My requirements are as follows:

  1. It has to be bootable via legacy BIOS
  2. It has to be bootable via (U)EFI
  3. I don't want multiple configuration files for each booting method
  4. I want to be able to store other files on it too
  5. I want it to be recognised by all major operating systems
  6. I want to be able to fiddle with the grub configuration without manually mounting a partition

Quite the list! I can confirm that this is all technically achievable - it just takes a bit of work to do so. In this post, I'll outline how you can do it too - with reasoning at each step as to why it's necessary.

Start by finding a completely free flash drive. Note that you'll lose all the data that's currently stored on it, because we need to re-partition it.

I used the excellent GParted for this purpose, which is included in the Ubuntu live CD for those without a supported operating system.

Start by creating a brand-new gpt partition table. We're using GPT here because I believe it's required for (U)EFI booting. I haven't run into a machine that doesn't understand it yet, but there's always a hybrid partition that you can look into if you have issues.

Once done, create a FAT32 partition that fills all but the last 128MiB or so of the disk. Let's call this one DATA.

Next, create another partition that fills the remaining ~128MiB of the disk. Let's call this one EFI.

Write these to disk. Once done, right click on each partition in turn and click "manage flags". Set them as such:

Partition Filesystem Flags
DATA FAT32 msftdata
EFI FAT32 esp, boot

This is important, because only partitions marked with the boot flag can be booted from via EFI. Partitions marked boot also have to be marked esp apparently, which is mutually exclusive with the msftdata flag. The other problem is that only partitions marked with msftdata will be auto-detected by operating systems in a GPT partition table.

It is for this reason that we need to have a separate partition marked as esp and boot - otherwise operating systems wouldn't detect and automount our flash drive.

Once you've finished setting the flags, close GParted and mount the partitions. Windows users may have to use a Linux virtual machine and pass the flash drive in via USB passthrough.

Next, we'll need to copy a pair of binary files to the EFI partition to allow it to boot via EFI. These can be found in this zip archive, which is part of this tutorial that I linked to in my previous post I linked to above. Extract the EFI directory from the zip archive to the EFI partition we created, and leave the rest.

Next, we need to install grub to the EFI partition. We need to do this twice:

  • Once for (U)EFI booting
  • Once for legacy bios booting

Before you continue, make sure that your host machine is not Ubuntu 19.10. This is really important - as there's a bug in the grub 2.04 version used in Ubuntu 19.10 that basically renders the loopback command (used for booting ISOs) useless when booting via UEFI! Try Ubuntu 18.04 - hopefully it'll get fixed soon.

This can be done like so:

# Install for UEFI boot:
sudo grub-install --target x86_64-efi --force --removable --boot-directory=/media/sbrl/EFI --efi-directory=/media/sbrl/EFI /dev/sdb
# Install for legacy BIOS boot:
sudo grub-install --target=i386-pc --force --removable --boot-directory=/media/sbrl/EFI /dev/sdb --removable

It might complain a bit, but you should be able to (mostly) ignore it.

This is actually ok - as this Unix Stack Exchange post explains - as the two installations don't actually clash with each other and just happen to load and use the same configuration file in the end.

If you have trouble, make sure that you've got the right packages installed with your package manager (apt on Linux-based systems). Most systems will be missing 1 of the following, as it seems that the installer will only install the one that's required for your system:

  • For BIOS booting, grub-pc-bin needs to be installed via apt.
  • For UEFI booting grub-efi-amd64-bin needs to be installed via apt.

Note that installing these packages won't mess with the booting of your host machine you're working on - it's the grub-pc and grub-efi-amd64 packages that do that.

Next, we can configure grub. This is a 2-step process, as we don't want the main grub configuration file on the EFI partition because of requirement #6 above.

Thankfully, we can achieve this by getting grub to dynamically load a second configuration file, in which we will store our actual configuration.

Create the file grub/grub.cfg on the EFI partition, and paste this inside:

# Load the configfile on the main partition
configfile (hd0,gpt1)/images/grub.cfg

In grub, partitioned block devices are called hdX, where X is a number indexed from 0. Partitions on a block device are specified by a comma, followed by the partition type and the number of the partition (which starts from 1, oddly enough). The block device grub booted from is always device 0.

In the above, we specify that we want to dynamically load the configuration file that's located on the first partition (the DATA partition) of the disk that it booted from. I did it this way around, because I suspect that Windows still has that age-old bug where it will only look at the first partition of a flash drive - which would be marked as esp + boot and thus hidden if we had them the other way around. I haven't tested this though, so I could be wrong.

Now, we can create that other grub configuration file on the DATA partition. I'm storing all my ISOs and the grub configuration file in question in a folder called images (specifically my main grub configuration file is located at /images/grub.cfg on the DATA partition), but you can put it wherever you like - just remember to edit above the grub configuration file on the EFI partition - otherwise grub will get confused and complain it can't find the configuration file on the DATA partition.

For example, here's a (cut-down) portion of my grub configuration file:

# Just a header message - selecting this basically has no effect
menuentry "*** Bootable Images ***" { true }

submenu "Ubuntu" {
    set isofile="/images/ubuntu-18.04.3-desktop-amd64.iso"
    set isoversion="18.04 Bionic Beaver"
    #echo "ISO file: ${isofile}, version: ${isoversion}";

    loopback loop $isofile

    menuentry "[x64] Ubuntu Desktop ${isoversion}" {
        linux (loop)/casper/vmlinuz boot=casper setkmap=uk eject noprompt splash  iso-scan/filename=${isofile} --
        initrd (loop)/casper/initrd
    }
    menuentry "[x64] [ejectable] Ubuntu Desktop ${isoversion}" {
        linux (loop)/casper/vmlinuz boot=casper iso-scan/filename=$isofile setkmap=uk eject noprompt splash toram iso-scan/filename=${isofile} --
        initrd (loop)/casper/initrd
    }
    menuentry "[x64] [install] Ubuntu Desktop ${isoversion}" {
        linux (loop)/capser/vmlinuz  file=/cdrom/preseed/ubuntu.seed only-ubiquity quiet iso-scan/filename=${isofile} --
        initrd (loop)/install/initrd
    }
}


# Artix Linux
menuentry "Artix Linux" {
    set isofile="/images/artix-lxqt-openrc-20181008-x86_64.iso"

    probe -u $root --set=rootuuid
    set imgdevpath="/dev/disk/by-uuid/$rootuuid"

    loopback loop $isofile
    probe -l loop --set=isolabel

    linux (loop)/arch/boot/x86_64/vmlinuz archisodevice=/dev/loop0 img_dev=$imgdevpath img_loop=$isofile archisolabel=$isolabel earlymodules=loop
    initrd (loop)/arch/boot/x86_64/archiso.img
}

menuentry "Fedora Workstation 31" {
    set isofile="/images/Fedora-Workstation-Live-x86_64-31-1.9.iso"

    echo "Setting up loopback"
    loopback loop "${isofile}" 
    probe -l loop --set=isolabel
    echo "ISO Label is ${isolabel}"

    echo "Booting...."
    linux (loop)/isolinux/vmlinuz iso-scan/filename="${isofile}" root=live:CDLABEL=$isolabel  rd.live.image
    initrd (loop)/isolinux/initrd.img
}

menuentry "Offline Password Changer [01/02/2014]" {
    loopback loop /images/offline_password_changer.iso
    linux (loop)/VMLINUZ setkmap=uk isoloop=$isofile
    # initrd (loop)/initrd.cgz
    initrd (loop)/initrd
}

menuentry "Memtest 86+ 5.01" {
    linux16 /images/memtest86+.bin
}

submenu "Boot from Hard Drive" {
    menuentry "Hard Drive 0" {
        set root=(hd0)
        chainloader +1
    }
    menuentry "Hard Drive 1" {
        set root=(hd1)
        chainloader +1
    }
    menuentry "Hard Drive 2" {
        set root=(hd2)
        chainloader +1
    }
    menuentry "Hard Drive 3" {
        set root=(hd3)
        chainloader +1
    }
}

If you're really interested in building on your grub configuration file, I'll include some useful links at the bottom of this post. Specifically, having an understanding of the Linux boot process can be helpful for figuring out how to boot a specific Linux ISO if you can't find any instructions on how to do so. These steps might help if you are having issues figuring out the right parameters to boot a specific ISO:

  • Use your favourite search engine and search for Boot DISTRO_NAME_HERE iso with grub or something similar
  • Try the links at the bottom of this post to see if they have the parameters you need
  • Try looking for a configuration for a more recent version of the distribution
  • Try using the configuration from a similar distribution (e.g. Artix is similar to Manjaro - it's the successor to Manjaro OpenRC, which is derived from Arch Linux)
  • Open the ISO up and look for the grub configuration file for a clue
  • Try booting it with memdisk
  • Ask on the distribution's forums

Memdisk is a tool that copies a given ISO into RAM, and then chainloads it (as far as I'm aware). It can actually be used with grub (despite the fact that you might read that it's only compatible with syslinux):

menuentry "Title" {
    linux16 /images/memdisk iso
    initrd16 /path/to/linux.iso
}

Sometimes it can help with particularly stubborn ISOs. If you're struggling to find a copy of it out on the web, here's the version I use - though I don't remember where I got it from (if you know, post a comment below and I'll give you attribution).

That concludes this (quite lengthly!) tutorial on creating the, in my opinion, ultimate multi-boot everything flash drive. My future efforts with respect to my flash drive will be directed in the following areas:

  • Building a complete portable environment for running practically all the software I need when out and about
  • Finding useful ISOs to include on my flash drive
  • Anything else that increases the usefulness of flash drive that I haven't thought of yet

If you've got any cool suggestions (or questions about the process) - comment below!

Sources and Further Reading

PhD Update 1: Directions

Welcome to my first PhD update post. I intend to post these at bimonthly intervals. In the last post, I talked a bit about my PhD project that I'm doing and my initial thoughts. Since then, I've done heaps of investigation into a number of different potential directions I could take the project. For reference, my PhD title is actually as follows:

Using the Internet of Things, Big Data, and AI to dynamically map flood risk.

There are 3 main elements to this project:

  • Big Data
  • Artificial Intelligence (AI)
  • The Internet of Things (IoT)

I'm pretty sure that each of them will have an important role to play in the final product - even if I'm not sure what those roles are just yet :P

Particularly of concern at the moment is this blog post by Google. It talks about they've managed to significantly improve flood forecasting with AI along with a seriously impressive visualisation to back it up - but I can't find a paper on it anywhere. I'm concerned that anything I try to do in the area won't be useful if they are already streets ahead of everyone else like that.

I guess one of the strong points I should try to hit is the concept of explainable AI if possible.

All the data sources!

As it stands right now, I'm currently evaluating various different potential data sources that I've managed to gain access to. My aim here is to evaluate how useful they will be in solving the wider problem - and whether they are useful enough to be worth investigating further.

Environment Agency

Some great people from the environment agency came into University recently to chat with us about what they did. The discussion we had was very interesting - but they also asked if there was anything they could do to help our PhD projects out.

Seeing the opportunity, I jumped at the chance to get a hold of some of their historical datasets. They actually maintain a network of high-quality sensors across the country that monitor everything from rainfall to river statistics. While they have a real-time API that you can use to download recent measurements, it doesn't appear to go back further than March 2017. To this end, I asked for data from 2005 up to the end of 2017, so that I could get a clearer picture of the 2007 and 2013 floods for AI training purposes.

So far, this dataset has proved very useful at least initially as a testbed for training various kinds of AI as I learn PyTorch (see my recent post for how that has been going - I've started with a basic LSTM first. For reference, an LSTM is a neural network architecture that is good at processing time-series data - but is quite computationally expensive to run.

Met Office

I've also been investigating the datasets that the Met Office provide. These chiefly appear to be in the form of their free DataPoint API. Particularly of interest are their rainfall radar images, which are 500x500 pixels and are released every 15 minutes. Sadly they are only available for a few hours at best, so you have to grab them fast if you want to be able to analyse particularly interesting ones later.

Annoyingly though, their API does not appear to give any hints as to the bounding boxes of these images - and neither can I find any information about this online. I posted in their support forum, but it doesn't appear that anyone actually monitors it - so at this point I suspect that I'm unlikely to receive a response. Without knowing the (lat, lng) co-ordinates of the images produced by the API, they are little more use than pretty wall art.

Internet of Things

On the Internet of Things front, I'm already part of Connected Humber, which have a network of sensors setup that are monitoring everything from air quality to temperature, humidity, and air pressure. While these things aren't directly related to my project, the dataset that we're collecting as a group may very well come in handy as an input to a model of some description.

I'm pretty sure that I'll need to setup some additional custom sensors of my own at some point (probably soonish too) to collect the measurement readings that I'm missing from other pre-existing datasets.

Reading a library

Whilst I've been doing this, I've also been reading up a storm. I've started by reading into traditional physics-based flood modelling simulations (such as caesar-lisflood) - which appear to fall into a number of different categories, which also have sub-categories. It's quite a rabbit hole - but apparently I'm diving all the way down to the very bottom.

The most interesting paper on this subject I found was this one from 2017. It splits physics-based models up into 3 categories:

  • Empirical models (i.e. ones that just display sensor readings, calculate some statistics, and that's about it)
  • Hydrodynamic models - the best-known models that simulate water flow etc - can be categorised as either 1D, 2D, or 3D - also very computationally expensive - especially in higher dimensions
  • Simplified conceptual models - don't actually simulate water flow, but efficient enough to be used on large areas - also can be quite inaccurate with complex terrain etc.

As I'm going to be using artificial intelligence as the core of my project, it quickly became evident that this is just stage-setting for the actual kind of work I'll be doing. After winding my way through a bunch of other less interesting papers, I found my way to this paper from 2018 next, which is similar to the previous one I linked to - just for AI and flood modelling.

While I haven't yet had a chance to follow up on all the interesting papers referenced, it has a number of interesting points to keep in mind:

  • Artificial Intelligences need lots of diverse data points to train well
  • It's important to measure a trained network's ability to generalise what it's learnt to other situations it hasn't seen yet

The odd thing about this paper is that it claims that regular neural networks were better than recurrent neural network structures - despite the fact that it is only citing a single old 2013 paper (which I haven't yet read). This led me on to read a few more papers - all of which were mildly interesting and had at least something to do with neural networks.

I certainly haven't read everything yet about flood modelling and AI, so I've got quite a way to go until I'm done in this department. Also of interest are 2 newer neural network architectures which I'm currently reading about:

Next steps

I want to continue to read about the above neural networks. I also want to implement a number of the networks I've read about in PyTorch to continue to learn the library.

Lastly, I want to continue to find new datasets to explore. If you're aware of a dataset that I haven't yet talked about on here, comment below!

PyTorch and the GPU: A tale of graphics cards

Recently, I've been learning PyTorch - which is an artificial intelligence / deep learning framework in Python. While I'm not personally a huge fan of Python, it seems to be the only library of it's kind out there at the moment (and Tensorflow.js has terrible documentation) - so it would seem that I'm stuck with it.

Anyway, as I've been trying to learn it I inevitably came to the bit where I need to learn how to take advantage of a GPU to accelerate the neural network training process. I've been implementing a few test networks to see how it performs (my latest one is a simple LSTM, loosely following this tutorial).

In PyTorch, this isn't actually done for you automatically. The basic building blocks of PyTorch are tensors (potentially multi-dimensional arrays that hold data). Each tensor is bound to a specific compute device - by default the CPU (in which the data is stored in regular RAM). TO do the calculations on a graphics card, you need to bind the data to the GPU in order to load the data into the GPU's own memory - so that the GPU can access it and do the calculation. The same goes for any models you create - they have to be explicitly loaded onto the GPU in order to run the calculations in the right place. Thankfully, this is fairly trivial:

tensor = torch.rand(3, 4)
tensor = tensor.to(COMPUTE_DEVICE)

....where COMPUTE_DEVICE is the PyTorch device object you want to load the tensor onto. I found that this works to determine the device that the data should be loaded onto quite well:

COMPUTE_DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Unfortunately, PyTorch (and all other AI frameworks out there) only support a technology called CUDA for GPU acceleration. This is a propriety Nvidia technology - which means that you can only use Nvidia GPUs for accelerated deep learning. Since I don't actually own an Nvidia GPU (far too expensive, and in my current laptop I have an AMD Radeon R7 M445 - and I don't plan on spending large sums of money to replace a perfectly good laptop), I've been investigating hardware at my University that I can use for development purposes - since this is directly related to my PhD after all.

Initially, I've found a machine with an Nvidia GeForce GTX 650 in it. If you run torch.cuda.is_available(), it will tell you if CUDA is available or not:

print(torch.cuda.is_available()) # Prints True if CUDA is available

.....but, as always, there's got to be a catch. Just because CUDA is available, doesn't mean to say that PyTorch can actually use it. After a bunch of testing, it transpired that PyTorch only supports CUDA devices with a capability index greater than or equal to 3.5 - and the GTX 650 has a capability index of just 3.0. You can see where this is going. I foound this webpage was helpful - it lists all of Nvidia's GPUs and their CUDA capability indices.

You can also get PyTorch to tell you more about the CUDA device it has found:

def display_compute_device():
    """Displays information about the compute device that PyTorch is using."""

    log(f"Using device: {COMPUTE_DEVICE}", newline=False)
    if COMPUTE_DEVICE.type == 'cuda':
        print(" {0} [Memory: {1}GB allocated, {2}GB cached]".format(
            torch.cuda.get_device_name(0),
            round(torch.cuda.memory_allocated(0)/1024**3, 1),
            round(torch.cuda.memory_cached(0)/1024**3, 1)
        ))

    print()

If you execute the above method, it will tell you more about the compute device it has found. Note that you can actually make use of multiple compute devices at the same time - I just haven't done any research into that yet.

Crucially, it will also generate a warning message if your CUDA device is too old. To this end, I'll be doing some more investigating as to the resources that the Department of Computer Science has available for PhD students to use....

If anyone knows of an artificial intelligence framework that can take advantage of any GPU (e.g. via OpenCL, oneAPI, or other similar technologies), do get in touch. I'm very interested to explore other options.

Exporting an SQLite3 database to a directory of CSV files

Recently I was working with a dataset I acquired for my PhD, and to pre-process said dataset into something more sensible I imported it into an SQLite3 database. Once I was finished processing it, I then needed to export it again into regular CSV files so that I could do other things, such as plot it with GNUPlot, or import it into InfluxDB (more on InfluxDB in a later post).

With the help of Stack Overflow and the SQLite3 man page, this didn't prove to be too difficult. To export a single SQLite3 table to a CSV file, you do this:

sqlite3 -bail -header -csv "bobsrockets.sqlite3" "SELECT * FROM 'table_name';" >"path/to/output_file.csv";

This is great for a single table, but what if we want to export all the tables? Well, we can iterate over all the tables in an SQLite3 database like so:

while read table_name; do
    echo "Exporting ${table_name}";

    # Do stuff
done < <(sqlite3 "bobsrockets.sqlite3" ".tables");

If we combine this with the previous snippet, we can export all the tables like so:

while read table_name; do
    log "Exporting ${table_name}";

    sqlite3 -bail -header -csv "bobsrockets.sqlite3" "SELECT * FROM '${table_name}';" >"${table_name}.csv"; 
done < <(sqlite3 "bobsrockets.sqlite3" ".tables");

Cool! We can make it even better with some simple improvements though:

  1. It's a pain to have to edit the script every time we want to change the database we're exporting
  2. It would be nice to be able to specify the output directory without editing the script too

Satisfying both of these points isn't particularly challenging. 10 minutes of fiddling got this the final completed script:

#!/usr/bin/env bash
set -e; # Don't allow errors

show_usage() {
    echo -e "Usage:";
    echo -e "\t./sqlite2csv.sh {db_filename} {output_dir}";
}

log() {
    echo -e "[ $(date +"%F %T") ] ${@}";
}

###############################################################################

db_filename="${1}";
output_dir="${2}";

if [ -z "${db_filename}" ]; then
    echo "Error: No database filename specified.";
    show_usage; exit;
fi
if [ -z "${output_dir}" ]; then
    echo "Error: No output directory specified.";
    show_usage; exit;
fi

if [ ! -d "${output_dir}" ]; then
    mkdir -p "${output_dir}"; 
fi

log "Output directory is ${output_dir}";

while read table_name; do
    log "Exporting ${table_name}";

    sqlite3 -bail -header -csv "${db_filename}" "SELECT * FROM '${table_name}';" >"${output_dir}/${table_name}.csv";    
done < <(sqlite3 "${db_filename}" ".tables");

log "Complete!";

Found this useful? Comment below!

Pepperminty Wiki is 5 today!

....let's celebrate with the release of v0.20. I got a notification from my calendar system yesterday that Pepperminty Wiki's birthday is today, and since I did a beta release a few days ago and there haven't been any major issues, I thought I'd time the full release to coincide with its birthday.

I'm timing it from the first commit I ever made in Pepperminty Wiki's git repository. 5 years is a long time - and as a program Pepperminty Wiki has come such a long way since then.

Today, it's actually a really useful piece of open-source software, which is evidenced by the fact that people recommend it to other people on their own. Seeing such things and hearing about where it's used are really amazing to see - and give me lots of motivation to improve Pepperminty Wiki even more.

While the number of commits a project has isn't always an indicator of quality or how complete a project is, you can usually get a pretty good idea as to how much work has been done on a project by the number of commits it has (but of course, not always). At the time of writing Pepperminty Wiki has 1,415 commits, which is more than any other project I have ever worked on - past or present. The air quality web interface (which is now more of a general sensor web interface) is my 2nd place project unless I've missed one - and at 425 commits it doesn't even come close!

To summarise the features in the latest release:

  • ๐ŸŒœ New automatic dark mode in the default theme! Uses prefers-color-scheme under-the-hood
  • ๐ŸŒˆ Added theme gallery! Read more here
  • โ›ต Vastly improved search engine performance, with new advanced query syntax (with even more syntax along the way)
  • ๐Ÿš Accessibility improvements - if you're a screen-reader or accessibility tool user, I want to hear from you if you think anything (big or small!) could be improved!

Personally, I'm most proud of the optimisations to the search engine. I've actually blogged about how I did it in a 3 part series and tested it on a test wiki with ~5.9M words - while search times vary depending on your input (the new -exclude syntax will actually speed up queries) and your server hardware, a single word query for ~5.0M word wikis takes ~50ms O.o

Unfortunately, this does mean that the search index will need to be rebuilt under the new format - and will be slightly larger than before. To get a progress bar for this operation, go to the master settings and click the rebuild button.

Another notable change is the new 'mega-menu' style more menu:

image

That menu has been bothering me for a while, and thanks to the kind people on Reddit, I've now got a solution.

Note that you'll need to delete nav_links_extra from your peppermint.json in order for it to take effect.

Please also test the theme gallery in particular. It's brand-new in this release and quite complicated under-the-hood, so I'd appreciate some extra eyes on that.

As for when I'll release v1.0, I'm not sure. As a program, Pepperminty Wiki is certainly stable enough to be used in production scenarios today - so perhaps incrementing the version number to v1.0 would be a good idea to reflect that. At the same time though, there are a number of missing features - most notably watchlists and further improvements to the page history system - so I'm not sure when I'll be confident enough to bump it to v1.0.

Either way, I'm pretty sure that I'll keep working on Pepperminty Wiki for years to come - I have no plans to cease development at this time. While Pepperminty Wiki releases don't move at the most rapid of paces, I aim to get about 2 releases out per year about 6 months apart from each other.

Special thanks to @SeanFromIT for reporting a number of bugs which have been squashed.

If you use Pepperminty Wiki, tweet me @SBRLabs! I'd love to hear about how you're using it.

Lastly, don't forget to take a backup of your wiki before updating. While I've made every effort to squash bugs, you can never be too careful :P

Check out v0.20 here:

Pepperminty Wiki v0.20

MDNS: Simple device addressing for home networks

We all know about DNS, and how it forms one of the foundations of the Internet. With a hierarchical system of caching DNS resolvers, it provides a scalable system by which domain names (such as starbeamrainbowlabs.com) can be translated into their associated IP address (such as 2001:41d0:e:74b::1 or 5.196.73.75). You can register your own domain name for a modest fee, and point it at a web server to host a website.

But what about a local home network? In such an environment, where devices get switched on and off and enter and leave the network on a regular basis, manually specifying DNS records for devices which may even have dynamic IP addresses is a chore (and dynamic DNS solutions are complex to setup). Is there an easier way?

As I discovered the other day, it turns out the answer is yes - and it comes in the form of Multicast DNS, which abbreviates to MDNS. MDNS is a decentralised peer-to-peer protocol that lets devices on a small home network announce their names and their IP addresses in a standard fashion. It's also (almost) zero-configuration, so as long as UDP port 5353 is allowed through all your devices' firewalls, it should start working automatically.

Linux users will need avahi-daemon installed and running, which should be the default on popular distributions such as Ubuntu. Windows users with a recent build of Windows 10 should have it enabled by default too - and if I understand it right, macOS users should also have it enabled by default (though I don't have a mac, or a Windows machine, to check these on).

For example, if Bob has a home network with a file server on it, that file server might announce it's name as bobsfiles. This is automatically translated to be the fully-qualified domain name bobsfiles.local.. When Bill comes around to Bob's house and turns on his laptop, it will send a multicast DNS message out to ask all the supporting hosts on the network what their names and IP addresses are to add them it it's cache. Then, all Bill has to do is enter bobsfiles.local. into their web browser (or file manager, or SSH client, any other networked application) to connect to Bob's file server and access Bob's cool rocket designs and cat pictures.

This greatly simplifies the setup of a home network, and allows for pseudo-hostnames even in a local setting! Very cool. At some point, I'd like to refactor my home network to make better use of this - and have 1 MDNS name per service I'm running, rather than using subfolders for everything. This fits in nicely with some clustering plans I have on the horizon too.....

With a bit of fiddling, you can assign multiple MDNS names to a single host too. On Linux, you can use avahi-publish:

avahi-publish --address -R bobsrockets.local X.Y.Z.W

...where X.Y.Z.W is your local machine's IP, and bobsrockets.local is the .local MDNS domain name you want to assign. This is a daemon process that needs to run in the background apparently which is a bit of a pain - but hopefully there's a better solution out there somewhere.

Art by Mythdael