Archive

## Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## simple-dash fork: now with directory support!

A while back (I still have all sorts of projects I've forgotten to blog about - with many more to come), I forked an excellent project called simple-dash, which is a web dashboard. You can configure it to display 1 or more links, and it presents them nice and cleanly in the middle of the page.

I don't make forks lightly, but in this case I liked the project a lot - but I wanted to add enough features that I felt that I might be taking it in a different direction than the original project. The original project also hasn't been touched in 2+ years, and the author hasn't had any contributions on GitHub in that time either - so think it's fair to say that it's unlikely that any pull request I open wouldn't be looked at either (if the original author is reading this, I'm happy to open one!).

Anyway, before I continue too far, here's a screenshot of my improvements in action:

I use simple-dash in multiple places to provide a dashboard of links to the various services that I run so I both don't lose them and, in some cases, other people in my family can easily access said services.

I added a number of features here. The first is invisible, but I completely re-implemented the layout to use the CSS Grid (see also: a, b). If you've played with CSS before but aren't yet aware of the CSS grid yet - I can thoroughly recommend you take a moment to investigate - it will blow you away and solve all your layout problems all at the same time! In short, it's like a 2d version of the flexbox.

Since the original has full mobile support, I continue that trend in the rewrite with some CSS media queries to change the number of items per row based on the width of your screen.

The other invisible change is that I changed the language the configuration file is written in to TOML, which is a much more friendly language to write configuration files in.

Anyway, in terms of more visible changes, I also added the ability to set a background image, as well as the default random triangles background. Icons also got the same treatment - gaining the ability to display an image instead of a Font Awesome icon (I haven't actually used Font Awesome before, so this was an interesting experience - even if it was already setup in this project).

Last but certainly not least, I added the ability group pages into folders. Here's a screenshot of what the contents of that folder in the top left looks like when opened:

You can't see it here, but it's even animated! Link to a demo at the end of this post.

There were a number of different challenges to overcome to get this working right actually - it was not trivial at all. There are 2 components to it: The CSS to style it, and the Javascript to fiddle the class list on the folder itself to add / remove the active class so that I could distinguish between open and closed folders in the CSS, and also prevent the click event from propagating through to the <a href="https://example.com/">links</a> links when the folder is closed.

Thinking about it, it may be possible use a clever pointer-events: none to avoid the Javascript.

The CSS does the heavy lifting here though. For inactive folders, I use a CSS grid with overflow: none to display the 1st 4 icons in a preview. When the folder becomes active, position: fixed breaks it out of the layout of the rest of the page (sadly leaving a placeholder behind would require an additional html element), and the content reflows to use the same CSS as the main grid of tiles.

Through some CSS grid wizardry (you can do anything with CSS grid, it's amazing) and a container element, I can even fade out the rest of the page while the folder is open.

Clicking on the items in a folder when the folder is open takes you to their destination as usual, while clicking anywhere else closes the folder again.

I've got a demo running over here if you'd like to play around with it:

sbrl's simple-dash fork demo

The background is set to a random image from Unsplash. It loads fine for me, but sometimes it takes a moment.

If this looks like something, you'd like to use for yourself, my fork is open-source! Check it out here:

sbrl/simple-dash on GitHub

You can find instructions on how to set it up for yourself in the README. You'll need npm to install dependencies - this should come bundled with Node.js. You can also find a lovingly-commented example configuration file here:

config.sample.toml

If you have any difficulties setting it up, want to request a feature, or even (gasp!) report a bug, please open an issue. While I do monitor the comments here on this blog, GitHub issues are a much better place to track bugs and feature requests.

## Cluster, Part 11: Lock and Key | Let's Encrypt DNS-01 for wildcard TLS certificates

Welcome one and all to another cluster blog post! Cluster blog posts always take a while to write, so sorry for the delay. As is customary, let's start this post off with a list of all the parts in the series so far:

With that out of the way, in this post we're going to look at obtaining a wildcard TLS certificate using the Let's Encrypt DNS-01 challenge. We want this because you need a TLS certificate to serve HTTPS without lighting everyone's browsers up with warnings like a Christmas tree.

The DNS-01 challenge is an alternate challenge to the default HTTP-01 challenge you may already me familiar with.

Unlike the HTTP-01 challenge which proves you have access to single domain by automatically placing a file on your web server, the DNS-01 challenge proves you have control over an entire domain - thus allowing you to obtain a wildcard certificate - which is valid for not only your domain, but all possible subdomains! This should save a lot of hassle - but it's important we keep it secure too.

As with regular Let's Encrypt certificates, we'll also need to ensure that our wildcard certificate we obtain will be auto-renewed, so we'll be setting up a periodic task on our Nomad cluster to do this for us.

If you don't have a Nomad cluster, don't worry. It's not required, and I'll be showing you how to do it without one too. But if you'd like to set one up, I recommend part 7 of this series.

In order to complete the DNS-01 challenge successfully, we need to automatically place a DNS record in our domain. This can be done via an API, if your DNS provider has one and it's supported. Personally, I have the domain name I'm using for my cluster (mooncarrot.space.) with Gandi. We'll be using certbot to perform the DNS-01 challenge, which has a plugin system for different DNS API providers.

We'll be installing the challenge provider we need with pip3 (a Python 3 package manager, as certbot is written in Python), so you can find an up-to-date list of challenge providers over on PyPi here: https://pypi.org/search/?q=certbot-dns

If you don't see a plugin for your provider, don't worry. I couldn't find one for Gandi, so I added my domain name to Cloudflare and followed the setup to change the name servers for my domain name to point at them. After doing this, I can now use the Cloudflare API through the certbot-dns-cloudflare plugin.

With that sorted, we can look at obtaining that TLS certificate. I opt to put certbot in a Docker container here so that I can run it through a Nomad periodic task. This proved to be a useful tool to test the process out though, as I hit a number of snags with the process that made things interesting.

The first order of business is to install certbot and the associate plugins. You'd think that simply doing an sudo apt install certbot certbot-dns-cloudflare would do the job, but you'd be wrong.

As it turns out, it does install that way, but it installs an older version of the certbot-dns-cloudflare plugin that requires you give it your Global API Key from your Cloudflare account, which has permission to do anything on your account!

That's no good at all, because if the key gets compromised an attacker could edit any of the domain names on our account they like, which would quickly turn into a disaster!

Instead, we want to install the latest version of certbot and the associated Cloudflare DNS plugin, which support regular Cloudflare API Tokens, upon which we can set restrictive permissions to only allow it to edit the one domain name we want to obtain a TLS certificate for.

I tried multiple different ways of installing certbot in order to get a version recent enough to get it to take an API token. The way that worked for me was a script called certbot-auto, which you can download from here: https://dl.eff.org/certbot-auto.

Now we have a way to install certbot, we also need the Cloudflare DNS plugin. As I mentioned above, we can do this using pip3, a Python package manager. In our case, the pip3 package we want is certbot-dns-cloudflare - incidentally it has the same name as the outdated apt package that would have made life so much simpler if it had supported API tokens.

Now we have a plan, let's start to draft out the commands we'll need to execute to get certbot up and running. If you're planning on following this tutorial on bare metal (i.e. without Docker), go ahead and execute these directly on your target machine. If you're following along with Docker though, hang on because we'll be wrapping these up into a Dockerfile shortly.

First, let's install certbot:

sudo apt install curl ca-certificates
cd some_permanent_directory;
curl -sS https://dl.eff.org/certbot-auto -o certbot-auto
chmod +x certbot-auto
sudo certbot-auto --debug --noninteractive --install-only

Installation with certbot-auto comprises downloading a script and executing it. with a bunch of flags. Next up, we need to shoe-horn our certbot-dns-cloudflare plugin into the certbot-auto installation. This requires some interesting trickery here, because certbot-auto uses something called virtualenv to install itself and all its dependencies locally into a single directory.

sudo apt install python3-pip
cd /opt/eff.org/certbot/venv
source bin/activate
pip install certbot-dns-cloudflare
deactivate

In short, we cd into the certbot-auto installation, activate the virtualenv local environment, install our dns plugin package, and then exit out of the virtual environment again.

With that done, we can finally add a convenience synlink so that the certbot command is in our PATH:

ln -s /opt/eff.org/certbot/venv/bin/certbot /usr/bin/certbot

That completes the certbot installation process. Then, to use certbot to create the TLS certificate, we'll need an API as mentioned earlier. Navigate to the API Tokens part of your profile and create one, and then create an INI file in the following format:

# Cloudflare API token used by Certbot
dns_cloudflare_api_token = "YOUR_API_TOKEN_HERE"

...replacing YOUR_API_TOKEN_HERE with your API token of course.

Finally, with all that in place, we can create our wildcard certificate! Do that like this:

sudo certbot certonly --dns-cloudflare --dns-cloudflare-credentials path/to/credentials.ini -d 'bobsrockets.io,*.bobsrockets.io' --preferred-challenges dns-01

It'll ask you a bunch of interactive questions the first time you do this, but follow it through and it should issue you a TLS certificate (and tell you where it stored it). Actually utilising it is beyond the scope of this post - we'll be tackling that in a future post in this series.

For those following along on bare metal, this is where you'll want to skip to the end of the post. Before you do, I'll leave you with a quick note about auto-renewing your TLS certificates. Do this:

sudo letsencrypt renew
sudo systemctl reload nginx postfix

....on a regular basis, replacing nginx postfix with a space-separated list of services that need reloading after you've renewed your certificates. A great way to do this is to setup a cron job.

### Sweeping things under the carpet

For the Docker users here, we aren't quite finished yet: We need to package this mess up into a nice neat Docker container where we can forget about it :P

Some things we need to be aware of:

• certbot has a number of data directories it interacts with that we need to ensure don't get wiped when the Docker ends instances of our container.
• Since I'm serving the shared storage of my cluster over NFS, we can't have certbot running as root as it'll get a permission denied error when it tries to access the disk.
• While curl and ca-certificates are needed to download certbot-auto, they aren't needed by certbot itself - so we can avoid installing them in the resulting Docker container by using a multi-stage Dockerfile.

To save you the trouble, I've already gone to the trouble of developing just such a Dockerfile that takes all of this into account. Here it is:

ARG REPO_LOCATION
# ARG BASE_VERSION

FROM ${REPO_LOCATION}minideb AS builder RUN install_packages curl ca-certificates \ && curl -sS https://dl.eff.org/certbot-auto -o /srv/certbot-auto \ && chmod +x /srv/certbot-auto FROM${REPO_LOCATION}minideb

COPY --from=builder /srv/certbot-auto /srv/certbot-auto

RUN /srv/certbot-auto --debug --noninteractive --install-only && \
install_packages python3-pip

WORKDIR /opt/eff.org/certbot/venv
RUN . bin/activate \
&& pip install certbot-dns-cloudflare \
&& deactivate \
&& ln -s /opt/eff.org/certbot/venv/bin/certbot /usr/bin/certbot

VOLUME /srv/configdir /srv/workdir /srv/logsdir

USER 999:994
ENTRYPOINT [ "/usr/bin/certbot", \
"--config-dir", "/srv/configdir", \
"--work-dir", "/srv/workdir", \
"--logs-dir", "/srv/logsdir" ]

A few things to note here:

• We use a multi-stage dockerfile here to avoid installing curl and ca-certificates in the resulting docker image.
• I'm using minideb as a base image that resides on my private Docker registry (see part 8). For the curious, the script I use to do this located on my personal git server here: https://git.starbeamrainbowlabs.com/sbrl/docker-images/src/branch/master/images/minideb.
• If you don't have minideb pushed to a private Docker registry, replace minideb with bitnami/minideb in the above.
• We set the user and group certbot runs as to 999:994 to avoid the NFS permissions issue.
• We define 3 Docker volumes /srv/configdir, /srv/workdir, and /srv/logsdir to contain all of certbot's data that needs to be persisted and use an elaborate ENTRYPOINT to ensure that we tell certbot about them.

Save this in a new directory with the name Dockerfile and build it:

sudo docker build --no-cache --pull --tag "certbot" .;

...if you have a private Docker registry with a local minideb image you'd like to use as a base, do this instead:

sudo docker build --no-cache --pull --tag "myregistry.seanssatellites.io:5000/certbot" --build-arg "REPO_LOCATION=myregistry.seanssatellites.io:5000/" .;

In my case, I do this on my CI server:

laminarc queue docker-rebuild IMAGE=certbot

The hows of how I set that up will be the subject of a future post. Part of the answer is located in my docker-images Git repository, but the other part is in my private continuous integration Git repo (but rest assured I'll be talking about it and sharing it here).

Anyway, with the Docker container built we can now obtain our certificates with this monster of a one-liner:

sudo docker run -it --rm -v /mnt/shared/services/certbot/workdir:/srv/workdir -v /mnt/shared/services/certbot/configdir:/srv/configdir -v /mnt/shared/services/certbot/logsdir:/srv/logsdir certbot certonly --dns-cloudflare --dns-cloudflare-credentials path/to/credentials.ini -d 'bobsrockets.io,*.bobsrockets.io' --preferred-challenges dns-01

The reason this is so long is that we need to mount the 3 different volumes into the container that contain certbot's data files. If you're running a private registry, don't forget to prefix certbot there with registry.bobsrockets.com:5000/.

Don't forget also to update the Docker volume locations on the host here to point a empty directories owned by 999:994.

Even if you want to run this on Nomad, I still advise that you execute this manually. This is because the first time you do so it'll ask you a bunch of questions interactively (which it doesn't do on subsequent times).

If you're not using Nomad, this is the point you'll want to skip to the end. As before with the bare-metal users, you'll want to add a cron job that runs certbot renew - just in your case inside your Docker container.

For the truly intrepid Nomad users, we still have one last task to complete before our work is done: Auto-renewing our certificate(s) with a Nomad periodic task.

This isn't really that complicated I found. Here's what I came up with:

job "certbot" {
datacenters = ["dc1"]
priority = 100
type = "batch"

periodic {
cron = "@weekly"
prohibit_overlap = true
}

driver = "docker"

config {
image = "registry.service.mooncarrot.space:5000/certbot"
labels { group = "maintenance" }
entrypoint = [ "/usr/bin/certbot" ]
command = "renew"
args = [
"--config-dir", "/srv/configdir/",
"--work-dir", "/srv/workdir/",
"--logs-dir", "/srv/logsdir/"
]
# To generate a new cert:
# /usr/bin/certbot --work-dir /srv/workdir/ --config-dir /srv/configdir/ --logs-dir /srv/logsdir/ certonly --dns-cloudflare --dns-cloudflare-credentials /srv/configdir/__cloudflare_credentials.ini -d 'mooncarrot.space,*.mooncarrot.space' --preferred-challenges dns-01

volumes = [
"/mnt/shared/services/certbot/workdir:/srv/workdir",
"/mnt/shared/services/certbot/configdir:/srv/configdir",
"/mnt/shared/services/certbot/logsdir:/srv/logsdir"
]
}
}
}

If you want to use it yourself, replace the various references to things like the private Docker registry and the Docker volumes (which require "docker.volumes.enabled" = "True" in clientoptions in your Nomad agent configuration) with values that make sense in your context.

I have some confidence that this is working as intended by inspecting logs and watching TLS certificate expiry times. Save it to a file called certbot.nomad and then run it:

nomad job run certbot.nomad

### Conclusion

If you've made it this far, congratulations! We've installed certbot and used the Cloudflare DNS plugin to obtain a DNS wildcard certificate. For the more adventurous, we've packaged it all into a Docker container. Finally for the truly intrepid we implemented a Nomad periodic job to auto-renew our TLS certificates.

Even if you don't use Docker or Nomad, I hope this has been a helpful read. If you're interested in the rest of my cluster build I've done, why not go back and start reading from part 1? All the posts in my cluster series are tagged with "cluster" to make them easier to find.

Unfortunately, I haven't managed to determine a way to import TLS certificates into Hashicorp Vault automatically, as I've stalled a bit on the Vault front (permissions and policies are wildly complicated), so in future posts it's unlikely I'll be touching Vault any time soon (if anyone has an alternative that is simpler and easier to understand / configure, please comment below).

Despite this, in future posts I've got a number of topics lined up I'd like to talk about:

• Configuring Fabio (see part 9) to serve HTTPS and force-redirect from HTTP to HTTPS (status: implemented)
• Implementing HAProxy to terminate port forwarding (status: initial research)
• Password protecting the private docker registry, Consul, and Nomad (status: on the todo list)
• Semi-automatic docker image rebuilding with Laminar CI (status: implemented)

In the meantime, please comment below if you liked this post, are having issues, or have any suggestions. I'd love to hear if this helped you out!

## consulstatus: public status pages drawn from Consul

In my cluster series of blog posts, I've been talking about how I've been building my cluster from scratch. Now that I've got it into some sorta stable state (though I'm still working on it), one of the things I discovered might be helpful for other users of my cluster is a status page.

(Above: The logo for consulstatus. Consulstatus is written by me and not endorsed by Hashicorp or the Consul project.)

To this end, I ended up implementing a quick solution to this problem in PHP. Here's a screenshot of what it looks like:

The colour scheme changes depending on your browser's prefers-colour-scheme. The circles to the right of each service are either green (indicating no issues), yellow (some problems are occurring), or red (it's down and everything's terrible))

As the name suggests, it's backed by Hashicorp Consul (which I blogged about in cluster, part 6: superglue service discovery). I recommend reading my blog post about it, but in short Consul allows you to register services that it should keep track of, and checks that define whether said services are healthy or not.

It supports a TOML config file that allows you to specify where Consul is, along with the names of the services you'd like to display:

title = "Cluster status page"

[consul]

base_url = "http://consul.service.bobsrockets.com:8500"

services = [
"some_service",
"another_service"
# .....
]

The status page is designed to be as simple to understand to understand as possible, so that anyone (even those who aren't technically skilled) can get an idea as to what is working and what isn't at any given time.

So far, it's been moderately successful. The status page itself is stable and behaves expectedly (which is always a plus), and it does reflect the status of the services in question.

I did initially toy with the idea of exposing more information about the specific checks in Consul that have failed, but then I thought that I'd be then doing what the Consul web interface already does, which seems a bit pointless.

Instead, I decided to keep it rather minimalist instead, such that it could be exposed publicly (in theory, though my instance is only accessible on my local LAN) in a way that the main Consul web interface really can't.

Moving forwards, I'm quite happy with consulstatus as-is, so if I make any changes they aren't likely to be too drastic. I'd like to look at adding a description to each service so that it's more obvious what it is, or maybe have display names that are shown instead of the Consul service names.

I'd also maybe like to display an icon to the left of each service as well to further help with visual identification and understanding, and perhaps allow grouping services too.

Out of scope though is logging service status history. That can be done elsewhere if desired (and I don't particularly have a need for that) - and PHP isn't particularly suited to that anyway.

Found this interesting? Got a suggestion? Comment below!

## Website change detection with headless Firefox and ImageMagick

This wasn't the script I had in mind in the previous blog post (so you can look forward to another blog post about it), but have you ever wanted to know when a web page changes? If it does change, it's almost impossible to tell where on the page it's changed. Recently, I was thinking about the problem, and realised a few things:

• Firefox can be operated headlessly (with --headless) to take screenshots
• ImageMagick must be advanced enough to diff images

With this in mind, I set about implementing a script. Before we continue, here's an example diff image:

It's rather tall because of the webpage I chose, but the bits that have changed appear in red. The script I've written also generates an animated PNG showing the difference too:

Again, it's very tall because of the page I tested with, but I think it's pretty cool!

If you'd like to check the script out for yourself, you find it in the following git repository: sbrl/url-diff

For the curious, the script in question is written in Bash. It uses apcalc (available in Debian / Ubuntu based Linux distributions with sudo apt install apcalc) to crunch the numbers, and headless Firefox + Imagemagick as described above to take the screenshots and do the image processing. It should in theory work on Windows, but you'll need to jump through a number of hoops:

• Install call url-diff.sh from [git bash]()
• Install [ImageMagick]() and make sure the binaries are in your PATH
• Install Firefox and make sure firefox is in your PATH
• Explicitly set the URLDIFF_STORAGE_DIR environment variable when calling the script (do this by prefixing the command at the bottom of this post with URLDIFF_STORAGE_DIR=path/to/directory)

With my fancy new embed system, I can show you the code behind it:

(Can't see the above? Check it out in the git repository.)

I'm working on line numbers (sadly the author of highlight.js doesn't like them, so an alternative solution is required).

Anyway, the basic layout of the script is as follows:

1. First, the settings are read in and the default values set
2. Then, I define some utility functions.
• The calculate_percentage_colour function is integral to the image change detection algorithm. It counts percentage of an image that is a given colour.
3. Next, the help text is displayed if necessary
4. The case statement that follows allows multiple subcommands to be implemented. Currently I only have a check subcommand, but you never know!
5. Inside this case statement, the screenshots are taken and compared.
• A new screenshot is taken with headless Firefox
• If we don't have a screenshot stored away already, we stash the new screenshot and exit
• If we do have a pre-existing screenshot, we continue with the comparison, starting by generating a diff image where pixels that have changed are given 1 colour, and pixels that haven't changed another
• It's at this point that calculate_percentage_colour is called to calculate how much of the image has changed - the diff image is passed in and the changed pixels are counted
• If more than 2% (by default) has changed, then we continue on to generate the output images
• The first output image consists of the new screenshot with the diff image overlaid - this is generated with some ImageMagick wizardry: -compose over -composite
• The second is an animated PNG comprised of the old and new screenshots. This is generated with ffmpeg - which supports animated PNGs
• Finally, the old screenshot that we have stored away is replaced with the new one

It sounds more complicated than it is - hopefully my above explanation makes sense (post a comment below if you're confused about something!).

You can call the script like so:

git clone https://git.starbeamrainbowlabs.com/sbrl/url-diff.git
cd url-diff;
./url-diff.sh check URL_HERE path/to/output_diff.png path/to/output.apng

....replacing URL_HERE with the URL to check, and the paths with the places you'd like to write the output images to.

## Pure CSS spoilers with the CSS :target selector

For 1 reason or another, I've been working on some parser improvements for Pepperminty Wiki recently. Pepperminty Wiki uses Markdown for the page content syntax - specifically Parsedown. Markdown has a number of variations and extensions, some of which are more widely accepted than others. For Pepperminty Wiki, I try to stick as closely to existing Markdown conventions as possible (such as the CommonMark spec). Where that's not possible, I try to make sure there's an existing precedent (e.g. internal links use the same syntax as MediaWiki).

Anyway, as part of this I thought it would be cool to implement a spoiler tag. The problem here is that nobody can agree on the canonical syntax. Discord has recently implemented a vertical bar syntax like a spoiler wall:

Some text ||spoiler text|| more text

Reddit, on the other hand, uses a different syntax:

Some text >!spoiler text!< more text

Anyway, I've ended up supporting both of the above 2 syntaxes. My Parsedown extension generates something like the following HTML:

<p>Some text <a class="spoiler" id="spoiler-RSSZTkNA30-OGJQf_7VivKtJAaoNhbx" href="#spoiler-RSSZTkNA30-OGJQf_7VivKtJAaoNhbx" title="Click / tap to reveal spoiler">spoiler text</a> more text</p>

The next question here is how to make it function as a spoiler. If you're not already aware, to reveal to text in a spoiler, one first has to click on it or perform some other action. Personally, I'd prefer to avoid Javascript if possible for this, as not all users have it enabled and it complicates matters in Pepperminty Wiki.

To this end, if you search for "Pure CSS spoiler" with your favourite search engine, you'll find loads of different solutions out there. Some require Javascript, and others only show the text in a tooltip on hover (which doesn't work on mobile). All this isn't very cool, so I decided to implement my own solution and share it here :-)

It's actually pretty concise:

.spoiler {
background: #333333;
color: transparent;
cursor: pointer;
}
.spoiler:target {
background: transparent;
color: inherit;
}

By setting the text colour to transparent and the background to an obvious colour, we can give the user an obvious hint that there's a spoiler that can be clicked on. Setting the cursor to a hand on platforms with a mouse further helps to support this suggestion.

When the link is clicked, it sets the anchor to spoiler-RSSZTkNA30-OGJQf_7VivKtJAaoNhbx, which is also the id of the spoiler. This triggers the :target selector, which makes the spoiler text visible.

Here's a demo:

See the Pen Pure CSS Spoiler by Starbeamrainbowlabs (@sbrl) on CodePen.

The only issue here is that it doesn't support accessibility tools such as screen readers very well. Using a trick I've found on the Mozilla Developer Net, we can do this to improve that:

.spoiler::before, .spoiler::after {
clip-path: inset(100%);
clip: rect(1px, 1px, 1px, 1px);
height: 1px;
overflow: hidden;
position: absolute;
white-space: nowrap;
width: 1px;
}
.spoiler::before {
content: " [spoiler start] ";
}
.spoiler::after {
content: " [spoiler end] ";
}

...but this still doesn't "fix" the issue, because we're only giving the user warning. Not being a screen-reader user myself, I'm not sure whether this is adequate (is there a 'skip' command that allows skipping to the end of the element or something?) and what isn't.

If you've got a better idea for screen-reader users, please do comment below - I'd love to know.

Found this useful? Got a suggestion to make it even better? Comment below!

## Switching TOTP providers from Authy to andOTP

Since I first started using 2-factor authentication with TOTP (Time based One Time Passwords), I've been using Authy to store my TOTP secrets. This has worked well for a number of years, but recently I decided that I wanted to change. This was for a number of reasons:

1. I've acquired a large number of TOTP secrets for various websites and services, and I'd like a better way of sorting the list
2. Most of the web services I have TOTP secrets for don't have an icon in Authy - and there are only so many times you can repeat the 6 generic colours before it becomes totally confusing
3. I'd like the backups of my TOTP secrets to be completely self-hosted (i.e. completely on my own infrastructure)

After asking on Reddit, I received a recommendation to use andOTP (F-Droid, Google Play). After installing it, I realised that I needed to export my TOTP secrets from Authy first.

Unfortunately, it turns out that this isn't an easy process. Many guides tell you to alter the code behind the official Authy Chrome app - and since I don't have Chrome installed (I'm a Firefox user :D), that's not particularly helpful.

Thankfully, all is not lost. During my research I found the authy project on GitHub, which is a command-line app - written in Go - temporarily registers as a 'TOTP provider' with Authy and then exports all of your TOTP secrets to a standard text file of URIs.

These can then be imported into whatever TOTP-supporting authenticator app you like. Personally, I did this by generating QR codes for each URI and scanning them into my phone. The URIs generated, when converted to a QR code, are actually in the same format that they were originally when you scan them in the first place on the original website. This makes for an easy time importing them - at least from a walled garden.

Generating all those QR codes manually isn't much fun though, so I automated the process. This was pretty simple:

#!/usr/bin/env bash
exec 3<&0; # Copy stdin
echo "${url}" | qr --error-correction=H; read -p "Press a enter to continue" <&3; # Pipe in stdin, since we override it with the read loop done <secrets.txt; The exec 3<&0 bit copies the standard input to file descriptor 3 for later. Then we enter a while loop, and read in the file that contains the secrets and iterate over it. For each line, we convert it to a QR code that displays in the terminal with VT-100 ANSI escape codes with the Python program qr. Finally, after generating each QR code we pause for a moment until we press the enter key, so that we can generate the QR codes 1 at a time. We pipe in file descriptor 3 here that we copied earlier, because inside the while loop the standard input is the file we're reading line-by-line and not the keyboard input. With my secrets migrated, I set to work changing the labels, images, and tags for each of them. I'm impressed by the number of different icons it supports - and since it's open-source if there's one I really want that it doesn't have, I'm sure I can open a PR to add it. It also encrypts the TOTP secrets database at rest on disk, which is pretty great. Lastly came the backups. It looks like andOTP is pretty flexible when it comes to backups - supporting plain text files as well as various forms of encrypted file. I opted for the latter, with GPG encryption instead of a password or PIN. I'm sure it'll come back to bite me later when I struggle to decrypt the database in an emergency because I find the gpg CLI terribly difficult to use - perhaps I should take multiple backups encrypted with long and difficult password too. To encrypt the backups with GPG, you need to have a GPG provider installed on your phone. It recommended that I install OpenKeychain for managing my GPG private keys on Android, which I did. So far, it seems to be functioning as expected too - additionally providing me with a mechanism by which I can encrypt and decrypt files easily and perform other GPG-related tasks...... if only it was this easy in the Linux terminal! Once setup, I saved my encrypted backups directly to my Nextcloud instance, since it turns out that in Android 10 (or maybe before? I'm not sure) it appears that if you have the Nextcloud app installed it appears as a file system provider when saving things. I'm certainly not complaining! While I'm still experimenting with my new setup, I'm pretty happy with it at the moment. I'm still considering how I can make my TOTP backups even more secure while not compromising the '2nd factor' nature of the thing, so it's possible I might post again in the future about that. Next on my security / privacy todo list is to configure my Keepass database to use my Solo for authentication, and possibly figure out how I can get my phone to pretend to be a keyboard to input passwords into machines I don't have my password database configured on :D Found this interesting? Got a suggestion? Comment below! ## Using prefers-color-scheme to display alternate website themes Support for the prefers-color-scheme landed in Firefox 67 beta recently. While it's a little bit awkward to use at the moment, it enables support for a crucial new feature on the web: Changing the colour scheme of a webpage based on the user's preference. If a user is outside on a sunny day, they might prefer a light theme whilst browsing the web. However, if they are at home on an evening when it's dark, they might prefer a dark theme. Traditionally, some websites have supported enabling a site-specific option in their settings, but this requires that the user enable an option manually when they visit the website - and has to be reset on every device and in every private browsing session. The solution to this is the new prefers-color-scheme CSS level 5 media query. It lets you do something like this: body { background: red; /* Fallback */ } /* Dark theme */ @media (prefers-color-scheme: dark) { body { background: #333; color: #efefef; } } /* Light theme */ @media (prefers-color-scheme: light), (prefers-color-scheme: no-preference) { body { background: #efefef; color: #333; } } Hey presto - the theme will now auto-change based on the user's preference! Here's a live demo of that in action: See the Pen prefers-color-scheme demo by Starbeamrainbowlabs (@sbrl) on CodePen. (Can't see the above? Try a direct link.) ### Current Support Currently, Firefox 67+ and Safari (wow, who knew?) support the new media query - as of the time of typing. For Firefox users, the preferred colour scheme is autodetected based on your operating system's colour scheme. Here's the caniuse.com support table, which should always be up-to-date: If this doesn't work correctly for some reason (like it didn't for me on Ubuntu with the Unity Desktop - apparently it's based on the GTK theme on Ubuntu), then you can set it manually in about:config. Create a new integer value called ui.systemUsesDarkTheme, and set it to 1. If you accidentally create a string by mistake, you'll need to close Firefox, find your profile folder, and edit prefs.js to force-change it, as about:config isn't capable of deleting or altering configuration directive types at this time (yep, I ran into this one!). ### Making use of CSS Properties Altering your website for dark-mode readers isn't actually all that hard. If you're using CSS Properties already, you might do something like this: :root { --background: #efefef; --text-colour: #333; } @media (prefers-color-scheme: dark) { :root { --background: #333; --text-colour: #efefef; } } body { background: var(--background); color: var(--text-colour); } (Want a demo? Click here!) This way, you can keep all the 'settings' that you might want to change later at the beginning of your CSS files, which makes it easy to add support for dark mode later. I know that I'm going to be investigating adding support to my website at some point soon (stay tuned for that!). Found this interesting? Got a cool resource to aid in colour scheme design? Found another great CSS feature? Comment below! ### Sources and Further Reading ## Animated PNG for all! I recently discovered that Animated PNGs are now supported by most major browsers: I stumbled across the concept of an animated PNG a number of years ago (on caniuse.com actually if I remember right!), but at the time browser support was very bad (read: non-existent :P) - so I moved on to other things. I ended up re-discovering it a few weeks ago (also through caniuse.com!), and since browser support is so much better now I decided that I just had to play around with it. It hasn't disappointed. Traditional animated GIFs (Graphics Interchange Format for the curious) are limited to 256 colours, have limited transparency support (a pixel is either transparent, or it isn't - there's no translucency support), and don't compress terribly well. Animated PNGs (Portable Network Graphics), on the other hand, let you enjoy all the features of a regular PNG (as many colours as you want, translucent pixels, etc.) while also supporting animation, and compressing better as well! Best of all, if a browser doesn't support the animated PNG standard, they will just see and render the first frame as a regular PNG and silently ignore the rest. Let's take it out for a spin. For my test, I took an image and created a 'panning' animation from one side of it to the other. Here's the image I've used: (Credit: The background on the download page for Mozilla's Firefox Nightly builds. It isn't available on the original source website anymore (and I've lost the link), but can still be found on various wallpaper websites.) The first task is to generate the frames from the original image. I wrote a quick shell script for this purpose. Firstly, I defined a bunch of variables: #!/usr/bin/env bash set -e; # Crash if we hit an error input_file="Input.png"; output_file="Output.apng" # The maximum number of frames max_frame=32; end_x=960; end_y=450; start_x=0; start_y=450; crop_size_x=960; crop_size_y=540; mkdir -p ./frames; Variable Meaning input_file The input file to generate frames from output_file The file to write the animated png to max_frame The number of frames (plus 1) to generate start_x The starting position to pan from on the x axis start_y The starting position to pan from on the y axis end_x The ending position to pan to on the x axis end_y The ending position to pan to on the y axis crop_size_x The width of the cropped frames crop_size_y The height of the cropped frames It's worth noting here that it's probably a good idea to implement a proper CLI to this script at this point, but since it's currently only a one-off script I didn't bother. Perhaps in the future I'll come back and improve it if I find myself needing it again for some other purpose. With the parameters set up (and a temporary directory created - note that you should use mktemp -d instead of the approach I've taken here!), we can then use a for loop to repeatedly call ImageMagick to crop the input image to generate our frames. This won't run in parallel unfortunately, but since it's only a few frames it shouldn't take too long to render. This is only a quick shell script after all :P  for ((frame=0; frame <= "${max_frame}"; frame++)); do
this_x="$(calc -p "${start_x}+((${end_x}-${start_x})*(${frame}/${max_frame}))")";
this_y="$(calc -p "${start_y}+((${end_y}-${start_y})*(${frame}/${max_frame}))")";
percent="$(calc -p "round((${frame}/${max_frame})*100, 2)")"; convert "${input_file}" -crop "${crop_size_x}x${crop_size_y}+${this_x}+${this_y}" "frames/Frame-$(printf "%02d" "${frame}").jpeg";

echo -ne "${frame} /${max_frame} (${percent}%) \r"; done echo ""; # Move to the line after the progress indicator This looks complicated, but it really isn't. The for loop iterates over each of the frame numbers. We do some maths to calculate the (x, y) co-ordinates of the frame we want to extract, and then get ImageMagick's convert command to do the cropping for us. Finally we write a quick progress indicator out to stdout (\r resets the cursor to the beginning of the line, but doesn't go down to the next one). The maths there is probably better represented like this:$this_x=start_x+((end_x-start_x)*\frac{currentframe}{max_frame})this_y=start_y+((end_y-start_y)*\frac{currentframe}{max_frame})$Much better :-) In short, we convert the current frame number into a percentage (between 0 and 1) of how far through the animation we are and use that as a multiplier to find the distance between the starting and ending points. I use the calc command-line tool (in the apcalc package on Ubuntu) here to do the calculations, because the bash built-in result=$(((4 + 5))) only supports integer-based maths, and I wanted floating-point support here.

With the frames generated, we only need to stitch them together to make the final animated png. Unfortunately, an easy-to-use tool does not yet exist (like gifsicle for GIFs) - so we'll have to make-do with ffpmeg (which, while very powerful, has a rather confusing CLI!). Here's how I stitched the frames together:

ffmpeg -r 10 -i frames/Frame-%02d.jpeg -plays 0 "${output_file}"; • -r - The frame rate • -i - The input filename • -plays - The number of times to loop (0 = infinite; defaults to no looping if omitted) • "${output_file}" - The output file

Here's the final result:

(Filesize: ~2.98MiB)

Of course, it'd be cool to compare it to a traditional animated gif. Let's do that too! First, we'll need to convert the frames to gif (gifsicle, our tool of choice, doesn't support anything other than GIFs as far as I'm aware):

mogrify -format gif frames/*.jpeg

Easy-peasy! mogrify is another tool from ImageMagick that makes such in-place conversions trivial. Note that the frames themselves are stored as JPEGs because I was experiencing an issue whereby each of the frames apparently had a slightly different colour palette, and ffmpeg wasn't smart enough to correct for this - choosing instead to crash.

With the frames converted, we can make our animated GIF like so:

gifsicle --optimize --colors=256 --loopcount=10 --delay=10 frames/*.gif >Output.gif;

Lastly, we probably want to delete the intermediate frames:

rm -r ./frames

Here's the animated GIF version:

(Filesize: ~3.28MiB)

Woah! That's much bigger.

(Generated (and then extracted & edited with the Firefox developer tools) from Meta-Chart)

It's ~9.7% bigger in fact! Though not a crazy amount, smaller resulting files are always good. I suspect that this effect will stack the more frames you have. Others have tested this too, finding pretty similar results to those that I've found here - though it does of course depend on your specific scenario.

With that observation, I'll end this blog post. The next time you think about inserting an animation into a web page or chat window, consider making it an Animated PNG.

Found this interesting? Found a cool CLI tool for manipulating APNGs? Having trouble? Comment below!

## Keeping the Internet free and open for years to come - #ForTheWeb

I've recently come across contractfortheweb.org. It sounds obvious, but certain qualities of the web that we take for granted aren't, in fact, universal. Things like a lack of censorship (China & their great firewall; Cuba; Venezuela), consumer privacy (US Mobile Carriers; Google; Google/Android Antitrust), and fair pricing (AT&T; BT) are rather a problem in more places than is noticeable at first glance.

The Contract for the Web is a set of principles - backed by the famous Tim Berners-Lee - for a free, open, and fair Internet. The aim is to build a full contract based on these principles to guide the evolution of the web for year to come.

Personally, I first really started to use the web to learn how to program. By reading (lots) of tutorials freely available on the web, I learnt to build websites, write Javascript, and more!

Since then, I've used the web to share the things I've been learning on this blog, stay up-to-date with the latest technology news, research new ideas, and play games that deliver amazing experiences.

The web (and, by extension, the Internet - the web refers to just HTTP + HTTPS) for me represents freedom of information. Freedom to express (and learn about) new thoughts and ideas without fear of being censored. Freedom to communicate with anyone in the world - regardless of physical distances.

It is for this reason that I've signed the Contract for the Web. It's my hope that this effort will ensure that the Internet becomes more open and neutral going forwards, so that everyone can experience the benefits of the open web for a long time to come.

## Markov Chains Part 4: Test Data

With a shiny-new markov chain engine (see parts 1, 2, and 3), I found that I had a distinct lack of test data to put through it. Obviously this was no good at all, so I decided to do something about it.

Initially, I started with a list of HTML colours (direct link; 8.6KiB), but that didn't produce very good output:

MarkovGrams/bin/Debug/MarkovGrams.exe markov-w --wordlist wordlists/Colours.txt --length 16
errobiartrawbear
frelecteringupsy
vendellorazanigh
arvanginklectrit
dighoonbottlaven
ndiu
llighoolequorain
indeesteadesomiu

I see a few problems here. Firstly, it's treating each word as it's entity, where in fact I'd like it to generate n-grams on a line-by-line basis. Thankfully, this is easy enough with my new --no-split option:

MarkovGrams/bin/Debug/MarkovGrams.exe markov-w --wordlist wordlists/Colours.txt --no-split --length 16
med carrylight b
jungin pe red dr
ureelloufts blue
uamoky bluellemo
trinaterry aupph
utatellon reep g
bian reep mardar
ght burnse greep
atimson-phloungu

Hrm, that's still rather unreadable. What if we make the n-grams longer by bumping the order?

MarkovGrams/bin/Debug/MarkovGrams.exe markov-w --wordlist wordlists/Colours.txt --length 16 --order 4
on fuchsia blue
rsity of carmili
e blossom per sp
ngel
ulean red au lav
as green yellowe
indigri
ly gray aspe
disco blus
berry pine blach

Better, but it looks like it's starting the generation process from inside the middle of words. We can fix that with my new --start-uppercase option, which ensures that each output always stars with an n-gram that begins with a capital letter. Unfortunately the wordlist is all lowercase:

air force blue
alice blue
alizarin crimson
almond
amaranth
amber
american rose
amethyst
android green
anti-flash white

This is an issue. The other problem is that with an order of 4 the choice-point ratio is dropping quite low - I got a low of just ~0.97 in my testing.

The choice-point ratio is a measure I came up with of the average number of different directions the engine could potential go in at each step of the generation process. I'd like to keep this number consistently higher than 2, at least - to ensure a good variety of output.

### Greener Pastures

Rather than try fix that wordlist, let's go in search of something better. It looks like the CrossCode Wiki has a page that lists all the items in the entire game. That should do the trick! The only problem is extracting it though. Let's use a bit of bash! We can use curl to download the HTML of the page, and then xidel to parse out the text from the <a> tags inside tables. Here's what I came up with:

curl https://crosscode.gamepedia.com/Items | xidel --data - --css "table a"

This is a great start, but we've got blank lines in there, and the list isn't sorted alphabetically (not required, but makes it look nice :P). Let's fix that:

curl https://crosscode.gamepedia.com/Items | xidel --data - --css "table a" | awk "NF > 0" | sort

Very cool. Tacking wc -l on the end of the pipe chain I can we've got ourselves a list of 527(!) items! Here's a selection of input lines:

Rough Branch
Raw Meat
Blue Grass
Crystal Plate
Humming Razor
Everlasting Amber
Tracker Chip
Lawkeeper's Fist

Let's run it through the engine. After a bit of tweaking, I came up with this:

cat wordlists/Cross-Code-Items.txt | MarkovGrams/bin/Debug/MarkovGrams.exe markov-w --start-uppercase --no-split --length 16 --order 3
Capt Keboossauci
Fajiz Keblathfin
King Steaf Sharp
Stintze Geakralt
Fruisty 'olipe F
Apper's TN
Prow Peptumn's C
Rus Recreetan Co
Veggiel Spiragma
Laver's Bolden M

That's quite interesting! With a choice-point ratio of ~5.6 at an order of 3, we've got a nice variable output. If we increase the order to 4, we get ~1.5 - ~2.3:

Edgy Hoo
Junk Petal Goggl
Red Metal Need C
Samurai Shel
Echor
Krystal Wated Li
Sweet Residu
Raw Stomper Thor
Purple Fruit Dev
Smokawa

It appears to be cutting off at the end though. Not sure what we can do about that (ideas welcome!). This looks interesting, but I'm not done yet. I'd like it to work on word-level too!

### Going up a level

After making some pretty extensive changes, I managed to add support for this. Firstly, I needed to add support for word-level n-gram generation. Currently, I've done this with a new GenerationMode enum.

public enum GenerationMode
{
CharacterLevel,
WordLevel
}

Under the hood I've just used a few if statements. Fortunately, in the case of the weighted generator, only the bottom method needed adjusting:

/// <summary>
/// Generates a dictionary of weighted n-grams from the specified string.
/// </summary>
/// <param name="str">The string to n-gram-ise.</param>
/// <param name="order">The order of n-grams to generate.</param>
/// <returns>The weighted dictionary of ngrams.</returns>
private static void GenerateWeighted(string str, int order, GenerationMode mode, ref Dictionary<string, int> results)
{
if (mode == GenerationMode.CharacterLevel) {
for (int i = 0; i < str.Length - order; i++) {
string ngram = str.Substring(i, order);
if (!results.ContainsKey(ngram))
results[ngram] = 0;
results[ngram]++;
}
}
else {
string[] parts = str.Split(" ".ToCharArray());
for (int i = 0; i < parts.Length - order; i++) {
string ngram = string.Join(" ", parts.Skip(i).Take(order)).Trim();
if (ngram.Trim().Length == 0) continue;
if (!results.ContainsKey(ngram))
results[ngram] = 0;
results[ngram]++;
}
}
}

Full code available here. After that, the core generation algorithm was next. The biggest change - apart from adding a setting for the GenerationMode enum - was the main while loop. This was a case of updating the condition to count the number of words instead of the number of characters in word mode:

(Mode == GenerationMode.CharacterLevel ? result.Length : result.CountCharInstances(" ".ToCharArray()) + 1) < length

A simple ternary if statement did the trick. I ended up tweaking it a bit to optimise it - the above is the end result (full code available here). Instead of counting the words, I count the number fo spaces instead and add 1. That CountCharInstances() method there is an extension method I wrote to simplify things. Here it is:

public static int CountCharInstances(this string str, char[] targets)
{
int result = 0;
for (int i = 0; i < str.Length; i++) {
for (int t = 0; t < targets.Length; t++)
if (str[i] == targets[t]) result++;
}
return result;
}

### Recursive issues

After making these changes, I needed some (more!) test data. Inspiration struck: I could run it recipe names! They've quite often got more than 1 word, but not too many. Searching for such a list proved to be a challenge though. My first thought was BBC Food, but their terms of service disallow scraping :-(

A couple of different websites later, and I found the Recipes Wikia. Thousands of recipes, just ready and waiting! Time to get to work scraping them. My first stop was, naturally, the sitemap (though how I found in the first place I really can't remember :P).

What I was greeted with, however, was a bit of a shock:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p1.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p2.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p3.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p4.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p5.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p6.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p7.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p8.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_0-p9.xml</loc></sitemap>
<sitemap><loc>http://recipes.wikia.com/sitemap-newsitemapxml-NS_14-p1.xml</loc></sitemap>
<sitemap><loc>https://services.wikia.com/discussions-sitemap/sitemap/3355</loc></sitemap>
</sitemapindex>
<!-- Generation time: 26ms -->
<!-- Generation date: 2018-10-25T10:14:26Z -->

Like who has a sitemap of sitemaps, anyways?! We better do something about this: Time for some more bash! Let's start by pulling out those sitemaps.

curl http://recipes.wikia.com/sitemap-newsitemapxml-index.xml | xidel --data - --css "loc"

Easy peasy! Next up, we don't want that bottom one - as it appears to have a bunch of discussion pages and other junk in it. Let's strip it out before we even download it!

curl http://recipes.wikia.com/sitemap-newsitemapxml-index.xml | xidel --data - --css "loc" | grep -i NS_0

With a list of sitemaps extract from the sitemap (completely coconuts I tell you) extracted, we need to download them all in turn and extract the page urls therein. This is, unfortunately, where it starts to get nasty. While a simple xargs call downloads them all easily enough (| xargs -n1 -I{} curl "{}" should do the trick), this outputs them all to stdout, and makes it very difficult for us to parse them.

I'd like to avoid shuffling things around on the file system if possible, as this introduces further complexity. We're not out of options yet though, as we can pull a subshell out of our proverbial hat:

curl http://recipes.wikia.com/sitemap-newsitemapxml-index.xml | xidel --data - --css "loc" | grep -i NS_0 | xargs -n1 -I{} sh -c 'curl {} | xidel --data - --css "loc"'

Yay! Now we're getting a list of urls to all the pages on the entire wiki:

http://recipes.wikia.com/wiki/Mexican_Black_Bean_Soup
http://recipes.wikia.com/wiki/Eggplant_and_Roasted_Garlic_Babakanoosh
http://recipes.wikia.com/wiki/Bathingan_bel_Khal_Wel_Thome
http://recipes.wikia.com/wiki/Lebanese_Tabbouleh
http://recipes.wikia.com/wiki/Lebanese_Hummus_Bi-tahini
http://recipes.wikia.com/wiki/Baba_Ghannooj
http://recipes.wikia.com/wiki/Lebanese_Falafel
http://recipes.wikia.com/wiki/Kebab_Koutbane
http://recipes.wikia.com/wiki/Moroccan_Yogurt_Dip

One problem though: We want recipes names, not urls! Let's do something about that. Our next special guest that inhabits our bottomless hat is the illustrious sed. Armed with the mystical power of find-and-replace, we can make short work of these urls:

... | sed -e 's/^.*\///g' -e 's/_/ /g'

The rest of the command is omitted for clarity. Here I've used 2 sed scripts: One to strip everything up to the last forward slash /, and another to replace the underscores _ with spaces. We're almost done, but there are a few annoying hoops left to jump through. Firstly, there are A bunch of unfortunate escape sequences lying around (I actually only discovered this when the engine started spitting out random ones :P). Also, there are far too many page names that contain the word Nutrient, oddly enough.

The latter is easy to deal with. A quick grep sorts it out:

... | grep -iv "Nutrient"

The former is awkward and annoying. As far as I can tell, there's no command I can call that will decode escape sequences. To this end, I wound up embedding some Python:

... | python -c "import urllib, sys; print urllib.unquote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])"

This makes the whole thing much more intimidating that it would otherwise be. Lastly, I'd really like to sort the list and save it to a file. Compared to the above, this is chicken feed!

| sort >Dishes.txt

And there we have it. Bash is very much like lego bricks when you break it down. The trick is to build it up step-by-step until you've got something that does what you want it to :)

Here's the complete command:

curl http://recipes.wikia.com/sitemap-newsitemapxml-index.xml | xidel --data - --css "loc" | grep -i NS_0 | xargs -n1 -I{} sh -c 'curl {} | xidel --data - --css "loc"' | sed -e 's/^.*\///g' -e 's/_/ /g' | python -c "import urllib, sys; print urllib.unquote(sys.argv[1] if len(sys.argv) > 1 else sys.stdin.read()[0:-1])" | grep -iv "Nutrient" | sort >Dishes.txt

After all that effort, I think we deserve something for our troubles! With ~42K(!) lines in the resulting file (42,039 to be exact as of the last time I ran the monster above :P), the output (after some tweaking, of course) is pretty sweet:

cat wordlists/Dishes.txt | mono --debug MarkovGrams/bin/Debug/MarkovGrams.exe markov-w --words --start-uppercase --length 8
Lemon Lime Ginger
Seared Tomatoes and Summer Squash
Cabbage and Potato au
Endive stuffed with Lamb and Winter Vegetable
Stuffed Chicken Breasts in Yogurt Turmeric Sauce with
Blossoms on Tomato
Morning Shortcake with Whipped Cream Graham
Orange and Pineapple Honey
Tempura with a Southwestern
Rice Florentine with
Cabbage Slaw with Pecans and Parmesan
Pork Sandwiches with Tangy Sweet Barbecue
Tea with Lemongrass and Chile Rice
Butterscotch Mousse Cake with Fudge
Fish and Shrimp -
Beans in the Slow
California Avocado Chinese Chicken Noodle Soup with Arugula

...I really need to do something about that cutting off issue. Other than that, I'm pretty happy with the result! The choice-point ratio is really variable, but most of the time it's hovering around ~2.5-7.5, which is great! The output if I lower the order from 3 to 2 isn't too bad either:

Salata me Htapodi kai
Hot 'n' Cheese Sandwich with Green Fish and
Poisson au Feuilles de Milagros
Valentines Day Cookies with Tofu with Walnut Rice
Up Party Perfect Turkey Tetrazzini
Olives and Strawberry Pie with Iceberg Salad with
Mashed Sweet sauce for Your Mood with Dried
Zespri Gold Corn rice tofu, and Corn Roasted
California Avocado and Rice Casserole with Dilled Shrimp
Egyptian Tomato and Red Bell Peppers, Mango Fandango

This gives us a staggering average choice-point ratio of ~125! Success :D

### One more level

After this, I wanted to push the limits of the engine, so see what it's capable of. The obvious choice here is Shakespeare's Complete Works (~5.85MiB). Pushing this through the engine required some work, as ~30 seconds is far too slow - namely optimising the pipeline as much as possible.

The Mono Profiler helped a lot here. With it, I discovered that string.StartsWith() is really slow. Like, ridiculously slow (though this is relative, since I'm calling it hundreds of thousand of times), as it's culture-aware. In our case, we can't be bothering with any of that, as it's not relevant anyway. The easiest solution is to write another extension method:

public static bool StartsWithFast(this string str, string target) {
if (str.Length < target.Length) return false;
return str.Substring(0, target.Length) == target;
}

string.Substring() is faster, so by utilising this instead of the regular string.StartsWith() yields us a perfectly gigantic boost! Next up, I noticed that I can probably parallelize the Linq query that builds the list of possible n-grams we can choose from next, so that it runs on all the CPU cores:

Parallel.ForEach(ngrams, (KeyValuePair<string, double> ngramData) => {
if (!ngramData.Key.StartsWithFast(nextStartsWith)) return;
throw new Exception("Error: Failed to add to staging ngram concurrent dictionary");
});

Again, this netted a another huge gain. With this and a few other architectural changes, I was able to chop the time down to a mere ~4 seconds (for a standard 10 words)! In the future, I might experiment with selective unmanaged code via the unsafe keyword to see if I can do any better.

For now, it's fast enough to enjoy some random Shakespeare on-demand:

What should they since that to that tells me Nero
He doth it turn and and too, gentle
Ha! you shall not come hither; since that to provoke
ANTONY. No further, sir; a so, farewell, Sir
Bona, joins with
From fairies and the are like an ass which is
The very life-blood of our blood, no, not with the
Alas, why is here-in which
Transform'd and weak'ned? Hath Bolingbroke

Very interesting. The choice-point ratios sit at ~20 and ~3 for orders 3 and 4 respectively, though I got as high as 188 for an order of 3 during my testing. Definitely plenty of test data here :P

### Conclusion

My experiments took me to several other places - which, if I included them all here, would result in an post much much longer than this! I scripted the download of several other wordlists in download.sh (direct link, 4.2KiB), if you're interested, with ready-downloaded copies in the wordlists folder of the repository.

I would advise reading the table in the README that gives credit to where I sourced each list, because of course I didn't put any of them together myself - I just wrote the script :P

Particularly of note is the Starbound list, which contains a list of all the blocks and items from the game Starbound. I might post about that one separately, as it ended up being a most interesting adventure.

In the future, I'd like to look at implementing a linguistic drift algorithm, to try and improve the output of the engine. The guy over at Here Dragons Abound has a great post on the subject, which you should definitely read if you're interested.

Found this interesting? Got an idea for another wordlist I can push though my engine? Confused by something? Comment below!