Starbeamrainbowlabs

Stardust
Blog


Archive

Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance manjaro network networking node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Markdown editors compared

Parts of the 3 markdown editors I'll be comparing in this post.

If you didn't know already, I write all my blog posts here in markdown. I've used several markdown editors over the years (wow it's strange to write that), and I thought I'd talk a little bit about the ones I've used, what I liked about them (and the things I didn't), and what my current preference is.

Firstly though, why would you want one? Couldn't you just use a regular text editor like Notepad++? Well, yes - but a dedicated editor has several benefits: Proper spell-checking for one, a live-preview for another, and other nice features that make the experience just that little bit better (I'm even writing one of my reports for University in Markdown, and I have to say that the experience is much more pleasurable than using Microsoft Word :P).

I like Markdown itself rather a lot too. First invented by John Gruber over on daringfireball.net, Markdown is a simple markup language that's inspired by the things that people already do in instant messaging and other text-based mediums. It's designed to be both easy to read and understand on it's own, and easy to write in - such that it doesn't break your flow as a writer by requiring you to look up how to figure out how to apply that particular bit of formatting (I find myself having to do that with LaTeX and others a lot).

A Screenshot of StackEdit.

(Above: A Screenshot of StackEdit.)

The first contender up is StackEdit. It's an in-browser offering, which saves it's data to your local machine (or the cloud). It comes with a number of nice features - apart from not having to install it of course - such as synchronised scrolling in the live-preview, and a 'publish' button to send your document to a number of different sources automatically.

Since I used it last (which was quite a while ago, actually), it appears to have received a sizeable update, updating the user-interface to be more polished and aesthetically pleasing, and adding a toggleable folder structure to the left-hand-side, amongst other things.

If you can't install anything or run portable programs from a flash drive, StackEdit would be my recommendation.

A screenshot of Classeur.

(Above: A Screenshot of Classeur.)

Next up on my list is Classeur. It's another browser-based offering, with many of the same features, just with a different UI. When I discovered it I was using Stack Edit, and at the time the interface of Classeur was vastly superior.

The main thing I don't like about it is that it's 'freemium' -- meaning that you get to keep about 100 documents on it, and then you either have to delete something or pay. While Markdown is just text documents I can keep on my computer, if I'm going to use a browser-based solution I would prefer to keep them all in the same place (though I never did hit this limit :P).

A screenshot of me writing this post in ghostwriter. Meta!

More recently, now that I've got a travel-laptop that is running Linux (and not Chrome OS, as nice that was I ended up out-growing it), I've been using ghostwriter. It's a desktop application for both Windows and Linux. While it doesn't have synchronised-scrolling for the live-preview as Stack Edit does, it allows you to save your text files to your local disk (or other mounted partition!), and open them as you would a Spreadsheet or other file - in a way that you can't with a browser-based tool.

The interface is also highly customisable - if you don't like the built-in themes, you can write your own. You can also write your own stylesheet for exported documents too. In addition, it automatically detects multiple different markdown renderers that may or may not have installed, allowing you to switch between them (and the inbuilt sundown processor) at will to get the exported document (e.g. HTML, PDF, EPUB, etc.) looking just the way you want it to.

For me, you can't beat the feeling of a native desktop application, so currently ghostwriter is my markdown editor of choice. If I can't use ghostwriter, I'll probably use StackEdit, with Classeur coming at the bottom of the pile.

If you're thinking of doing some writing, I'd highly suggest considering using a proper markdown editor such as the ones I've mentioned here. If you're not familiar with markdown, fear not! It's easy to learn, and all 3 of the editors featured here feature a quick-reference guide sidebar (or floating window) that you can enable to help you along.

Found this useful? Got a different editor of choice? Comment below!

An epic journey awaits: The hows and whys of DNS (and why DNS privacy is important)

The fancy 1.1.1.1 logo! Read on to find out more. (Above: The logo of Cloudflare's new announcement. Read on to find out more! Sourced from here.)

Hello! I hope everyone had a nice restful Easter. Cloudflare made an exciting announcement recently (more on that later), which inspired me to sit down and write about a vital, but invisible, part of the internet we know today.

It's called DNS (Domain Name System), and I'd like to take you on a journey - showing you what DNS is, how it works, how it can be exploited, and what we can do about it. After all, privacy is important! How does relate to DNS you ask? Well, I'll show you - but we're getting a little ahead of ourselves. Let's introduce DNS first. I'll explain what it is, how it works, and why we need it.

Enter Stage Left

DNS is, in many ways, the backbone of the modern internet. While it isn't directly responsible for delivering billions of packets across the internet every day like the Internet Protocol is, its role is still vitally important. DNS is responsible for translating domain names, such as starbeamrainbowlabs.com, bobsrockets.com, or billsboosters.net into an IP address that your device can connect to in order to do whatever else it needs to do.

It does this by sending a UDP datagram (comparison with TCP) to a DNS server ask it for a specific type of response - usually the IP address associated with a specific domain name. The following query types are most common:

  • A - Returns the IPv4 address(es) associated with the specified domain name
  • AAAA - Same as A, but returns IPv6 addresses instead
  • CNAME - Acts as an alias to another domain name. Usually immediately followed by either an A or AAAA record in the DNS server's response to save time (a DNS server can return multiple items in a single response)
  • MX - A bit like a CNAME, but returns a prioritised list of domains that handle email for the specified domain.
  • TXT - Contains an arbitrary text string. Usually used for easter eggs or for domain ownership verification by various analytics services (e.g. Google Analytics, Bing Webmaster Tools, etc.)
  • NS - Specifies which DNS servers can be queried about the domain.
  • SOA - Specifies what the primary DNS server is that holds the authoritative copy of the DNS records for the specified domain.

Let's try it out

With that in mind, lets try some queries.

(Can't see the above asciicast? Try viewing it over on asciinema.org, or entering the below commands into a computer with DiG)

dig starbeamrainbowlabs.com
dig bbc.co.uk AAAA
dig cloudflare.com MX
dig contact.starbeamrainbowlabs.com TXT
dig github.com TXT

DiG is a command-line DNS client for Linux-like operating systems (if you don't have it already, try sudo apt install dnsutils, or equivalent for your distribution. If you're on Windows without access to a Linux-like machine, try following along with nslookup.). In the above asciicast I make a variety of queries for demonstrative purposes. Note the QUESTION SECTION and ANSWER SECTION bits - they tell us what the query was for, and what the response to that query was. For example, here's an extract from the question and answer sections respectively from the bbc.co.uk lookup in the asciicast:

;bbc.co.uk.         IN  AAAA
bbc.co.uk.      300 IN  AAAA    2a04:4e42:600::81

The bit in the question section is quite straightforward - it's asking for an AAAA record for bbc.co.uk.. The answer section is a bit more complicated. From left to right:

  • bbc.co.uk. - The domain name the response is for.
  • 300 - The time-to-live. In other words, the number of seconds that the response can be cached for.
  • IN - a legacy component. Stands for INternet - more information here
  • AAAA - The type of response record.
  • 2a04:4e42:600::81 - The IPv6 address that the domain name corresponds to.

Am I being spied on?

DNS works rather well, most of the time. The problems start to occur when you start thinking about privacy. With more websites than ever now serving their websites over https, the data that we transfer between these websites and our devices is now much more secure - and can't be intercepted, analysed, and modified in transit.

DNS, however, is not currently encrypted - which poses a rather serious problem. Anyone able to get a hold of your devices network traffic - such as another device of your network in promiscuous mode, your ISP, or literally anyone in between you and your DNS server - can spy on the DNS lookups your device is doing, and even poison your DNS cache - sending you to an attacker's website when you typed in a legitimate domain name!

DNS Cache Poisoning in Action

(Above: A DNS timing cache poisoning in action. The attacker responds with a spoofed UDP datagram before the original server has a chance to reply!)

Thankfully, after 35 years of DNS, the internet has some solutions to some of these problems. First up: DNSSEC. Often misunderstood, the protocol tries to prevent man-in-the-middle and timing attacks (such as the one shown in the diagram above) by cryptographically verifying the DNS records returned to the client. Though it's actually 20 years old already, it's still overly-complicated - and subsequently hasn't been rolled out by an awful lot of people. It's also rather weighty - requiring the transfer of crytographical keys and other associated information.

Preventing cache poisoning is one thing, but it would be nice to prevent nosy onlookers from peering at the DNS queries we're making - and here's where it gets complicated. As of early 2018, there are currently no less than 3 competing standards to provide proper client-server connection encryption:

  • DNS-over-HTTPS - Basically a protocol for sending DNS requests via a standard HTTPS web server. As you can imagine, this can be rather weighty.
  • DNS-over-TLS - As the name implies - DNS queries over a raw TLS connection - which is, in short, a HTTPS connection without the HTTP bit. Now supported natively in Android.
  • DNSCurve - An augmentation to the existing DNS protocol that adds encryption by way of elliptical curves. The supposed official website appears to be a bit biased and inaccurate, so I'm linking to the Wikipedia article here.

A bit of mess, isn't it? Furthermore, many applications don't yet have support for some (or any) of these protocols. In that regard, it's currently a waiting game. Still, it's interesting to compare the different approaches taken here. Most of these protocols carry significantly more weight that plain-old DNS - with DNS-over-HTTPS being the most weighty, and DNSCurve being the lightest I should imagine.

I find it especially curious that DNS-over-HTTPS is as popular as it is. Surely it's a bit flawed if you've got to look up the domain name of the HTTPS server that you need to contact in order to do a 'secure' lookup? A safe is only as strong as it's weakest point, after all....

But wait, there's more!

Encrypted and verified responses are all very well, but it's no good of the owner of the DNS server themselves are logging all the queries you send to them! Google's 8.8.8.8 service logs a percentage of queries made permanently to disk, and OpenDNS don't appear to have very many details on their website about what data they collect and what they don't!

Furthermore, some DNS servers (especially those controlled by ISPs) tend to have some domain names censored due to agreements with their country's government - preventing you from 'accessing' a website by stopping your device from figuring out where on the internet to talk to.

Clearly, these are serious issues - and the solutions boil down to trust. Who do you trust to send your DNS queries to? If you don't trust any of the aforementioned providers (Google Public DNS or OpenDNS), then you could always run a DNS resolver yourself.

How does it work if you run it yourself? Well, basically instead of your device sending queries to a remote DNS server, they send it to your personal DNS server instead. Your personal DNS server then performs a recursive resolve. Basically, this means that it traverses the requested domain name from right-to-left, analysing and resolving each part in turn. For example, gateway.discord.gg. would be resolved like so:

  • com.
  • discord.
  • gateway.

For each successive part of the domain name, the DNS server asks the next one in the the chain that other DNS servers hold the authoritative records about that domain name (using SOA / NS records), and then repeats the cycle with the servers provided in the response.

Quite quickly you can see that there's an issue here - how does it know where to start? That's where root servers come in. They contain the authoritative information on the Internet's top-level domains. These servers can be queried to figure out which servers hold information about the various country codes (or other codes). The servers that these root servers point to can then be queried to ask who holds information about the various domain names you and I are used to typing in our address bars, such as seanssatellites.io, or billsboosters.edu for instance.

A simpler alternative

This brings me to the announcement by Cloudflare I mentioned at the beginning of this post. By now, you can probably guess what it is: they've set up a new public DNS server! Apparently, they did a deal with [APNIC]() to let them study the garbage traffic that ends up at 1.1.1.1 in exchange for running a DNS server on it.

Either way, I think it's a brilliant thing for the Internet at large to have another public DNS network to choose from. Especially considering how privacy-conscious they appear to have being in setting it up: They never store client IP addresses, and they delete the anonymised logs after 24 hours. Assuming what they've said is true, I think that it's rather great. For my own personal reference, here are the IP addresses of Cloudflare's new service:

  • 1.1.1.1
  • 1.0.0.1
  • 2606:4700:4700::1111
  • 2606:4700:4700::1001

Conclusion

That brings us to the end of our journey through DNS. We've seen what DNS is and how it works. We've also seen how it can be attacked, and what is being done about it. Lastly, we've taken a look at how running your own recursive resolver works, and looked at Cloudflare's new service.

If you'd like to continue on and explore DNS further, I've left some links below.

Found this informative? Still confused about something? Comment below!

Sources and Further Reading

Read / Write Disk Performance Testing in Bash

Recently I needed to quickly (and non-destructively) test the read / write performance of a flash drive of mine. Naturally, I turned my attention to my terminal. This post is me documenting what I did so that I can remember for next time :P

Firstly, to test the speed of a disk, we need some data to test with. Since lots of small files will inevitably cause slowdowns due to the overhead of writing the file metadata and inode information to the superblock, it makes the most sense to use one gigantic file rather than tons of small ones. Here's what I did to generate a 1 Gigabyte file filled with zeroes:

dd if=/dev/zero of=/tmp/testfile.bin bs=1M count=1024

Cool. Next, we need to copy it to the target disk and measure the time it took. Then, since we know the size of the file (1073741824 bytes, to be exact), we can calculate the speed at which the copy took place. Here's my first attempt:

time dd if=/tmp/testfile.bin >testfile.bin

If you run this, you might find that it doesn't take it very long at all, and you get a speed of something like ~250MiB / sec! While impressive, I seriously doubt that my flash drive has that kind of speed behind it. Typically, flash memory takes longer to write to and read from - and I'm pretty sure that it can't read from it that fast either. So what's going on?

Well, it turns out that Linux is caching the disk write operations in a buffer, and then doing them in the background for us. Whilst fine for ordinary operation, this doesn't give us an accurate representation of how fast it's actually writing to the disk. Thankfully, there's something we can do about this: Use the sync command. sync will flush all cached write operations to disk for us, giving us the actual time it took to write the 1 GiB file to disk. Here's the altered command:

sync;
time sh -c 'dd if=/tmp/testfile.bin >testfile.bin; sync'

Very cool! Now, we can just take the time it took and do some simple maths to calculate the write speed of our disk. What about the read speed though? Well, to test that, we'll first need to clear out the page cache - another one of Linux's (many) caches that holds portions of files that have recently been accessed for faster retrieval - because as before, we're not interested in the speed of the cache! Here's how to do that:

echo 1 | sudo tee /proc/sys/vm/drop_caches

With the correct cache cleared, we can test the read speed accurately. Here's how I did it:

time dd if=testfile.bin of=/dev/null

Fairly simple, right? At a later date I might figure out a way of automating this, but for the occasional use now and again this works just fine :)

Found this useful? Got a better way of doing it? Want to say hi? Post in the comments below!

Java: First Impressions

The logos of a few of the tools and language I've been using recently.

(Above: The Android, Android Studio, and Java logos. I don't own any of these - nor is this post endorsed by any of the entities represented here - they are just for illustrative purposes.)

I've been using Java pretty extensively recently, as I've been doing a module on Android development at University. It's a pretty interesting language, so I thought I'd share my first impressions here. Later on in a separate post, I'll also talk a little bit about Kotlin, Google's new language they are championing for development on the Android platform.

Firstly, Android Studio has made it really easy to get started. The code hinting / autocompletion is fairly intelligent, and provides enough support that it's not too much of a bother programming in a new environment that you've never seen before - lessening the burden of learning a new language.

It seems to me that the whole build process for Java applications has been greatly overcomplicated though. It's slow, and keeps throwing random errors - especially when I've only just opened Android Studio. This non-determinism proves especially challenging for beginners, such as myself - as sometimes there's no real way to know what's gone wrong (the error messages are not particularly helpful - I've seen several languages with much more helpful ones).

There seem to be a bunch of assumptions that the developers have made too about the user's setup and programming style - leading to confusing situations in which it doesn't work - but there's no real way to know why, as there aren't any obvious error messages.

Despite this, Java as a language has some interesting features. As a whole, I can definitely see where Microsoft got their inspiration for C♯ from, as it's very similar - just without a lot of the syntactical sugar I'm used to in C♯ that makes expressing complex data structures and algorithms much easier, such as getters and setters.

Particularly of note is the exception system. In Java, if you want to throw an exception, you have to add throws ExceptionName to the method signature. Since your main activity in Android contains overridden methods at the top level, this means that you have to use lots of try..catch blocks to trap exceptions and deal with them before they bubble up to higher levels - otherwise it's a compilation error!

While this can be helpful, I've found that it can lead to awkward bugs in which an exception is eaten higher up, and the default value that's returned by the method that eats the exception causes strange things to happen that aren't immediately obvious - and it's only when you check the log that you realise what happened.....

The other bothersome thing I've found is the deeply-nested folder structure that a Java project appears to generate for even the simplest of projects. This makes it a rather difficult and involved process to find any code outside of the IDE - which I often do because Android Studio is far too slow and bulky just to check on or reference something quickly.

Finally, the last issue that concerns me are the licensing issues that have plagued Java in recent years. If you haven't heard, Google and Oracle (the company that owns Java) have been in disagreement over licensing fees which Oracle claims Google should pay them because they used Java in the making of Android (which is an open-source project). If Oracle are going after Google over licensing fees for just using a language, then what does that say about any projects I do? It's not exactly confidence inspiring, that's for sure. I for one will be keeping as much of my code library out of the Java ecosystem as possible.

Java seems to be the kind of language with a lot of history. While some of this has led to innovations that have ultimately improved the language, I feel that as a language it's being bogged down by lots of bloat and unnecessary garbage that it could really do without. C♯ has done a brilliant job so cutting through this clutter and rubbish, creating a language that both works with you and is easy to understand (except .NET Standard and .NET Core, but that's a story for another time :P).

Issues with Android Studio

I don't know about you, but I've been having a spot of bother with Android Studio - the IDE we're using for your Mobile Development ACW in which we are building an app for Android. I thought I'd document some of the challenges I've encountered in the process of installing it and using version 3.0.1 on Linux - and issues I've seen in the University labs too.

Disclaimer: This is by no means a complete list. Take advice from this list at your own risk! Additionally, any issues with the University lab machines must be reported to ICTD, whose email address you can find on your desktop background when you login.

Android Studio can't find the SDK

This issue is fairly trivial - it means that the Android SDK is probably not installed. There are two solutions here - download it through Android Studio itself, or, if you're on Linux, install the appropriate SDK package using your package manager.

Using Ubuntu it's the android-sdk package - on Arch-based distributions you'll have to consult the Arch User Repository. Don't forget to point the IDE at the location that it installed it to in the settings! You might have to hunt around a bit, but it's nothing a sudo find / -mount -iname "*sdk*" or something similar won't fix :P

Android Studio doesn't have permission to download the SDK to disk

This issue is specific to multi-user machines upon which you are downloading the SDK that you don't have administrative privileges on. The solution? Create a new directory and specify that as the Android SDK path before asking it to download the SDK for you.

After downloading the SDK, the Gradle sync fails

This is probably because the SDK version specified in the Gradle file doesn't match the one you have installed. Updating this should resolve the issue.

If not, then check the build tools version too. You can find the version it should be by opening the root of the SDK in your favourite file manager, going into the build-tools folder, and observing the name of the only folder in that directory.

Android Studio claims that abd doesn't exist

If you're on Linux, then it's likely that you don't have the Android Debugger installed. Find and install it with your package manager (it's probably called adb or similar).

If you're on Windows, check that you've set the SDK path correctly. adb can also be found in the platform-tools folder of the SDK. Also make sure that you have execute privileges on the drive you installed the adb to.

Other than that, I suspect that your installation of Android Studio might be broken, and require a re-install.

Android Studio claims that the emulator is out of date

I've has this one several times - simply press the update button when prompted (if you've got administrative privileges). I've found that the updates have made the emulator progressively more stable, so if you're experiencing issues, it's worth installing any updates it asks you about.

Android Studio claims that the "Google Maven repository" doesn't exist

Again, simply click the "add" button or whatever it is when prompted. Unlike the emulator update though, this is project-specific and doesn't require administrative privileges.

Intel HAXM errors

Another issue that I've heard of happening in the lab. I've heard that the following help:

  1. Make sure that Hyper-V is turned off, as it's mutually exclusive with Intel's HAXM.
  2. Delete the Intel folder in C:\ProgramData

Other various compatibility issues with the Android Studio project

If you experience any random compatibility issues when trying to open an existing project that was for an older version of Android Studio, delete the .idea folder and then open Android Studio again. The .idea folder actually just contains auto-generated files - none of which can't be replaced based on the rest of your project. To that end, I'd avoid committing it to source code control too.

Pressing start next to a virtual device doesn't do anything

I've seen this a few times - and I think it might be an Intel HAXM issue. Try reading the solution above.

Android Studio claims that the module SDK is not defined

This only happens on startup. Wait it out, and it should disappear once the Gradle sync finishes. It'll prompt you to delete a Gradle project file because it's "not part of the project", but I haven't had the courage to allow it to delete it yet :P

Errors relating to the integrated source-code-control support

I've seen many of these, but I ignore them as the external tools I use to manage my repository work just fine - and I've no desire to allow a complicated and opinionated IDE to take control over how I commit my code. If anyone knows how to disable the integrated SVN/Git support, I'm all ears!

After updating Android Studio from 3.0.1 to 3.1, all the Android API calls in my code turn to unresolved references, despite dthe gradle build being successful!

I've just experienced this (~March 31st) with a Kotlin project. The solution, according to a nice person on stackoverflow, is to delete the following folders in your project whilst Android Studio is closed:

  • .idea/
  • build/
  • app/build/

Once done, open Android Studio again and the problem should be resolved - once it's rebuilt all it's cache files, of course.


That about concludes the list of issues I've seen and experienced. If you've experienced any of the above (or even a different issue) and found a different workaround, and then let me know below! Did a solution work / not work? Let me know too.

Another reminder: I take no responsibility for any damage that might happen to your computer / project / work as a result of following this suggestions. Always have backups! Additionally, as mentioned above, if you're having an issue with the machines in the University labs, you need to let ICTD know by emailing the address on your desktop background. If you don't, then they won't know about the issue!

Quest Get: Search large amounts of code!

A map  of the Linux Kernel source code.

(Above: A map of the Linux Kernel source code. Source: this post on medium.)

Recently I was working on a little project of mine (nope, not this for once! :P), and I needed a C♯ class I'd written a while ago. Being forgetful as I am, I had no idea which of my project I'd written it for. And so the quest began to find it! I did in the end, but it left me thinking whether there was a better way to search all my code quickly. This post is the culmination of everything I've discovered so far about the process of searching one's code.

Before I started, I already know about grep, which is built into almost every Linux system around. It's even available for Windows via the MSYS Tools. Unfortunately though, despite it's prevailance, it's not particularly good at searching large numbers of git repositories, as it keeps descending into the .git folder and displaying a whole load of useless results.

Something had to change. After asking reddit, I was introduced to OpenGrok. Written in Java, it indexes all of your code, and provides a web interface through which you can search it. Very nice. Unfortunately, I had trouble figuring out the logistics of actually getting it to run - and discovered that it takes multiple hours to set up correctly.

Moving on, I was re-introduced to ack, written in plain-old Perl, it apparently runs practically any system that Perl does - though it's not installed by default like grep is. Looking into it, I found it to be much like grep - only smarter. It ignores version control directories (like the .git folder ), and common package folders (like node_modules) by default, and even has a system by which results can be filtered by language (with support for hash-bangs too!). The results themselves are coloured by default - making it easy to skim through quickly. Coupled with the flexible configuration file system, ack makes for a wonderfully flexible way to search through large amounts of code quickly.

Though ack looks good, I still didn't have a way to search through all my code that scattered across multiple devices at once, so I kept looking. The next project I found (through alternative to actually) was Text Sherlock. It positions itself as an alternative to OpenGrok that's much simpler to configure.

True to its word, I managed to get a test instance set up running from my /tmp directory in 15 minutes - though it did take a while to index the code I had locally. It also took several seconds to consult its index when I entered a query. I suspect I could alleviate both of these issues by installing Xapian (an open-source high-performance search library), which it appears to have support for.

While the interface was cool, it didn't appear to allow me to tell it which directories not to index, so it ended trawling through all my .git directories - just like grep did. It also doesn't appear to multi-threaded - so it took much longer to index my code than it really needed to (I've got a solid-state drive and enough RAM for a few GBs of cache, so the indexing operation was CPU-bound, not I/O-bound).

In the end, I've rediscovered the awesome search tool ack, and taken a look at the current state of code search tools today. While I haven't yet found precisely what I'm looking for, I'm further forward than when I started.

Other honourable mentions include GNU Global (which apparently needs several GiBs per ~300MiB of source code for its generated static HTML web interface), insight.io (an IDE-like freemium cloud product that 'understands your code'), CodeQuery (only supports C, C++, Java, Python, Ruby, Javascript, and Go), and ripgrep (rust-based program, similar to ack and grep, feature comparison). The official ack website has a good page that contains more tools that are worth a look, too.

Got a cool way to search through all your code? Did this help you out? Comment below!

Retinex: Correct your low-light images today!

I was processing some images for someone recently, and I ended up encountering issues with colour balance. The images looked okay on my monitor, but as soon as I printed them out, they took on a slight red-orange tint. Very interesting. I suspect that the root cause lies in some complex colourspace or device colour profile issue (which will take me ages to debug and track down), but I stumbled upon a filter in GIMP called Retinex, which provided a very useful workaround.

According to the GIMP documentation, retinex is an algorithm that improves the appearance of images that were taken in sub-optimal lighting conditions. It's probably best illustrated with an example:

An example of the retinex filter in action.

(Above: An example of the retinex filter in action. Image source: The official GIMP documentation.)

As you can see, the things on the desk are much easier to pick out in the right image as compared to the left one. Apparently, the algorithm was invented at NASA's Langley Research Centre in 2004 to automatically enhance astronomical photographs - and has a full name of Multi-Scale Retinex with Color Restoration (MSRCR) - which is a bit of mouthful!

During my own testing, I've found it be most effective on outdoor pictures, or pictures with poor lighting. I've also found it to be rather prone to introducing noise into the image - so if a simple automatic white balance correction will suffice, then that's probably a better filter to apply than this one.

It's one of those things that's really useful to know about - because it might just solve your problem one day! To that end, I wanted to blog about it so that I don't forget :P

Sources and Further Reading

Markov Chains Part 3: Weighted Chains

Recently I remembered that I had all the pieces I need to make a weighted version of the unweighted markov chain I built in part 2 of this series - specifically the weighted random number generator I built shortly after the unweighted markov chain. With this in mind, I decided on a whim to put all the pieces of the puzzle together - and this post is the result!

Where to start... hrm. I know. To start with, I needed to perform a minor upgrade tot he WeightedRandom class I had. If you haven't read the original post about it, I'd recommend doing so now.

Finished reading that? Great! Lets talk about the changes I've made (they may show up there at the bottom, since I embedded it via GitHub Gist). Firstly, I needed a way to work out if the weighted random number generator was currently empty or not, leading me to add a Count property:

public int Count {
    get {
        return weights.Count;
    }
}

With a count property in place, I also found I was going to need a way to dynamically swap out the weightings of the random number generator without creating a completely new instance - which would end up resetting the Random class instance it was working with, leading to a reduction in the quality in random numbers it uses under high load (see [this article]() for more information on that).

To that end, I ended up refactoring the constructor into a pair of methods: SetContents, and a companion method ClearContents. Since the weight calculations happen when the items are first added to the generator, and I'd need to completely recalculate them if another item is added, I wasn't able to emulate the API for another existing class in .NET, such as the List class, as I like to do.

Finally, I found later on I needed a way to initialise an empty weighted random generator, so I added a new empty constructor to facilitate that, along with an additional check in the Next() method that throws an InvalidOperationException if the generator is empty and you try to ask it to pick a random item.

Here's the updated WeightedRandomNumberGenerator:

(Can't see the above? Click here to view it on GitHub directly, or here for the raw code as plain text)

With the weighted random number generator updated to properly support the future weighted markov chain, let's get down to the markov chain itself. Firstly, let's create a skeleton that's based on the UnweightedMarkovChain class I wrote in the last post:

using System;
using System.Collections.Generic;
using System.Linq;
using MarkovGrams.Utilities;
using SBRL.Algorithms;

namespace MarkovGrams
{
    /// <summary>
    /// An unweighted character-based markov chain.
    /// </summary>
    public class WeightedMarkovChain
    {
        private WeightedRandom<string> wrandom = new WeightedRandom<string>();

        /// <summary>
        /// The ngrams that this markov chain currently contains.
        /// </summary>
        Dictionary<string, double> ngrams;

        /// <summary>
        /// Creates a new character-based markov chain.
        /// </summary>
        /// <param name="inNgrams">The ngrams to populate the new markov chain with.</param>
        public WeightedMarkovChain(IEnumerable<string> inNgrams);

        /// <summary>
        /// Returns a random ngram that's currently loaded into this WeightedMarkovChain.
        /// </summary>
        /// <returns>A random ngram from this UnweightMarkovChain's cache of ngrams.</returns>
        public string RandomNgram();

        /// <summary>
        /// Generates a new random string from the currently stored ngrams.
        /// </summary>
        /// <param name="length">
        /// The length of ngram to generate.
        /// Note that this is a target, not a fixed value - e.g. passing 2 when the n-gram order is 3 will
        /// result in a string of length 3. Also, depending on the current ngrams this markov chain contains,
        /// it may end up being cut short. 
        /// </param>
        /// <returns>A new random string.</returns>
        public string Generate(int length);
    }
}

As you can see, it is rather similar to the unweighted version. Fear not however, for the differences will become more apparent shortly. The only real difference so far is the extra private WeightedRandom<string> wrandom declaration at the top of the class. Let's change that though, by filling out the constructor:

ngrams = new Dictionary<string, double>();
foreach (string ngram in inNgrams)
{
    if (ngrams.ContainsKey(ngram))
        ngrams[ngram]++;
    else
        ngrams.Add(ngram, 1);
}

Here, we read in the raw n-grams and a dictionary that represents the number of times that a given n-gram has been discovered. It's got to be a double there as the type value of the dictionary, as apparently the C♯ compiler isn't clever enough to convert a Dictionary<string, int> to a Dictionary<string, double>. Hrm. Maybe they'll fix that in the future (or if not, does anyone know why not-)?

Anyway, let's move on to RandomNgram(). Here it is:

if (wrandom.Count == 0)
    wrandom.SetContents(ngrams);
return wrandom.Next();

Quite simple, right? Basically, we populate the weighted random generator if it's currently empty, and then we simply ask it for a random item. We're on a roll, here! Let's keep going with Generate(). Here's the first part:

string result = RandomNgram();
string lastNgram = result;
while(result.Length < length)
{
    // ......
}

Here, we declare an accumulator-like variable result to hold the word we're generating as we construct it, and another one to holdt he last n-gram we picked. We also create a while loop to make sure we keep adding to the word until we reach the desired length (we'll be adding a stop condition just in case we run into a brick wall later). Next, let's put some code inside that while loop. First up is the (re)population of the weighted random number generator:

wrandom.ClearContents();
// The substring that the next ngram in the chain needs to start with
string nextStartsWith = lastNgram.Substring(1);
// Get a list of possible n-grams we could choose from next
Dictionary<string, double> convNextNgrams = new Dictionary<string, double>();
ngrams.Where(gram_data => gram_data.Key.StartsWith(nextStartsWith))
      .ForEach((KeyValuePair<string, double> ngramData) => convNextNgrams.Add(ngramData.Key, ngramData.Value));

Ah, good ol' Linq to the rescue again! But wait, what's that ForEach() call there? I don't remember that being in core .NET! You'd be right of course, but through the power of [extension methods]() one can extend a class with an additional method that can then be used as if it were an integral part of that class, when in reality that isn't the case! Here's my definition for that ForEach() extension method I used above:

public static class LinqExtensions
{
    public static void ForEach<T>(this IEnumerable<T> enumerable, Action<T> action)
    {
        foreach (T item in enumerable)
        {
            action(item);
        }
    }
}

Next, we need to add that stop condition we talked about earlier before I forget! Here it is:

// If there aren't any choices left, we can't exactly keep adding to the new string any more :-(
if(convNextNgrams.Count() == 0)
    break;

Observant readers will notice that we haven't actually finished the (re)population process yet, so we should do that next. Once done, we can also obtain a random n-gram from the generator and process it:

wrandom.SetContents(convNextNgrams);
// Pick a random n-gram from the list
string nextNgram = wrandom.Next();
// Add the last character from the n-gram to the string we're building
result += nextNgram[nextNgram.Length - 1];
lastNgram = nextNgram;

That completes my initial weighted markov chain implementation. Here's the class in full:

using System;
using System.Collections.Generic;
using System.Linq;
using MarkovGrams.Utilities;
using SBRL.Algorithms;

namespace MarkovGrams
{
    /// <summary>
    /// An unweighted character-based markov chain.
    /// </summary>
    public class WeightedMarkovChain
    {
        private WeightedRandom<string> wrandom = new WeightedRandom<string>();

        /// <summary>
        /// The ngrams that this markov chain currently contains.
        /// </summary>
        Dictionary<string, double> ngrams;

        /// <summary>
        /// Creates a new character-based markov chain.
        /// </summary>
        /// <param name="inNgrams">The ngrams to populate the new markov chain with.</param>
        public WeightedMarkovChain(IEnumerable<string> inNgrams)
        {
            ngrams = new Dictionary<string, double>();
            foreach (string ngram in inNgrams)
            {
                if (ngrams.ContainsKey(ngram))
                    ngrams[ngram]++;
                else
                    ngrams.Add(ngram, 1);
            }
        }

        /// <summary>
        /// Returns a random ngram that's currently loaded into this WeightedMarkovChain.
        /// </summary>
        /// <returns>A random ngram from this UnweightMarkovChain's cache of ngrams.</returns>
        public string RandomNgram()
        {
            if (wrandom.Count == 0)
                wrandom.SetContents(ngrams);
            return wrandom.Next();
        }

        /// <summary>
        /// Generates a new random string from the currently stored ngrams.
        /// </summary>
        /// <param name="length">
        /// The length of ngram to generate.
        /// Note that this is a target, not a fixed value - e.g. passing 2 when the n-gram order is 3 will
        /// result in a string of length 3. Also, depending on the current ngrams this markov chain contains,
        /// it may end up being cut short. 
        /// </param>
        /// <returns>A new random string.</returns>
        public string Generate(int length)
        {
            string result = RandomNgram();
            string lastNgram = result;
            while(result.Length < length)
            {
                wrandom.ClearContents();
                // The substring that the next ngram in the chain needs to start with
                string nextStartsWith = lastNgram.Substring(1);
                // Get a list of possible n-grams we could choose from next
                Dictionary<string, double> convNextNgrams = new Dictionary<string, double>();
                ngrams.Where(gram_data => gram_data.Key.StartsWith(nextStartsWith))
                      .ForEach((KeyValuePair<string, double> ngramData) => convNextNgrams.Add(ngramData.Key, ngramData.Value));
                // If there aren't any choices left, we can't exactly keep adding to the new string any more :-(
                if(convNextNgrams.Count() == 0)
                    break;
                wrandom.SetContents(convNextNgrams);
                // Pick a random n-gram from the list
                string nextNgram = wrandom.Next();
                // Add the last character from the n-gram to the string we're building
                result += nextNgram[nextNgram.Length - 1];
                lastNgram = nextNgram;
            }
            wrandom.ClearContents();
            return result;
        }
    }
}

You can find it on my private git server here, if you're interested in any future improvements I might have made to it since writing this post. Speaking of which, I've got a few in mind - mainly refactoring both this class and it's unweighted cousin to utilise lists of objects instead of strings. This way, I'll be able to apply it to anything I like - such as sentence generation, music improvisation, and more!

I'd also like to extend it such that I can specify the weights manually, giving me even more flexibility as to how I can put the engine to use.

(Found a cool use for a Markov Chain? Comment about it below!)

Sources and Further Reading

LoRaWAN talks at CD4I!

The LoRaWAN Logo (The LoRaWAN Logo. Of course, this post isn't endorsed (or even read?) by them at all)

Hello again! I decided to write a quick post about the trio of talks I attended at C4DI yesterday. We had Rob Miles, Robin, and a very knowledgeable Paul from Norfolk come to us about all things LoRa.

Rob Miles started off with an introduction to how it all works, and how as a hobbyist we can get started with it and build an excellent cow tracking program :D

Robin took it further by showing us how he took his idea for a temperature graph from first principles to a working device, all the steps along the way, and solutions to the problems he encountered whilst building it.

Finally, Paul showed us what he has been doing with LoRa down in Norfolk, and went into further details as to how LoRa devices communicate with your application server. He also talked more about The Things Network, and how the people behind it are creating a public LoRa network that everyone can both use and contribute to by running a gateway. Apparently, soon even private commercial companies can deploy private LoRa infrastructure that is able to route public messages through to the things network - since they are picked up anyway due to the nature of radio!

All in all, it was an excellent set of talks - even if I didn't know very many people there, and had to leave a bit before the end to attend a meeting!

If any of these 3 talks sound interesting to you, Rob Miles should have the slides available on his blog soon. I've also got a recording of all 3 talks (minus the last bit of Paul's talk of course). If you'd like a copy of the recordings, get in touch (IRL if you know me, by email - check my homepage for the address, or by commenting below and I can pull your email address from the comment)!

Securing a Linux Server Part 2: SSH

Wow, it's been a while since I posted something in this series! Last time, I took a look at the Uncomplicated Firewall, and how you can use it to control the traffic coming in (and going out) of your server. This time, I'm going to take a look at steps you can take to secure another vitally important part of most servers: SSH. Used by servers and their administrators across the world to talk to one another, if someone manages to get in who isn't supposed to, they could do all kinds of damage!

The first, and easiest thing we can do it improve security is to prevent the root user logging in. If you haven't done so already, you should create a new user on your server, set a good password, and give it superuser privileges. Login with the new user account, and then edit /etc/ssh/sshd_config, finding the line that says something like

PermitRootLogin yes

....and change it to

PermitRootLogin no

Once done, restart the ssh server. Your config might be slightly different (e.g. it might be PermitRootLogin without-password) - but the principle is the same. This adds an extra barrier to getting into your server, as now attackers must not only guess your password, but your username as well (some won't even bother, and keep trying to login to the root account :P).

Next, we can move SSH to a non-standard port. Some might argue that this isn't a good security measure to take and that it doesn't actually make your server more secure, but I find that it's still a good measure to take for 2 reasons: defence in depth, and preventing excessive CPU load from all the dumb bots that try to get in on the default port. With that, it's make another modification to /etc/ssh/sshd_config. Make sure you test at every step you take, as if you lock yourself out, you'll have a hard time getting back in again....

Port 22

Change 22 in the above to any other number between about 1 and 65535. Next, make sure you've allowed the new port through your firewall! If you're using ufw, my previous post (link above) gives a helpful guide on how to do this. Once done, restart your SSH server again - and try logging in before you close your current session. That way if you make a mistake, you can fix through your existing session.

Once you're confident that you've got it right, you can close port 22 on your firewall.

So we've created a new user account with a secure password (tip: use a password manager if you have trouble remembering it :-)), disabled root login, and moved the ssh port to another port number that's out of the way. Is there anything else we can do? Turns out there is.

Passwords are not the only we can authenticate against an SSH server. Public private keypairs can be used too - and are much more secure - and convenient - than passwords if used correctly. You can generate your own public-private keypair like so:

ssh-keygen -t ed25519

It will ask you a few questions, such as a password to encrypt the private key on disk, and where to save it. Once done, we need to tell ssh to use the new public-private keypair. This is fairly easy to do, actually (though it took me a while to figure out how!). Simply edit ~/.ssh/config (or create it if it doesn't exist), and create (or edit) an entry for your ssh server, making it look something like this:

Host bobsrockets.com
    Port            {port_name}
    IdentityFile    {path/to/private/keyfile}

It's the IdentityFile line that's important. The port line simply makes it such that you can type ssh bobsrockets.com (or whatever your server is called) and it will figure out the port number for you.

With a public-private keypair now in use, there's just one step left: disable password-based logins. I'd recommend trailing it for a while to make sure you haven't messed anything up - because once you disable it, if you lose your private key, you won't be getting back in again any time soon!

Again, open /etc/ssh/sshd_config for editing. Find the line that starts with PasswordAuthentication, and comment it out with a hash symbol (#), if it isn't already. Directly below that line, add PasswordAuthentication no.

Once done, restart ssh for a final time, and check it works. If it does, congratulations! You've successfully secured your SSH server (to the best of my knowledge, of course). Got a tip I haven't covered here? Found a mistake? Let me know in a comment below!

Art by Mythdael