Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

The plan to caption and index images

Something that has been on my mind for a while are the photos that I take. At last count on my NAS I have 8564 pictures I have taken so far since I first got a phone to take them with, and many more belonging to other family members.

I have blogged before about a script I've written that automatically processes photos graphs and files them in by year and month. It fixes the date taken, set the thumbnail for rapid preview loading, automatically rotates them to be the right way up, losslessly optimises them, and more.

The one thing it can't do though is to help me locate a specific photo I'm after, so given my work with AI recently I have come up with a plan to do something about this, and I want to blog about it here.

By captioning the images with an AI, I plan to index the captions (and other image metadata) and have a web interface in the form of a search engine. In this blog post, I'm going to outline the AI I intend to use, and the architecture of the image search engine I have already made a start on implementing.

AI for image captioning

The core AI to do image captioning will be somewhat based on work I've done for my PhD. The first order of business was finding a dataset to train on, and I stumbled across Microsoft's Common Objects in Context dataset. The next and more interesting part was to devise a model architecture that translate an image into text.

When translating 1 thing (or state space) into another in AI, it is generally done with an encoder-decoder architecture. In my case here, that's an encoder for the image - to translate it into an embedded feature space - and a decoder to turn that embedded feature space into text.

There are many options for these - especially for encoding images - which I'll look at first. While doing my PhD, I've come across many different encoders for images, which I'd roughly categorise into 2 main categories:

Since the transformer model was invented, they have been widely considered to be the best option. Swin Transformers adapt this groundbreaking design for images - transformers originally handled text - from what I can tell better than the earlier Vision Transformer architecture.

On the other side, a number of encoders were invented before transformers were a thing - the most famous of which was ResNet (I think I have the right paper), which was basically just a bunch of CNN layers stacked on top of one another with a few extra bits like normalisation and skip connections.

Recently though, a new CNN-based architecture that draws inspiration from the strong points of transformers - and it's called ConvNeXt. Based on the numbers in the paper, it even beats the swin transformer model mentioned earlier. Best of all, it's much simpler in design so it makes it relatively easy to implement. It is this model architecture I will be using.

For the text, things are both straight forward - the model architecture I'll be using is a transformer (of course - I even implemented it myself from scratch!) - but the trouble is representation. Particularly the representation of the image caption we want the model to predict.

There are many approaches to this problem, but the one I'm going to try first is a word-based solution using one-hot encoding. There are about 27K different unique words in the dataset, so I've assigned each one a unique number in a dictionary file. Then, I can turn this:

[ "a", "cat", "sat", "on", "a", "mat" ]

....into this:

[ 0, 1, 2, 3, 0, 4 ]

...then, the model would predict something like this:

    [ 1, 0, 0, 0, 0, 0, ... ],
    [ 0, 1, 0, 0, 0, 0, ... ],
    [ 0, 0, 1, 0, 0, 0, ... ],
    [ 0, 0, 0, 1, 0, 0, ... ]
    [ 1, 0, 0, 0, 0, 0, ... ]
    [ 0, 0, 0, 0, 1, 0, ... ]

...where each sub-array is a word.

This will as you might suspect use a lot of memory - especially with 27K words in the dictionary. By my calculations, with a batch size of 64 and a maximum caption length of 25, each output prediction tensor will use a whopping 172.8 MiB memory as float32, or 86.4 MiB memory as float16 (more on memory usage later).

I'm considering a variety of techniques to combat this if it becomes an issue. For example, reducing the dictionary size by discarding infrequently used words.

Another option would be to have the model predict GloVe vectors as an output and then compare the output to the GloVe dictionary to tell which one to pick. This would come with it's own set of problems however, like lots of calculations to compare each word to every word in the dictionary.

My final thought was that I could maybe predict individual characters instead of full words. There would be more items in the sequence predicted, but each character would only have up to 255 choices (probably more like 36-ish), potentially saving memory.

I have already implemented this AI - I just need to debug and train it now. To summarise, here's a diagram:

The last problem with the AI though is memory usage. I plan on eventually running the AI on a raspberry pi, so much tuning will be required to reduce memory usage and latency as much I can. In particular, I'll be trying out quantisating my model and writing the persistent daemon to use Tensorflow Lite to reduce memory usage. Models train using the float32 data type - which uses 32 bits per value, but quantising it after training to use float16 (16 bits / value) or even uint8 (8 bits / value) would significantly reduce memory usage.

Search engine and indexing

The second part of this is the search engine. The idea here is to index all my photos ahead of time, and then have a web interface I can use to search and filter them. The architecture I plan on using to achieve this is rather complicated, and best explained with a diagram:

The backend I have chosen for the index is called meilisearch. It's written in Rust, and provides advanced as-you-type search functionality. This is for 2 reasons:

  1. While I'd love to implement my own, meilisearch is an open source project where they have put in more hours into making it cool than I ever would be able to
  2. Being a separate daemon means I can schedule it on my cluster as a separate task, which potentially might end up on a different machine

With this in mind, the search engine has 2 key parts to it: the crawler / indexer, and the HTTP server that serves the web interface. The web interface will talk to meilisearch to perform searches (not directly; requests will be proxied and transformed).

The crawler will periodically scan the disk for new, updated, and deleted files, and pass them on to the indexer queue. The indexer will do 4 things:

  1. Caption the image, by talking to a persistent Python child process via Inter Process Communication (IPC) - captions will be written as EXIF data to images
  2. Thumbnail images and store them in a cache (perhaps some kinda key-value store, as lots of little files on disk would be a disaster for disk space)
  3. Extract EXIF (and other) metadata
  4. Finally, push the metadata to meilisearch for indexing

Tasks 2 and 3 can be done in parallel, but the others will need to be done serially - though multiple images can of course be processed concurrently. I anticipate much asynchronous code here, which I'm rather looking forward to finishing writing :D

I already have a good start on the foundation of the search engine here. Once I've implemented enough that it's functional, I'll open source everything.

To finish this post, I have a mockup screenshot of what the main search page might look like:

Obviously the images are all placeholders (append ?help to this URL see the help page) for now and I don't yet have a name for it (suggestions in the comments are most welcome!), but the rough idea is there.

PhD Aside 2: Jupyter Lab / Notebook First Impressions

Hello there! I'm back with another PhD Aside blog post. In the last one, I devised an extremely complicated and ultimately pointless mechanism by which multiple Node.js processes can read from the same file handle at the same time. This post hopefully won't be quite as useless, as it's a cross with the other reviews / first impressions posts I've made previously.

I've had Jupyter on my radar for ages, but it's only very recently that I've actually given it a try. Despite being almost impossible to spell (though it does appear to be getting easier with time), both it's easy to install and extremely useful when plotting visualisations, so I wanted to talk about it here.

I tried Jupyter Lab, which is apparently more complicated than Jupyter Notebook. Personally though I'm not sure I see much of a difference, aside from a file manager sidebar in Jupyter Lab that is rather useful.

A Jupyter Lab session of mine, in which I was visualising embeddings from a pretrained CLIP model.

(Above: A Jupyter Lab session of mine, in which I was visualising embeddings from a pretrained CLIP model.)

Jupyter Lab is installed via pip (pip3 for apt-based systems): Once installed, you can start a server with jupyter-lab in a terminal (or command line), and then it will automatically open a new tab in your browser that points to the server instance (http://localhost:8888/ by default).

Then, you can open 1 or more Jupyter Notebooks, which seem to be regular files (e.g. Javascript, Python, and more) but are split into 'cells', which can be run independently of one another. While these cells are usually run in order, there's nothing to say that you can't run them out of order, or indeed the same cell over and over again as you prototype a graph.

The output of each cell is displayed directly below it. Be that a console.log()/print() call or a graph visualisation (see the screenshot above), it seems to work just fine. It also saves the output of a cell to disk alongside the code in the Jupyter Notebook, can be a double-edged sword: On the one hand, it's very useful to have the plot and other output be displayed to remind you what you were working on, but on the other hand if the output somehow contains sensitive data, then you need to remember to clear it before saving & committing to git each time, which is a hassle. Similarly, every time the output changes the notebook file on disk also changes, which can result in unnecessary extra changes committed to git if you're not careful.

In the same vein, I have yet to find a way to define a variable in a notebook file whose value is not saved along with the notebook file, which I'd rather like since the e.g. tweets I work with for the social media side of my PhD are considered sensitive information, and so I don't want to commit them to a git repository which will no doubt end up open-source.

You can also import functions and classes from other files. Personally, I see Jupyter notebooks to be most useful when used in conjunction with an existing codebase: while you can put absolutely everything in your Jupyter notebook, I wouldn't recommend it as you'll end up with spaghetti code that's hard to understand or maintain - just like you would in a regular codebase in any other language.

Likewise, I wouldn't recommend implementing an AI model in a Jupyter notebook directly. While you can, it makes it complicated to train it on a headless server - which you'll likely want to do if you want to train a model at any scale.

The other minor annoyance is that by using Jupyter you end up forfeiting thee code intelligence of e.g. Atom or Visual Studio Code, which is a shame since a good editor can e.g. check syntax on the fly, inform you of unused variables, provide autocomplete, etc.

These issues aside, Jupyter is a great fit for plotting visualisations due to the very short improve → rerun → inspect/evaluate output loop. It's also a good fit for writing tutorials I suspect, as it apparently has support for markdown cells too. At some point, I may try writing a tutorial in Jupyter notebook, rendering it to regular markdown, and posting it here.

Tensorflow and PyTorch compared

Hey there! Since I've used both Tensorflow and PyTorch a bit now, I thought it was time to write a post comparing the two and their respective strengths and weaknesses.

For reference, I've used Tensorflow both for Javascript (less popular) and for Python (more popular) for a number of different models, relating to both my rainfall radar and social media halves to my PhD. While I definitely have less experience with PyTorch, I feel like I have a good enough grasp on it to get a first impression.

Firstly, let's talk about how PyTorch is different from Tensorflow, and what Tensorflow could learn from the former. The key thing I noticed about PyTorch is that it's easily the more flexible of the two. I'm pretty sure that you can create layers and even whole models that do not explicitly define the input and output shapes of the tensors they operate on - e.g. using CNN layers. This gives them a huge amount of power for handling variable sized images or sentences without additional padding, and would be rather useful in Tensorflow - where you must have a specific input shape for every layer.

Unfortunately, this comes at the cost of complexity. Whereas Tensorflow has a .fit() method, in PyTorch you have to implement it yourself - which, as you can imagine - result in a lot of additional code you have to write and test. This was quite the surprise to me when I first used PyTorch!

The other thing I like about PyTorch is the data processing pipeline and it's simplicity. It's easy to understand and essentially guides you to the most optimal solution all on it's own - leading to greater GPU usage, faster model training times, less waiting around, and tighter improve → run → evaluate & inspect → repeat loops.

While in most cases you need to know the number of items in your dataset in advance, this is not necessarily a bad thing - as it gently guides you to the realisation that by changing the way your dataset is stored, you can significantly improve CPU and disk utilisation by making your dataset more amenable to be processed in parallel.

Tensorflow on the other hand has a rather complicated data processing pipeline with multiple ways to do things and no clear guidance I could easily find on building a generic data processing pipeline that didn't make enormous assumptions like "Oh, you want to load images right? Just use this function!" - which really isn't helpful when you want to do something unusual for a research project.

Those tutorials I do find suggest you use a generator function, which can't be parallelised and makes training a model a slow and painful process. Things aren't completely without hope though - Tensorflow has a .map() method on their Dataset objects and also have a .interleave() method (if I recall correctly) to interleave multiple Dataset objects together - which I believe is a relatively recent addition. This is quite a clever way of doing things, if a bit more complicated than PyTorch's solution.

It would be nice though if the feature for automatically managing the number of parallel workers to use when parallelising things was more intelligent. I recently discovered that it doesn't max out my CPU if I have multiple .map() calls I parallelise for example, when it really should look at the current CPU usage and notice that the CPU is sitting e.g. 50% idle.

Tensorflow for Python has a horrible API more generally. It's a confusing mess as there's both Tensorflow and the inbuilt Keras, which means that it's not obvious where that function you need is - or, indeed, which version thereof you want to call. I know it's a holdover from when Keras wasn't bundled with Tensorflow by default, but the API really should be imagined and tf.keras merged into the main tf namespace somehow.

It can also be unclear when you mix Tensorflow Tensors, numpy arrays and numbers, and plain Python numbers. In some cases, it's impossible to tell where one begins and the other ends, which can be annoying since they all behave differently, so you can in some cases get random error messages when you accidentally mix the types (e.g. "I want a Tensor, not a numpy array", or "I want a plain Python number, not a numpy number").

A great example of what's possible is demonstrated by Tensorflow's own Javascript bindings - to a point. They are much better organised than the Python library for Tensorflow, although they require explicit memory management and disposal of Tensors (which isn't necessarily a bad thing, though it's difficult to compare performance improvements without comparing apples and oranges).

The difficulties start though if you want to do anything in even remotely uncharted territory - Tensorflow.js doesn't have a very wide selection of layers like the Python bindings do (e.g. multi-headed attention). It also seems to have some a number of bugs, meaning you can't just port code from the Python bindings and expect it to work. For example, I tried implementing an autoencoder, but found that that it didn't work as I wanted it to - and for the life of me I couldn't find the bug at all (despite extensive searching).

Another annoyance with Tensorflow.js is that the documentation for exactly which CUDA version you need is very poor - and sometimes outright wrong! In addition, there's no table of versions and associated CUDA + CuDNN versions required like there is for Tensorflow for Python.

It is for these reasons that I find myself using Python much more regularly - even if I dislike Python as a language and ecosystem.

At some point, I'd love to build a generic Tensor library on top of GPU.js. It would naturally support almost any GPU (since GPU.js isn't limited to CUDA-capable devices like Tensorflow is - while you can recompile it with support for other GPUs, I don't recommend it unless you have lots of time on your hands), be applicable to everything from machine to simulation to cellular automata, and run in server, desktop, and browser environments with minimal to no changes to your codebase!


There's no clear answer to whether you should use PyTorch or Tensorflow for your next project. As a rule of thumb, I suggest starting in Tensorflow due to the reduced boilerplate code, and use PyTorch if you find yourself with a wacky model that Tensorflow doesn't like very much - or you want to use a pretrained model that's only available in one or the other.

Having said this, I can certainly recommend experiencing both libraries, as there are valuable things to be learnt from both frameworks. Unfortunately, I can't recommend Tensorflow.js for anything more than basic tensor manipulations (which it is very good at, despite supporting only a limited range of GPUs without recompilation in Node.js) - even though it's API is nice and neat (and the Python bindings should take significant inspiration from it).

In the near future - one way or another - I will be posting about contrastive learning here soon. It's very cool indeed - I just need to wrap my head around and implement the loss function....

If you have experience with handling matrices, please get in touch as I'd really appreciate some assistance :P

Tensorflow / Tensorflow.js in Review

For my PhD, I've been using both Tensorflow.js (Tensorflow for Javascript) and more recently Tensorflow for Python (including the bundled Keras) extensively for implementing multiple different models. Given the experiences I've had so far, I thought it was high time I put my thoughts to paper so to speak and write a blog post reviewing the 2 frameworks.

Tensorflow logo

Tensorflow for Python

Let's start with Tensorflow for Python. I haven't been using it as long as Tensorflow.js, but as far as I can tell they've done a great job of ensuring it comes with batteries included. It has layers that come in an enormous number of different flavours for doing everything you can possibly imagine - including building Transformers (though I ended up implementing the time signal encoding in my own custom layer).

Building custom layers is not particularly difficult either - though you do have to hunt around a bit for the correct documentation, and I haven't yet worked out the all the bugs with loading model checkpoints that use custom layers back in again.

Handling data as a generic "tensor" that contains an n-dimension slab of data is - once you get used to it - a great way of working. It's not something I would recommend to the beginner however - rather I would recommend checking out Brain.js. It's easier to setup, and also more transparent / easier to understand what's going on.

Data preprocessing however is where things start to get complicated. Despite a good set of API reference docs to refer to, it's not clear how one is supposed to implement a performant data preprocessing pipeline. There are multiple methods for doing this (, tf.utils.Sequence, and others), and I have as of yet been unable to find a definitive guide on the subject.

Other small inconsistencies are also present, such as both the Keras website and the Tensorflow API docs both documenting the Keras API, which in and of itself appears to be an abstraction of the Tensorflow API.... it gets confusing. Some love for the docs more generally is also needed, as I found some the wording in places ambiguous as to what it meant - so I ended up guessing and having to work it out by experimentation.

By far the biggest issue I encountered though (aside from the data preprocessing pipeline, which is really confusing and frustrating) is that a highly specific version of CUDA is required for each version of Tensorflow. Thankfully, there's a table of CUDA / CuDNN versions to help you out, but it's still pretty annoying that you have to have a specific version. Blender manages to be CUDA enabled while supporting enough different versions of CUDA that I haven't had an issue on stock Ubuntu with the propriety Nvidia drivers and the system CUDA version, so whhy can't Tensorflow do it too?


This brings me on to Tensorflow.js, the Javascript bindings for libtensorflow (the underlying C++ library). This also has the specific version of CUDA issue, but in addition the version requirement documented in the README is often wrong, leaving you to make random wild guesses as to which version is required!

Despite this flaw, Tensorflow.js fixes a number of inconsistencies in Tensorflow for Python - I suspect that it was written after Tensorflow for Python was first implemented. The developers have effectively learnt valuable lessons from the Python version of Tensorflow, which has resulted in a coherent and cohesive API that makes a much more sense than the API in Python. A great example of this is tf.Dataset: The data preprocessing pipeline in Tensorflow.js is well designed and easy to use. The Python version could learn a lot from this.

While Tensorflow.js doesn't have quite the same set of features (a number of prebuilt layers that exist in Python don't yet existing in Tensorflow.js), it still provides a reasonable set of features that satisfy most use-cases. I have noticed a few annoying inconsistencies in the loss functions though and how they behave - for example I implemented an autoencoder in Tensorflow.js, but it only returned black and white pixels - whereas Tensorflow for Python returned greyscale as intended.

Aside from improving CUDA support and adding more prebuilt layers, the other thing that Tensorflow.js struggles with is documentation. It has comparatively few guides with respect to Tensorflow for Python, and it also has an ambiguity problem in the API docs. This could be mostly resolved by being slightly more generous with explanations as to what things are and do. Adding some examples to the docs in question would also help - as would also fixing the bug where it does not highlight the item you're currently viewing in the navigation page (DevDocs support would be awesome too).


Tensorflow for Python and Tensorflow.js are feature-filled and performant frameworks for machine learning and processing large datasets with GPU acceleration. Many tutorials are provided to help newcomers to the frameworks, but once you've followed a tutorial or 2 you're left very much to be on your own. A number of caveats and difficulties such as CUDA versions and confusing APIs / docs make mastering the framework difficult.

A much easier way to install custom versions of Python

Recently, I wrote a rather extensive blog post about compiling Python from source: Installing Python, Keras, and Tensorflow from source.

Since then, I've learnt of multiple other different ways to do that which are much easier as it turns out to achieve that goal.

For context, the purpose of running a specific version of Python in the first place was because on my University's High-Performance Computer (HPC) Viper, it doesn't have a version of Python new enough to run the latest version of Tensorflow.

Using miniconda

After contacting the Viper team at the suggestion of my supervisor, I discovered that they already had a mechanism in place for specifying which version of Python to use. It seems obvious in hindsight - since they are sure to have been asked about this before, they already had a solution in the form of miniconda.

If you're lucky enough to have access to Viper, then you can load miniconda like so:

module load python/anaconda/4.6/miniconda/3.7

If you don't have access to Viper, then worry not. I've got other methods in store which might be better suited to your environment in later sections.

Once loaded, you can specify a version of Python like so:

conda create -n py python=3.8

The -n py specifies the name of the environment you'd like to create, and can be anything you like. Perhaps you could use the name of the project you're working on would be a good idea. The python=3.8 is the version of Python you want to use. You can list the versions of Python available like so:

conda search -f python

Then, to activate the new environment, do this:

conda init bash
conda activate py
exec bash

Replace py with the name of the environment you created above.

Now, you should have the specific version of Python you wanted installed and ready to use.

Edit 2022-03-30: Added conda install pip step, as some systems don't natively have pip by default which causes issues.

The last thing we need to do here is to install pip inside the virtual conda environment. Do that like so:

conda install pip

You can also install packages with pip, and it should all come out in the wash.

For Viper users, further information about miniconda can be found here: Applications/Miniconda Last

Gentoo Project Prefix

Another option I've been made aware of is Gentoo's Project Prefix. Essentially, it installs Gentoo (a distribution of Linux) inside a directory without root privileges. It doesn't work very well on Ubuntu, however due to this bug, but it should work on other systems.

They provide a bootstrap script that you can run that helps you bootstrap the system. It asks you a few questions, and then gets to work compiling everything required (since Gentoo is a distribution that compiles everything from source).

If you have multiple versions of gcc available, try telling it about a slightly older version of GCC if it fails to install.

If you can get it to install, a Gentoo Prefix install allows the installation whatever software you like!


The last solution to the problem I'm aware of is pyenv. It automates the process of downloading and compiling specified versions of Python, and also updates you shell automatically. It does require some additional dependencies to be installed though, which could be somewhat awkward if you don't have sudo access to your system. I haven't actually tried it myself, but it may be worth looking into if the other 2 options don't work for you.


There's always more than 1 way to do something, and it's always worth asking if there's a better way if the way you're currently using seems hugely complicated.

Installing Python, Keras, and Tensorflow from source

I found myself in the interesting position recently of needing to compile Python from source. The reasoning behind this is complicated, but it boils down to a need to use Python with Tensorflow / Keras for some natural language processing AI, as Tensorflow.js isn't going to cut it for the next stage of my PhD.

The target upon which I'm aiming to be running things currently is Viper, my University's high-performance computer (HPC). Unfortunately, the version of Python on said HPC is rather old, which necessitated obtaining a later version. Since I obviously don't have sudo permissions on Viper, I couldn't use the default system package manager. Incredibly, pre-compiled Python binaries are not distributed for Linux either, which meant that I ended up compiling from source.

I am going to be assuming that you have a directory at $HOME/software in which we will be working. In there, there should be a number of subdirectories:

  • bin: For binaries, already added to your PATH
  • lib: For library files - we'll be configuring this correctly in this guide
  • repos: For git repositories we clone

Make sure you have your snacks - this was a long ride to figure out and write - and it's an equally long ride to follow. I recommend reading this all the way through before actually executing anything to get an overall idea as to the process you'll be following and the assumptions I've made to keep this post a reasonable length.

Setting up

Before we begin, we need some dependencies:

  • gcc - The compiler
  • git - For checking out the cpython git repository
  • readline - An optional dependency of cpython (presumably for the REPL)

On Viper, we can load these like so:

module load utilities/multi
module load gcc/10.2.0
module load readline/7.0

Compiling openssl

We also need to clone the openssl git repo and build it from source:

cd ~/software/repos
git clone git://;    # Clone the git repo
cd openssl;                                     # cd into it
git checkout OpenSSL_1_1_1-stable;              # Checkout the latest stable branch (do git branch -a to list all branches; Python will complain at you during build if you choose the wrong one and tell you what versions it supports)
./config;                                       # Configure openssl ready for compilation
make -j "$(nproc)"                              # Build openssl

With openssl compiled, we need to copy the resulting binaries to our ~/software/lib directory:

cp lib*.so* ~/software/lib;
# We're done, cd back to the parent directory
cd ..;

To finish up openssl, we need to update some environment variables to let the C++ compiler and linker know about it, but we'll talk about those after dealing with another dependency that Python requires.

Compiling libffi

libffi is another dependency of Python that's needed if you want to use Tensorflow. To start, go to the libgffi GitHub releases page in your web browser, and copy the URL for the latest release file. It should look something like this:

Then, download it to the target system:

cd ~/software/lib

Note that we do it this way, because otherwise we'd have to run the script which requires yet more dependencies that you're unlikely to have installed.

Then extract it and delete the tar.gz file:

tar -xzf libffi-3.3.tar.gz
rm libffi-3.3.tar.gz

Now, we can configure and compile it:

./configure --prefix=$HOME/software
make -j "$(nproc)"

Before we install it, we need to create a quick alias:

cd ~/software;
ln -s lib lib64;
cd -;

libffi for some reason likes to install to the lib64 directory, rather than our pre-existing lib directory, so creating an alias makes it so that it installs to the right place.

Updating the environment

Now that we've dealt with the dependencies, we now need to update our environment so that the compiler knows where to find them. Do that like so:

export LDFLAGS="-L$HOME/software/lib -L$HOME/software/include $LDFLAGS";
export CPPFLAGS="-I$HOME/software/include -I$HOME/software/repos/openssl/include -I$HOME/software/repos/openssl/include/openssl $CPPFLAGS"

It is also advisable to update your ~/.bashrc with these settings, as you may need to come back and recompile a different version of Python in the future.

Personally, I have a file at ~/software/ which I run with source $HOME/software/ in my ~/.bashrc file to keep things neat and tidy.

Compiling Python

Now that we have openssl and libffi compiled, we can turn our attention to Python. First, clone the cpython git repo:

git clone
cd cpython;

Then, checkout the latest tag. This essentially checks out the latest stable release:

git checkout "$(git tag | grep -ivP '[ab]|rc' | tail -n1)"

Important: If you're intention is to use tensorflow, check the Tensorflow Install page for supported Python versions. It's probable that it doesn't yet support the latest version of Python, so you might need to checkout a different tag here. For some reason, Python is really bad at propagating new versions out to the community quickly.

Before we can start the compilation process, we need to configure it. We're going for performance, so execute the configure script like so:

./configure --with-lto --enable-optimizations --with-openssl=/absolute/path/to/openssl_repo_dir

Replace /absolute/path/to/openssl_repo with the absolute path to the above openssl repo.

Now, we're ready to compile Python. Do that like so:

make -j "$(nproc)"

This will take a while, but once it's done it should have built Python successfully. For a sanity check, we can also test it like so:

make -j "$(nproc)" test

The Python binary compiled should be called simply python, and be located in the root of the git repository. Now that we've compiled it, we need to make a few tweaks to ensure that our shell uses our newly compiled version by default and not the older version from the host system. Personally, I keep my ~/bin folder under version control, so I install host-specific to ~/software, and put ~/software/bin in my PATH like so:

export PATH=$HOME/software/bin

With this in mind, we need to create some symbolic links in ~/software/bin that point to our new Python installation:

cd $HOME/software/bin;
ln -s relative/path/to/python_binary python
ln -s relative/path/to/python_binary python3
ln -s relative/path/to/python_binary python3.9

Replace relative/path/to/python_binary with the relative path tot he Python binary we compiled above.

To finish up the Python installation, we need to get pip up and running, the Python package manager. We can do this using the inbuilt ensurepip module, which can bootstrap a pip installation for us:

python -m ensurepip --user

This bootstraps pip into our local user directory. This is probably what you want, since if you try and install directly the shebang incorrectly points to the system's version of Python, which doesn't exist.

Then, update your ~/.bash_aliases and add the following:

export LD_LIBRARY_PATH=/absolute/path/to/openssl_repo_dir/lib:$LD_LIBRARY_PATH;
alias pip='python -m pip'
alias pip3='python -m pip'

...replacing /absolute/path/to/openssl_repo_dir with the path to the openssl git repo we cloned earlier.

The next stage is to use virtualenv to locally install our Python packages that we want to use for our project. This is good practice, because it keeps our dependencies locally installed to a single project, so they don't clash with different versions in other projects.

Before we can use virtualenv though, we have to install it:

pip install virtualenv

Unfortunately, Python / pip is not very clever at detecting the actual Python installation location, so in order to actually use virtualenv, we have to use a wrapper script - because the [shebang]() in the main ~/.local/bin/virtualenv entrypoint does not use /usr/bin/env to auto-detect the python binary location. Save the following to ~/software/bin (or any other location that's in your PATH ahead of ~/.local/bin):

#!/usr/bin/env bash

exec python ~/.local/bin/virtualenv "$@"

For example:

# Write the script to disk
nano ~/software/bin/virtualenv;
# chmod it to make it executable
chmod +x ~/software/bin/virtualenv

Installing Keras and tensorflow-gpu

With all that out of the way, we can finally use virtualenv to install Keras and tensorflow-gpu. Let's create a new directory and create a virtual environment to install our packages in:

mkdir tensorflow-test
cd tensorflow-test;
virtualenv "$PWD";
source bin/activate;

Now, we can install Tensorflow & Keras:

pip install tensorflow-gpu

It's worth noting here that Keras is a dependency of Tensorflow.

Tensorflow has a number of alternate package names you might want to install instead depending on your situation:

  • tensorflow: Stable tensorflow without GPU support - i.e. it runs on the CPU instead.
  • tf-nightly-gpu: Nightly tensorflow for the GPU. Useful if your version of Python is newer than the version of Python supported by Tensorflow

Once you're done in the virtual environment, exit it like this:


Phew, that was a huge amount of work! Hopefully this sheds some light on the maddenly complicated process of compiling Python from source. If you run into issues, you're welcome to comment below and I'll try to help you out - but you might be better off asking the Python community instead, as they've likely got more experience with Python than I have.

Sources and further reading

PyTorch and the GPU: A tale of graphics cards

Recently, I've been learning PyTorch - which is an artificial intelligence / deep learning framework in Python. While I'm not personally a huge fan of Python, it seems to be the only library of it's kind out there at the moment (and Tensorflow.js has terrible documentation) - so it would seem that I'm stuck with it.

Anyway, as I've been trying to learn it I inevitably came to the bit where I need to learn how to take advantage of a GPU to accelerate the neural network training process. I've been implementing a few test networks to see how it performs (my latest one is a simple LSTM, loosely following this tutorial).

In PyTorch, this isn't actually done for you automatically. The basic building blocks of PyTorch are tensors (potentially multi-dimensional arrays that hold data). Each tensor is bound to a specific compute device - by default the CPU (in which the data is stored in regular RAM). TO do the calculations on a graphics card, you need to bind the data to the GPU in order to load the data into the GPU's own memory - so that the GPU can access it and do the calculation. The same goes for any models you create - they have to be explicitly loaded onto the GPU in order to run the calculations in the right place. Thankfully, this is fairly trivial:

tensor = torch.rand(3, 4)
tensor =

....where COMPUTE_DEVICE is the PyTorch device object you want to load the tensor onto. I found that this works to determine the device that the data should be loaded onto quite well:

COMPUTE_DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Unfortunately, PyTorch (and all other AI frameworks out there) only support a technology called CUDA for GPU acceleration. This is a propriety Nvidia technology - which means that you can only use Nvidia GPUs for accelerated deep learning. Since I don't actually own an Nvidia GPU (far too expensive, and in my current laptop I have an AMD Radeon R7 M445 - and I don't plan on spending large sums of money to replace a perfectly good laptop), I've been investigating hardware at my University that I can use for development purposes - since this is directly related to my PhD after all.

Initially, I've found a machine with an Nvidia GeForce GTX 650 in it. If you run torch.cuda.is_available(), it will tell you if CUDA is available or not:

print(torch.cuda.is_available()) # Prints True if CUDA is available

.....but, as always, there's got to be a catch. Just because CUDA is available, doesn't mean to say that PyTorch can actually use it. After a bunch of testing, it transpired that PyTorch only supports CUDA devices with a capability index greater than or equal to 3.5 - and the GTX 650 has a capability index of just 3.0. You can see where this is going. I foound this webpage was helpful - it lists all of Nvidia's GPUs and their CUDA capability indices.

You can also get PyTorch to tell you more about the CUDA device it has found:

def display_compute_device():
    """Displays information about the compute device that PyTorch is using."""

    log(f"Using device: {COMPUTE_DEVICE}", newline=False)
    if COMPUTE_DEVICE.type == 'cuda':
        print(" {0} [Memory: {1}GB allocated, {2}GB cached]".format(
            round(torch.cuda.memory_allocated(0)/1024**3, 1),
            round(torch.cuda.memory_cached(0)/1024**3, 1)


If you execute the above method, it will tell you more about the compute device it has found. Note that you can actually make use of multiple compute devices at the same time - I just haven't done any research into that yet.

Crucially, it will also generate a warning message if your CUDA device is too old. To this end, I'll be doing some more investigating as to the resources that the Department of Computer Science has available for PhD students to use....

If anyone knows of an artificial intelligence framework that can take advantage of any GPU (e.g. via OpenCL, oneAPI, or other similar technologies), do get in touch. I'm very interested to explore other options.

Binary Searching

We had our first Algorithms lecture on wednesday. We were introduced to two main things: complexity and binary searching. Why it is called binary searching, I do not know (leave a comment below if you do!). The following diagram I created explains it better than I could in words:

Binary Search Algorithm

I have implementated the 'binary search' algorithm in Javascript (should work in Node.JS too), PHP, and Python 3 (not tested in Python 2).

Javascript (editable version here):

 * @summary Binary Search Implementation.
 * @description Takes a sorted array and the target number to find as input.
 * @author Starbeamrainbowlabs
 * @param arr {array} - The *sorted* array to search.
 * @param target {number} - The number to search array for.
 * @returns {number} - The index at which the target was found.
function binarysearch(arr, target)
    console.log("searching", arr, "to find", target, ".");
    var start = 0,
        end = arr.length,
        midpoint = Math.floor((end + start) / 2);

    do {
        console.log("midpoint:", midpoint, "start:", start, "end:", end);
        if(arr[midpoint] !== target)
            console.log("at", midpoint, "we found", arr[midpoint], ", the target is", target);
            if(arr[midpoint] > target)
                console.log("number found was larger than midpoint - searching bottom half");
                end = midpoint;
                console.log("number found was smaller than midpoint - searching top half");
                start = midpoint;
            midpoint = Math.floor((end + start) / 2);
            console.log("new start/end/midpoint:", start, "/", end, "/", midpoint);
    } while(arr[midpoint] !== target);
    console.log("found", target, "at position", midpoint);
    return midpoint;

The javascript can be tested with code like this:

//utility function to make generating random number easier
function rand(min, max)
    if(min > max)
        throw new Error("min was greater than max");
    return Math.floor(Math.random()*(max-min))+min;

var tosearch = [];
for(var i = 0; i < 10; i++)
    tosearch.push(rand(0, 25));
tosearch.sort(function(a, b) { return a - b;});
var tofind = tosearch[rand(0, tosearch.length - 1)];
console.log("result:", binarysearch(tosearch, tofind));


//utility function
function logstr($str) { echo("$str\n"); }

 * @summary Binary Search Implementation.
 * @description Takes a sorted array and the target number to find as input.
 * @author Starbeamrainbowlabs
 * @param arr {array} - The *sorted* array to search.
 * @param target {number} - The number to search array for.
 * @returns {number} - The index at which the target was found.
function binarysearch($arr, $target)
    logstr("searching [" . implode(", ", $arr) . "] to find " . $target . ".");
    $start = 0;
    $end = count($arr);
    $midpoint = floor(($end + $start) / 2);

    do {
        logstr("midpoint: " . $midpoint . " start: " . $start . " end: " . $end);
        if($arr[$midpoint] != $target)
            logstr("at " . $midpoint . " we found " . $arr[$midpoint] . ", the target is " . $target);
            if($arr[$midpoint] > $target)
                logstr("number found was larger than midpoint - searching bottom half");
                $end = $midpoint;
                logstr("number found was smaller than midpoint - searching top half");
                $start = $midpoint;
            $midpoint = floor(($end + $start) / 2);
            logstr("new start/end/midpoint: " . $start . "/" . $end . "/" . $midpoint);
    } while($arr[$midpoint] != $target);
    logstr("found " . $target . " at position " . $midpoint);
    return $midpoint;

The PHP version can be tested with this code:

$tosearch = [];
for($i = 0; $i < 10; $i++)
    $tosearch[] = rand(0, 25);

$tofind = $tosearch[array_rand($tosearch)];
logstr("result: " . binarysearch($tosearch, $tofind));

And finally the Python 3 version:

#!/usr/bin/env python
import math;
import random;

" @summary Binary Search Implementation.
" @description Takes a sorted list and the target number to find as input.
" @author Starbeamrainbowlabs
" @param tosearch {list} - The *sorted* list to search.
" @param target {number} - The number to search list for.
" @returns {number} - The index at which the target was found.
def binarysearch(tosearch, target):
    print("searching [" + ", ".join(map(str, tosearch)) + "] to find " + str(target) + ".");
    start = 0;
    end = len(tosearch);
    midpoint = int(math.floor((end + start) / 2));

    while True:
        print("midpoint: " + str(midpoint) + " start: " + str(start) + " end: " + str(end));
        if tosearch[midpoint] != target:
            print("at " + str(midpoint) + " we found " + str(tosearch[midpoint]) + ", the target is " + str(target));
            if tosearch[midpoint] > target:
                print("number found was larger than midpoint - searching bottom half");
                end = midpoint;
                print("number found was smaller than midpoint - searching top half");
                start = midpoint;

            midpoint = int(math.floor((end + start) / 2));
            print("new start/end/midpoint: " + str(start) + "/" + str(end) + "/" + str(midpoint));


    print("found " + str(target) + " at position " + str(midpoint));
    return midpoint;

The python code can be tested with something like this:

tosearch = [];
for i in range(50):
    tosearch.append(random.randrange(0, 75));

tofind = random.choice(tosearch);

print("result: " + str(binarysearch(tosearch, tofind)));

That's a lot of code for one blog post.....

Art by Mythdael