Archive

## Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

## The plan to caption and index images

Something that has been on my mind for a while are the photos that I take. At last count on my NAS I have 8564 pictures I have taken so far since I first got a phone to take them with, and many more belonging to other family members.

I have blogged before about a script I've written that automatically processes photos graphs and files them in by year and month. It fixes the date taken, set the thumbnail for rapid preview loading, automatically rotates them to be the right way up, losslessly optimises them, and more.

The one thing it can't do though is to help me locate a specific photo I'm after, so given my work with AI recently I have come up with a plan to do something about this, and I want to blog about it here.

By captioning the images with an AI, I plan to index the captions (and other image metadata) and have a web interface in the form of a search engine. In this blog post, I'm going to outline the AI I intend to use, and the architecture of the image search engine I have already made a start on implementing.

## AI for image captioning

The core AI to do image captioning will be somewhat based on work I've done for my PhD. The first order of business was finding a dataset to train on, and I stumbled across Microsoft's Common Objects in Context dataset. The next and more interesting part was to devise a model architecture that translate an image into text.

When translating 1 thing (or state space) into another in AI, it is generally done with an encoder-decoder architecture. In my case here, that's an encoder for the image - to translate it into an embedded feature space - and a decoder to turn that embedded feature space into text.

There are many options for these - especially for encoding images - which I'll look at first. While doing my PhD, I've come across many different encoders for images, which I'd roughly categorise into 2 main categories:

Since the transformer model was invented, they have been widely considered to be the best option. Swin Transformers adapt this groundbreaking design for images - transformers originally handled text - from what I can tell better than the earlier Vision Transformer architecture.

On the other side, a number of encoders were invented before transformers were a thing - the most famous of which was ResNet (I think I have the right paper), which was basically just a bunch of CNN layers stacked on top of one another with a few extra bits like normalisation and skip connections.

Recently though, a new CNN-based architecture that draws inspiration from the strong points of transformers - and it's called ConvNeXt. Based on the numbers in the paper, it even beats the swin transformer model mentioned earlier. Best of all, it's much simpler in design so it makes it relatively easy to implement. It is this model architecture I will be using.

For the text, things are both straight forward - the model architecture I'll be using is a transformer (of course - I even implemented it myself from scratch!) - but the trouble is representation. Particularly the representation of the image caption we want the model to predict.

There are many approaches to this problem, but the one I'm going to try first is a word-based solution using one-hot encoding. There are about 27K different unique words in the dataset, so I've assigned each one a unique number in a dictionary file. Then, I can turn this:

[ "a", "cat", "sat", "on", "a", "mat" ]

....into this:

[ 0, 1, 2, 3, 0, 4 ]

...then, the model would predict something like this:

[
[ 1, 0, 0, 0, 0, 0, ... ],
[ 0, 1, 0, 0, 0, 0, ... ],
[ 0, 0, 1, 0, 0, 0, ... ],
[ 0, 0, 0, 1, 0, 0, ... ]
[ 1, 0, 0, 0, 0, 0, ... ]
[ 0, 0, 0, 0, 1, 0, ... ]
]

...where each sub-array is a word.

This will as you might suspect use a lot of memory - especially with 27K words in the dictionary. By my calculations, with a batch size of 64 and a maximum caption length of 25, each output prediction tensor will use a whopping 172.8 MiB memory as float32, or 86.4 MiB memory as float16 (more on memory usage later).

I'm considering a variety of techniques to combat this if it becomes an issue. For example, reducing the dictionary size by discarding infrequently used words.

Another option would be to have the model predict GloVe vectors as an output and then compare the output to the GloVe dictionary to tell which one to pick. This would come with it's own set of problems however, like lots of calculations to compare each word to every word in the dictionary.

My final thought was that I could maybe predict individual characters instead of full words. There would be more items in the sequence predicted, but each character would only have up to 255 choices (probably more like 36-ish), potentially saving memory.

I have already implemented this AI - I just need to debug and train it now. To summarise, here's a diagram:

The last problem with the AI though is memory usage. I plan on eventually running the AI on a raspberry pi, so much tuning will be required to reduce memory usage and latency as much I can. In particular, I'll be trying out quantisating my model and writing the persistent daemon to use Tensorflow Lite to reduce memory usage. Models train using the float32 data type - which uses 32 bits per value, but quantising it after training to use float16 (16 bits / value) or even uint8 (8 bits / value) would significantly reduce memory usage.

## Search engine and indexing

The second part of this is the search engine. The idea here is to index all my photos ahead of time, and then have a web interface I can use to search and filter them. The architecture I plan on using to achieve this is rather complicated, and best explained with a diagram:

The backend I have chosen for the index is called meilisearch. It's written in Rust, and provides advanced as-you-type search functionality. This is for 2 reasons:

1. While I'd love to implement my own, meilisearch is an open source project where they have put in more hours into making it cool than I ever would be able to
2. Being a separate daemon means I can schedule it on my cluster as a separate task, which potentially might end up on a different machine

With this in mind, the search engine has 2 key parts to it: the crawler / indexer, and the HTTP server that serves the web interface. The web interface will talk to meilisearch to perform searches (not directly; requests will be proxied and transformed).

The crawler will periodically scan the disk for new, updated, and deleted files, and pass them on to the indexer queue. The indexer will do 4 things:

1. Caption the image, by talking to a persistent Python child process via Inter Process Communication (IPC) - captions will be written as EXIF data to images
2. Thumbnail images and store them in a cache (perhaps some kinda key-value store, as lots of little files on disk would be a disaster for disk space)
3. Extract EXIF (and other) metadata
4. Finally, push the metadata to meilisearch for indexing

Tasks 2 and 3 can be done in parallel, but the others will need to be done serially - though multiple images can of course be processed concurrently. I anticipate much asynchronous code here, which I'm rather looking forward to finishing writing :D

I already have a good start on the foundation of the search engine here. Once I've implemented enough that it's functional, I'll open source everything.

To finish this post, I have a mockup screenshot of what the main search page might look like:

Obviously the images are all placeholders (append ?help to this URL see the help page) for now and I don't yet have a name for it (suggestions in the comments are most welcome!), but the rough idea is there.

## PhD Aside 2: Jupyter Lab / Notebook First Impressions

Hello there! I'm back with another PhD Aside blog post. In the last one, I devised an extremely complicated and ultimately pointless mechanism by which multiple Node.js processes can read from the same file handle at the same time. This post hopefully won't be quite as useless, as it's a cross with the other reviews / first impressions posts I've made previously.

I've had Jupyter on my radar for ages, but it's only very recently that I've actually given it a try. Despite being almost impossible to spell (though it does appear to be getting easier with time), both it's easy to install and extremely useful when plotting visualisations, so I wanted to talk about it here.

I tried Jupyter Lab, which is apparently more complicated than Jupyter Notebook. Personally though I'm not sure I see much of a difference, aside from a file manager sidebar in Jupyter Lab that is rather useful.

(Above: A Jupyter Lab session of mine, in which I was visualising embeddings from a pretrained CLIP model.)

Jupyter Lab is installed via pip (pip3 for apt-based systems): https://jupyter.org/install. Once installed, you can start a server with jupyter-lab in a terminal (or command line), and then it will automatically open a new tab in your browser that points to the server instance (http://localhost:8888/ by default).

Then, you can open 1 or more Jupyter Notebooks, which seem to be regular files (e.g. Javascript, Python, and more) but are split into 'cells', which can be run independently of one another. While these cells are usually run in order, there's nothing to say that you can't run them out of order, or indeed the same cell over and over again as you prototype a graph.

The output of each cell is displayed directly below it. Be that a console.log()/print() call or a graph visualisation (see the screenshot above), it seems to work just fine. It also saves the output of a cell to disk alongside the code in the Jupyter Notebook, can be a double-edged sword: On the one hand, it's very useful to have the plot and other output be displayed to remind you what you were working on, but on the other hand if the output somehow contains sensitive data, then you need to remember to clear it before saving & committing to git each time, which is a hassle. Similarly, every time the output changes the notebook file on disk also changes, which can result in unnecessary extra changes committed to git if you're not careful.

In the same vein, I have yet to find a way to define a variable in a notebook file whose value is not saved along with the notebook file, which I'd rather like since the e.g. tweets I work with for the social media side of my PhD are considered sensitive information, and so I don't want to commit them to a git repository which will no doubt end up open-source.

You can also import functions and classes from other files. Personally, I see Jupyter notebooks to be most useful when used in conjunction with an existing codebase: while you can put absolutely everything in your Jupyter notebook, I wouldn't recommend it as you'll end up with spaghetti code that's hard to understand or maintain - just like you would in a regular codebase in any other language.

Likewise, I wouldn't recommend implementing an AI model in a Jupyter notebook directly. While you can, it makes it complicated to train it on a headless server - which you'll likely want to do if you want to train a model at any scale.

The other minor annoyance is that by using Jupyter you end up forfeiting thee code intelligence of e.g. Atom or Visual Studio Code, which is a shame since a good editor can e.g. check syntax on the fly, inform you of unused variables, provide autocomplete, etc.

These issues aside, Jupyter is a great fit for plotting visualisations due to the very short improve → rerun → inspect/evaluate output loop. It's also a good fit for writing tutorials I suspect, as it apparently has support for markdown cells too. At some point, I may try writing a tutorial in Jupyter notebook, rendering it to regular markdown, and posting it here.

## Tensorflow and PyTorch compared

Hey there! Since I've used both Tensorflow and PyTorch a bit now, I thought it was time to write a post comparing the two and their respective strengths and weaknesses.

For reference, I've used Tensorflow both for Javascript (less popular) and for Python (more popular) for a number of different models, relating to both my rainfall radar and social media halves to my PhD. While I definitely have less experience with PyTorch, I feel like I have a good enough grasp on it to get a first impression.

Firstly, let's talk about how PyTorch is different from Tensorflow, and what Tensorflow could learn from the former. The key thing I noticed about PyTorch is that it's easily the more flexible of the two. I'm pretty sure that you can create layers and even whole models that do not explicitly define the input and output shapes of the tensors they operate on - e.g. using CNN layers. This gives them a huge amount of power for handling variable sized images or sentences without additional padding, and would be rather useful in Tensorflow - where you must have a specific input shape for every layer.

Unfortunately, this comes at the cost of complexity. Whereas Tensorflow has a .fit() method, in PyTorch you have to implement it yourself - which, as you can imagine - result in a lot of additional code you have to write and test. This was quite the surprise to me when I first used PyTorch!

The other thing I like about PyTorch is the data processing pipeline and it's simplicity. It's easy to understand and essentially guides you to the most optimal solution all on it's own - leading to greater GPU usage, faster model training times, less waiting around, and tighter improve → run → evaluate & inspect → repeat loops.

While in most cases you need to know the number of items in your dataset in advance, this is not necessarily a bad thing - as it gently guides you to the realisation that by changing the way your dataset is stored, you can significantly improve CPU and disk utilisation by making your dataset more amenable to be processed in parallel.

Tensorflow on the other hand has a rather complicated data processing pipeline with multiple ways to do things and no clear guidance I could easily find on building a generic data processing pipeline that didn't make enormous assumptions like "Oh, you want to load images right? Just use this function!" - which really isn't helpful when you want to do something unusual for a research project.

Those tutorials I do find suggest you use a generator function, which can't be parallelised and makes training a model a slow and painful process. Things aren't completely without hope though - Tensorflow has a .map() method on their Dataset objects and also have a .interleave() method (if I recall correctly) to interleave multiple Dataset objects together - which I believe is a relatively recent addition. This is quite a clever way of doing things, if a bit more complicated than PyTorch's solution.

It would be nice though if the tf.data.AUTOTUNE feature for automatically managing the number of parallel workers to use when parallelising things was more intelligent. I recently discovered that it doesn't max out my CPU if I have multiple .map() calls I parallelise for example, when it really should look at the current CPU usage and notice that the CPU is sitting e.g. 50% idle.

Tensorflow for Python has a horrible API more generally. It's a confusing mess as there's both Tensorflow and the inbuilt Keras, which means that it's not obvious where that function you need is - or, indeed, which version thereof you want to call. I know it's a holdover from when Keras wasn't bundled with Tensorflow by default, but the API really should be imagined and tf.keras merged into the main tf namespace somehow.

It can also be unclear when you mix Tensorflow Tensors, numpy arrays and numbers, and plain Python numbers. In some cases, it's impossible to tell where one begins and the other ends, which can be annoying since they all behave differently, so you can in some cases get random error messages when you accidentally mix the types (e.g. "I want a Tensor, not a numpy array", or "I want a plain Python number, not a numpy number").

A great example of what's possible is demonstrated by Tensorflow's own Javascript bindings - to a point. They are much better organised than the Python library for Tensorflow, although they require explicit memory management and disposal of Tensors (which isn't necessarily a bad thing, though it's difficult to compare performance improvements without comparing apples and oranges).

The difficulties start though if you want to do anything in even remotely uncharted territory - Tensorflow.js doesn't have a very wide selection of layers like the Python bindings do (e.g. multi-headed attention). It also seems to have some a number of bugs, meaning you can't just port code from the Python bindings and expect it to work. For example, I tried implementing an autoencoder, but found that that it didn't work as I wanted it to - and for the life of me I couldn't find the bug at all (despite extensive searching).

Another annoyance with Tensorflow.js is that the documentation for exactly which CUDA version you need is very poor - and sometimes outright wrong! In addition, there's no table of versions and associated CUDA + CuDNN versions required like there is for Tensorflow for Python.

It is for these reasons that I find myself using Python much more regularly - even if I dislike Python as a language and ecosystem.

At some point, I'd love to build a generic Tensor library on top of GPU.js. It would naturally support almost any GPU (since GPU.js isn't limited to CUDA-capable devices like Tensorflow is - while you can recompile it with support for other GPUs, I don't recommend it unless you have lots of time on your hands), be applicable to everything from machine to simulation to cellular automata, and run in server, desktop, and browser environments with minimal to no changes to your codebase!

### Conclusion

There's no clear answer to whether you should use PyTorch or Tensorflow for your next project. As a rule of thumb, I suggest starting in Tensorflow due to the reduced boilerplate code, and use PyTorch if you find yourself with a wacky model that Tensorflow doesn't like very much - or you want to use a pretrained model that's only available in one or the other.

Having said this, I can certainly recommend experiencing both libraries, as there are valuable things to be learnt from both frameworks. Unfortunately, I can't recommend Tensorflow.js for anything more than basic tensor manipulations (which it is very good at, despite supporting only a limited range of GPUs without recompilation in Node.js) - even though it's API is nice and neat (and the Python bindings should take significant inspiration from it).

In the near future - one way or another - I will be posting about contrastive learning here soon. It's very cool indeed - I just need to wrap my head around and implement the loss function....

If you have experience with handling matrices, please get in touch as I'd really appreciate some assistance :P

## Tips for training (large numbers of) AI models

As part of my PhD, I'm training AI models. The specifics as to what for don't particularly matter for this post (though if you're curious I recommend my PhD update blog post series). Over the last year or so, I've found myself training a lot of AI models, and dealing with a lot of data. In this post, I'm going to talk about some of the things I've found helpful and some of the things things I've found that are best avoided. Note that this is just a snapshot of my current practices now - this will probably gradually change over time.

I've been working with Tensorflow.js and Tensorflow for Python on various Linux systems. If you're on another OS or not working with AI then what I say here should still be somewhat relevant.

### Datasets

First up: a quick word on datasets. While this post is mainly about AI models, datasets are important too. Keeping them organised is vitally important. Keeping all the metadata that associated with them is also vitally important. Keeping a good directory hierarchy is the best way to achieve this.

I also recommend sticking with a standard format that's easy to parse using your preferred language - and preferably lots of other languages too. Json Lines is my personal favourite format for data - potentially compressed with Gzip if the filesize of is very large.

### AI Models

There are multiple facets to the problem of wrangling AI models:

1. Code that implements the model itself and supporting code
2. Checkpoints from the training process
3. Analysis results from analysing such models

All of these are important for different reasons - and are also affected by where it is that you're going to be training your model.

By far the most important thing I recommend doing is using Git with a remote such as GitHub and committing regularly. I can't stress enough how critical this is - it's the best way to both keep a detailed history of the code you've written and keep a backup at the same time. It also makes working on multiple computers easy. Getting into the habit of using Git for any project (doesn't matter what it is) will make your life a lot easier. At the beginning of a programming session, pull down your changes. Then, as you work, commit your changes and describe them properly. Finally, push your changes to the remote after committing to keep them backed up.

Coming in at a close second is implementing is a command line interface with the ability to change the behaviour of your model. This includes:

• Setting input datasets
• Specifying output directories
• Model hyperparameters (e.g. input size, number of layers, number of units per layer, etc)

This is invaluable for running many different variants of your model quickly to compare results. It is also very useful when training your model in headless environments, such as on High Performance Computers (HPCs) such as Viper that my University has.

For HPCs that use Slurm, a great tip here is that when you call sbatch on your job file (e.g. sbatch path/to/jobfile.job), it will preserve your environment. This lets you pass in job-specific parameters by writing a script like this:

#!/usr/bin/env bash
#SBATCH -J TwImgCCT
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --gres=gpu:1
#SBATCH -o %j.%N.%a.out
#SBATCH -e %j.%N.%a.err
#SBATCH -p gpu05,gpu
#SBATCH --time=5-00:00:00
#SBATCH --mem=25600
# 25600 = 25GiB memory required

# Viper use Trinity ClusterVision: https://clustervision.com/trinityx-cluster-management/ and https://github.com/clustervision/trinityX

echo ">>> Installing requirements";
conda run -n py38 pip install -r requirements.txt;
echo ">>> Training model";
/usr/bin/env time --verbose conda run -n py38 src/my_model.py ${PARAMS} echo ">>> exited with code$?";

....which you can call like so:

PARAMS="--size 4 --example 'something else' --input path/to/file --output outputs/20211002-resnet" sbatch path/to/jobfile.job

You may end up finding you have rather a lot of code behind your model - especially for data preprocessing depending on your dataset. To handle this, I go by 2 rules of thumb:

1. If a source file of any language is more than 300 lines long, it should be split into multiple files
2. If a collection of files do a thing together rather nicely, they belong in a separate Git repository.

To elaborate on these, having source code files become very long makes them difficult to maintain, understand, and re-use in future projects. Splitting them up makes your life much easier.

Going further, modularising your code is also an amazing paradigm to work with. I've broken many parts of my various codebases I've implemented for my PhD out as open-source projects on npm (the Node Package Manager) - most notably applause-cli, terrain50, terrain50-cli, nimrod-data-downloader, and twitter-academic-downloader.

By making them open-source, I'm not only making my research and methods more transparent and easier for others to independently verify, but I'm also allowing others to benefit from them (and potentially improve them) too! As they say, there's no need to re-invent the wheel.

Eventually, I will be making the AI models I'm implementing for my PhD open-source too - but this will take some time as I want to ensure that the models actually work before doing so (I've got 1 model I implemented fully and documented too, but in the end it has a critical bug that means the whole thing is useless.....).

Saving checkpoints from the training process of your model is also essential. I recommend doing so at the end of each epoch. As part of this, it's also useful to have a standard format for your output artefacts from the training process. Ideally, these artefacts can be used to identify precisely what dataset and hyperparameters that model and checkpoints were trained with.

At the moment, my models output something like this:

+ output_dir/
+ summary.txt       Summary of the layers of the model and their output shapes
+ metrics.tsv       TSV file containing training/validation loss/accuracy and epoch numbers
+ settings.toml     The TOML settings that the model was trained with
+ checkpoints/      Directory containing the checkpoints - 1 per epoch
+ checkpoint_e1_val_acc0.699.hdf5   Example checkpoint filename [Tensorflow for Python]
+ 0/            OR, if using Tensorflow.js instead of Tensorflow for Python, 1 directory per checkpoint
+ this_run.log      Logfile for this run [depends on where the program is being executed]

settings.toml leads me on to settings files. Personally I use TOML for mine, and I use 2 files:

• settings.default.toml - Contains all the default values of the settings, and is located alongside the code for my model
• example.toml - Custom settings that override values in the default settings file can be specified using my standard --config CLI argument.

Having a config file is handy when you have multiple dataset input files that rarely change. Generally speaking you want to ensure that you minimise the number of CLI arguments that you have to specify when running your model, as then it reduces cognitive load when you're training many variants of a model at once (I've found that wrangling dozens of different dataset files and model variants is hard enough to focus on and keep organised :P).

Analysis results are the final aspect here that it's important to keep organised - and the area in which I have the least experience. I've found it's important to keep track of which model checkpoint it was that the analysis was done with and which dataset said model was trained on. Keeping the entire chain of dataflow clear and easy to follow is difficult because the analysis one does is usually ad-hoc, and often has to be repeated many times on different model variants.

For this, so far I generate statistics and some graphs on the command line. If you're not already familiar with the terminal / command line of your machine, I can recommend checking out my earlier post Learn Your Terminal, which has a bunch of links to tutorials for this. In addition, jq is an amazing tool for manipulating JSON data. It's not installed by default on most systems, but it's available in most default repositories and well worth the install.

For some graphs, I use Gnuplot. Usually though this is only for more complex plots, as it takes a moment to write a .plt file to generate the graph I want in it.

I'm still looking for a good tool that makes it easy to generate basic graphs from the command line, so please get in touch if you've found one.

I'm also considering integrating some of the basic analysis into my model training program itself, such that it generates e.g. confusion matrices automatically as part of the training process. matplotlib seems to do the job here for plotting graphs in Python, but I have yet to find an equivalent library for Javascript. Again, if you've found one please get in touch by leaving a comment below.

### Conclusion

In this post, I've talked about some of the things I've found helpful so far while I've been training models. From using Git to output artefacts to implementing command line interfaces and wrangling datasets, implementing the core AI model itself is actually only a very small part of an AI project.

Hopefully this post has given you some insight into the process of developing an AI model / AI-powered system. While I've been doing some of these things since before I started my PhD (like Git), others have taken me a while to figure out - so I've noted them down here so that you don't have to spend ages figuring out the same things!

If you've got some good tips you'd like to share on developing AI models (or if you've found the tips here in this blog post helpful!), please do share them below.

## Running multiple local versions of CUDA on Ubuntu without sudo privileges

I've been playing around with Tensorflow.js for my PhD (see my PhD Update blog post series), and I had some ideas that I wanted to test out on my own that aren't really related to my PhD. In particular, I've found this blog post to be rather inspiring - where the author sets up a character-based recurrent neural network to generate text.

The idea of transcoding all those characters to numerical values and back seems like too much work and too complicated just for a quick personal project though, so my plan is to try and develop a byte-based network instead, in the hopes that I can not only teach it to generate text as in the blog post, but valid Unicode as well.

Obviously, I can't really use the University's resources ethically for this (as it's got nothing to do with my University work) - so since I got a new laptop recently with an Nvidia GeForce RTX 2060, I thought I'd try and use it for some machine learning instead.

The problem here is that Tensorflow.js requires only CUDA 10.0, but since I'm running Ubuntu 20.10 with all the latest patches installed, I have CUDA 11.1. A quick search of the apt repositories on my system reveals nothing that suggests I can install older versions of CUDA alongside the newer one, so I had to devise another plan.

I discovered some months ago (while working with Viper - my University's HPC - for my PhD) that you can actually extract - without sudo privileges - the contents of the CUDA .run installers. By then fiddling with your PATH and LD_LIBRARY_PATH environment variables, you can get any program you run to look for the CUDA libraries elsewhere instead of loading the default system libraries.

Since this is the second time I've done this, I thought I'd document the process for future reference.

First, you need to download the appropriate .run installer for the CUDA libraries. In my case I need CUDA 10.0, so I've downloaded mine from here:

Next, we need to create a new subdirectory and extract the .run file into it. Do that like so:

cd path/to/runfile_directory;
mkdir cuda-10.0
./cuda_10.0.130_410.48_linux.run --extract=${PWD}/cuda-10.0/ Make sure that the current working directory contains no spaces, no preferably no other special characters either. Also, adjust the file and directory names to suit your situation. Once done, this will have extract 3 subfiles - which also have the suffix .run. We're only interested in CUDA itself, so we only need to extract the the one that starts with cuda-linux. Do that like so (adjusting file/directory names as before): cd cuda-10.0; ./cuda-linux.10.0.130-24817639.run -noprompt -prefix=$PWD/cuda;
rm *.run;
mv cuda/* .;
rmdir cuda;

If you run ./cuda-linux.10.0.130-24817639.run --help, it's actually somewhat deceptive - since there's a typo in the help text! I corrected it for this above though. Once done, this should leave the current working directory containing the CUDA libraries - that is a subdirectory next to the original .run file:

+ /path/to/some_directory/
+ ./cuda_10.0.130_410.48_linux.run
+ cuda-10.0/
+ version.txt
+ bin/
+ doc/
+ extras/
+ ......

Now, it's just a case of fiddling with some environment variables and launching your program of choice. You can set the appropriate environment variables like this:

export PATH="/absolute/path/to/cuda-10.0/bin:${PATH}"; if [[ ! -z "${LD_LIBRARY_PATH}" ]]; then
export LD_LIBRARY_PATH="/absolute/path/to/cuda-10.0/lib64:${LD_LIBRARY_PATH}"; else export LD_LIBRARY_PATH="/absolute/path/to/cuda-10.0/lib64"; fi You could save this to a shell script (putting #!/usr/bin/env bash before it as the first line, and then running chmod +x path/to/script.sh), and then execute it in the context of the current shell for example like so: source path/to/activate-cuda-10.0.sh Many deep learning applications that use CUDA also use CuDNN, a deep learning library provided by Nvidia for accelerating deep learning applications. The archived versions of CuDNN can be found here: https://developer.nvidia.com/rdp/cudnn-archive When downloading (you need an Nvidia developer account, but thankfully this is free), pay attention to the version being requested in the error messages generated by your application. Also take care to download the version of CUDA you're using, and match the CuDNN version appropriately. When you download, select the "cuDNN Library for Linux" option. This will give you a tarball, which contains a single directory cuda. Extract the contents of this directory over the top of your CUDA directory from following my instructions above, and it should work as intended. I used my graphical archive manager for this purpose. ## PyTorch and the GPU: A tale of graphics cards Recently, I've been learning PyTorch - which is an artificial intelligence / deep learning framework in Python. While I'm not personally a huge fan of Python, it seems to be the only library of it's kind out there at the moment (and Tensorflow.js has terrible documentation) - so it would seem that I'm stuck with it. Anyway, as I've been trying to learn it I inevitably came to the bit where I need to learn how to take advantage of a GPU to accelerate the neural network training process. I've been implementing a few test networks to see how it performs (my latest one is a simple LSTM, loosely following this tutorial). In PyTorch, this isn't actually done for you automatically. The basic building blocks of PyTorch are tensors (potentially multi-dimensional arrays that hold data). Each tensor is bound to a specific compute device - by default the CPU (in which the data is stored in regular RAM). TO do the calculations on a graphics card, you need to bind the data to the GPU in order to load the data into the GPU's own memory - so that the GPU can access it and do the calculation. The same goes for any models you create - they have to be explicitly loaded onto the GPU in order to run the calculations in the right place. Thankfully, this is fairly trivial: tensor = torch.rand(3, 4) tensor = tensor.to(COMPUTE_DEVICE) ....where COMPUTE_DEVICE is the PyTorch device object you want to load the tensor onto. I found that this works to determine the device that the data should be loaded onto quite well: COMPUTE_DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') Unfortunately, PyTorch (and all other AI frameworks out there) only support a technology called CUDA for GPU acceleration. This is a propriety Nvidia technology - which means that you can only use Nvidia GPUs for accelerated deep learning. Since I don't actually own an Nvidia GPU (far too expensive, and in my current laptop I have an AMD Radeon R7 M445 - and I don't plan on spending large sums of money to replace a perfectly good laptop), I've been investigating hardware at my University that I can use for development purposes - since this is directly related to my PhD after all. Initially, I've found a machine with an Nvidia GeForce GTX 650 in it. If you run torch.cuda.is_available(), it will tell you if CUDA is available or not: print(torch.cuda.is_available()) # Prints True if CUDA is available .....but, as always, there's got to be a catch. Just because CUDA is available, doesn't mean to say that PyTorch can actually use it. After a bunch of testing, it transpired that PyTorch only supports CUDA devices with a capability index greater than or equal to 3.5 - and the GTX 650 has a capability index of just 3.0. You can see where this is going. I foound this webpage was helpful - it lists all of Nvidia's GPUs and their CUDA capability indices. You can also get PyTorch to tell you more about the CUDA device it has found: def display_compute_device(): """Displays information about the compute device that PyTorch is using.""" log(f"Using device: {COMPUTE_DEVICE}", newline=False) if COMPUTE_DEVICE.type == 'cuda': print(" {0} [Memory: {1}GB allocated, {2}GB cached]".format( torch.cuda.get_device_name(0), round(torch.cuda.memory_allocated(0)/1024**3, 1), round(torch.cuda.memory_cached(0)/1024**3, 1) )) print() If you execute the above method, it will tell you more about the compute device it has found. Note that you can actually make use of multiple compute devices at the same time - I just haven't done any research into that yet. Crucially, it will also generate a warning message if your CUDA device is too old. To this end, I'll be doing some more investigating as to the resources that the Department of Computer Science has available for PhD students to use.... If anyone knows of an artificial intelligence framework that can take advantage of any GPU (e.g. via OpenCL, oneAPI, or other similar technologies), do get in touch. I'm very interested to explore other options. ## Easy AI with Microsoft.Text.Recognizers I recently discovered that there's an XMPP client library (NuGet) for .NET that I overlooked a few months ago, and so I promptly investigated the building of a bot! The actual bot itself needs some polishing before I post about it here, but in writing said bot I stumbled across a perfectly brilliant library - released by Microsoft of all companies - that can be used to automatically extract common data-types from a natural-language sentence. While said library is the underpinnings of the Azure Bot Framework, it's actually free and open-source. To that end, I decided to experiment with it - and ended up writing this blog post. Data types include (but are not limited to) dates and times (and ranges thereof), numbers, percentages, monetary amounts, email addresses, phone numbers, simple binary choices, and more! While it also lands you with a terrific number of DLL dependencies in your build output folder, the result is totally worth it! How about pulling a DateTime from this: in 5 minutes or this: the first Monday of January or even this: next Monday at half past six Pretty cool, right? You can even pull multiple things out of the same sentence. For example, from the following: The host 1.2.3.4 has been down 3 times over the last month - the last of which was from 5pm and lasted 30 minutes It can extract an IP address (1.2.3.4), a number (3), and a few dates and times (last month, 5pm, 30 minutes). I've written a test program that shows it in action. Here's a demo of it working: (Can't see the asciicast above? View it on asciinema.org) The source code is, of course, available on my personal Git server: Demos/TextRecogniserDemo If you can't check out the repo, here's the basic gist. First, install the Microsoft.Recognizers.Text package(s) for the types of data that you'd like to recognise. Then, to recognise a date or time, do this: List<ModelResult> result = DateTimeRecognizer.RecognizeDateTime(nextLine, Culture.English); The awkward bit is unwinding the ModelResult to get at the actual data. The matched text is stored in the ModelResult.Resolution property, but that's a SortedDictionary<string, object>. The interesting property inside which is value, but depending on the data type you're recognising - that can be an array too! The best way I've found to decipher the data types is to print the value of ModelResult.Resolution as a string to the console: Console.WriteLine(result[0].Resolution.ToString()); The .NET runtime will helpfully convert this into something like this: System.Collections.Generic.SortedDictionary2[System.String,System.Object] Very helpful. Then we can continue to drill down: Console.WriteLine(result[0].Resolution["values"]); This produces this: System.Collections.Generic.List1[System.Collections.Generic.Dictionary2[System.String,System.String]] Quite a mouthful, right? By cross-referencing this against the JSON (thanks, Newtonsoft.JSON!), we can figure out how to drill the rest of the way. I ended up writing myself a pair of little utility methods for dates and times: public static DateTime RecogniseDateTime(string source, out string rawString) { List<ModelResult> aiResults = DateTimeRecognizer.RecognizeDateTime(source, Culture.English); if (aiResults.Count == 0) throw new Exception("Error: Couldn't recognise any dates or times in that source string."); /* Example contents of the below dictionary: [0]: {[timex, 2018-11-11T06:15]} [1]: {[type, datetime]} [2]: {[value, 2018-11-11 06:15:00]} */ rawString = aiResults[0].Text; Dictionary<string, string> aiResult = unwindResult(aiResults[0]); string type = aiResult["type"]; if (!(new string[] { "datetime", "date", "time", "datetimerange", "daterange", "timerange" }).Contains(type)) throw new Exception($"Error: An invalid type of {type} was encountered ('datetime' expected).");

string result = Regex.IsMatch(type, @"range\$") ? aiResult["start"] : aiResult["value"];
return DateTime.Parse(result);
}

private static Dictionary<string, string> unwindResult(ModelResult modelResult)
{
return (modelResult.Resolution["values"] as List<Dictionary<string, string>>)[0];
}

Of course, it depends on your use-case as to precisely how you unwind it, but the above should be a good starting point.

Once I've polished the bot I've written a bit, I might post about it on here.

Found this interesting? Run into an issue? Got a neat use for it? Comment below!

## Semantic Nets in Prolog

Yesterday a few friends were puzzling over a few Prolog exam questions, and I thought I'd write up a post about what we learnt before I forget :-)

The first part of the question asked us to convert a paragraph of knowledge into a semantic net (isa / hasa) diagram. Here's the paragraph in question:

Charles and Wilbert are rats which are brown coloured European animals. Charles has a brown collar. Animals are defined as having DNA and being about to move. They include African animals, European animals and Australian animals. Skippy is a kangaroo; kangaroos are brown coloured Australian animals. Wallabies are dark brown Australian animals, Willy being one of them. They have a diet of eucalyptus leaves. Gnu are antelopes and come from Africa, and they have stripes, as are Nyala. Stella is a Gnu and Madge a Nyala.

This first part wasn't too tough. It doesn't quite fit in some places, but here's what I came up with:

(Generated using mermaid by Knut Sveidqvist)

The blue nodes are the isa node, while the green nodes are the hasa nodes. The next part asked us to convert the above into prolog. Again, this wasn't particularly hard - it's just a bunch of isa/2's and hasa/2's:

isa(charles, rat).
isa(wilbert, rat).
isa(rat, european_animal).
isa(european_animal, animal).
isa(african_animal, animal).
isa(australian_animal, animal).
isa(skippy, kangaroo).
isa(kangaroo, australian_animal).
isa(wallaby, australian_animal).
isa(willy, wallaby).
isa(gnu, antelope).
isa(antelope, african_animal).
isa(stella, gnu).
hasa(animal, dna).
hasa(animal, able_to_move).
hasa(rat, colour(brown)).
hasa(wallaby, colour(dark_brown)).
hasa(wallaby, diet(eucaliptus_leaves)).
hasa(gnu, stripes).
hasa(nyala, stripes).

After converting the diagram into Prolog, we were then asked to write some Prolog that interacts with the above knowledge base. Here's the first challenge:

Define a predicate called appearance which behaves as follows:


appearance(wilbert,Colour).
Colour=dark_brown
true.
appearance(skippy,Colour).
Colour=brown
true.


Upon first sight, this looks rather complicated, but it's not actually as bad as it looks. Basically, it is asking for a predicate, that, given the name of a thing, returns the colour of that thing. For example, wilbert was produce the answer brown, and wallaby would return dark_brown. The trick here is to get Prolog to recurse up the isa hasa tree if it doesn't find the answer at the current node.

When thinking about recursion, a good idea is to consider the stopping condition first. In our case, we want it to stop when it finds a thing that has a colour. Here's that in Prolog:

appearance(Name, Colour) :-
hasa(Name, colour(Colour)).

Now we've got a stopping condition in place, we can think about the recursion itself. If it doesn't find a colour at the current node, we want Prolog to follow the appropriate isa fact and travel to the next level up. We can do that like so:

appearance(Name, Colour) :-
isa(Name, Thing),
appearance(Thing, Colour).

That completes the first challenge. If you put the above together this is what you'll get:

appearance(Name, Colour) :-
hasa(Name, colour(Colour)).
appearance(Name, Colour) :-
isa(Name, Thing),
appearance(Thing, Colour).

The second challenge, however, was much more challenging:

Write a predicate that takes two argument and is true if both animals live on the same continent. Thus

?- same_continent(skippy,willy).

is true, whilst

?- same_continent(stella,skippy).

is not.

The problem with this challenge is that unlike the first challenge, there isn't any way (that I could think of anyway) to determine he continent that an animal comes from. I managed to hack around this by always going up 2 levels before comparing the things to see if they are the same:

same_continent(NameA, NameB) :-
isa(NameA, AnimalTypeA),
isa(AnimalTypeA, ContA),

isa(NameB, AnimalTypeB),
isa(AnimalTypeB, ContB),

ContA = ContB.

For example, if wilfred and charles were plugged in, both ContA and ContB would be set to european_animal, and so Prolog would return true. Prolog would tell us that skippy and wilbert are not of the same continent because ContA and ContB would be set to different values (european_animal and australian_animal`).

Art by Mythdael