Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Visualising Tensorflow model summaries

It's no secret that my PhD is based in machine learning / AI (my specific title is "Using Big Data and AI to dynamically map flood risk"). Recently a problem I have been plagued with is quickly understanding the architecture of new (and old) models I've come across at a high level. I could read the paper a model comes from in detail (and I do this regularly), but it's much less complicated and much easier to understand if I can visualise it in a flowchart.

To remedy this, I've written a neat little tool that does just this. When you're training a new AI model, one of the things that it's common to print out is the summary of the model (I make it a habit to always print this out for later inspection, comparison, and debugging), like this:

model = setup_model()

This might print something like this:

Model: "sequential"
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 500, 16)           160000    

lstm (LSTM)                  (None, 32)                6272      

dense (Dense)                (None, 1)                 33        
Total params: 166,305
Trainable params: 166,305
Non-trainable params: 0

(Source: ChatGPT)

This is just a simple model, but it is common for larger ones to have hundreds of layers, like this one I'm currently playing with as part of my research:

(Can't see the above? Try a direct link.)

Woah, that's some model! It must be really complicated. How are we supposed to make sense of it?

If you look closely, you'll notice that it has a Connected to column, as it's not a linear tf.keras.Sequential() model. We can use that to plot a flowchart!

This is what the tool I've written generates, using a graphing library called nomnoml. It parses the Tensorflow summary, and then compiles it into a flowchart. Here's a sample:

It's a purely web-based tool - no data leaves your client. Try it out for yourself here:

For the curious, the source code for this tool is available here:

NAS Series List

Somehow, despite posting about my NAS back in 2021 I have yet to post a proper series list post about it! I'm rectifying that now with this quick post.

I wrote this series of 4 posts back when I first built my new NAS box.

Here's the full list of posts in the main NAS series:

Additionally, as a bonus, I also later in 2021 I wrote a pair of posts back how I was backing up my NAS to a backup NAS. Here they are:

How (not) to recover a consul cluster

Hello again! I'm still getting used to a new part-time position at University which I'm not quite ready to talk about yet, but in the mean time please bear with me as I shuffle my schedule around.

As I've explained previously on here, I have a consul cluster (superglue service discovery!) that forms the backbone of my infrastructure at home. Recently, I had a small powercut that knocked everything offline, and as the recovery process was quite interesting I thought I'd blog about it here.

The issue at had happened at about 5pm, but I only discovered it was a problem until a few horus later when I got home. Essentially, a small powercut knocked everything offline. While my NAS rebooted automatically afterwards, my collection of Raspberry Pis weren't so lucky. I can only suspect that they were caught in some transient state or something. None of them responded when I pinged them, and later inspection of the logs on my collectd instance revealed that they were essentially non-functional until after they were rebooted manually.

A side effect of this was that my Consul (and, by extension, my Nomad cluster) cluster was knocked offline.

Anyway, at first I only rebooted the controller host (that has both a Consul and Nomad server running on it, but does not accept and run jobs). This rebooted just fine and came back online, so I then rebooted my monitoring box (that also runs continuous integration), which also came back online.

Due to the significantly awkward physical location I keep my cluster in with the rest of the Pis, I decided to flip the power switch on the extension to restart all my hosts at the same time.

While this worked..... it also caused my cluster controller node to reboot, which caused its raft epoch number to increment by 1... which broke the quorum (agreement) of my cluster, and required manual intervention to resolve.

Raft quorum

To understand the specific issue here, we need to look at the Raft consensus algorithm. Raft is, as the name suggests, a consensus algorithm. Such an algorithm is useful when you have a cluster of servers that need to work together in a redundant fault-tolerant fashion on some common task, such as in our case Consul (service discovery) and Nomad (task scheduling).

The purpose of a raft server is to maintain agreement amongst all nodes in a cluster as to the global state of an application. It does this using a distributed log that it replicates through a fancy but surprisingly simple algorithm.

At the core of this algorithm is the concept of a leader. The cluster leader is responsible for managing and committing updates to the global state, as well as sending out the global state to everyone else in the cluster. In the case of Consul, the Consul servers are the cluster (the clients simply connect back to whichever servers are available) - and I have 3 of them, since Raft requires an odd number of nodes.

When the cluster first starts up or the leader develops a fault (e.g. someone sets off a fork bomb on it just for giggles), an election occurs to decide on a new leader. The election term number (or epoch number) is incremented by one, and everyone votes on who the new leader should be. The node with the most votes becomes the new leader, and quorum (agreement) is achieved across the entire cluster.

Consul and Raft

In the case of Consul, everyone must cast a vote for the vote to be considered valid, otherwise the vote is considered invalid and the election process must begin again. Crucially, the election term number must also be the same across everyone voting.

In my case, because I started my cluster controller and then rebooted it before it had a chance to achieve quorum, it incremented it's election term number and additional time than the rest of the cluster did, which caused the cluster to fail to reach quorum as the other 2 nodes in the Consul server cluster consider the controller node's vote to be invalid, yet they still demanded that all servers vote to elect a new leader.

The practical effect of this was tha because the Consul cluster failed to agree on who the leader should be, the Nomad cluster (which hangs off the Consul cluster, using it to find each other) also failed to start and subsequently reach quorum, which knocked all my jobs offline.

The solution

Thankfully, the Hashicorp Consul documentation for this specific issue is fabulous:

To summarise:

  1. Boot the cluster as normal if it isn't booted already
  2. Stop the failed node
  3. Create a special config file (raft/peers.json) that will cause the failed node to drop it's state and accept the state of the incomplete cluster, allowing it to rejoin and the cluster gain collective quorum once more.

The documentation to perform this recovery protocol is quite clear. While there is an option to recover a failed node if you still have a working cluster with a leader, in my case I didn't so I had to use the alternate route.


I've talked briefly about an interesting issue that caused my Consul cluster to break quorum, which inadvertently brought my entire infrastructure down until resolved the issue.

While Consul is normally really quite resilient, you can break it if you aren't careful. Having an understanding of the underlying consensus algorithm Raft is very helpful to diagnosing and resolving issues, though the error messages and documentation I looked through were generally clear and helpful.

Considerations on monitoring infrastructure

I like Raspberry Pis. I like them so much that by my count I have at least 8 in operation at the time of typing performing various functions for me, including a cluster for running various services.

Having Raspberry Pis and running services on servers is great, but once you have some infrastructure setup hosting something you care about your thoughts naturally turn to mechanisms by which you can ensure that such infrastructure continues to run without incident, and if problems do occur they can be diagnosed and fixed efficiently.

Such is the thought that is always on my mind when managing my own infrastructure, sprawls across multiple physical locations. To this end, I thought I'd blog what my monitoring system looks like - what it's strengths are, and what it could do better.

A note before we begin: I continue to have a long-term commitment to posting on this blog - I have just started a part-time position alongside my PhD due to the end of my primary research period, which has been taking up a lot of my mental energy. Things should get slowly back to normal soon-ish.

Keep in mind as you read this that my situation may be different to your own. For example, monitoring a network primary consisting of Raspberry Pis demands a very different approach than an enterprise setup (if you're looking for a monitoring solution for a bunch of big powerful servers, I've heard the TICK stack is a good place to start).

Monitoring takes many forms and purposes. Broadly speaking, I split the monitoring I have on my infrastructure into the following categories:

  1. Logs (see my earlier post on Centralising logs with rsyslog)
  2. System resources (e.g. CPU/RAM/disk/etc usage) - I use collectd for this
  3. Service health - I use Consul for my cluster, and Uptime Robot for this website.
  4. Server health (e.g. whether a server is down or not, hanging due to a bad mount, etc.)

I've found that as there are multiple categories of things that need monitoring, there isn't a single one-size-fits-all solution to the problem, so different tools are needed to monitor different things.

Logs - centralised rsyslog

At the moment, monitoring logs is a solved problem for me. I've talked about my setup previously, in which I have a centralised rsyslog server which receives and stores all logs from my entire infrastructure (barring a few select boxes I need to enrol in this system). Storing logs nets me 2 things:

  1. The ability to reference them (e.g. with lnav) later in the event of an issue for diagnostic purposes
  2. The ability to inspect the logs during routine maintenance for any anomalies, issues, or errors that might become problematic later if left unattended

System information - collectd

Similarly, storing information about system resource usage - such as CPU load or disk usage for instance - is more useful than you'd think for spotting and pinpointing issues with one's infrastructure - be it a single server or an entire fleet. In my case, this also includes monitoring network latency (useful should my ISP encounter issues, as then I can identify if it's a me or a them problem) and HTTP response times.

For this, I use collectd, backed by rrd (round-robin database) files. These are fixed-size files that contain ring buffers that it iteratively writes over, allowing efficient storage of up to 1 year's worth of history.

To visualise this in the browser, I use Collectd Graph Panel, which is unfortunately pretty much abandonware (I haven't found anything better).

To start with the strengths of this system, it's very computationally efficient. I have tried previously to setup a TICK (Telegraf, InfluxDB, Chronograf, and Kapacitor) stack on a Raspberry Pi, but it was way too heavy - especially considering the Raspberry Pi my monitoring system runs on is also my continuous integration server. Collectd, on the other hand, runs quietly in the background, barely using any resources at all.

Another strength is that it's easy and simple. You throw a config file at it (which could be easily standardised across an entire fleet of servers), and collectd will dutifully send encrypted system metrics to a given destination for you with minimal fuss. Meanwhile, the browser-based dashboard I use automatically plots graphs and displays them for you without any tedious creation of a custom dashboard.

Having a system monitor things is good, but having it notify you in the event of an anomaly is even better. While collectd does have the ability to generate and send notifications, its capacity to do this is unfortunately rather limited.

Another limitation of collectd is that accessing and processing the stored system metrics data is not a trivial process, since it's stored in rrd databases, the parsing of which is surprisingly difficult due to a lack of readily available libraries to do this. This makes it difficult to integrate it with other systems, such as n8n for example, which I have recently setup to replace some functions of IFTTT to automatically repost my blog posts here to Reddit and Discord.

Collectd can write to multiple sources however (e.g. MQTT), so I might look into this as an option to connect it to some other program to deliver more flexible notifications about issues.

Service health

Service health is what most people might think of when I initially said that this blog post would be about monitoring. In many ways, it's one of the most important things to monitor - especially if other people rely on infrastructure which is managed by you.

Currently, I achieve this in 2 ways. Firstly, for services running on the server that hosts this website I have a free Uptime Robot account which monitors my server and website. It costs me nothing, and I get monitoring of my server from a completely separate location. In the event my server or the services thereon that it monitors are down, I will get an email telling me as such - and another email once it goes back up again.

Secondly, for services running on my cluster I use Consul's inbuilt service monitoring functionality, though I don't yet have automated emails to notify me of failures (something I need to investigate a solution for).

The monitoring system you choose here depends on your situation, but I strongly recommend having at least some form of external monitoring of whether your target boxes go down that can notify you of this. If your monitoring is hosted on the box that goes down, it's not really of much use...!

Monitoring service health more robustly and notifying myself about issues is currently on my todo list.

Server health

Server health ties into service health, and perhaps also system information too. Knowing which servers are up and which ones are down is important - not least because of the services running thereon.

The tricky part of this is that if a server goes down, it could be because of any one of a number of issues - ranging from a simple software/hardware failure, all the way up to an entire-building failure (e.g. a powercut) or a natural disaster. With this in mind, it's important to plan your monitoring carefully such that you still get notified in the event of a failure.


In this post, I've talked a bit about my monitoring infrastructure, and things to consider more generally when planning monitoring for new or existing infrastructure.

It's never too late to iteratively improve your infrastructure monitoring system - whether it be enrolling that box in the corner that never got added to the system, or implementing a totally kind of monitoring - e.g. centralised logging, or in my case I need to work on more notifications for when things go wrong.

On a related note, what do your backups look like right now? Are they automated? Do they cover all your important data? Could you restore them quickly and efficiently?

If you've found this interesting, please leave a comment below!

PhD Update 15: Finding what works when

Hey there! Sorry this post is a bit late again: I've been unwell.

In the last post, I revisited my rainfall radar model, and shared how I had switched over to using .tfrecord files to store my data, and the speed boost to training I got from doing that. I also took an initial look at applying contrastive learning to my rainfall radar problem. Finally, I looks a bit into dimensionality reduction algorithms - in short: use UMAP (paper).

Before we continue, here's the traditional list of previous posts:

In addition, there's also an additional intermediate post about my PhD entitled "A snapshot into my PhD: Rainfall radar model debugging", which I posted since the last PhD update blog post. If you're interested in details of the process I go through stumbling around in the dark doing research on my PhD, do give it a read:

Since last time, I have been noodling around with the rainfall radar dataset and image segmentation models to see if I can find something that works. The results are mixed, but the reasons for this are somewhat subtle, so I'll explain those below.

Social media journal article

Last time I mentioned that I had written up my work with social media sentiment analysis models, but I had yet to finalise it to send to be published. This process is now completed, and it's currently under review! It's unlikely I'll have anything more to share on this for a good 3-6 months, but know that it's a process happening in the background. The journal I've submitted it to is Elsevier's Computers and Geosciences, though of course since it's under review I don't yet know if they will accept it or not.

Image segmentation, sort of

It doesn't feel like I've done much since last time, but looking back I've done a lot, so I'll summarise what I've been doing here. Essentially, the bulk of my work has been into different image segmentation models and strategies to see what works and what doesn't.

The specific difficulty here is that while I'm modelling my task of going from rainfall radar data (plus heightmap) to water depth in 2 dimensions as an image segmentation task, it's not exactly image segmentation, in that the output is significantly different in nature to the input I'm feeding the model, which complicates matters as this significantly increases the difficulty of the learning task I'm attempting to get the model to work on.

As a consequence of this, it is not obvious which model architecture to try first, or which ones will perform well or not, so I've been trying a variety of different approaches to see what works and what doesn't. My rough plan of model architectures to try is as follows:

  1. Split: contrastive pretraining
  2. Mono / autoencoder: encoder [ConvNeXt] → decoder [same as #1]
  3. Different loss functions:
    • Categorical Crossentropy
    • Binary Crossentropy
    • Dice
  4. DeepLabV3+ (almost finished)
  5. Encoder-only [ConvNeXt/ResNet/maybe Swin Transformer] (pending)

Out of all of these approaches, I'm almost done with DeepLabV3+ (#4), and #5 (encoder-only) is the backup plan.

My initial hypothesis was that a more connected image segmentation model such as the pre-existing PSPNet, DeepLabV3+, etc would not be a suitable choice for this task, since regular image segmentation models such as these place emphasis on the output being proportional to the input. Hence, I theorised that an autoencoder-style model would be the best place to start - especially so since I've fiddled around with an autoencoder before, albeit for a trivial problem.

However, I discovered with approaches #1 and #2 that autoencoder-style models with this task have a tendency to get 'lost amongst the weeds', and ignore the minority class:

To remedy this, I attempted to use a different loss function called Dice, but this did not help the situation (see the intermediary A snapshot into my PhD post for details).

I ended up cutting the contrastive pretraining temporarily (#1), as it added additional complexity to the model that made it difficult to debug. In the future, when the model actually works, I intend to revisit the idea of contrastive pretraining to see if I can boost the performance of the working model at all.

If there's one thing that doing a PhD teaches you, it's to keep on going in the face of failure. I should know: my PhD has been full of failed attempts. I saying I found online (I forget who said it, unfortunately) definitely rings true here: "The difference between the novice and the master is that the master has failed more times than the novice has tried"

In the spirit of this, this brings us to the next step of proving (or disproving) that this task is possible, which is to try a pre-existing image segmentation model to see what happens. After some research (against, see the intermediary A snapshot into my PhD post for details), I discovered that DeepLabV3+ is the current state of the art for image segmentation.

After verifying that DeepLabV3+ actually works with it's intended dataset, I've now just finished adapting it to take my rainfall radar (plus heightmap) dataset as an input instead. It's currently training as I write this post, so I'll definitely have some results for next time.

The plan from here depends on the performance of DeepLabV3+. Should it work, then I'm going to first post an excited social media post, and then secondly try adding an attention layer to further increase performance (if I have time). CBAM will probably be my choice of attention mechanism here - inspired by this paper.

If DeepLabV3+ doesn't work, then I'm going to go with my backup plan (#5), and quickly try training a classification-style model that takes a given area around a central point, and predicts water / no water for the pixel in the centre. Ideally, I would train this with a large batch size, as this will significantly boost the speed at which the model can make predictions after training. In terms of the image encoder, I'll probably use ConvNeXt and at least one other image encoder for comparison - probably ResNet - just in case there's a bug in the ConvNeXt implementation I have (I'm completely paranoid haha).

Ideally I want to get a basic grasp on a model that works soon though, and leave too much of the noodling around with improving performance until later, as if at all possible it would be very cool to attend IJCAI 2023. At this point it feels unlikely I'll be able to scrape something together for submitting to the main conference (the deadline for full papers is 18th January 2023, abstracts by 11th January 2023), but submitting to an IJCAI 2023 workshop is definitely achievable I think - they usually open later on.

Long-term plans

Looking into the future, my formal (and funded) PhD research period is coming to an end this month, so I will be taking on some half time work alongside my PhD - I may publish some details of what this entails at a later time. This does not mean that I will be stopping my PhD, just doing some related (and paid) work on the side as I finish up.

Hopefully, in 6 months time I will have cracked this rainfall radar model and be a good way into writing my thesis.


Although I've ended up doing things a bit back-to-front on this rainfall radar model (doing DeepLabV3+ first would have been a bright idea), I've been trying a selection of different model architectures and image segmentation models with my rainfall radar (plus heightmap) to water depth problem to see which ones work and which ones don't. While I'm still in the process of testing these different approaches, it will not take long for me to finish this process.

Between now and the next post in this series in 2 months time, I plan to finish trying DeepLabV3+, and then try an encoder-only (image classification) style model should that not work out. I'm also going to pay particularly close attention to the order of my dimensions and how I crop them, as I found yesterday that I mixed up the order of the width and height dimensions, feeding one of the models I've tested data in the form [batch_size, width, height, channels] instead of [batch_size, height, width, channels] as you're supposed to.

If I can possibly manage it I'm going to begin the process of writing up my thesis by writing a paper for IJCAI 2023, because it would be very cool to get the chance to go to a real conference in person for the first time.

Finalyl, if anyone knows of any good resources on considerations in the design of image segmentation heads for AI models, I'm very interested. Please do leave comment below.

A snapshot into my PhD: Rainfall radar model debugging

Hello again!

The weather is cold over here right now, and it's also been a while since I posted about my PhD in some detail, so I thought while I get my thoughts in order for a new PhD update blog post I'd give you a snapshot into what I've been doing in the last 2 weeks.

If you're not interested in nitty gritty details, I'll be posting a higher-level summary soon in my next PhD update blog post.

For context, since wrapping up (for now, more on this in the PhD update blog post) the social media side of my PhD, I've returned to the rainfall radar half of my PhD and implementing and debugging several AI models to predict water depth in real time. If you're thinking of doing a PhD yourself, this is in no way representative of what a PhD is like! Each PHD is different - mine just happens to include lots of banging my head against a wall debugging.

To start with, recently I've found and fixed a nasty bug in the thresholding function, which defaulted to a value of 1.5 instead of 0.1. My data is stored in .tfrecord files with pairs of rainfall radar and water depth 'images'. When the model reads these in, it will 'threshold' the water depth: for each pixel setting a value of 0 for pixels with water depth lower than a given threshold, and 1 for pixels above.

The bug in question manifested itself as an accuracy of 99%/100%, which is extremely unlikely given the nature of the task I'm asking it to predict. After some extensive debugging (including implementing a custom loss function that wrapped several different other loss functions, though not all at the same time), I found that the default value for the threshold was 1.5 (metres) instead of what it should have been - 0.1 (again, metres).

After fixing this, the accuracy lowered to 83% - the proportion of pixels in the input that were not water.

The model in question predicts water depth in 2D, taking in rainfall radar data (also in 2D) as an input. It uses ConvNeXt as an encoder, and an inverted ConvNeXt as the decoder. For the truly curious, the structure of this model as of the end of this section of the post can be found in the summary.txt file here.

Although I'd fixed the bug, I still had a long way to go. An accuracy of 83% is, in my case, no better than random guessing..... unfortunately completely ignoring the minority class.

In an attempt to get it to stop ignoring the minority class, I tried (in no particular order):

  • Increasing the learning rate to 0.1
  • Summing the output of the loss function instead of doing tf.math.reduce_sum(loss_output) / batch_size, as is the default
  • Multiplying the rainfall radar values by 100 (standard deviation of the rainfall radar data is in the order fo magnitude of 0.01, mean 0.0035892173182219267, min 0, max 1)
  • Adding an extra input channel with the heightmap (cross-entropy, loss: 0.583, training accuracy: 0.719)
  • Using the dice loss function instead of one-hot categorical cross-entropy
  • Removing the activation function (GeLU in my case) from the last few layers of the model (helpful, but I tried this with dice and not cross entropy loss, and switching back would be a bit of a pain)

Unfortunately, none of these things fixed the underlying issue of the model not learning anything.

Dice loss was an interesting case - I have some Cool Graphs to show on this:

Here, I compare removing the activation function (GeLU) from the last few layers (ref) with not removing the activation function from the last few layers. Clearly, removing it helps significantly, as the loss actually has a tendency to go in a downward direction instead of rocketing sky high.

This shows the accuracy and validation accuracy for the model without the activation function the last few layers. Unfortunately, the Dice loss function has some way to go before it can compete with cross-entropy. I speculate that while dice is cool, it isn't as useful on it's own in this scenario.

It would be cool to compare having no activation function in the last few layers and using cross-entropy loss to my previous attempts, but I'm unsure if I'll have time to noodle around with that.

In terms of where I got the idea to use the dice loss function from, it's from this GitHub repo and its associated paper:

It has a nice summary of loss functions for image segmentation and their uses / effects. If/when DeepLabV3+ actually works (see below) and I have some time, I might return to this to see if I can extract a few more percentage points of accuracy from whatever model I end up with.


Simultaneously with the above, I've been reading into existing image segmentation models. Up until now, my hypothesis has been that a model well connected with skip connections, such as U-Net, would not be ideal in this situation, as the input (rainfall radar) is so drastically different from the output (water depth) it would not be ideal to have a model with skip connections, as they encourage the output to be more similar to the input, which is not really what I want.

Now, however, I am (finally) going to test this (long running) hypothesis to see if it's really true. To do this, I needed to find the existing state-of-the-art image segmentation model. To summary long hours of reading, I found the following models:

  • SegNet (bad)
  • FCN (bog standard, also bad, maybe this paper)
  • U-Net (heard of this before)
  • PSPNet (like a pyramid structure, was state of the art but got beaten recently)
  • DeepLabV3 (PSPNet but not quite as good)
  • DeepLabV3+ (terrible name to search for but the current state of the art, beats PSPNet)

To this end, I've found myself a DeepLabV3+ implementation on (the code is terrible and so full of spaghetti I could eat it for breakfast and still have some left over) and I've tested it with the provided dataset, which seems to work fine:

....though there seems to be a bug in the graph plotting code, in that it doesn't clear the last line plotted.

Not sure if I can share the actual segmentations that it produces, but I can say that while they are a bit rough around the edges, it seems to work fine.

The quality of the segmentation is somewhat lacking given the training data only consisted of ~1k images and it was only trained for ~25 epochs. It has ~11 million parameters or so. I'm confident that more epochs and more data would improve things, which is good enough for me so my next immediate task will be to push my own data though it and see what happens.

I hypothesise that models like DeepLabV3+ etc also bring a crucial benefit to the table: training stability. Given the model's more interconnectedness with skip connections etc, backpropagation needs to travel overall less far to cover the entire model and update the weights.

If you're interested, a full summary of this DeepLabV3+ model can be found here:

If it does work, I've implemented recently a cool attention mechanism called Convolutional Block Attention Module (CBAM), which looks seriously cool. I'd like to try adding it to the DeepLabV3+ model to see if it increases the accuracy of the output.

Finally, a backup plan is in order in case it doesn't work. My plan is to convolve over the input rainfall radar data and make a prediction for a single water depth pixel at a time, using ConvNeXt as an image encoder backbone (though I may do tests with other backbones such as its older cousin ResNet simultaneously just in case, see also my post on image encoders), keeping the current structure of 7 channels rainfall radar + 1 channel heightmap.

While this wouldn't be ideal (given you'd need to push though multiple batches just to get a single 2D prediction), the model to make such predictions would be simpler and more likely to work right off the bat.


I've talked a bunch about my process and thoughts on debugging my rainfall radar to water depth model and trying to get it to work. Taking a single approach at a time to problems like this isn't usually the best idea, so I'm also trying something completely new in DeepLabV3+ to see if it will work.

I also have a backup plan in a more traditional image encoder-style model that will predict a single pixel at a time. As I mentioned at the beginning of this blog post, every PhD is different, so this is not representative of what you'd be doing on yours if you decide to do one / are doing one / have done one. If you are thinking of doing a PhD, please do get in touch if you're interested in hearing more about my experiences doing one and what you could expect.

I'm on Mastodon/Fediverse!

An ocean at sunset, vector done by me in Inkscape.

Hello there once again!

This is just a quick post to let you know that I can now be found on the fediverse! My account name is on a public Mastodon instance for now, but it may change in the future (I'll update this post if/when I change it).

At the moment, posts are copied from Twitter automatically by this neat cool, which means that automated tweets about new blog posts and Pepperminty Wiki releases, and other tweets I make will also be copied over to my fediverse account.

In the future, I would like to use my new n8n instance that's now running on my cluster to automate blog post notifications instead.

While I'm on the topic, I also now have a Discord server you can join, which also has an announcements channel where blog posts are posts by the above n8n instance I have setup. It also has a channel or two for chatting, which I'll expand and change as necessary. You can join it with this link:

Finally, as more of a maintenance thing, the automated posts to the subreddit have also been switched over to n8n from their previous home in IFTTT. The (eventual) plan is to switch everything over to my self-hosted n8n instance.

If there are any other social media networks you'd like me to post blog post updates etc to that are not currently listed on my homepage, please leave a comment below and I'll take a look to see how feasible it is.

In terms of new blog posts coming, I have on memtest86+ that's almost finished, but I just need to disentangle memtest86+ and figure out where to download a stable version that supports UEFI from. It's also about time I write another PhD update blog post. And, of course, there are also the random posts that spring out of what I'm currently working on, which recently has been about my PhD AI development (hit a really nasty snag recently which was fun to solve and debug; if I can adapt it into a blog post I'll post it here).

If there's anything you'd like me to blog about, please let me know in a comment below.


AI encoders demystified

When beginning the process of implementing a new deep learning / AI model, alongside deciding on the input and outputs of the model and converting your data to tfrecord files (file extension: .tfrecord.gz; you really should, the performance boost is incredible) is designing the actual architecture of the model itself. It might be tempting to shove a bunch of CNN layers at the problem to make it go away, but in this post I hope to convince you to think again.

As with any field, extensive research has already been conducted, and when it comes to AI that means that the effective encoding of various different types of data with deep learning models has already been carefully studied.

To my understanding, there are 2 broad categories of data that you're likely to encounter:

  1. Images: photos, 2D maps of sensor readings, basically anything that could be plotted as an image
  2. Sequenced data: text, sensor readings with only a temporal and no spatial dimension, audio, etc

Sometimes you'll encounter a combination of these two. That's beyond the scope of this blog post, but my general advice is:

  • If it's 3D, consider it a 2D image with multiple channels to save VRAM
  • Go look up specialised model architectures for your problem and how well they work/didn't work
  • Use a Convolutional variant of a sequenced model or something
  • Pick one of these encoders and modify it slightly to fix your use case

In this blog post, I'm going to give a quick overview of the various state of the art encoders for the 2 categories of data I've found on my travels so far. Hopefully, this should give you a good starting point for building your own model. By picking an encoder from this list as a starting point for your model and reframing your problem just a bit to fit, you should find that you have not only a much simpler problem/solution, but also a more effective one too.

I'll provide links to the original papers for each model, but also links to any materials I've found that I found helpful in understanding how they work. This can be useful if you find yourself in the awkward situation of needing to implement them yourself, but it's also generally good to have an understanding of the encoder you ultimately pick.

I've read a number of papers, but there's always the possibility that I've missed something. If you think this is the case, please comment below. This is especially likely if you are dealing with audio data, as I haven't looked into handling that too much yet.

Finally, also out of scope of this blog post are the various ways of framing a problem for an AI to understand it better. If this interests you, please comment below and I'll write a separate post on that.

Images / spatial data

ConvNeXt: Image encoders are a shining example of how a simple stack of CNNs can be iteratively improved through extensive testing, and in no encoder is this more apparent than in ConvNeXt.

In 1 sentence, a ConvNeXt model can be summarised as "a ResNet but with the features of a transformer". Researchers took a bog-standard ResNet, and then iteratively tested and improved it by adding various features that you'd normally find in a transformer, which results in the model performant (*read: had the highest accuracy) encoder when compared to the other 2 on this list.

ConvNeXt is proof that CNN-based models can be significantly improved by incorporating more than just a simple stack of CNN layers.

Vision Transformer: Born from when someone asked what should happen if they tried to put an image into a normal transformer (see below), Vision Transformers are a variant of a normal transformer that handles images and other spatial data instead of sequenced data (e.g. sentences).

The current state of the art is the Swin Transformer to the best of my knowledge. Note that ConvNeXt outperforms Vision Transformers, but the specifics of course depend on your task.

ResNet: Extremely popular, you may have heard of the humble ResNet before. It was originally made a thing when someone wanted to know what would happen if they took the number of layers in a CNN-based model to the extreme. It has a 'residual connection' between blocks of layers, which avoids the 'vanishing gradient problem' which in short is when your model is too deep, and when it backpropagates the error the gradient it adjusts the weight by becomes so small that it has hardly any effect.

Since this model was invented, skip connections (or 'residual connections', as they are sometimes known) have become a regular feature in all sorts of models - especially those deeper than just a couple of layers.

Despite this significant advancement though, I recommend using a ConvNeXt encoder instead of images - it's better than and the architecture more well tuned than a bog standard ResNet.

Sequenced Data

Transformer: The big one that is quite possibly the most famous encoder-decoder ever invented. The Transformer replaces the LSTM (see below) for handling sequenced data. It has 2 key advantages over an LSTM:

  • It's more easily paralleliseable, which is great for e.g. training on GPUs
  • The attention mechanism it implements revolutionises the performance of like every network ever

The impact of the Transformer on AI cannot be overstated. In short: If you think you want an LSTM, use a transformer instead.

  • Paper: Attention is all you need
  • Explanation: The Illustrated Transformer
  • Implementation: Many around, but I have yet to find a simple enough one, so I implemented it myself from scratch. While I've tested the encoder and I know it works, I have yet to fully test the decoder. Once I have done this, I will write a separate blog post about it and add the link here.

LSTM: Short for Long Short-Term Memory, LSTMs were invented in 1997 to solve the vanishing gradient problem, in which gradients when backpropagating back through recurrent models that handle sequenced data shrink to vanishingly small values, rendering them ineffective at learning long-term relationships in sequenced data.

Superceded by the (non recurrent) Transformer, LSTMs are a recurrent model architecture, which means that their output feeds back into themselves. While this enables some unique model architectures for learning sequenced data (exhibit A), it also makes them horrible at being parallelised on a GPU (the dilated LSTM architecture attempts to remedy this, but Transformers are just better). The only thing that stands them apart from the Transformer is that the Transformer is somehow not built-in into Tensorflow as standard yet, whereas LSTMs are.

Just like Vision Transformers adapted the Transformer architecture for multidimensional data, so too are Grid LSTMs to normal LSTMs.


In summary, for images encoders in priority order are: ConvNeXt, Vision Transformer (Swin Transformer), ResNet, and for text/sequenced data: Transformer, LSTM.

We've looked at a small selection of model architectures for handling a variety of different data types. This is not an exhaustive list, however. You might have another awkward type of data to handle that doesn't fit into either of these categories - e.g. specialised models exist for handling audio, but the general rule of thumb is an encoder architecture probably already exists for your use-case - even if I haven't listed it here.

Also of note are alternative use cases for data types I've covered here. For example, if I'm working with images I would use a ConvNeXt, but if model prediction latency and/or resource consumption mattered I would consider using a MobileNet](, which while a smaller model is designed for producing rapid predictions in lower resource environments - e.g. on mobile phones.

Finally, while these are encoders decoders also exist for various tasks. Often, they are tightly integrated into the encoder. For example, the U-Net is designed for image segmentation. Listing these is out of scope of this article, but if you are getting into AI to solve a specific problem (as is often the case), I strongly recommend looking to see if an existing model/decoder architecture has been designed to solve your particular problem. It is often much easier to adjust your problem to fit an existing model architecture than it is to design a completely new architecture to fit your particular problem (trust me, I've tried this already and it was a Bad Idea).

The first one you see might even not be the best / state of the art out there - e.g. Transformers are better than the more widely used LSTMs. Surveying the landscape for your particular task (and figuring out how to frame it in the first place) is critical to the success of your model.

Easily write custom Tensorflow/Keras layers

At some point when working on deep learning models with Tensorflow/Keras for Python, you will inevitably encounter a need to use a layer type in your models that doesn't exist in the core Tensorflow/Keras for Python (from here on just simply Tensorflow) library.

I have encountered this need several times, and rather than e.g. subclassing tf.keras.Model, there's a much easier way - and if you just have a simple sequential model, you can even keep using tf.keras.Sequential with custom layers!


First, some brief background on how Tensorflow is put together. The most important thing to remember is Tensorflow likes very much to compile things into native code using what we can think of as an execution graph.

In this case, by execution graph I mean a directed graph that defines the flow of information through a model or some other data processing pipeline. This is best explained with a diagram:

A simple stack of Keras layers illustrated as a directed graph

Here, we define a simple Keras AI model for classifying images which you might define with the functional API. I haven't tested this model - it's just to illustrate an example (use something e.g. like MobileNet if you want a relatively small model for image classification).

The layer stack starts at the top and works it's way downwards.

When you call model.compile(), Tensorflow complies this graph into native code for faster execution. This is important, because when you define a custom layer, you may only use Tensorflow functions to operate on the data, not Python/Numpy/etc ones.

You may have already encountered this limitation if you have defined a Tensorflow function with tf.function(some_function).

The reason for this is the specifics of how Tensorflow compiles your model. Now consider this graph:

A graph of Tensorflow functions

Basic arithmetic operations on tensors as well as more complex operators such as tf.stack, tf.linalg.matmul, etc operate on tensors as you'd expect in a REPL, but in the context of a custom layer or tf.function they operate on not a real tensor, but symbolic ones instead.

It is for this reason that when you implement a tf.function to use with for example, it only gets executed once.

Custom layers for the win!

With this in mind, we can relatively easily put together a custom layer. It's perhaps easiest to show a trivial example and then explain it bit by bit.

I recommend declaring your custom layers each in their own file.

import tensorflow as tf

class LayerMultiplier(tf.keras.layers.Layer):
    def __init__(self, multiplier=2, **kwargs):
        super(LayerMultiplier, self).__init__(**kwargs)

        self.param_multiplier = multiplier
        self.tensor_multiplier = tf.constant(multiplier, dtype=tf.float32)

    def get_config(self):
        config = super(LayerMultiplier, self).get_config()

            "multiplier": self.param_multiplier

        return config

    def call(self, input_thing, training, **kwargs):
        return input_thing * self.tensor_multiplier

Custom layers are subclassed from tf.keras.layers.Layer. There are a few parts to a custom layer:

The constructor (__init__) works as you'd expect. You can take in custom (hyper)parameters (which should not be tensors) here and use then to control the operation of your custom layer.

get_config() must ultimately return a dictionary of arguments to pass to instantiate a new instance of your layer. This information is saved along with the model when you save a model e.g. with tf.keras.callbacks.ModelCheckpoint in .hdf5 mode, and then used when you load a model with tf.keras.models.load_model (more on loading a model with custom layers later).

A paradigm I usually adopt here is setting self.param_ARG_NAME_HERE fields in the constructor to the value of the parameters I've taken in, and then spitting them back out again in get_config().

call() is where the magic happens. This is called when you call model.compile() with a symbolic tensor which stands in for the shape of the real tensor to build an execution graph as explained above.

The first argument is always the output of the previous layer. If your layer expects multiple inputs, then this will be an array of (potentially symbolic) tensors rather then a (potentially symbolic) tensor directly.

The second argument is whether you are in training mode or not. You might not be in training mode if:

  1. You are spinning over the validation dataset
  2. You are making a prediction / doing inference
  3. Your layer is frozen for some reason

Sometimes you may want to do something differently if you are in training mode vs not training mode (e.g. dataset augmentation), and Tensorflow is smart enough to ensure this is handled as you'd expect.

Note also here that I use a native multiplication with the asterisk * operator. This works because Tensorflow tensors (whether symbolic or otherwise) overload this and other operators so you don't need to call tf.math.multiply, tf.math.divide, etc explicitly yourself, which makes your code neater.

That's it, that's all you need to do to define a custom layer!

Using and saving

You can use a custom layer just like a normal one. For example, using tf.keras.Sequential:

import tensorflow as tf

from .components.LayerMultiply import LayerMultiply

def make_model(batch_size, multiplier):
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation="softmax"),
    ])[ batch_size, 32 ]))
    return model

The same goes here for the functional API. I like to put my custom layers in a components directory, but you can put them wherever you like. Again here, I haven't tested the model at all, it's just for illustrative purposes.

Saving works as normal, but for loading a saved model that uses a custom layer, you need to provide a dictionary of custom objects:

loaded_model = tf.keras.models.load_model(filepath_checkpoint, custom_objects={
    "LayerMultiply": LayerMultiply,

If you have multiple custom layers, define all the ones you use here. It doesn't matter if you define extra it seems, it'll just ignore the ones that aren't used.

Going further

This is far from all you can do. In custom layers, you can also:

  • Instantiate sublayers or models (tf.keras.Model inherits from tf.keras.layers.Layer)
  • Define custom trainable weights (tf.Variable)

Instantiating sublayers is very easy. Here's another example layer:

import tensorflow as tf

class LayerSimpleBlock(tf.keras.layers.Layer):
    def __init__(self, units, **kwargs):
        super(LayerSimpleBlock, self).__init__(**kwargs)

        self.param_units = units

        self.block = tf.keras.Sequential([

    def get_config(self):
        config = super(LayerSimpleBlock, self).get_config()

            "units": self.param_units

        return config

    def call(self, input_thing, training, **kwargs):
        return self.block(input_thing, training=training)

This would work with a single sublayer too.

Custom trainable weights are also easy, but require a bit of extra background. If you're reading this post, you have probably heard of gradient descent. The specifics of how it works are out of scope of this blog post, but in short it's the underlying core algorithm deep learning models use to reduce error by stepping bit by bit towards lower error.

Tensorflow goes looking for all the weights in a model during the compilation process (see the explanation on execution graphs above) for you, and this includes custom weights.

You do, however, need to mark a tensor as a weight - otherwise Tensorflow will assume it's a static value. This is done through the use of tf.Variable:

tf.Variable(name="some_unique_name", initial_value=tf.random.uniform([64, 32]))

As far as I've seen so far, tf.Variable()s need to be defined in the constructor of a tf.keras.layers.Layer, for example:

import tensorflow as tf

class LayerSimpleBlock(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(LayerSimpleBlock, self).__init__(**kwargs)

        self.weight = tf.Variable(name="some_unique_name", initial_value=tf.random.uniform([64, 32]))

    def get_config(self):
        config = super(LayerSimpleBlock, self).get_config()
        return config

    def call(self, input_thing, training, **kwargs):
        return input_thing * weight

After you define a variable in the constructor, you can use it like a normal tensor - after all, in Tensorflow (and probably other deep learning frameworks too), tensors don't always have to hold an actual value at the time of execution as I explained above (I call tensors that don't contain an actual value like this symbolic tensors, since they are like stand-ins for the actual value that gets passed after the execution graph is compiled).


We've looked at defining custom Tensorflow/Keras layers that you can use without giving tf.keras.Sequential() or the functional API. I've shown how by compiling Python function calls into native code using an execution graph, many orders of magnitude of performance gains can be obtained, fully saturating GPU usage.

We've also touched on defining custom weights in custom layers, which can be useful depending on what you're implementing. As a side note, should you need a weight in a custom loss function, you'll need to define it in the constructor of a tf.keras.layers.Layer and then pull it out and pass it to your subclass of tf.keras.losses.Loss.

By defining custom Tensorflow/Keras layers, we can implement new cutting-edge deep learning logic that are easy to use. For example, I have implemented a Transformer with a trio of custom layers, and CBAM: Convolutional Block Attention Module also looks very cool - I might implement it soon too.

I haven't posted a huge amount about AI / deep learning on here yet, but if there's any topic (machine learning or otherwise) that you'd like me to cover, I'm happy to consider it - just leave a comment below.

NSD, Part 2: Dynamic DNS

Hey there! In the last post, I showed you how to setup nsd, the Name Server Daemon, an authoritative DNS server to serve records for a given domain. In this post, I'm going to talk through how to extend that configuration to support Dynamic DNS.

Normally, if you query, say, the A or AAAA records for a domain or subdomain like, it will return the same IP address that you manually set in the DNS zone file, or if you use some online service then the value you manually set there. This is fine if your IP address does not change, but becomes problematic if your IP address may change unpredictably.

The solution, as you might have guessed, lies in dynamic DNS. Dynamic DNS is a fancy word for some kind of system where the host system that a DNS record points to (e.g. informs the DNS server about changes to its IP address.

This is done by making a network request from the host system to some kind of API that automatically updates the DNS server - usually over HTTP (though anything else could work too, but please make sure it's encrypted!).

You may already be familiar with using a HTTP API to inform your cloud-based registrar (e.g. Cloudflare, Gandi, etc) of IP address changes, but in this post we're going to set dynamic DNS up with the nsd server we configured in the previous post mentioned above.

The first order of business is to find some software to do this. You could also write a thing yourself (see also setting up a systemd service). There are several choices, but I went with dyndnsd (I may update this post if I ever write my own daemon for this).

Next, you need to determine what subdomain you'll use for dynamic dns. Since DNS is hierarchical, an entire subdomain is required - you can't just do dynamic DNS for, say, - since dyndnsd will manage it's own DNS zone file, all dynamic DNS hostnames will be under that subdomain - e.g.

Configuring the server

For the server, I will be assuming that the dynamic dns daemon will be running on the same server as the nsd daemon.

For this tutorial, we'll be setting it up unencrypted. This is a security risk if you are setting it up to accept requests over the Internet rather than a local trusted network! Notes on how to fix this at the end of this post.

Since this is a Ruby-based program (which I do generally recommend avoiding since Ruby is generally an inefficient language to write a program in I've observed), first we need to install gem, the Ruby package manager:

sudo apt install ruby ruby-rubygems ruby-dev

Then, we can install the gem Ruby package manager:

sudo gem install dyndnsd

Now, we need to configure it. dyndnsd is configured using a YAML (ew) configuration file. It's probably best to show an example configuration file and explain it afterwards:

# listen address and port
host: ""
port: 5354
# The internal database file. We'll create this in a moment.
db: "/var/lib/dyndnsd/db.json"
# enable debug mode?
debug: false
# all hostnames are required to be
domain: ""
# configure the updater, here we use command_with_bind_zone, params are updater-specific
  name: "command_with_bind_zone"
    zone_file: "/etc/dyndnsd/zones/"
    command: "systemctl reload nsd"
    ttl: "5m"
    dns: ""
    email_addr: ""
# Users with the hostnames they are allowed to create/update
  computeuser: # <--- Username
    password: "alongandrandomstring"
    password: "anotherlongandrandomstring"

...several things to note here that I haven't already noted in comments.

  • zone_file: "/etc/nsd/zones/": This is the path to the zone file dyndnsd should update.
  • dns: "": This is the fully-qualified hostname with a dot at the end of the DNS server that will be serving the DNS records (i.e. the nsd server).
  • email_addr: "": This sets the email address of the administrator of the system, but the @ at sign is replaced with a dot .. If your email address contains a dot . in the user (e.g., then it won't work as expected here.

Also important here is that although when dealing with domains like this it is less confusing to always require a dot . at the end of fully qualified domain names, this is not always the case here.

Once you've written the config file,, create the directory /etc/dyndnsd and write it to /etc/dyndnsd/dyndnsd.yaml.

With the config file written, we now need to create and assign permissions to the data directory it will be using. Do that like so:

sudo useradd --no-create-home --system --home /var/lib/dyndnsd dyndnsd
sudo mkdir /var/lib/dyndnsd
sudo chown dyndnsd:dyndnsd /var/lib/dyndnsd

Also, we need to create the zone file and assign the correct permissions so that it can write to it:

sudo mkdir /etc/dyndnsd/zones
sudo chown dyndnsd:dyndnsd /etc/dyndnsd/zones
# symlink the zone file into the nsd zones directory. This way dyndns isn't allowed to write to all of /etc/nsd/zones - just the 1 zone file it is supposed to update.
sudo ln -s /etc/dyndnsd/zones/ /etc/nsd/zones/

Now, we can write a systemd service file to run dyndnsd for us:

Description=dyndnsd: Dynamic DNS record updater

ExecStart=/usr/local/bin/dyndnsd /etc/dyndnsd/dyndnsd.yaml


Save this to /etc/systemd/system/dyndnsd.service. Then, start the daemon like so:

sudo systemctl daemon-reload
sudo systemctl enable --now dyndnsd.service

Finally, don't forget to update your firewall to allow requests through to dyndnsd. For UFW, do this:

sudo ufw allow 5354/tcp comment dyndnsd

That completes the configuration of dyndnsd on the server. Now we just need to update the nsd config file to tell it about the new zone.

nsd's config file should be at /etc/nsd/nsd.conf. Open it for editing, and add the following to the bottom:


...and you're done on the server!

Configuring the client(s)

For the clients, all that needs doing is configuring them to make regular requests to the dyndnsd server to keep it appraised of their IP addresses. This is done by making a HTTP request, so we can test it with curl like this:


...where computeuser is the username, alongandrandomstring is the password, and is the hostname it should update.

The server will be able to tell what the IP address is it should set for the subdomain by the IP address of the client making the request.

The simplest way of automating this is using cron. Add the following cronjob (sudo crontab -e to edit the crontab):

*/5 * * * *     curl -sS

....and that's it! It really is that simple. Windows users will need to setup a scheduled task instead and install curl, but that's outside the scope of this post.


In this post, I've given a whistle-stop tour of setting up a simple dynamic dns server. This can be useful if a host as a dynamic IP address on a local network but it still needs a (sub)domain for some reason.

Note that this is not suitable for untrusted networks! For example, setting dyndnsd to accept requests over the Internet is a Bad Idea, as this simple setup is not encrypted.

If you do want to set this up over an untrusted network, you must encrypt the connection to avoid nasty DNS poisoning attacks. Assuming you already have a working reverse proxy setup on the same machine (e.g. Nginx), you'll need to add a new virtual host (a server { } block in Nginx) that reverse-proxies to your dyndnsd daemon and sets the X-Real-IP HTTP header, and then ensure port 5354 is closed on your firewall to prevent direct access.

This is beyond this scope of this post and slightly different depending on your setup, but if there's the demand I can blog about how to do this.

Sources and further reading

Art by Mythdael