A snapshot into my PhD: Rainfall radar model debugging

Hello again!

The weather is cold over here right now, and it's also been a while since I posted about my PhD in some detail, so I thought while I get my thoughts in order for a new PhD update blog post I'd give you a snapshot into what I've been doing in the last 2 weeks.

If you're not interested in nitty gritty details, I'll be posting a higher-level summary soon in my next PhD update blog post.

For context, since wrapping up (for now, more on this in the PhD update blog post) the social media side of my PhD, I've returned to the rainfall radar half of my PhD and implementing and debugging several AI models to predict water depth in real time. If you're thinking of doing a PhD yourself, this is in no way representative of what a PhD is like! Each PHD is different - mine just happens to include lots of banging my head against a wall debugging.

To start with, recently I've found and fixed a nasty bug in the thresholding function, which defaulted to a value of 1.5 instead of 0.1. My data is stored in .tfrecord files with pairs of rainfall radar and water depth 'images'. When the model reads these in, it will 'threshold' the water depth: for each pixel setting a value of 0 for pixels with water depth lower than a given threshold, and 1 for pixels above.

The bug in question manifested itself as an accuracy of 99%/100%, which is extremely unlikely given the nature of the task I'm asking it to predict. After some extensive debugging (including implementing a custom loss function that wrapped several different other loss functions, though not all at the same time), I found that the default value for the threshold was 1.5 (metres) instead of what it should have been - 0.1 (again, metres).

After fixing this, the accuracy lowered to 83% - the proportion of pixels in the input that were not water.

The model in question predicts water depth in 2D, taking in rainfall radar data (also in 2D) as an input. It uses ConvNeXt as an encoder, and an inverted ConvNeXt as the decoder. For the truly curious, the structure of this model as of the end of this section of the post can be found in the summary.txt file here.

Although I'd fixed the bug, I still had a long way to go. An accuracy of 83% is, in my case, no better than random guessing..... unfortunately completely ignoring the minority class.

In an attempt to get it to stop ignoring the minority class, I tried (in no particular order):

Unfortunately, none of these things fixed the underlying issue of the model not learning anything.

Dice loss was an interesting case - I have some Cool Graphs to show on this:

Here, I compare removing the activation function (GeLU) from the last few layers (ref) with not removing the activation function from the last few layers. Clearly, removing it helps significantly, as the loss actually has a tendency to go in a downward direction instead of rocketing sky high.

This shows the accuracy and validation accuracy for the model without the activation function the last few layers. Unfortunately, the Dice loss function has some way to go before it can compete with cross-entropy. I speculate that while dice is cool, it isn't as useful on it's own in this scenario.

It would be cool to compare having no activation function in the last few layers and using cross-entropy loss to my previous attempts, but I'm unsure if I'll have time to noodle around with that.

In terms of where I got the idea to use the dice loss function from, it's from this GitHub repo and its associated paper:

It has a nice summary of loss functions for image segmentation and their uses / effects. If/when DeepLabV3+ actually works (see below) and I have some time, I might return to this to see if I can extract a few more percentage points of accuracy from whatever model I end up with.


Simultaneously with the above, I've been reading into existing image segmentation models. Up until now, my hypothesis has been that a model well connected with skip connections, such as U-Net, would not be ideal in this situation, as the input (rainfall radar) is so drastically different from the output (water depth) it would not be ideal to have a model with skip connections, as they encourage the output to be more similar to the input, which is not really what I want.

Now, however, I am (finally) going to test this (long running) hypothesis to see if it's really true. To do this, I needed to find the existing state-of-the-art image segmentation model. To summary long hours of reading, I found the following models:

To this end, I've found myself a DeepLabV3+ implementation on (the code is terrible and so full of spaghetti I could eat it for breakfast and still have some left over) and I've tested it with the provided dataset, which seems to work fine:

....though there seems to be a bug in the graph plotting code, in that it doesn't clear the last line plotted.

Not sure if I can share the actual segmentations that it produces, but I can say that while they are a bit rough around the edges, it seems to work fine.

The quality of the segmentation is somewhat lacking given the training data only consisted of ~1k images and it was only trained for ~25 epochs. It has ~11 million parameters or so. I'm confident that more epochs and more data would improve things, which is good enough for me so my next immediate task will be to push my own data though it and see what happens.

I hypothesise that models like DeepLabV3+ etc also bring a crucial benefit to the table: training stability. Given the model's more interconnectedness with skip connections etc, backpropagation needs to travel overall less far to cover the entire model and update the weights.

If you're interested, a full summary of this DeepLabV3+ model can be found here:

If it does work, I've implemented recently a cool attention mechanism called Convolutional Block Attention Module (CBAM), which looks seriously cool. I'd like to try adding it to the DeepLabV3+ model to see if it increases the accuracy of the output.

Finally, a backup plan is in order in case it doesn't work. My plan is to convolve over the input rainfall radar data and make a prediction for a single water depth pixel at a time, using ConvNeXt as an image encoder backbone (though I may do tests with other backbones such as its older cousin ResNet simultaneously just in case, see also my post on image encoders), keeping the current structure of 7 channels rainfall radar + 1 channel heightmap.

While this wouldn't be ideal (given you'd need to push though multiple batches just to get a single 2D prediction), the model to make such predictions would be simpler and more likely to work right off the bat.


I've talked a bunch about my process and thoughts on debugging my rainfall radar to water depth model and trying to get it to work. Taking a single approach at a time to problems like this isn't usually the best idea, so I'm also trying something completely new in DeepLabV3+ to see if it will work.

I also have a backup plan in a more traditional image encoder-style model that will predict a single pixel at a time. As I mentioned at the beginning of this blog post, every PhD is different, so this is not representative of what you'd be doing on yours if you decide to do one / are doing one / have done one. If you are thinking of doing a PhD, please do get in touch if you're interested in hearing more about my experiences doing one and what you could expect.

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt


Art by Mythdael