A snapshot into my PhD: Rainfall radar model debugging
Hello again!
The weather is cold over here right now, and it's also been a while since I posted about my PhD in some detail, so I thought while I get my thoughts in order for a new PhD update blog post I'd give you a snapshot into what I've been doing in the last 2 weeks.
If you're not interested in nitty gritty details, I'll be posting a higher-level summary soon in my next PhD update blog post.
For context, since wrapping up (for now, more on this in the PhD update blog post) the social media side of my PhD, I've returned to the rainfall radar half of my PhD and implementing and debugging several AI models to predict water depth in real time. If you're thinking of doing a PhD yourself, this is in no way representative of what a PhD is like! Each PHD is different - mine just happens to include lots of banging my head against a wall debugging.
To start with, recently I've found and fixed a nasty bug in the thresholding function, which defaulted to a value of 1.5 instead of 0.1. My data is stored in .tfrecord
files with pairs of rainfall radar and water depth 'images'. When the model reads these in, it will 'threshold' the water depth: for each pixel setting a value of 0 for pixels with water depth lower than a given threshold, and 1 for pixels above.
The bug in question manifested itself as an accuracy of 99%/100%, which is extremely unlikely given the nature of the task I'm asking it to predict. After some extensive debugging (including implementing a custom loss function that wrapped several different other loss functions, though not all at the same time), I found that the default value for the threshold was 1.5 (metres) instead of what it should have been - 0.1 (again, metres).
After fixing this, the accuracy lowered to 83% - the proportion of pixels in the input that were not water.
The model in question predicts water depth in 2D, taking in rainfall radar data (also in 2D) as an input. It uses ConvNeXt as an encoder, and an inverted ConvNeXt as the decoder. For the truly curious, the structure of this model as of the end of this section of the post can be found in the summary.txt
file here.
Although I'd fixed the bug, I still had a long way to go. An accuracy of 83% is, in my case, no better than random guessing..... unfortunately completely ignoring the minority class.
In an attempt to get it to stop ignoring the minority class, I tried (in no particular order):
- Increasing the learning rate to 0.1
- Summing the output of the loss function instead of doing
tf.math.reduce_sum(loss_output) / batch_size
, as is the default - Multiplying the rainfall radar values by 100 (standard deviation of the rainfall radar data is in the order fo magnitude of 0.01, mean 0.0035892173182219267, min 0, max 1)
- Adding an extra input channel with the heightmap (cross-entropy, loss: 0.583, training accuracy: 0.719)
- Using the dice loss function instead of one-hot categorical cross-entropy
- Removing the activation function (GeLU in my case) from the last few layers of the model (helpful, but I tried this with dice and not cross entropy loss, and switching back would be a bit of a pain)
Unfortunately, none of these things fixed the underlying issue of the model not learning anything.
Dice loss was an interesting case - I have some Cool Graphs to show on this:
Here, I compare removing the activation function (GeLU) from the last few layers (ref) with not removing the activation function from the last few layers. Clearly, removing it helps significantly, as the loss actually has a tendency to go in a downward direction instead of rocketing sky high.
This shows the accuracy and validation accuracy for the model without the activation function the last few layers. Unfortunately, the Dice loss function has some way to go before it can compete with cross-entropy. I speculate that while dice is cool, it isn't as useful on it's own in this scenario.
It would be cool to compare having no activation function in the last few layers and using cross-entropy loss to my previous attempts, but I'm unsure if I'll have time to noodle around with that.
In terms of where I got the idea to use the dice loss function from, it's from this GitHub repo and its associated paper: https://github.com/shruti-jadon/Semantic-Segmentation-Loss-Functions.
It has a nice summary of loss functions for image segmentation and their uses / effects. If/when DeepLabV3+ actually works (see below) and I have some time, I might return to this to see if I can extract a few more percentage points of accuracy from whatever model I end up with.
DeepLabV3+
Simultaneously with the above, I've been reading into existing image segmentation models. Up until now, my hypothesis has been that a model well connected with skip connections, such as U-Net, would not be ideal in this situation, as the input (rainfall radar) is so drastically different from the output (water depth) it would not be ideal to have a model with skip connections, as they encourage the output to be more similar to the input, which is not really what I want.
Now, however, I am (finally) going to test this (long running) hypothesis to see if it's really true. To do this, I needed to find the existing state-of-the-art image segmentation model. To summary long hours of reading, I found the following models:
- SegNet (bad)
- FCN (bog standard, also bad, maybe this paper)
- U-Net (heard of this before)
- PSPNet (like a pyramid structure, was state of the art but got beaten recently)
- DeepLabV3 (PSPNet but not quite as good)
- DeepLabV3+ (terrible name to search for but the current state of the art, beats PSPNet)
To this end, I've found myself a DeepLabV3+ implementation on keras.io (the code is terrible and so full of spaghetti I could eat it for breakfast and still have some left over) and I've tested it with the provided dataset, which seems to work fine:
....though there seems to be a bug in the graph plotting code, in that it doesn't clear the last line plotted.
Not sure if I can share the actual segmentations that it produces, but I can say that while they are a bit rough around the edges, it seems to work fine.
The quality of the segmentation is somewhat lacking given the training data only consisted of ~1k images and it was only trained for ~25 epochs. It has ~11 million parameters or so. I'm confident that more epochs and more data would improve things, which is good enough for me so my next immediate task will be to push my own data though it and see what happens.
I hypothesise that models like DeepLabV3+ etc also bring a crucial benefit to the table: training stability. Given the model's more interconnectedness with skip connections etc, backpropagation needs to travel overall less far to cover the entire model and update the weights.
If you're interested, a full summary of this DeepLabV3+ model can be found here: https://starbeamrainbowlabs.com/blog/images/20221215-DeeplabV3+_summary.txt
If it does work, I've implemented recently a cool attention mechanism called Convolutional Block Attention Module (CBAM), which looks seriously cool. I'd like to try adding it to the DeepLabV3+ model to see if it increases the accuracy of the output.
Finally, a backup plan is in order in case it doesn't work. My plan is to convolve over the input rainfall radar data and make a prediction for a single water depth pixel at a time, using ConvNeXt as an image encoder backbone (though I may do tests with other backbones such as its older cousin ResNet simultaneously just in case, see also my post on image encoders), keeping the current structure of 7 channels rainfall radar + 1 channel heightmap.
While this wouldn't be ideal (given you'd need to push though multiple batches just to get a single 2D prediction), the model to make such predictions would be simpler and more likely to work right off the bat.
Conclusion
I've talked a bunch about my process and thoughts on debugging my rainfall radar to water depth model and trying to get it to work. Taking a single approach at a time to problems like this isn't usually the best idea, so I'm also trying something completely new in DeepLabV3+ to see if it will work.
I also have a backup plan in a more traditional image encoder-style model that will predict a single pixel at a time. As I mentioned at the beginning of this blog post, every PhD is different, so this is not representative of what you'd be doing on yours if you decide to do one / are doing one / have done one. If you are thinking of doing a PhD, please do get in touch if you're interested in hearing more about my experiences doing one and what you could expect.