PhD Update 6: The road ahead
Hey, I'm back early with another post in my PhD series! Turns out there was a bit of mix up last time, and I misplaced update 4 - so I've renamed the duplicate update 4 to update 5. Before we continue, here are all the posts in this series so far:
- PhD Update 1: Directions
- PhD Update 2: The experiment, the data, and the supercomputers
- PhD Update 3: Simulating simulations with some success
- PhD Update 4: Ginormous Data
- PhD Update 5: Hyper optimisation and frustration
Earlier today, I had my PhD panel 2. For those reading who don't know, the primary assessments for a PhD (at least in my case) take the form of 6 panels - in which you write a big report, and then your supervisors get together with you and discuss it. The even numbered ones at the end of each year are the more important ones, I'm led to believe - so I was understandably nervous.
Thankfully, it went well, and I ended up writing a report that's a third longer than my undergraduate dissertation at ~9k words O.o Anyway, I wanted to make another post in this series, as the process of writing the report (and research plan) for my panel has made me look at the bigger picture of where my PhD is going, and how it's all going to tie together - and I wanted to share this here.
In the last few posts, I've given some insights into the process I've been working through to train a Temporal CNN to predict the output of HAIL-CAESAR. This is starting to reach its conclusion - though there are a number of tasks I have yet to complete to close this chapter of the story. In particular, I've (finally!) got the results from the hyperparameter optimisation I've been doing. Let's check out a heatmap:
I plotted the above with GNUPlot, but ran into a number of issues (of course) while generating it, which I'd like to briefly discuss here. If you're interested in a more detailed discussion of this, please get in touch (I'll also need to know your real name and where you're working / studying, just in case) via a private communication method - such as my email address on my website.
Firstly, those who are particularly particularly perceptive will notice the rather strange filename for the above image. As indicated, I ran into some instability in my early stopping algorithm. For each epoch, the algorithm I implemented looked at the error from 3 epochs ago - and if it's greater than the error value for the epoch that's just finished it allows training to continue. Unfortunately, due to the nature of the task at hand, this sometimes caused it to stop training too soon. Further analysis of the results revealed that it sometimes allowed it to continue training too long as well (which is reflected in the above chart), but I'm baffled on that one.
For this reason, all hyperparameter combinations that didn't train for quite long enough were omitted in the above graph.
Secondly, there's a large area of the chart that isn't filled in. This is because both increasing the number of filters and the temporal depth increases memory usage - and the area that's unfilled is the area in which it failed to train because it ran out of GPU memory (I used 4 x Nvidia GeForce V100 - thanks very much to my department for providing this!).
Finally, the most important thing to note is that after reviewing other similar models, I discovered that this really isn't the best way to evaluate the performance of this kind of model. Rather, it would be better to consider this as a classification-based task instead. This can be achieved by creating a number of different bins for the water depth values, and assign a probability to each pixel for each water depth bin. The technical name for this is cross-entropy loss as far as I'm aware, which in my case I've been tending to prefix with pixel-based.
In this fashion, I can use things like a confusion matrix (can't find a good explanation of this, so I'll talk about it in a future blog post) to evaluate the performance of the model - which allows me to see the models strengths and weaknesses. What are its strengths? What are its weaknesses? I hope to find out - though I'm a little nervous about it, as I haven't yet looked into how to generate one with Tensorflow.js or tested my initial cross-entropy loss implementation with my kind of dataset yet.
Having a probability-based system like this should also make the output more useful. By this, I mean that having probabilities assigned to different water depths should be easier to interpret than a single value which nobody knows how the model came to that conclusion, or how uncertain the model is about the result.
Let's take look at 1 more set of graphs before we move on:
In this set of 4 graphs, I've taken the training and validation root mean squared error from 4 different combinations of hyperparameters. These rather effectively illustrate the early stopping stability here (top left), and they also demonstrate some instability in the training process (all graphs). I'm guessing the latter is probably caused by insufficient training data - this I can remedy quite easily as I have plenty more besides the 5K time steps I trained on (~1.5M time steps to be precise), but in the interests of time I limited the training dataset so that I could train a variety of different models.
It also shows the shortcomings of the root mean-squared-error approach I've been using so far - is an average error of 2 biased towards deeper or shallower water? What can it predict well, and what does it struggle with?
My supervisor also wants me to write a publication on the Temporal CNN stuff I've done when it's complete, so that's going to be an interesting new experience for me too (I'm both excited and terribly nervous at the same time).
Social Media Analysis
Moving forwards, I'm hoping to complement my Temporal CNN model with some social media analysis (subject to ethical approval of course - filling out the paperwork for this is part of my task list for the rest of this week). If I get the go-ahead, my intention is to bring some natural language processing AI model to the subject of flood mapping using social media - as I've noticed that there's.... limited existing material on this so far.
I have yet to read up properly on the subject, but most recently I've found this paper rather interesting. It uses an unsupervised model to identify the topic(s) a tweet is talking about, which they then follow up with some statistical analysis.
I'm a little fuzzy on how they identify where tweets are talking about (especially since this paper mentions that only a very small percentage of tweets have a geotag attached, and even of these a fraction will be wrong or misleading for 1 reason or another), but the researchers put together a rather nice map by chaining together several statistical techniques techniques I'm currently unfamiliar with. It shows hot spots and cold spots, which indicate where the damage for an earthquake has occurred.
If possible, it would be very nice indeed to plot a flood on a map (stretch goal: in real-time!). I anticipate this to be a key issue I'll need to pay attention to.
In terms of my most immediate starting point, I'm going to be doing some more reading on the subject, and then have a chat with my supervisor about the next steps (she's particularly knowledgable about natural language processing :D).
In conclusion, I'm starting to come to the end of the Temporal CNN chapter of my PhD, although I suspect it's going to keep coming back over and over again to say hello. Moving forwards, I'm hoping to complement work I've done so far with some social media analysis (subject to ethical approval) using AI-based Natural Language Processing - which will probably consist of improving an existing model or something more directly computer science related (my existing work is classified more as a contribution to environmental sciences, apparently).
Look out for more posts in this series in the future, as I'm sure I'll have plenty to talk about soon (I might do another one in a month's time, or it might end up being nearer 2 months depending on circumstances).