PhD, Update 7: Just out of reach
Oops! I must have forgotten about writing an entry for this series. Things have been complicated with the current situation, but I've got some time now to talk about what's been happening since my last post about my PhD. Before we continue though, here's a list of all the parts so far:
- PhD Update 1: Directions
- PhD Update 2: The experiment, the data, and the supercomputers
- PhD Update 3: Simulating simulations with some success
- PhD Update 4: Ginormous Data
- PhD Update 5: Hyper optimisation and frustration
- PhD Update 6: The road ahead
In this post, there are 2 different distinct areas to talk about. Firstly, the (limited) progress I've made on the Temporal CNN - and secondly the social media content.
Rainfall Radar / Temporal CNN
Things on the Temporal CNN front have been.... interesting. In the last post, I talked about how I was planning to update the model to use a cross-entropy loss function instead of mean squared error. In short, the idea here is to bin the water depth values we want to predict into a number of different categories, and then get the AI to predict which category each pixel belongs in.
The point of this is to allow for evaluating what the model is good at, and what it struggles with more effectively with the help of a confusion matrix.
Unfortunately, after going to a considerable amount of effort, the model hasn't yet been able to learn anything at all when using the cross-entropy loss error function. I've tried a whole array of different things by this point:
- Using sparse / non-sparse categorical cross-entropy loss (ref)
- Changing the number of filters in the model
- Changing the format of the input data
- Moving from average pooling → max pooling
- Fiddling with the 2D CNN layer at the end of the model
- ...and many more things - too many to list or remember here
Unfortunately, none of these have had a meaningful impact on the model's ability to learn anything. Despite this, we haven't run out of ideas yet. My current plan is to rebuild the model based on a known good model.
The known-good model in question is one I built earlier for a talk I did. It's purpose is classifying images, which it does fabulously with the well-known MNIST handwriting digits dataset. It's structure has 1 2D CNN layer, followed by a dense layer that outputs the probabilities as a 1D array:
I devised 2 learning tasks to test the model with here. The "hard" task, which is to predict the exact digit in the picture, and the "easy" task - which is to predict whether the digit is greater than or equal to 5 or not (this simulates a binary cross-entropy task I've been trying my original model with). The original model works brilliantly with the hard task, gaining an easy 98% accuracy after 12 epochs.
After forking it and then refactoring significantly to decouple its various components, I started to modify the model's structure step by step to more closely match that of the Temporal CNN.
The initial results of this process (which only really got going on Tuesday 23rd February 2021) have been fascinating, as I've been running the MNIST dataset through it in between each step to check that it's still working as intended.
For example, I've discovered that the model has an intense dislike of pooling layers (both average and max). I suspect this might be because I'm not using it correctly, but I discovered that I could only get about 40% accuracy with the pooling layer in place, compared to ~99% without it.
Another thing I've done is removing the dense layer from the model, but this comes with its own set of problems though. The eventual goal is to do what is essentially image-to-video translation, so a key part of this process is to get the model to produce at least 2D tensor as an output instead of a 1D list of predictions for a single pixel.
To simulate this with the MNIST dataset, copied the output prediction. For the "hard" task, I copied the array of probabilities for each category into a cube, with 1 copy of the array for each pixel of the output. I found while doing this though that I got about 22% accuracy - though I suspect that the model was slower to converge than normal and if I'd maybe made the model a bit larger or let it train for longer, I'd be able to improve that somewhat.
It fared much better on the "easy" task though - easily achieving 99% accuracy fairly quickly with just 2 x 2D CNN layers in a row.
With these tests in mind, I'll be continuing the process of tweaking my new model bit by bit to match the original Temporal CNN, with the eventual goal of running my actual dataset through the model.
I thought I'd be well into the social media part of my PhD by now, but things have been getting in the way (e.g. life stuff with respect to the current situation, and the temporal cnn being awkward) so I haven't yet been able to make a serious start on the social media side of things yet.
Still, I've been working away at the paperwork. I've now got ethical approval to work with publicly-available social media data (so long as I anonymise it, of course), and I've also been applying to get access to Twitter's new Academic API (which apparently went through successfully, but I'm currently troubleshooting the reason why it's asking me to apply for an account all over again).
I've also been reading a paper or 2, but since most of my energy has been spent elsewhere I have yet to dive seriously into this (I re-discovered the other day a folder full of interesting papers my supervisor sent me, so I'm going to dive into that as soon as I get a moment).
Papers looking into analysing social media data with advanced AI models appear to be in short supply - most papers I've read so far are either talking about analysing longer texts such as newspaper articles, or are using keyword-based and statistics-based methodologies to analyse data.
While this makes for an interesting research gap, I do feel slightly nervous that I've somehow missed something (which I guess I'll find out soon enough after reading some more papers). At any rate, my supervisor and I have some promising ideas and directions to look into moving forwards, so I'm not too worried here. I've also had some interesting discussions with people from the humanities side of my PhD (if you can call it that? I'm not sure what the right terminology is) over potential research questions too, so there's lots of scope here for investigation.
I anticipate that social media data isn't going to be as difficult a dataset to wrangle as the rainfall radar either (it's got to be better than a badly documented propriety binary format), as it's already encoded in JSON - so I'm not expecting I'll need to spend ages and ages writing programs to reformat and parse the data.
Things have been moving slowly recently, due in part to difficulties with the Temporal CNN, and due in part to life in general suddenly becoming rather challenging recently. Things are starting to calm down now though, so I'm starting to have more time to work on my PhD (but it's going to be a number of months yet until things are properly back to normal).
By changing tack with the Temporal CNN, I feel like I'm starting to make some more progress again, and the social media track of my PhD is showing lots of promise even though it's too early to tell exactly what direction I'll be heading in with it.
Hopefully by the time I make another post here in 2 months time, I'll have a working Temporal CNN and a start to the social media side of things - but this seems a tad ambitious based on how things have been going so far.
If you've got any comments or suggestions, do leave them below! I'd love to hear from you.