PhD Update 11: Answers to our questions

Heya! It's time for another PhD update blog post. Sometimes, answers to the questions one ask some in the form of more questions. In this post, as I predicted in the last post I'll be talking mainly about my work with tweets from twitter, as I haven't yet had time to return to the Temporal ~~CNN~~ Autoencoder. Before we start though, here's a list of all the parts in this series so far:

As usual, none of the things I present here are finalised, and are subject to significant change as I double check everything.

Conferences part 2

In the last post, I talked about the conferences I have applied to and not applied to. In this one, I can now say that I have been accepted for the AAAI-22 Doctoral Consortium! It was going to be held in Vancouver, Canada - but has since been moved to be fully virtual and online. While I both understand the reasoning behind the decision and am relieved I don't have to travel, it is a bit of a shame that I won't get the chance to have those face-to-face conversations you don't get when in a video call.

Despite the move to being in person, I'm still both excited to attend and mildly terrified about presenting the things I've been doing to a potentially large audience.

Looking forward, at the suggestion of my supervisor I plan to finish writing up that AI+HADR paper I mentioned in the last update as a journal article instead, and then submit that. While I'm working through the review process for that paper, I'll return to the Temporal Autoencoder / rainfall radar subproject and work on implementing my idea for it.

Tweet sentiment analysis

After the rather rushed analysis of the data in the last post, I've now taken the time to analyse the data more thoroughly. A number of things became apparent here, but first let's look at the research questions I've asked:

Is there a more negative response to more sudden / severe floods?
Can we classify images by the sentiment of the associated tweet?

Answering these questions has not been straightforward, but I'm now at a point where I have some preliminary answers (which, I stress, are not double checked and not peer reviewed).

Unfortunately, the answer to the first question there is that it can't easily be answered with the current data available. As it turns out, I have been unable to find an objective and consistent measure of how severe or sudden a flood was.

One might think that say the amount of insurance damages would be a good choice. This doesn't work out though, because not all flooding event have (public) damage estimates. Those that do are often measured against different goalposts: flood A might be measured in property damages, and flood B in economic impact.

Another measure I investigated was the number of homes destroyed or people displaced. This too as it turns out has multiple issues. For one, while for some floods multiple estimates are available they don't always agree. For a single given flood estimates might range from 800 to 2400 homes destroyed, and are often measured at different points in the history of the given flooding event, and it's sometimes unclear at what stage in a flood's lifecycle the estimate was made.

Even if such estimates were consistent (which they really aren't), there's another issue too: they are often limited to country borders. For example, a government agency may estimate the number of homes destroyed for their country, but not other countries. This is totally reasonable: a government is concerned first and foremost with the people within it's borders. Sadly floods, storms, and hurricanes rarely discriminate across such borders. Take Storm Christoph for example. It hit the UK in January 2021, but after that it continued on and hit Scandinavia too.

The other question above is thankfully much easier to answer - it deserves it's own section in this post though, so see below for that. It's not all bad new on the tweet sentiment analysis front though - I did find some unexpected results while I was analysing the data. To explain, let's look at a fancy new chart I've plotted:

(Above: Various floods and their overall sentiment - both with replies included and excluded.)

This fancy chart shows the overall sentiment of a number of different flooding events, with replies included (going down) and excluded (going up). What's fascinating here is that it appears to suggest that the replies contribute significantly towards the percentage of positive tweets. Perhaps people are most likely to tweet words of encouragement (e.g. "stay safe :hugs:")?

With this in mind, I correlated the total number of tweets made in a flood with the overall sentiment (as a percentage), and got a Pearson correlation coefficient of -0.54, which indicates a medium correlation. Apparently, if more people tweet about a flood, it's more likely to have a more positive overall sentiment. If replies are excluded, it works out to -0.31, which would indicate a weaker negative correlation.

Image classification

Another thing I've been working on is classifying images associated with tweets using the sentiment of the tweet as a label. With roughly ~175K images to work with, this has proved to be a useful exercise - resulting in ~75.23% validation accuracy (rising to 96.9% training accuracy by epoch 50%, suggesting it's overfitting and I need more data to improve validation accuracy any further) over 9 epochs. While unfortunately I can't share a sample of positive/negative images predicted by the model due to data privacy rules, I can talk about the structure of the model and show a confusion matrix.

I started out with a Compact Convolutional Transformer model as it looks very cool and has some significant benefits over more traditional model architectures, there's a bug in my implementation somewhere I can't spot, and it only yields 10% to 20% accuracy on Fashion MNIST. To avoid wasting too much time, I'm now using a prebuilt ResNet50 initialised with random weights that takes in images in the size 128 x 128 pixels (images larger or smaller than this are automatically resized without preserving aspect ratio).

While the accuracy of the model is slightly lower than that of the model that predicts the sentiment of the tweets themselves, looking at samples that I quickly threw together with a bit of Bash and ImageMagick and the confusion matrix (see below) reveals that it is in fact doing something useful. Here's that Bash 2-liner:

shuf path/to/tweets-labelled.tsv | awk '/positive$/ { print("/absolte/path/to/media_dir/" $1); }' | head -n64 >/tmp/sample-pos.txt;
montage @/tmp/sample-pos.txt -geometry 960x540+10+10 -tile 8x8 /tmp/sample-pos.jpeg

The @path/to/file.txt syntax in the montage command call there reads a list of filepaths from a file instead of directly specifying them on the command line. By replacing positive in the awk filter with negative and sample-pos.txt to sample-neg.txt, the same procedure can also generate a random sample for the negative category too.

From looking at a single sample of 64 positive images and 64 negative images generated using the method above, positive images generally include:

Cats (by far the most popular of course)
Cupcakes
People, some of which are helping others in a flood

Whereas negative images generally include:

Floods
Damage to homes, buildings, etc

My next immediate step here is to plot a confusion matrix to better understand how the model is performing, as I'm slightly concerned that it's ignoring the minority class (in this case, positive tweets). I've already mostly completed this already, but it's just not quite ready to show here yet as I need to double check and revise some stuff.

Of course, given that the source dataset is very noisy (social media data generally is) and relatively difficult for AI models to understand, I think this is a good result.

From here, if I have time I'd like to combine this image classification model with the earlier tweet sentiment analyser model to create a single model that can more accurately predict the sentiment of both the text of a tweet and the associated image at the same. To do this, I'm probably going to investigate and use CLIP - more on this in a future post.

Conclusion

While I still haven't done anything with the Temporal Autoencoder due to other priorities, I'm hoping to return to it once I've wrapped up this social media section of the project. I have made significant progress on analysing the social media data - both the textual tweets and the associated images, and I plan to combine the models I've trained to classify both text and images into a single model. It's not the end of the road yet though: while I've found some answers, they are just leading to more questions.

Type this	To get this	Notes
`bold text`	bold text	-
`_italics text_`	italics text	-
`~~deleted text~~`	~~deleted~~	-
`code text`	`code text`	Inserts some monospaced code. It is preferred that large blocks of code are linked to using a service such as Pastebin, Github Gists or Ideone.
`> Quote`	Quote	-
`[display text](//google.com)`	display text	Inserts a hyperlink. Please use responsibly. `[rel=nofollow]` is in use and spam will be deleted.
`---`		Inserts a horizontal line. The previous line must be blank.

Stardust
Blog