PhD Update 10: Sharing with the world
Hey there - is it that time already? Another PhD update blog post! And in double digits too! In this one, I'll be talking mainly about the social media project I've been working on, as I haven't had time to work on the temporal CNN much since last time. I've also taken some time off in August, so technically this is only ~just over 1 month's worth of progress here.
Before we begin, here's a list of posts in this series so far. They give useful context - you probably won't understand this one without reading them first.
- PhD Update 1: Directions
- PhD Update 2: The experiment, the data, and the supercomputers
- PhD Update 3: Simulating simulations with some success
- PhD Update 4: Ginormous Data
- PhD Update 5: Hyper optimisation and frustration
- PhD Update 6: The road ahead
- PhD Update 7: Just out of reach
- PhD Update 8: Eggs in Baskets
- PhD Update 9: Results?
As with the last post, none of the graphs here are finalised, and are work in progress. There ~~may~ probably are multiple nasty bugs that invalidate the results that I haven't found yet.
AAAI Doctoral Consortium 2022
The main thing I've done since last time is apply for the AAAI Doctoral Consortium 2022. As I understand it, this is a specific part of the main AAAI conference (Association for the Advancement of Artificial Intelligence) that is designed for PhD student researchers. To apply, you have to submit a cover page, your CV, and a 2 page thesis summary.
The cover page wasn't too bad, and updating my CV and getting it checked by the careers service at my university was simply a case of doing it. The thesis summary on the other hand was more of a challenge than I expected - it's quite difficult to summarise your entire 3 years of (planned) work in just 2 pages! It helped me to picture it as a high-level overview / conversation starter rather than a free-standing paper in it's own right - even if it looks like one.
Although this took longer than I thought to prepare, I did in fact get my submission together in time. I was glad that I left some extra time at the end though, as the rules were both very strict for what you can and can't do with your paper and unclear since they were written for the main AAAI conference. In addition I found at least 2 mistakes in the instructions and rules which appeared to be left over from previous years and hadn't been updated.
AI + HADR 2021
The other conference which I attempted to apply for at the suggestion of my supervisor is AI + HADR 2021. This stands for Artificial Intelligence for Humanitarian Assistance and Disaster Response, and it's virtual workshop that is happening in December 2021. The submission calls for a 6 page paper (5 for content, and 1 for references) on things related to disaster response.
My plan here was to write a paper on the social media work I've been doing and submit that, with the idea of talking about the existing sentiment analysis I've done (see update 9) and focusing on answering a research question using the sentiment analysis model I've implemented.
Unfortunately, things didn't go to plan as I ran out of time - even with my supervisor helping to write some of the parts of he paper. After finding a number of bugs in my data processing pipeline and failing to see any obvious trends in the sentiment analysis graphs I plotted, I realised that it was unlikely that I was going to be able to submit for it this time around.
The plan from here is to take some more time over it and answer my other research question instead, and tidy up and re-use our existing unfinished submission here for IJCNN 2022 (I think that's the right link) - the the submission deadline for which I think is going to be in January 2021.
While I've spent time writing submissions for various conferences and workshops, I have also been doing a bit of social media data analysis too.
To start with, I implemented a new endpoint for labelling tweets using a given saved model checkpoint. After using this to label the various datasets I've acquired (with the best transformer-based model checkpoint I have, which I think is ~78% accurate (got to double check everything to make sure I haven't mixed anything up), I then got to work plotting some graphs. To start with, I plotted a simple bar graph of the overall sentiment of the different datasets I've downloaded.
(Above: A bar graph showing the overall sentiment of some of the floods in my dataset.)
I'm not really sure what to make of this - I suspect context-specific information is required to fully interpret this. I couldn't find any reliable context-specific information on short notice for the AI + HADR paper, so I'm going to attempt to keep looking if I can find the time to do so. Asking someone form the energy and environment institute may be a good idea.
After this, I binned the tweets over time and used this to plot a combined graph showing both the tweet frequency and sentiment over time using Gnuplot.
(Above: A graph showing the frequency (blue) and sentiment (green and red) of tweets over time.)
Again here, I'm not sure what to make of this. Even with cropping out the long tail of people talking about the flooding even afterwards to make it easier to see the sentiment over time as the actual event occurred doesn't seem to help uncover any clear trends. It could be said that for Hurricane Iota that the sentiment got more positive over time at the beginning, but this does not really also hold true for Storm Christoph and others - and without context-specific information it's difficult to tell if there are any meaningful conclusions that can be drawn here.
To this end, after talking with my supervisor we've got some idea of things I'm going to try - so more on this in an upcoming PhD update blog post.
Finally, I've also had a discussion with my supervisor and when I'm ready to publish something on my social media work, I'm going to make the code behind it open source (probably either GPLv3 or MPL-2.0). If you're interested, the code for downloading tweets using Twitter's Academic API is already open source: https://www.npmjs.com/package/twitter-academic-downloader.
I haven't really done anything on the Temporal CNN since last time, but I wanted to make sure it wasn't left out of this post! It's definitely still on the cards - the plan is that once I've got this data analysis done and some meaningful social media results, I'm going to return the the Temporal CNN and put the plan I described in the last post into action.
Since last time I've mainly been writing and analysing social media data. While I didn't manage to apply to AI + HADR 2021, I did manage to submit for AAAI Doctoral Consortium 2022 - I'll find out if I've been accepted on the 15th October 2021.
Next up, I'm going to be working on answering my other research question first - more on this in a later blog post. If I have time, I'll put some effort into the Temporal CNN - though I doubt I'll have anything to show on that front next time. Finally, I'm going to be arranging my PhD Panel 4 (where did all the time go?) - hopefully for before the end of November, availability of those involved permitting.
If you are finding this series of blog posts on my PhD interesting, please do comment below. It's great to see that the stuff I'm working on for my PhD is actually interesting to someone.
Sources and further reading
- The Illustrated Transformer
- Attention is All you Need [paper]