Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

How to read a paper

So you've got a paper. Maybe even a few papers. Okay, it's a whole stack of them and you don't have the time to read them all (they do have a habit of multiplying when you're not looking). What is one to do? I've had this question asked of me a few times, so I thought I'd write up a quick post to answer it, organise my thoughts, and explain my personal process for sorting through and reading scientific papers (I generally find regular 'news'papers to be of questionable reliability, lacking depth, and to just not to be worth the effort).

A bunch of papers

(A bunch of papers I've read.... and one that I've written.)

Finding papers

If you are in a position where you don't have any papers to begin with, then search engines are your best friend. Just like DuckDuckGo, Ecosia, and others provide an interface to search the web, there are special search engines designed to search for scientific papers. The two main ones I suggest are:

Personally, semantic scholar is my paper search engine of choice. Enter some general search terms for the field / thing you want to read about, and relevant papers will be displayed. It can be useful to change the sort order from relevance to citation count or most influential papers to get a look at what are likely to be the seminal papers (i.e. the ones that first introduced a thing - e.g. like the Attention is all you need paper first introduced the transformer) in that field - though they may be less relevant.

The other nice feature these search engines have is copying out BibTeX to paste into your bibliography in LaTeX (see also the LaTeX templates I maintain for reports/papers/dissertations/theses)

A note on reliability: Papers on preprint servers like arXiv have not been peer reviewed. Avoid these unless there's no other option.

Sorting through them

So you've you know how to find papers now, but how do you actually read them? Personally, I use a tiered system to this.

Reading the abstract: Firstly, I'll read the abstract. Just like you read the title of a search result to decide whether you want to click on the search result, so do I read the abstract to decide whether a paper is worth my time to read it.

Sometimes I'll stop there. Maybe the paper isn't what I thought it was, or I've simply got all the information I need from it. The latter is most common when I'm writing some paper or report: often I'll need a paper as a reliable source for something, and I won't need to read the whole paper to know that it has the information I need.

Okay, so suppose a paper passes a quick look at the title and abstract, and I want to go deeper. You'd think it's time to jump right in and read it from top to bottom, but you'd be wrong. Reading an entire paper in detail is significantly time consuming, and I want to be really sure it's worth the effort before I commit to it.

Skim reading: The next test is a quick skim read. If it's a journal article, there might be some key contributions at the top of the paper - these are a good place to start. If not, then they can often be found at the end of the introduction - this also goes for conference papers as well. The introduction is usually my second stop (though remember I'm still not reading it word for word yet), followed by the end of the results/experimental discussion section to understand the key points of what they did and how that went for them.

AI summarisation Another option if a paper is dense and/or long is to use an AI summarisation tool. These must always be taken with a grain of salt, but can help to direct my search when I'm having difficulty extracting a specific piece of information. AI summarisation can also be a good start if an abstract is bad or missing the information I want but the subject itself is interesting. I often find AI-generated summaries can be quite generic, so it's not a complete solution.

A note on ChatGPT: ChatGPT is a generic language model, and as such isn't ideal for generating summaries of documents. It's best to use a model specifically trained for this purpose, and to take any output you get with a grain of salt.

AI document discussion: Occasionally the abstract of a paper suggests that it contains a significantly interesting nugget of information I'm interested in acquiring (again, most often when writing a paper rather than initial research), but the paper is long, dense, I'm having difficulty finding it, or some combination of the three.

This is where AI-driven document discussion can be invaluable. As I noted earlier, AI-generated summaries tend to be quite generic, so it's not great if there's something highly specific I'm after. The only place I'm currently aware of that ships this feature in a useful form is Kagi, a paid-for search engine with AI features (document summarisation and document discussion) built-in. I'm sure others have shipped the feature, but I haven't seen them yet.

Essentially, AI-driven document discussion is where you ask a natural language question about the target paper, and it does the reading comprehension for you by answering your question with useful quotes from the paper. Then once you have the answer you can go and look at that specific part of the paper (use your browser's find tool) to get additional context.

I've found this to be a great time saver. It can also be useful if I'm unsure if a paper actually talks about the thing I'm interested in or not.

Kagi: Specifically, Kagi (my current main search engine) implements both of the aforementioned features. They can be access via the Discuss document option next to search engines, or by dedicated !bangs (Kagi implements all of DuckDuckGo's !bangs too), which are significantly helpful as I touched on above.

  • AI summarisation: !sum <url_of_paper_or_webpage>
  • AI discuss document: !discuss <url_of_paper_or_webpage>

A disclaimer: I have received no money or other forms of compensation for mentioning Kagi here. Kagi have no asked me to mention them here at all, I just think their product is helpful, useful, produces good search results, and saves me time. AI models can be computationally expensive, so I speculate it would be difficult to find a free version without strings attached.

A screenshot of a sample discuss document discussion about the paper Attention is all you need.

(Above: A screenshot of a sample discuss document discussion about the paper Attention is all you need)

How to read a paper effectively

So a paper has somehow made it through all of those steps unscathed, and yet I still haven't extracted everything I want to know from it. By this point it must be a significantly interesting paper that I likely want lots of details from.

The process of actually reading a paper from top to bottom is an inherently time consuming one: hence all the other steps above to filter papers out with minimal effort before I commit to spending what is typically an hour or more of my time to a single paper.

My general advice is to do a re-read of the abstract to confirm, and then start with the introduction and make your way down. Take it slow.

Making notes: When I do read a paper, I always make notes when doing so. Having 2 monitors is also helpful, as I can make notes on 1 and have the paper on the other. My current tool of choice here is Obsidian, a fabulous open-source note taking system that I'll wholeheartedly recommend to everyone. It's Markdown-based and has a tagging system (nested tags are supported too!) to keep papers organised. The directed graph and canvas features are also pretty cool. My general template at the moment I use for making notes on papers is as follows:

---
tags: some, tags/here
---

> - URL: <https://example.com/paper_url_here/doi_if_possible.pdf>
> - Year: YEAR_PAPER_WAS_PUBLISHED

- Bulleted notes go here
    - I nest bullet points based on the topic
        - To as many levels as needed
    - These notes are very casual
- [I contain my own thoughts in square brackets]
    - This keeps the things that the paper says separate from the things that I think about it
- Sometimes if I'm making a lot of notes I'll split them up into sections derived from the paper


## PDF
The last section contains the PDF of the paper itself. Obsidian supports dragging and dropping PDFs in, and it also has a dedicated PDF viewer.

Complete with an explanation of what each section is for!

You don't have to use Obsidian (it's the best one I've found), but I strongly recommend making notes while you read a paper. This way you have some distilled notes in your own words to refer back to later. It also helps to further your own understanding of the topic of a paper by putting it into your own words. Other tools I'm aware of include OneNote and QOwnnotes (I still use this for making notes in meetings and recording random stuff that's not necessarily related to research. I keep Obsidian quite focused atm).

Make sure these notes are digital. You'll thank me later. The number of times I've used Obsidian's search function to find the notes I made about a specific paper is absolutely unreal. Over time you'll get a good sense for what you need to make notes on, to avoid both having to refer back to the paper again later and having so many notes that it takes longer than hunting around in the source paper for the information you were after.

A screenshot of my obsidian workspace.

(Above: A screenshot of my Obsidian workspace.)

Sometimes your research project will change direction, and the notes you made are suddenly less relevant. Or you've learned something elsewhere and now come back with fresh and more experienced eyes. I often update the notes I took initially to add more information, or references to other related papers that go together.

Continual evaluation: As I read, I'm continually evaluating in the back of my mind whether it's worth continuing to read. I'm asking questions like "is this paper going on a tangent?", and "is the solution to their problem the researchers employed actually interesting to me?", and "is this paper getting too dense for me to understand?", and "is the explanation the paper gives actually intelligible?" (yes, papers do vary in explanatory quality). If the exercise of reading a paper becomes not worth the time, stop reading it and move on.

Sometimes it's worth jumping into skim-reading mode for a bit if something's irrelevant etc to see if it gets better.

But I don't understand something!

This is a normal part of reading a paper. This can be for a number of reasons:

  1. The paper is bad
  2. The paper is good, but is terrible at explaining things
  3. The paper contains more maths than explanation of the variables contained therein
  4. I'm lacking some prerequisite knowledge that the paper doesn't properly explain
  5. Some other issue

It is not always obvious which of these cases I find myself in when I encounter difficulty reading a paper. Nevertheless, I employ a number of strategies to deal with the situation:

  • Reading around: As in most things, reading around the area of the paper that is causing and issue may yield additional information. Sometimes returning to the related works / background / approach section can help.
  • Search for related papers: There are many papers that have been written, so it can be worth going looking for a related paper. It might be a better paper or worded differently that makes it easier to understand.
  • Look through the paper's references: This can also be a good way to trace back to the source of an idea. Semantic scholar's References tab below the abstract lists all the references too, and the related works section of a paper will tell you how each cited work is relevant to the problem, motivations, and subsequent method and results thereof.
  • Look for seminal papers: See above. Finding the original paper on a given idea can help a lot, as it's often explained in much more detail than later papers that assume you've read the so-called seminal work.
  • Web search: For specific terms or concepts. Sometimes just a quick definition is needed. Other times it's more substantial and requires reading an entire separate blog post - compare Attention is all you need with the blog post the illustrated transformer. Each provides a different perspective. In this case I actually read both at the same time to fully understand the topic. Make sure you properly assess anything you find for reliability as usual.

Supervision: It's very unlikely that after all of these steps I'll still be stumped on how to proceed, but it has happened. In these situation it can be extremely helpful to have someone more experienced in the field to discuss with. For me, this is my PhD supervisor Nina.

Whoever they are, keeping in regular contact is best as you work through a project. Frequency varies, but for my PhD supervision this has fallen somewhere between 1 week and 3 weeks between each meeting, and each meeting is no less than an hour long. Their advice and insight can guide your efforts as you progress through a research project.

They will also likely be busy people, so make sure you properly prepare before meeting them. Summarise what you've read and how it relates to your project and what you want to do. Make a list of questions that you want to ask them. Gather your thoughts. This will help you make the most of your discussion with them.

Conclusion

I've outlined my personal process I employ when reading a paper (in perhaps more detail than was necessary). It's designed to save me time and allow me to cover ground relatively quickly (though quickly is still a relative term, as in a worst-case with a completely new broad field it can take weeks to cover it enough to gain a good understanding thereof).

This is my process: you need to find something that works for you. It's okay if this takes time. Maybe lots of time... but you'll get there in the end. The more you read, the more you'll get an instinctive sense of the stuff I ramble about here. My method isn't perfect either - I'm still learning, so my process will likely evolve over time.

If you've got any comments or questions, do leave them in the comments section below and I'll do my best to answer them.

PhD Update 16: Realising the possibilities of the past

Hey there! It's been a while. As I explained in a previous post, I've been adjusting to a new part-time position in my department! In short, posts will continue on here by may be slightly less frequent than before.

Before we begin, here is the customary list of previous posts in this series:

A lot has happened since last time! I'm going to split this up into sections as I have in previous posts, but in summary the noodling around I've done since the last post has really paid off.

Publication

After another round of revisions, my journal article on my social media research has now been accepted and published!

View it here:

https://doi.org/10.1016/j.cageo.2023.105405

It's my first published journal article, so I am quite excited about it :D

If I haven't already (I'm writing this post first, as I only got notified about its publication while I was writing this post!), I'll definitely be making another post about it here!

Rainfall Radar

The main thing I've focused on a lot is my rainfall radar model, and beating it into some kind fo shape that actually works. This has not been a simple process, but I think that this graph speaks for itself:

It works! I can scarcely believe that after nearly 3 and a half years it finally produces a useful output. This feels like a big personal achievement - as those who have been following this series will know, I have tried many different things before reaching this point.

The first question I know will be on your mind is "How did we get here?", and the answer to this lies in 2 things:

  1. Connectedness
  2. Resolution and boundary difficulties

Let's tackle connectedness first. By connectedness, I mean specifically parts of a given model connecting back on themselves. This is important for multiple reasons, not least because it reduces the effect of the vanishing gradient problem. This is also the reason that ResNet adds skip-connections, as then the gradient weight updated used when the model is backpropagating can flow all the way up to the top of the model. Without this, the weights in the initial layers can't update, and information is then lost before it even makes it very far into the model.

From what I can tell, this is the primary problem with the models that I have tried so far, in one way or another. Autoencoders, for example, do not have very much connectedness, making it difficult for backpropagation to do its thing.... especially when the task at hand is a significantly difficult one.

The solution I ended up employing here was DeepLabV3+. It uses an image encoder at first, and then a PSPNet-style pyramid scheme for analysing multiple scales of features, and then finally an image segmentation head. It also has a skip connection between halfway up the image encoder and the segmentation head too, further increasing the connectedness of the model.

Once I had something from the initial DeepLabV3+ model, improving the output was a matter of increasing the resolution of the output and adjusting things so that the model can better resolve the boundaries of the water / no water regions I was asking it to predict.

To this end, with this tweaking I describe DeepLabV3+ has turned out to be the ideal model for the task. The changes and hyperparameters I used can be summarised like so:

  • Loss function: Add dice loss to cross-entropy loss.
  • Learning rate: Reduce to 0.00001 from the default 0.001
  • Upscaling: Hack the model to upscale the input/outputs. This increases the resolution the model operates at, improving performance significantly
  • Removing isolated pixels: I removed water pixels with no neighbour pixels being water. This is like a band-aid on the real problem (a bad physics-based model run), but it did help.

With this working models, I can now consider the other avenues I was exploring in part 15 of this series reasonably as dead ends, though I have learned a lot by investigating them.

I consider this model to be a proof of concept only. The idea needs a lot more adjusting and improving before it will be actually useful to anyone. Still, it might be improve short-medium term (i.e. ~up to a few hours in advance) flooding forecasts with some a lot more work I think. While my focus currently is writing up my thesis (see below), I do plan on continuing to work on this and other research projects on the side on a long-term basis. One of my many goals is to wrangle this model into something more than just a proof of concept, and somehow measure it's effectiveness more precisely.

The long road to thesis and beyond

With my funding for the main research period of my PhD at an end, my focus has been shifting to the writing of my thesis. The feeling is actually quite surreal - for the longest time writing my thesis has been a mystical objective far off in the distance in a blurry haze, but now the details are very much resolving into something more tangible.

So far I have a draft chapter based on my recent journal article, and part of a chapter on the rainfall radar model I've talked about briefly above. I also have part of an introduction and a background sections, but these require significant reworking because I wrote them ages ago (they are just bad).

The plan is to have my thesis complete by December 2023, potentially giving the required 3 months submission notice in ~September 2023 - depending on how things go with writing.

When my PhD comes to a close, that will also mean the end of this series of blog posts. I think this is the longest series of blog posts I've ever posted here, and certainly one of the most personal. This does not mean the end of posts on here about my research though, as I plan to continue blogging about it on here. The form this will take is likely to be similar to the form that the posts for my PhD have taken.

As I don't currently know what form my research will take after my PhD, I cannot say what will happen about blogging about it, only that it will happen ;-)

Thanks for sticking with me throughout this long and at times difficult process - it's been a wonderful and wild ride! Even as this part of my journey is beginning to come to a close, I really appreciate all the help and support everyone has given me throughout the process.

I'll try my best to keep up with this series again, now that I've had some time to adjust to being experimental officer. Until next time!

Achievement get: Experimental Officer position!

Hey there, everyone! It's been a while since I last posted here. Rest assured that I haven't abandoned this blog.

What I have been doing is adjusting to a new job! I haven't talked about it yet because this adjustment process takes time, and I needed space to do this process at my own pace. While I'm still adjusting and will be for some time, I'm now at a point where I feel comfortable to share with everyone what I've been up to.

My job title is Experimental Officer in the Department of Computer Science at the University of Hull - the same place as I've been doing my PhD (as is listed on the homepage of this site). I started this in January 2023.

This is an academic position, and it consists of academic teaching support (ie supporting lecturers in the delivery of their course content) combined with systems administration and managing the use of specialist equipment.

This role feels ideal for me at this time, as its a mix of academicy teaching support stuff and a technical role. It also has some teaching on the side, which is something I have very limited experience with, so it's a great chance to learn.

I'm doing this role part time at the moment while I finish my PhD.

In practical effects for this blog, it means that my posting frequency will be somewhat lower than it has been in times past, as you may have been noticing in the months leading up to my three month break. I'm going to personally aim for a blog post every 2 weeks, but it might be longer or shorter than that depending on how much energy I have to write posts and whether there's something I really want to talk about without waiting.

Given that I've been posting here since June 2014 (~9 years!), this blog is important to me, and so I thought it would be fitting to let you who read this blog know first about this news. I'll also continue to document my journey through the world of computer science into the future.

This will include a continuing to blog about my research post-PhD (no, I still haven't had enough of it :P) in some sort of sequel series to my PhD update blog posts that I've been writing. It will also include the random blog posts you've surely come to expect from me about neat things I've discovered and interesting things I've done.

Another milestone I am about to hit soon is my first published journal article! It has been accepted for publication, so I'm currently working through that process. The title will be "Real-time social media sentiment analysis for rapid impact assessment of floods", and will definitely be posting here as soon as it's published.

I'd like to thank everyone who has supported me in this journey so far up until this point. I really appreciate it!

I hope you'll continue to stick around here with me as I move forwards into this new era!

PhD Update 15: Finding what works when

Hey there! Sorry this post is a bit late again: I've been unwell.

In the last post, I revisited my rainfall radar model, and shared how I had switched over to using .tfrecord files to store my data, and the speed boost to training I got from doing that. I also took an initial look at applying contrastive learning to my rainfall radar problem. Finally, I looks a bit into dimensionality reduction algorithms - in short: use UMAP (paper).

Before we continue, here's the traditional list of previous posts:

In addition, there's also an additional intermediate post about my PhD entitled "A snapshot into my PhD: Rainfall radar model debugging", which I posted since the last PhD update blog post. If you're interested in details of the process I go through stumbling around in the dark doing research on my PhD, do give it a read:

https://starbeamrainbowlabs.com/blog/article.php?article=posts/519-phd-snapshot.html

Since last time, I have been noodling around with the rainfall radar dataset and image segmentation models to see if I can find something that works. The results are mixed, but the reasons for this are somewhat subtle, so I'll explain those below.

Social media journal article

Last time I mentioned that I had written up my work with social media sentiment analysis models, but I had yet to finalise it to send to be published. This process is now completed, and it's currently under review! It's unlikely I'll have anything more to share on this for a good 3-6 months, but know that it's a process happening in the background. The journal I've submitted it to is Elsevier's Computers and Geosciences, though of course since it's under review I don't yet know if they will accept it or not.

Image segmentation, sort of

It doesn't feel like I've done much since last time, but looking back I've done a lot, so I'll summarise what I've been doing here. Essentially, the bulk of my work has been into different image segmentation models and strategies to see what works and what doesn't.

The specific difficulty here is that while I'm modelling my task of going from rainfall radar data (plus heightmap) to water depth in 2 dimensions as an image segmentation task, it's not exactly image segmentation, in that the output is significantly different in nature to the input I'm feeding the model, which complicates matters as this significantly increases the difficulty of the learning task I'm attempting to get the model to work on.

As a consequence of this, it is not obvious which model architecture to try first, or which ones will perform well or not, so I've been trying a variety of different approaches to see what works and what doesn't. My rough plan of model architectures to try is as follows:

  1. Split: contrastive pretraining
  2. Mono / autoencoder: encoder [ConvNeXt] → decoder [same as #1]
  3. Different loss functions:
    • Categorical Crossentropy
    • Binary Crossentropy
    • Dice
  4. DeepLabV3+ (almost finished)
  5. Encoder-only [ConvNeXt/ResNet/maybe Swin Transformer] (pending)

Out of all of these approaches, I'm almost done with DeepLabV3+ (#4), and #5 (encoder-only) is the backup plan.

My initial hypothesis was that a more connected image segmentation model such as the pre-existing PSPNet, DeepLabV3+, etc would not be a suitable choice for this task, since regular image segmentation models such as these place emphasis on the output being proportional to the input. Hence, I theorised that an autoencoder-style model would be the best place to start - especially so since I've fiddled around with an autoencoder before, albeit for a trivial problem.

However, I discovered with approaches #1 and #2 that autoencoder-style models with this task have a tendency to get 'lost amongst the weeds', and ignore the minority class:

To remedy this, I attempted to use a different loss function called Dice, but this did not help the situation (see the intermediary A snapshot into my PhD post for details).

I ended up cutting the contrastive pretraining temporarily (#1), as it added additional complexity to the model that made it difficult to debug. In the future, when the model actually works, I intend to revisit the idea of contrastive pretraining to see if I can boost the performance of the working model at all.

If there's one thing that doing a PhD teaches you, it's to keep on going in the face of failure. I should know: my PhD has been full of failed attempts. I saying I found online (I forget who said it, unfortunately) definitely rings true here: "The difference between the novice and the master is that the master has failed more times than the novice has tried"

In the spirit of this, this brings us to the next step of proving (or disproving) that this task is possible, which is to try a pre-existing image segmentation model to see what happens. After some research (against, see the intermediary A snapshot into my PhD post for details), I discovered that DeepLabV3+ is the current state of the art for image segmentation.

After verifying that DeepLabV3+ actually works with it's intended dataset, I've now just finished adapting it to take my rainfall radar (plus heightmap) dataset as an input instead. It's currently training as I write this post, so I'll definitely have some results for next time.

The plan from here depends on the performance of DeepLabV3+. Should it work, then I'm going to first post an excited social media post, and then secondly try adding an attention layer to further increase performance (if I have time). CBAM will probably be my choice of attention mechanism here - inspired by this paper.

If DeepLabV3+ doesn't work, then I'm going to go with my backup plan (#5), and quickly try training a classification-style model that takes a given area around a central point, and predicts water / no water for the pixel in the centre. Ideally, I would train this with a large batch size, as this will significantly boost the speed at which the model can make predictions after training. In terms of the image encoder, I'll probably use ConvNeXt and at least one other image encoder for comparison - probably ResNet - just in case there's a bug in the ConvNeXt implementation I have (I'm completely paranoid haha).

Ideally I want to get a basic grasp on a model that works soon though, and leave too much of the noodling around with improving performance until later, as if at all possible it would be very cool to attend IJCAI 2023. At this point it feels unlikely I'll be able to scrape something together for submitting to the main conference (the deadline for full papers is 18th January 2023, abstracts by 11th January 2023), but submitting to an IJCAI 2023 workshop is definitely achievable I think - they usually open later on.

Long-term plans

Looking into the future, my formal (and funded) PhD research period is coming to an end this month, so I will be taking on some half time work alongside my PhD - I may publish some details of what this entails at a later time. This does not mean that I will be stopping my PhD, just doing some related (and paid) work on the side as I finish up.

Hopefully, in 6 months time I will have cracked this rainfall radar model and be a good way into writing my thesis.

Conclusion

Although I've ended up doing things a bit back-to-front on this rainfall radar model (doing DeepLabV3+ first would have been a bright idea), I've been trying a selection of different model architectures and image segmentation models with my rainfall radar (plus heightmap) to water depth problem to see which ones work and which ones don't. While I'm still in the process of testing these different approaches, it will not take long for me to finish this process.

Between now and the next post in this series in 2 months time, I plan to finish trying DeepLabV3+, and then try an encoder-only (image classification) style model should that not work out. I'm also going to pay particularly close attention to the order of my dimensions and how I crop them, as I found yesterday that I mixed up the order of the width and height dimensions, feeding one of the models I've tested data in the form [batch_size, width, height, channels] instead of [batch_size, height, width, channels] as you're supposed to.

If I can possibly manage it I'm going to begin the process of writing up my thesis by writing a paper for IJCAI 2023, because it would be very cool to get the chance to go to a real conference in person for the first time.

Finalyl, if anyone knows of any good resources on considerations in the design of image segmentation heads for AI models, I'm very interested. Please do leave comment below.

A snapshot into my PhD: Rainfall radar model debugging

Hello again!

The weather is cold over here right now, and it's also been a while since I posted about my PhD in some detail, so I thought while I get my thoughts in order for a new PhD update blog post I'd give you a snapshot into what I've been doing in the last 2 weeks.

If you're not interested in nitty gritty details, I'll be posting a higher-level summary soon in my next PhD update blog post.

For context, since wrapping up (for now, more on this in the PhD update blog post) the social media side of my PhD, I've returned to the rainfall radar half of my PhD and implementing and debugging several AI models to predict water depth in real time. If you're thinking of doing a PhD yourself, this is in no way representative of what a PhD is like! Each PHD is different - mine just happens to include lots of banging my head against a wall debugging.

To start with, recently I've found and fixed a nasty bug in the thresholding function, which defaulted to a value of 1.5 instead of 0.1. My data is stored in .tfrecord files with pairs of rainfall radar and water depth 'images'. When the model reads these in, it will 'threshold' the water depth: for each pixel setting a value of 0 for pixels with water depth lower than a given threshold, and 1 for pixels above.

The bug in question manifested itself as an accuracy of 99%/100%, which is extremely unlikely given the nature of the task I'm asking it to predict. After some extensive debugging (including implementing a custom loss function that wrapped several different other loss functions, though not all at the same time), I found that the default value for the threshold was 1.5 (metres) instead of what it should have been - 0.1 (again, metres).

After fixing this, the accuracy lowered to 83% - the proportion of pixels in the input that were not water.

The model in question predicts water depth in 2D, taking in rainfall radar data (also in 2D) as an input. It uses ConvNeXt as an encoder, and an inverted ConvNeXt as the decoder. For the truly curious, the structure of this model as of the end of this section of the post can be found in the summary.txt file here.

Although I'd fixed the bug, I still had a long way to go. An accuracy of 83% is, in my case, no better than random guessing..... unfortunately completely ignoring the minority class.

In an attempt to get it to stop ignoring the minority class, I tried (in no particular order):

  • Increasing the learning rate to 0.1
  • Summing the output of the loss function instead of doing tf.math.reduce_sum(loss_output) / batch_size, as is the default
  • Multiplying the rainfall radar values by 100 (standard deviation of the rainfall radar data is in the order fo magnitude of 0.01, mean 0.0035892173182219267, min 0, max 1)
  • Adding an extra input channel with the heightmap (cross-entropy, loss: 0.583, training accuracy: 0.719)
  • Using the dice loss function instead of one-hot categorical cross-entropy
  • Removing the activation function (GeLU in my case) from the last few layers of the model (helpful, but I tried this with dice and not cross entropy loss, and switching back would be a bit of a pain)

Unfortunately, none of these things fixed the underlying issue of the model not learning anything.

Dice loss was an interesting case - I have some Cool Graphs to show on this:

Here, I compare removing the activation function (GeLU) from the last few layers (ref) with not removing the activation function from the last few layers. Clearly, removing it helps significantly, as the loss actually has a tendency to go in a downward direction instead of rocketing sky high.

This shows the accuracy and validation accuracy for the model without the activation function the last few layers. Unfortunately, the Dice loss function has some way to go before it can compete with cross-entropy. I speculate that while dice is cool, it isn't as useful on it's own in this scenario.

It would be cool to compare having no activation function in the last few layers and using cross-entropy loss to my previous attempts, but I'm unsure if I'll have time to noodle around with that.

In terms of where I got the idea to use the dice loss function from, it's from this GitHub repo and its associated paper: https://github.com/shruti-jadon/Semantic-Segmentation-Loss-Functions.

It has a nice summary of loss functions for image segmentation and their uses / effects. If/when DeepLabV3+ actually works (see below) and I have some time, I might return to this to see if I can extract a few more percentage points of accuracy from whatever model I end up with.

DeepLabV3+

Simultaneously with the above, I've been reading into existing image segmentation models. Up until now, my hypothesis has been that a model well connected with skip connections, such as U-Net, would not be ideal in this situation, as the input (rainfall radar) is so drastically different from the output (water depth) it would not be ideal to have a model with skip connections, as they encourage the output to be more similar to the input, which is not really what I want.

Now, however, I am (finally) going to test this (long running) hypothesis to see if it's really true. To do this, I needed to find the existing state-of-the-art image segmentation model. To summary long hours of reading, I found the following models:

  • SegNet (bad)
  • FCN (bog standard, also bad, maybe this paper)
  • U-Net (heard of this before)
  • PSPNet (like a pyramid structure, was state of the art but got beaten recently)
  • DeepLabV3 (PSPNet but not quite as good)
  • DeepLabV3+ (terrible name to search for but the current state of the art, beats PSPNet)

To this end, I've found myself a DeepLabV3+ implementation on keras.io (the code is terrible and so full of spaghetti I could eat it for breakfast and still have some left over) and I've tested it with the provided dataset, which seems to work fine:

....though there seems to be a bug in the graph plotting code, in that it doesn't clear the last line plotted.

Not sure if I can share the actual segmentations that it produces, but I can say that while they are a bit rough around the edges, it seems to work fine.

The quality of the segmentation is somewhat lacking given the training data only consisted of ~1k images and it was only trained for ~25 epochs. It has ~11 million parameters or so. I'm confident that more epochs and more data would improve things, which is good enough for me so my next immediate task will be to push my own data though it and see what happens.

I hypothesise that models like DeepLabV3+ etc also bring a crucial benefit to the table: training stability. Given the model's more interconnectedness with skip connections etc, backpropagation needs to travel overall less far to cover the entire model and update the weights.

If you're interested, a full summary of this DeepLabV3+ model can be found here: https://starbeamrainbowlabs.com/blog/images/20221215-DeeplabV3+_summary.txt

If it does work, I've implemented recently a cool attention mechanism called Convolutional Block Attention Module (CBAM), which looks seriously cool. I'd like to try adding it to the DeepLabV3+ model to see if it increases the accuracy of the output.

Finally, a backup plan is in order in case it doesn't work. My plan is to convolve over the input rainfall radar data and make a prediction for a single water depth pixel at a time, using ConvNeXt as an image encoder backbone (though I may do tests with other backbones such as its older cousin ResNet simultaneously just in case, see also my post on image encoders), keeping the current structure of 7 channels rainfall radar + 1 channel heightmap.

While this wouldn't be ideal (given you'd need to push though multiple batches just to get a single 2D prediction), the model to make such predictions would be simpler and more likely to work right off the bat.

Conclusion

I've talked a bunch about my process and thoughts on debugging my rainfall radar to water depth model and trying to get it to work. Taking a single approach at a time to problems like this isn't usually the best idea, so I'm also trying something completely new in DeepLabV3+ to see if it will work.

I also have a backup plan in a more traditional image encoder-style model that will predict a single pixel at a time. As I mentioned at the beginning of this blog post, every PhD is different, so this is not representative of what you'd be doing on yours if you decide to do one / are doing one / have done one. If you are thinking of doing a PhD, please do get in touch if you're interested in hearing more about my experiences doing one and what you could expect.

PhD Update 14: An old enemy

Hello again! This post is rather late due to one thing and another, but I've finally gotten around to writing it. In the last post, I talked about the CLIP model I trained to predict sentiment using both twitter and their associated images in pairs, and the augmentation system I devised to increase the size of the dataset. I also talked about the plan for a next-generation rainfall radar model, and a journal article I'm writing.

Before we begin though, let's start with the customary list of previous posts:

Since that last post, I've pretty much finished my initial draft of the journal article - though it is rather overlength, and I've also made a significant start on the rainfall radar model, which is what I will be focusing on in this blog post as there isn't all that much to talk about with the journal article at the moment (I'm unsure how much I'm allowed to share). I will make a separate post when I (finally) publish the journal article.

Rainfall radar model, revisited

As you might remember, I have dealt with rainfall radar data before (exhibit A, B, C, D), and it didn't go too well. After the part of my PhD on social media, I have learnt a lot about AI models and how to build them. I have also learnt a lot about data preprocessing. With all this in hand, I am now better equipped to do battle once more with an old enemy: the 1.5M time step rainfall radar dataset.

For those who are somewhat confused, the dataset in question is in 2 dimensions (i.e. like greyscale images). It is comprised of 3 things:

  • A heightmap
  • Rainfall radar data every 5 minutes
  • Water depth information, calculated by HAIL-CAESAR and binarised to water / no water for each pixel with a simple threshold

Given that the rainfall radar dataset has an extremely restrictive licence, I am unfortunately unable to share sample images from the dataset here.

My first objective was to tame the beast. To do this, I needed to convert the data to .tfrecord.gz files (applying all the preprocessing transformations ahead of time) instead of the split .asc.stream.gz and .jsonl.gz files I was using. At first, I thought I could use a TextLineDataset (it even supports reading from gzipped files!), but the snag here is that Tensorflow does not have a JSON parsing function.

The reason this is a problem is due to the new way I am parsing my dataset. Before, I used tf.data.Dataset.from_generator() and a regular Python function, but I have since discovered that there is a much more efficient way of doing things. The key revelation here was that Tensorflow does not just simply execute e.g. your custom layers you implement and call .call() each time. No, instead it calls it once and constructs a graph of operations, before then compiling this into machine code that the GPU can understand. The implication of this is twofold:

  1. It is significantly more efficient to take advantage of Tensorflow's execution graph functionality where available
  2. Once your (any part of) dataset becomes a Tensor, it must stay a Tensor

This not only goes for custom layers, loss functions, etc, but it also goes for the dataset pipeline too! I strongly recommend using the .map() function on tf.data.Dataset with a tf.function. Avoid .from_generation() if you can possibly help it!

To take advantage of this, I needed to convert my dataset to a set of .tfrecord.gz files (to support parallel reading, esp. since Viper has a high read latency). Given my code to parse my dataset is in Javascript/Node.js, I first tried using the tfrecord npm package to write .tfrecord files in Javascript directly. This did not work out though, as it kept crashing. I also tried variant packages like tfrecords and tfrecord-stream and more, but none of them worked. In the end, I settled on a multi-step process:

  1. Convert split data into .jsonl.gz files, 4K records per file. Do all preprocessing / correction steps here.
  2. Make all records unique: hash all records in all files, mark records for deletion, then delete them from files
  3. Recompress .jsonl.gz files to 4K records per file
  4. Convert .jsonl.gz.tfrecord.gz with Python child processes managed by Node.js

Overcomplicated? Perhaps. Do I have a single command I can execute to do all of this? Nope! Does it work? Absolutely :P

With the data converted I turned my attention to the model itself. As I have discussed previously, my current hypothesis is that the previous models failed because the relationship between the rainfall radar and water depth data is non-obvious (and that the model designs were terrible. 5K parameters? hahahaha, 5M parameters is probably the absolute minimum I would need). To this end, I will be first training a contrastive learning model to find relationships between the dataset items. Only then will I train a model to predict water depth, which I'll model as an image segmentation task (I have yet to find a segmentation decoder to implement, so suggestions here are welcome).

The first step here is to implement the contrastive learning algorithm. This is non-trivial however, so I implemented a test model using images from Reddit (r/cats, r/fish, and r/dogs) to test it and test the visualisations that I will require to determine the effectiveness of the model. In doing this, I found that the algorithm for contrastive learning in the CLIP paper (Learning Transferable Visual Models From Natural Language Supervision) was wrong and completely different to that which is described in the code, and I couldn't find the training loop or core loss function at all - so I had to piece together something from a variety of different sources.

To visualise the model, I needed a new approach. While the loss function value over time plotted on a graph is useful, it's difficult to tell if the resulting embedded representation the model outputs is actually doing what it is supposed to. There Reading online, there are 2 ways of visualising embedding representations I've found:

  1. Dimensionality reduction
  2. Parallel coordinates plot

I can even include here a cool plot that demonstrates both of them with the pretrained CLIP model I used in the social media half of my project:

The second one is the easier to explain so I'll start with that. If you imagine that the output of the model is of shape [ batch_size, embedding_dim ] / [ 64, 200 ], then for every record in the dataset we can plot a line across a set of vertical lines, where each vertical line stands for each successive point in the dataset. This is what I have done in the plot on the right there.

The plot on the left uses the UMAP dimensionality reduction algorithm (paper), which to my knowledge is the best dimensionality reduction algorithm out there at the moment. For the uninitiated, a dimensionality reduction algorithm takes a vector with many dimensions - such one with an embedding dimension of size 200 - and converts it into a lower-dimensional value (e.g. in 2 or 3 dimensions most commonly) so that it can be plotted and visualised. This is particularly helpful in AI when you want to check if your model is actually doing what you expect.

I took some time to look into this, as there are a number of other algorithms out there and it seems like it's far too easy to pick the wrong one for the task. In short, there are 3 different algorithms you'll see most often:

  • PCA: Stands for Principled Component Analysis, and while popular it does not support non-linear transformations, which is most AI models.
  • tSNE: A non-linear alternative (designed for AI applications, in part) that is also rather popular. It does not preserve the global structure of the dataset (i.e. relationships and distances between different values) very well though.
  • UMAP: Stands for Uniform Manifold Approximation and Projection. It is designed as an alternative to tSNE and preserves global structure much better.

Sources for this are at the end of this post. If you're applying PCA or tSNE for dimensionality reduction in an AI context, consider switching it out to UMAP.

In the plot above, it is obvious that the pretrained CLIP model can differentiate between the 3 types of pet that I gave it as a test dataset. The next step was to train a model with the contrastive learning and the test dataset.

To do this, I needed an encoder. In the test, I used ResNetV2, which is apparently an improved version of the ResNet architecture (I have yet to read the paper on it). Since I implemented it though, I discovered an implementation of the state-of-the-art image encoder ConvNeXt (paper) that I discovered recently, so I'm using that in the main model. See my recent post on my image captioning project for more details on image encoders, but in short to the best of my knowledge ConvNeXt is the current state of the art.

Any, when I plot the output of this model it gave me this plot:

I notice a few issues with this. Firstly and most obviously, the points are all jumbled up! It has not learnt the difference between cats, fish, and dogs. I suspect this is because the input to the test model I trained got 2 variants of the same image altered randomly in different ways (flipping, hue change, etc) rather than an image and a textual label. I'm not too worried though, 'cause the real model will have 2 different items as inputs - I was avoiding doing extra work here.

Secondly, the parallel coordinates plot does not show a whole lot of variance between the different items. This is more worrying, but I'm again hoping that this issue will fix itself when I give the model 'real pairs' of rainfall radar <-> water depth images (with the heightmap thrown in there somewhere probably, I haven't decided yet).

Finally, I plotted a UMAP graph with completely random points to ensure it represented them properly:

As you can see, it plots them in a roughly spherical shape with no clear form or separation between the points. I'm glad I did this, because at first I was passing the labels to the UMAP plotter in the wrong way, and it instead artificially moved the points into groups.

With the test model done, I have moved swiftly on to (pretraining) actual model itself. This is currently underway so I don't have anything to show just yet (it is still training and I have yet to implement code to plot the output), but I can say that thanks to my realisations in Tensorflow graph execution as tensors, I'm seeing a GPU utilisation of 95% and above at all times :D

Conclusion

I've got a journal article written, but it's overlength so my job there isn't quite done just yet. When it is published, I will definitely make a dedicated post here!

Now, I have moved from writing to implementing a new model to tackle the rainfall radar part of my project. By using contrastive learning, I hope to enable the model to learn the relationship between the rainfall radar data and the water depth information. Once I've trained a contrastive learning model, I'll attach and train another model for image segmentation to predict the water depth information.

If you know of any state-of-the-art image segmentation decoder AI architectures, please leave a comment below. Bonus points if I can configure it to have >= 5M parameters without running out of memory. I'm currently very unsure what I'm going to choose.

Additionally, if you have any suggestions for additional tests I can do to verify my contrastive learning model is actually learning something, please leave a comment below also. The difficulty ist hat the while the loss value goes down, it's extremely difficult to tell whether what it's learning is actually sensible or not.

PhD Aside 2: Jupyter Lab / Notebook First Impressions

Hello there! I'm back with another PhD Aside blog post. In the last one, I devised an extremely complicated and ultimately pointless mechanism by which multiple Node.js processes can read from the same file handle at the same time. This post hopefully won't be quite as useless, as it's a cross with the other reviews / first impressions posts I've made previously.

I've had Jupyter on my radar for ages, but it's only very recently that I've actually given it a try. Despite being almost impossible to spell (though it does appear to be getting easier with time), both it's easy to install and extremely useful when plotting visualisations, so I wanted to talk about it here.

I tried Jupyter Lab, which is apparently more complicated than Jupyter Notebook. Personally though I'm not sure I see much of a difference, aside from a file manager sidebar in Jupyter Lab that is rather useful.

A Jupyter Lab session of mine, in which I was visualising embeddings from a pretrained CLIP model.

(Above: A Jupyter Lab session of mine, in which I was visualising embeddings from a pretrained CLIP model.)

Jupyter Lab is installed via pip (pip3 for apt-based systems): https://jupyter.org/install. Once installed, you can start a server with jupyter-lab in a terminal (or command line), and then it will automatically open a new tab in your browser that points to the server instance (http://localhost:8888/ by default).

Then, you can open 1 or more Jupyter Notebooks, which seem to be regular files (e.g. Javascript, Python, and more) but are split into 'cells', which can be run independently of one another. While these cells are usually run in order, there's nothing to say that you can't run them out of order, or indeed the same cell over and over again as you prototype a graph.

The output of each cell is displayed directly below it. Be that a console.log()/print() call or a graph visualisation (see the screenshot above), it seems to work just fine. It also saves the output of a cell to disk alongside the code in the Jupyter Notebook, can be a double-edged sword: On the one hand, it's very useful to have the plot and other output be displayed to remind you what you were working on, but on the other hand if the output somehow contains sensitive data, then you need to remember to clear it before saving & committing to git each time, which is a hassle. Similarly, every time the output changes the notebook file on disk also changes, which can result in unnecessary extra changes committed to git if you're not careful.

In the same vein, I have yet to find a way to define a variable in a notebook file whose value is not saved along with the notebook file, which I'd rather like since the e.g. tweets I work with for the social media side of my PhD are considered sensitive information, and so I don't want to commit them to a git repository which will no doubt end up open-source.

You can also import functions and classes from other files. Personally, I see Jupyter notebooks to be most useful when used in conjunction with an existing codebase: while you can put absolutely everything in your Jupyter notebook, I wouldn't recommend it as you'll end up with spaghetti code that's hard to understand or maintain - just like you would in a regular codebase in any other language.

Likewise, I wouldn't recommend implementing an AI model in a Jupyter notebook directly. While you can, it makes it complicated to train it on a headless server - which you'll likely want to do if you want to train a model at any scale.

The other minor annoyance is that by using Jupyter you end up forfeiting thee code intelligence of e.g. Atom or Visual Studio Code, which is a shame since a good editor can e.g. check syntax on the fly, inform you of unused variables, provide autocomplete, etc.

These issues aside, Jupyter is a great fit for plotting visualisations due to the very short improve → rerun → inspect/evaluate output loop. It's also a good fit for writing tutorials I suspect, as it apparently has support for markdown cells too. At some point, I may try writing a tutorial in Jupyter notebook, rendering it to regular markdown, and posting it here.

PhD Update 13: A half complete

*...almost! In the last post, I talked about the AAAI-22 doctoral consortium, the sentiment analysis models I've implemented, and finally LDA topic analysis. Before we continue to what I've been doing since then, here's a list of all the posts in this series so far:

As always, you can follow all my PhD-related blog posts in the PhD tag on my blog.

Since the last post, I've participated in both the AAAI-22 Doctoral Consortium and a Hackathon in AI for Sustainability! I've written separate posts about these topics to avoid cluttering this post, so if you're interested I can recommending checking those posts out:

CLIP works.... kinda

In the last post, I mentioned I was implementing a sentiment analysis model based on CLIP. I've been doing this in PyTorch as the pretrained CLIP model is also implemented in PyTorch. This has caused a number of issues, since it requires a GPU with a CUDA compute capability index of 3.7+, which excludes a number of the GPUs I currently have access to, making things rather awkward. Thankfully, a few months ago I built a GPU server which have somehow forgotten to blog about, so I have been able to use this for the majority of the CLIP experiments I've been running.

Anyway, this process is now complete, so I can share a graph or two on the training progress:

These graphs are as always provisional and not final results, so please don't take them such. The graph on the left is the training accuracy, and the graph on the right is the validation accuracy. I used the ViT-B/32 variant of CLIP, with 512 units for 2 x dense layers after it before the final softmax dense layer that made the prediction (full model summary available upon request - please ensure you send requests by email from an official email account I can verify). What's astonishing here is CLIP's ability to 'zero-shot' - the ability to make a prediction in a target domain it hasn't seen yet with no additional training or fine tuning. It's one thing seeing it in a blog post, but quite another seeing it in person on your own dataset.

The reasoning for multiple lines here on each graph takes some explanation. Because the CLIP model is trained on tweets both with an image and an emoji, the number of tweets in my ~700K+ dataset of tweets that satisfy both of these requirements is only ~14K. With this in mind, I implemented a system to augment tweets that had a supported emoji but didn't have an image the image that CLIP thought best matched it. It was done with the following algorithm:

  1. Rank each image against the tweet in question
  2. Pick a random image from those CLIP has at least 75% confidence in
  3. If it doesn't have at least 75% confidence in any image, pick the next best image

The reason for this somewhat convoluted algorithm is to avoid a situation where CLIP picks the same image for every tweet. With this in place I increased the size of the dataset up to a peak of ~55K (it should be higher still, but I have yet to find the bug even after combing through all related code multiple times), I could then train multiple CLIP models each with a different threshold as to how confident CLIP had to be in the augmented dataset - this is what's shown on in the above graphs.

From the graphs above, I can tell that interestingly any image is better than none at all - at least in terms of training accuracy. With a peak validation accuracy of 86.48% (vs 84.61% without dataset augmentation), this outstrips the transformer encoder I trained earlier by a fair margin.

It's cool to compare the validation accuracy, but what would be really fascinating (and also more objective) would be to compare this to human-labelled tweets as a ground truth. While I'm unsure if I can publish the exact results and details of this experiment at this time, I can say that the results were very surprising: the transformer encoder narrowly beat CLIP in accuracy when comparing them against the ~2K human-labelled tweets!

The effect of this is that the images may not contain much information that's useful when predicting the positive/negative sentiment, so attempts to extract information from the images likely need to use a different strategy. I speculate here that the reason it appeared to boost the validation accuracy of CLIP is that it assisted CLIP in figuring out what actually being asked of it - similar to the "prompt engineering" the authors of CLIP mention in their section on CLIP's limitations.

Wrapping this half up

To wrap the social media half of my project up (for now at least), I'm writing a journal article to summarise the (sub)project. This will also include data and experiments from some of the students who participated in the Hackathon in AI for Sustainability 2022. I doubt that the journal I ultimately end up submitting to would like it very much if I release too many more details about this at this time, so a deeper discussion on the results, the journal I've chosen with my PhD supervisor's help to submit to, and the paper will have to wait until I finish it and it (hopefully!) gets accepted and published.

It's been slow-going on writing this journal article - both because it's my first one and because I'm drawing content together from many different sources, but I think I'm getting there.

Once I've finished writing this journal article, I believe I'll be turning my attention to the rainfall radar half of my project while I wait for a decision on whether it'll be published or not - so you can expect more on this in the next post in this series.

The plan

Going on a bit of a tangent, the CLIP portion of the project has been very helpful in introducing me to how important optimising the data preprocessing pipeline is - especially the data augmentation part. By preprocessing in parallel and reshuffling some things, I was able to bump the average usage of my Nvidia GeForce 3060 GPU from around 10% to well over 80%, speeding up the process of augmenting the data from ~10 minutes per tweet to just 1.5 seconds per tweet! It's well worth spending a few hours on your data processing pipeline if you know you'll be training and retraining your model a bunch of times as you tweak it, as you could save yourself many hours of training time.

A number of key things to watch out for that I've found so far, in no particular order:

  • Preprocessing data in parallel is very important. You can usually boost performance by as many times as you have CPU cores!
  • Reading data from a stream makes it awkward to parallelise. It's much easier and simpler to handle e.g. 1 image per file than a stream of images in a single file.
  • Image decoding is expensive, meaning that you'll most likely hit a CPU bottleneck if your model handles images. Ensuring images are JPEG can help, as PNGs are more expensive to decode.
    • Similarly, the image decoder you use can significantly affect performance. I used simplejpeg, but I've heard that if you wrap Tensorflow's native image decoding in an input pipeline that can also be good as it can compile it into something more efficient. Test different methods with your own dataset to see which is best.
  • Given that your preprocessing pipeline will run for every epoch, investigate if you can do any expensive steps just once before training begins.

In the future I'd like to write a blog post that more thoroughly compares PyTorch and Tensorflow now that I have more experience with both of them. They have different strengths and weaknesses which make them both good fits for different types of models and projects.

All this experience will be very useful indeed when I turn my attention back to the rainfall radar portion of my project. My current plan is to investigate training a CLIP model to comparatively train the rainfall radar + heightmap and the water depth data against one another. As of now I haven't looked into the specifics and details of how CLIP's training process actually works, but I'm hoping it's not too complicated to either re-use their code or implement my own.

In training such a CLIP model, it should in theory tell me whether there's any relationship between the two at all that a model can learn. If there is, then I can then move on to the next step and connect a decoder of some description to the model that will produce an image as an output. If anyone has any good resources on this, please do comment below as I'm rather unsure as to where to begin (I've tried an autoencoder design in the past for this model - albeit without CLIP - and it didn't go very well).

Conclusion

Since last time, I've trained a bunch of CLIP models, and compared these (in more ways than one) to the transformer encoder I trained earlier. To extract useful information from images, a different strategy is likely needed as it doesn't appear that they contain much useful information about sentiment in the context of a flooding situation.

In training the CLIP models however, I've gained a lot of very valuable experience that will greatly help me in implementing an efficient model and pipeline for the rainfall radar half of my project. If I could go back and do this all again, I would have started the social media half of my project first, as it's taught me a whole bunch of very useful things that would have saved me a lot of time on my rainfall radar project....

If you've found this interesting, are confused about anything here, or have any suggestions, please do comment below! I'd love to hear from you.

Hackathon in AI for Sustainability 2022

The other week, I took part in the Hackathon in AI for Sustainability 2022. While this was notable because it was my first hackathon, what was more important was that it was partially based on my research! For those who aren't aware, I'm currently doing a PhD at the University of Hull with the project title "Using Big Data and AI to Dynamically Predict Flood Risk". While part of it really hasn't gone according to plan (I do have a plan to fix it, I just need to find time to implement it), the second half of my project on social media has been coming together much more easily.

To this end, my supervisor asked me about a month ago whether I wanted to help organise a hackathon, so I took the plunge and said yes. The hackathon has 3 projects for attendees to choose from:

  • Project 1: Hedge identification from earth observation data with interpretable computer vision algorithms
  • Project 2: Monopile fatigue estimation from nonlinear waves using deep learning
  • Project 3: Live sentiment tracking during floods from social media data (my project!)

When doing research, I've found that there are often many more avenues to explore than there is time to explore them. To this end, a hackathon is an ideal time to explore these avenues that I have not had the time to explore previously.

To prepare, I put together some dataset of tweets and associated images - some from the models I've actually trained, and others (such as one based on the hashtag #StormFranklin) that I downloaded specially for the occasion. Alongside this, I also trained and prepared a model and some sample code for students to use as a starting point.

On the first day of the event, the leaders of the 3 projects presented the background and objectives of the 3 projects available for students to choose from, and then we headed to the lab to get started. While unfortunate technical issues were a problem for all 3 projects, we managed to find ways to work around them.

Over the next few days, the students participating in the hackathon tackled the 3 projects and explored different directions. At first, I wasn't really sure about what to do or how to help the students, but I soon started to figure out how I could assist students by explaining things, helping them with their problems, fetching and organising more data, and other such things.

While I can't speak for the other projects, the outputs of the hackathon for my project are fascinating insights into things I haven't had time to look into myself - and I anticipate that we'll be may be able to draw them together into something more formal.

Just some of the approaches taken in my project include:

  • Automatically captioning images to extract additional information
  • Using other sentiment classification models to compare performance
    • VADER: A rule-based model that classifies to positive/negative/neutral
    • BART: A variant of BERT
  • Resolving and inferring geolocations of tweets and plotting them on a map, with the goal of increasing relevance of tweets

The outputs of the hackathon have been beyond my wildest dreams, so I'm hugely thankful to all who participated in my project as part of the hackathon!

While I don't have many fancy visuals to show right now, I'll definitely keep you updated with progress on drawing it all together in my PhD Update blog post series.

A learning experience | AAAI-22 in review

Hey there! As you might have guessed, it's time for my review of the AAAI-22 conference(?) (Association for the Advancement of Artificial Intelligence) I attended recently. It's definitely been a learning experience, so I think I've got my thoughts in order in a way that means I can now write about them here.

Attending a conference has always been on the cards - right from the very beginning of my PhD - but it's only recently that I have had something substantial enough that it would be worth attending one. To this end, I wrote a 2 page paper last year and submitted it to the Doctoral Consortium, which is a satellite event that takes place slightly before the actual AAAI-22 conference. To my surprise I got accepted!

Unfortunately in January AAAI-22 was switched from being an in-person conference to being a virtual conference instead. While I appreciate and understand the reasons why they made that decision (safety must come first, after all), it made some things rather awkward. For example, the registration form didn't mention a timezone, so I had to reach out to the helpdesk to ask about it.

For some reason, the Doctoral Consortium wanted me to give a talk. While I was nervous beforehand, the talk itself seemed to go ok (even though I forgot to create a slide somewhere in the middle) - people seemed to find the subject interesting. They also assigned a virtual mentor to me as well, who was very helpful in checking my slide deck for me.

The other Doctoral Consortium talks were also really interesting. I think the one that stood out to me was "AI-Driven Road Condition Monitoring Across Multiple Nations" by Deeksha Arya, in which the presenter was using CNNs to detect damage to roads - and found that a model trained on data from 1 country didn't work so well in another - and talked about ways in which they were going to combat the issue. The talk on "Creating Interpretable Data-Driven Approaches for Tropical Cyclones Forecasting" by Fan Meng also sounded fascinating, but I didn't get a chance to attend on account of their session being when I was asleep.

As part of the conference, I also submitted a poster. I've actually done a poster session before, so I sort of knew what to expect with this one. After a brief hiccup and rescheduling of the poster session I was part of, I got a 35 minute slot to present my poster, and had some interesting conversations with people.

Technical issues were a constant theme throughout the event. While the Doctoral Consortium went well on Zoom (there was a last minute software change - I'm glad I took the night before to install and check multiple different video conferencing programs, otherwise I wouldn't have made it), the rest of the conference wasn't so lucky. AAAI-22 was held on something called VirtualChair / Gather.town, which as it turned out was not suited to the scale of the conference in question (200 people in each room? yikes). I found myself with the seemingly impossible task of using a website that was so laggy it was barely usable - even on my i7-10750H I bought back in 2020. While the helpdesk were helpful and suggested some things I could try, nothing seemed to help. This severely limited the benefit I could gain from the conference.

At times, there were also a number of communication issues that made the experience a stressful one. Some emails contradicted each other, and others were unclear - so I had to email the organisers at multiple points to request clarification. The wording on some of the forms (especially the registration form) left a lot to be desired. All in all, this led to a very large number of wasted hours figuring things out and going back and forth to resolve confusion.

It also seemed as though everyone appeared to assume that I knew how a big conference like this worked and what each event was about, when this was not the case. For example, after the start of the conference I received an email saying that they hoped I'd been enjoying the plenary sessions, when I didn't know that plenary sessions existed, let alone what they were about. Perhaps in future it would be a good idea to to distribute a beginner's guide to the conference - perhaps by email or something.

For future reference, my current understanding of the different events in a conference is as follows:

  • Doctoral Consortium: A series of talks - perhaps over several sessions - in which PhD students submit a 2 page paper in advance and then present their projects.
  • Workshop: A themed event in which a bunch of presenters submit longer papers and talk about their work
  • Tutorial: In which the organisers deliver content centred around a specific theme with the aim of educating the audience on a particular topic
  • Plenary session: While workshops and tutorials may run in parallel, plenary sessions are talks at a time when everyone can attend. They are designed to be general enough that they are applicable to the entire audience.
  • Poster session: A bunch of people create a poster about their research, and all of these posters are put up in a room. Then, researchers are designated specific sessions in which they stand by their poster and people come by and chat with them about their research. At other times, researchers are free to browse other researchers' papers.

Conclusion

Even though the benefit from talks, workshops, and other activities at the conference directly has been extremely limited due to technical, communication, and timezoning issues, the experience of attending this conference has been a beneficial one. I've learnt about how a conference is structured, and also had the chance to present my research to a global audience for the first time!

In the future, I hope that I get the chance to attend my first actual conference as I feel I'm much better prepared, and have a better understanding as to what I'm getting myself in for.

Art by Mythdael