PhD Update 1: Directions

Welcome to my first PhD update post. I intend to post these at bimonthly intervals. In the last post, I talked a bit about my PhD project that I'm doing and my initial thoughts. Since then, I've done heaps of investigation into a number of different potential directions I could take the project. For reference, my PhD title is actually as follows:

Using the Internet of Things, Big Data, and AI to dynamically map flood risk.

There are 3 main elements to this project:

I'm pretty sure that each of them will have an important role to play in the final product - even if I'm not sure what those roles are just yet :P

Particularly of concern at the moment is this blog post by Google. It talks about they've managed to significantly improve flood forecasting with AI along with a seriously impressive visualisation to back it up - but I can't find a paper on it anywhere. I'm concerned that anything I try to do in the area won't be useful if they are already streets ahead of everyone else like that.

I guess one of the strong points I should try to hit is the concept of explainable AI if possible.

All the data sources!

As it stands right now, I'm currently evaluating various different potential data sources that I've managed to gain access to. My aim here is to evaluate how useful they will be in solving the wider problem - and whether they are useful enough to be worth investigating further.

Environment Agency

Some great people from the environment agency came into University recently to chat with us about what they did. The discussion we had was very interesting - but they also asked if there was anything they could do to help our PhD projects out.

Seeing the opportunity, I jumped at the chance to get a hold of some of their historical datasets. They actually maintain a network of high-quality sensors across the country that monitor everything from rainfall to river statistics. While they have a real-time API that you can use to download recent measurements, it doesn't appear to go back further than March 2017. To this end, I asked for data from 2005 up to the end of 2017, so that I could get a clearer picture of the 2007 and 2013 floods for AI training purposes.

So far, this dataset has proved very useful at least initially as a testbed for training various kinds of AI as I learn PyTorch (see my recent post for how that has been going - I've started with a basic LSTM first. For reference, an LSTM is a neural network architecture that is good at processing time-series data - but is quite computationally expensive to run.

Met Office

I've also been investigating the datasets that the Met Office provide. These chiefly appear to be in the form of their free DataPoint API. Particularly of interest are their rainfall radar images, which are 500x500 pixels and are released every 15 minutes. Sadly they are only available for a few hours at best, so you have to grab them fast if you want to be able to analyse particularly interesting ones later.

Annoyingly though, their API does not appear to give any hints as to the bounding boxes of these images - and neither can I find any information about this online. I posted in their support forum, but it doesn't appear that anyone actually monitors it - so at this point I suspect that I'm unlikely to receive a response. Without knowing the (lat, lng) co-ordinates of the images produced by the API, they are little more use than pretty wall art.

Internet of Things

On the Internet of Things front, I'm already part of Connected Humber, which have a network of sensors setup that are monitoring everything from air quality to temperature, humidity, and air pressure. While these things aren't directly related to my project, the dataset that we're collecting as a group may very well come in handy as an input to a model of some description.

I'm pretty sure that I'll need to setup some additional custom sensors of my own at some point (probably soonish too) to collect the measurement readings that I'm missing from other pre-existing datasets.

Reading a library

Whilst I've been doing this, I've also been reading up a storm. I've started by reading into traditional physics-based flood modelling simulations (such as caesar-lisflood) - which appear to fall into a number of different categories, which also have sub-categories. It's quite a rabbit hole - but apparently I'm diving all the way down to the very bottom.

The most interesting paper on this subject I found was this one from 2017. It splits physics-based models up into 3 categories:

As I'm going to be using artificial intelligence as the core of my project, it quickly became evident that this is just stage-setting for the actual kind of work I'll be doing. After winding my way through a bunch of other less interesting papers, I found my way to this paper from 2018 next, which is similar to the previous one I linked to - just for AI and flood modelling.

While I haven't yet had a chance to follow up on all the interesting papers referenced, it has a number of interesting points to keep in mind:

The odd thing about this paper is that it claims that regular neural networks were better than recurrent neural network structures - despite the fact that it is only citing a single old 2013 paper (which I haven't yet read). This led me on to read a few more papers - all of which were mildly interesting and had at least something to do with neural networks.

I certainly haven't read everything yet about flood modelling and AI, so I've got quite a way to go until I'm done in this department. Also of interest are 2 newer neural network architectures which I'm currently reading about:

Next steps

I want to continue to read about the above neural networks. I also want to implement a number of the networks I've read about in PyTorch to continue to learn the library.

Lastly, I want to continue to find new datasets to explore. If you're aware of a dataset that I haven't yet talked about on here, comment below!

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt


Art by Mythdael