PhD Update 3: Simulating simulations with some success
Hey there! Welcome to another PhD update blog post. The last time I posted, I was still working away at getting the rainfall radar data downloader working as intended.
Thankfully, since then I've managed to get it to complete (wow, that too much longer than expected) - and I've now turned my attention to running the physics-based simulation, and beginning to implement the AI(s) that will (hopefully) implicitly learn the parameters of the model in question.
Physics-based simulation patching
Getting to this point, as you might imagine, wasn't quite as straight-forward as I initially thought. The physics-based model I'm (currently) using is HAIL-CAESAR, a (supposedly) high-performance version of CAESAR-Lisflood (yes, it's SourceForge shudder). Unfortunately, the format for the rainfall data that it takes in is especially space inefficient - after writing a converter, I found that my 4.5GiB compress JSON stream files (1 per year) would have turned into about 66GB of uncompressed ASCII! Theoretically speaking by my calculations. I don't have that much disk space free - so clearly another approach is in order.
This approach I speak of is convincing HAIL-CAESAR to take the data in via the standard input. I initially tried using a FIFO (also known as a named pipe), but I ran into this bug in Node.js.
HAIL-CAESAR by default doesn't support taking the data in on the standard input though, so I had to patch HAIL-CAESAR to add support. I did this by getting it to interpret the filename
- to mean "use the standard input instead", which from my previous experiences seem to be an unofficial convention that lots of other programs follow. Perhaps at some point soon I should consider contributing my patch back to HAIL-CAESAR for others to enjoy.
With that sorted, I also had to mess around with the heightmap (I got this through my University's "Digimap" service thingy) I obtained to get it to be precisely the same size as the rainfall radar data I have.
It turned out that the service I got the heightmap from isn't smart enough to give you precisely the bit you asked for - instead giving you just the tiles that overlap the area you specify. In the end I found myself with ~170 separate tiles (some of which I had to get after the fact because I found I was missing some) - so I ended up implementing a program to stitch them all back together again.
That program ended up turning out much more complete as a separate whole than I thought it would. I'm pretty sure that these heightmap files I've been dealing with are in a standard format, but I'm not aware of its name (if you know, I'd love to hear from you - post a comment below!). It's for these reasons that I ended up releasing it as a pair of packages on npm.
You can find them here:
terrain50- OS Terrain 50 manipulation library
terrain50-cli- Command-line interface for the above to make it easy to manipulate heightmaps from the command line
I'll probably make a separate blog post about them at some point soon. For the curious, the API docs (there's a link in the README of the library package too) are automatically updated with my Laminar CI setup :D
It is with a considerable amount of anticipation that I'm finally reaching the really interesting part of this experiment. This week, I've started work on implementing a Temporal Convolutional Neural Network (see also this paper). A Temporal CNN is a network type I discovered recently that takes advantage of multiple 3-dimensional CNN layers to allow a CNN-based model to learn temporal-based relationships in a dataset.
I'm not sure how well it's going to work on my particular dataset, given that the existing papers I've found on it use it for classification-based tasks, but I'm pretty hopeful that, with some tweaking, it should perform pretty well. While I haven't yet finished writing up the dataset input logic, I have implemented the core model using the Tensorflow.js layers API:
From here, I intend to finish up my Temporal CNN implementation and get it running on the data I have so far from the HAIL-CAESAR model (which isn't unfortunately a lot - so far I've only got ~8K 5-minute time-steps worth of output which, if I'm calculating correctly, is just 29 days worth of simulation). I'm probably going to have to swap HAIL-CAESAR out at some point though, because it's really slow. Or perhaps I just don't know how to use it properly (maybe I should find someone more experienced with it and ask them first).
Anyway, I'm also going to try implementing a model inspired by the Google rainfall radar nowcasting paper I mentioned in my last post in this series. With both of these implemented, I can start to compare them and see which one is better suited for the task of flood prediction. I might even implement the Grid LSTM model I saw too.
In addition, I have my PhD panel 1 review coming up soon too - so apparently I've got a list of things I need to do to prepare for that - including writing a ~5K word report. I'll probably do this pretty soon - I don't want to be rushing it at the last minute.
Found this interesting? Got a suggestion? Want to say hi? Comment below!