Stardust | Starbeamrainbowlabs

PhD Update 20: Like a bad smell.....

Hi again! Another wild blog post appeared. PhD corrections, I have come to realise, have a habit of hanging around long past the time you want to have had them finished and done.

Before we get into all of that though, here's the customary list of posts:

See also Doing a 3-way dataset split in Tensorflow // PhD Aside 3, which I posted since the last one of these PhD update blog posts.

Things have not been easy over the last 9 months, but I am making it through. I can't promise when blog posts will come, but know that I have a lot of ideas and it's just a case of having the energy to write them. See also my fediverse account @sbrl@fediscience.org (rss feed) for smaller updates in between times.

Corrections

One of the things I did not realise when I started my PhD was just how much work you are expected to do outside of your main scholarshiped research period. There's writing the thesis, doing the viva, and, of course, the corrections afterwards.

While I'm not sure how much I can share about the corrections I've been given I can say that the process of completing them has been both long and annoying.

More importantly, it is now coming to a close!

Yep, that's right: I'm almost done with my corrections! I just need a laundry list of people to check them and say they are okay, and then I can finally get this PhD thing over with and move on to cooler researchy things that I'll talk about later in this blog post.

As with my viva, the corrections I received were mainly organisational and big picture stuff in nature, though I also had my fair share of experiment redos to complete (ewwww), which have been very time consuming.

Once everything is done, I do plan on making my thesis available for free here on my website if my institution will let me. I doubt it will get indexed in any scholarly search engines any time soon, but hopefully it will be useful to someone here.

Speaking of, I'd like to share a Cool Graph™ I created whilst doing my corrections:

A grid of graphs showing the stability of the metrics for my rainfall radar models over 7 runs. There are areas shaded on the graphs to show the standard deviation and min/max values for each epoch

(Above: A grid of graphs showing the stability of the metrics for my rainfall radar models over 7 runs)

This is, as the caption suggests, a random cross-validation (because anything else would be far too complicated to implement) run of my rainfall radar model I have mentioned before. This was one of the things I was asked to do in my corrections.

The code behind this is kinda cool, as it aggregates a metrics.tsv file from every experiment directory in in a given directory.

As I write this I realise that's kinda confusing, so let me show you what the directory of experiments actually looks like:

+ 202412_crossval-stbl7
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-A
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-B
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-C
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-D
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-E
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-F
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-G

The experiment series directory (202412_crossval-stbl7) contains 7 different runs, matching the stbl7 part of the experiment series name crossval-stbl7.

The directory names there look pretty complicated, but it's actually just made of up all the parameters that I'm currently interested in for that experiment series. Each part is separated by an underscore _.

So, to break down 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-C:

2024-12-12: The date the (individual) experiment was run. These all ran in parallel on a HPC at my university, hence all the dates are the same even though it takes more than 24 hours to train the rainfall radar model.
deeplabv3+: The architectural backbone of the model in question - this time DeepLabV3+.
rainfall: Identifier code for the project, rainfall is the traditional informal and short name I gave to the rainfall radar model.
csgpu: The place it was trained on. In this case csgpu is a small HPC cluster in my department.
ri: This marks the start of the experiment hyperparams I'm interested in, in no particular order (but the order usually remains constant for a given project). ri stands for Remove Isolated.
celdice: The loss function. Cross-Entropy Loss + Dice loss.
lr0.00001: The Learning Rate, this time 0.00001
us2: UpScale 2 - a property of the model in which it upscales the input x2 and downscales it just before the output. This improves the fidelity of the output at the cost of a higher memory usage.
t0.1: Threshold of 0.1 - the delimiter between water and no water. At some point, I want to split into multiple bins.
bs32: Batch Size of 32.
crossval-stbl7-C: The experiment series code - see above.
- crossval-stbl: The main part of the experiment series code
- 7: The number of cross-validation runs
- C: The differentiator. In this case it's part C of the experiment series since they are all the same, but usually each model has something unique about it, e.g. regresstest-regress vs regresstest-class.

Hmmm, looking at this it might be a bit more complicated a system than I first expected, but it makes sense to me. I wonder if I've blogged about how I organise experiments already? If not, that should go on the todo list.

Anyway, this is the foundation of my entire organisational system for running experiments. I've developed quite an intricate system since I started running experiments in 2020, but fundamentally it is based on the principle of preserving as much information about any given experiment that I've run as possible, as I am sure to need it later.

Even if I don't think I'll need it!

In fact, especially if I don't think I'll need it, because I've been bitten enough times to know that it's not a case of if, it's most certainly a case of when.

Corrections? What corrections?

Not to get too distracted, while I don't think the University would like it very much if I shared my exact list of corrections, it boiled down to the following basic principles, in no particular order:

There wasn't a clear narrative carrying the problem forwards through the thesis
They wanted more experiments running to confirm the stability of the models trained - hence this post on a 3-way split since they wanted a 3-way split, and also hence the stability testing done to produce the graph above amongst others
They wanted a regression model training for the rainfall radar model and a comparative analysis against my existing classification-based approach

It doesn't sound like much, but it has been quite a lot of work to get to this point, especially since I have been doing more teaching than I expected starting in September last year. I'm glad now that I applied for a 6 month extension and for the help of the people around me (and 2 people in particular - not sure if I can mention your names, but you know who you are), otherwise I would have run out of time to complete my corrections long ago.

Future research

Now that my corrections are (hopefully) coming to and end and I'm starting to get a handle on the teaching I've been asked to do (wow, and that isn't even the half of it), I'm finally starting to get myself into a place in which I can FINALLY start to look forwards to some more research that is actually useful, as opposed to making seemingly endless corrections to my thesis (the social media chapter in particular I can do SO MUCH BETTER).

First of all are real improvements to my rainfall radar model. These improvements largely fall into a few categories:

Analysing and improving the model's ability to actually predict floods, and applying sample weighting (psst, secret second graph for those of you who are still reading!) to hopefully measuably improve my model's ability to make actually useful predictions
Swapping out the physics-based model the model is trained on because it's bad and I didn't prepare the data very well and it's all bad
Exanding the model's ability to predict multiple bins instead of just a binarised water/no-water situation

These are not necessarily in order, but I imagine I'll likely tackle them in something like this order.

On the social media side, I know that I can do so much better than my social media paper which somehow has 41 citations (just HOW??!). Binary sentiment analysis is cute and all, but at the intersection of AI, disaster situational awareness, and user interface (UI) design and user experience (UX) I believe that I can do much better in the organisation of unstructured data.

With the use of contemporary AI algorithms and UI/UX, the extraction and presentation of richer information should be possible.

Even though these research plans won't be part of my PhD, I will still continue blogging about it! Who knows, I might even start a new long-running blog post series to mark the beginning of a new era in my life.

And, of course, I'll continue to share Cool Graphs™!

BlueSky

BlueSky: As a last thing, I'm going to blog about it at some point but I'm now using bridgy fed to allow you to follow me on Bluesky! I'm @sbrl.fediscience.org.ap.brid.gy, and to interact with me you'll need to follow @ap.brid.gy.

BlueSky seems to be becoming very popular, especially my circles - but while it pretends to be decentralised, it isn't. For this reason and others, my primary social media will remain on the Fediverse to ensure and preserve the long-term viability of my account.

I encourage you to join the fediverse too - it's a nice and friendly place :D

Final thoughts

It has been a long road, but I am finally nearing the end of one book and the beginning of another. This is not the last post in this series - I have at least 1 more planned. When I have the energy, I want to talk about my experiences learning to teach (I'm doing a course called PCAP right now, as it was a stipulation of my contract) in what may be a longer blog post than I expect.

I'm looking forward to continuing my research journey and blogging along the way right here at my stardust blog (I think this is the first time I've mentioned my blog's name!).

I'll see you next time, in what might be one of the last blog posts in this series: PhD Update 21: Where the water meets the sky.

--Starbeamrainbowlabs

Teaching this September

Hello!

Believe it or not, I'm going to be teachificatinating a thing at University this semester, which starts at the end of this month and lasts until around December-ish time (yeah, I'm surprised too).

It's called Secure Digital Infrastructure, and I'll be teaching Linux and system administration skills, so that includes the following sorta-areas:

Bash (Learn your terminal (or command line))
Linux file and account permissions (no posts I could find yet - plenty of resources online tho)
Practical networking and web server configuration (see also The NSD Authoritative DNS Server: What, why, and how and part 2, An epic journey awaits: The hows and whys of DNS (and why DNS privacy is important), Securing a Linux Server Part 1: Firewall, and others)
Docker (see also Cluster, Part 10: Dockerisification | Writing Dockerfiles)
Clustering - at least the theory (see also Cluster Series List)
Reverse proxies (see also related posts: Securing your port-forwarded reverse proxy and the unreasonably popular How to set up a WebDav share with Nginx)

(related posts aren't necessarily the exact content I'm going to cover, but are related)

To this end, it is quite stressful and is taking significantly more energy than I expected to prepare for this.

I definitely want to talk about it here, but that will likely happen after the fact - probably some time in January or February.

Please be patient with me as I navigate this new and unexpected experience :-)

--Starbeamrainbowlabs

PhD Update 19: The Reckoning

The inevitability of all PhDs. At first it seems distant and ephemeral, but it is also the inescapable and unavoidable destination for all on the epic journey of the PhD.

Sit down and listen as I tell my own tale of the event I speak of.

I am, of course, talking about the PhD Viva. It differs from country to country, but here in the UK the viva is an "exam" that happens a few months after you have submitted your thesis (PhD Update 18: The end and the beginning). Unlike across the pond in the US, in the UK vivas are a much more private affair, with only you, the chair, and your internal and external examiners normally attending.

In my case, that was 2 externals (as I am also staff, ref Achievement get: Experimental Officer Position!), an internal, and of course the chair. I won't name them as I'm unsure of policy there, but they were experts in the field and very kind people.

I write this a few weeks removed from the actual event (see also my post on Fediscience at the time), and I thought that my viva itself deserved a special entry in this series dedicated to it.

My purpose in this post is to talk about my experience as honestly and candidly as I can, and offer some helpful advice from someone who has now been through the process.

The Structure

The viva itself took about 4 hours. It's actually a pretty complicated affair: all your examiners (both internal and external) have to read your thesis and come up with a list of questions (hidden from you of course). Then, on the day but before you enter the room they have to debate who is going to ask what to avoid duplication.

In practice this usually means that the examiners will meet in the morning to discuss, before having lunch and then convening for the actual viva bit where they ask the questions. In my case, I entered the room to meet the examiners and say hi, before leaving again for them to sort out who was going to ask what.

Then, the main part of the viva simply consists of you answering all the questions that they have for you. Once all the questions are answered, then the viva is done.

You are usually allowed a copy of your thesis in one form or another to assist you while answering their questions. The exact form this will take varies from institution to institution, so I recommended always checking this with someone in charge (e.g. the Doctoral College in my case) well in advance - you don't want to be hit with paperwork and confusion minutes before your viva is scheduled to start!

After the questions, you leave the room again for the examiners to deliberate over what the outcome will be, before calling you back into the room to give you the news.

Once they have done this: the whole thing is over and you can go sleep (trust me, you will not want to do anything else).

My experience

As I alluded to in the aforementioned post on fediscience (a node in the fediverse), I found the viva a significantly intense experience - and one I'm not keen on repeating any time soon. I strongly recommend having someone nearby as emotional support for after the viva and during those periods when you have to step out of the room. I am not ashamed to admit that there were tears after the exam had ended.

More of the questions than I expected focused on the 'big picture' kinda stuff, like how my research questions linked in with the rest of the thesis, and how the thesis flowed. I was prepared for technical questions -- and there were some technical questions -- but the 'fluffy stuff' kinda questions caught me a little off guard. For example, there were some questions about my introduction and how while I introduced the subject matter well, the jump into the technical stuff with the research questions was quite jarring, with concepts mentioned that weren't introduced beforehand.

To this end, I can recommend looking over the 'big picture' stuff beforehand so that you are prepared for questions that quiz you on your motivations for doing your research in the first place and question different aspects research questions.

It can also feel quite demoralising, being questioned for hours on what has been your entire life for multiple years. It can feel like all you have done is pointless, and you need to start over. While it is sure that you could improve upon your methods if you started from scratch, remember that you have worked hard to get to this point! You have discovered things that were not known to the world before your research began, and that is a significant accomplishment!

Try not to think too hard about the corrections you will need to make once the viva is done. Institutions differ, but in my case it is the job of the chair to compile the list of corrections and then send them to you (in one form or another). The list of corrections - even if they are explained to you verbally when you go back in to receive the result - may surprise you.

Outcome

As I am sure that most of you reading this are wondering, what was my result?! Before I tell you, I will preface the answer to your burning question with a list of the possible outcomes:

Pass with no corrections (extremely rare)
Pass with X months corrections (common, where X is a multiple of 3)
Fail (also extremely rare)

In my case, I passed with corrections!

It is complicated by the fact that while the panel decided that I had 6 months of corrections to do, I am not able to spend 100% of my time doing them. To this end, it is currently undefined how long I will have to do them - paperwork is still being sorted out.

The reasons for this are many, but chief among them is that I will be doing some teaching in September - more to come on my experience doing that in a separate post (series?) just as soon as I have clarified what I can talk about and what I can't.

I have yet to recieve a list of the corrections themselves (although I have not checked my email recently as I'm on holiday now as I write this), but it is likely that the corrections will include re-running some experiments - a process I have begun already.

Looking ahead

So here we are. I have passed my viva with corrections! This is not the end of this series - I will keep everyone updated in future posts as I work through the corrections.

I also intend to write a post or two about my experience learning to teach - a (side)quest that I am currently persuing in my capacity as Experimental Officer (research is still my focus - don't worry!)

Hopefully this post has provided some helpful insight into the process of the PhD viva - and my experience in mine.

The viva is not a destination: only a waypoint on a longer journey.

If you have any questions, I am happy to anwser them in the comments, and chat on the fediverse and via other related channels.

PhD Update 18: The end and the beginning

Hello! It has been a while. Things have been most certainly happening, and I'm sorry I haven't had the energy to update my blog here as often as I'd like. Most notably, I submitted my thesis last week (gasp!)! This does not mean the end of this series though - see below.

Before we continue, here's our traditional list of past posts:

Since last time, that detecting persuasive tactic challenge has ended too, and we have a paper going through at the moment: BDA at SemEval-2024 Task 4: Detection of Persuasion in Memes Across Languages with Ensemble Learning and External Knowledge.

Theeeeeeeeeeeeesis

Hi! A wild thesis appeared! Final counts are 35,417 words, 443 separate sources, 167 pages, and 50 pages of bibliography - making that 217 pages in total. No wonder it took so long to write! I submitted at 2:35pm BST on Friday 10th May 2024.

I. can. finally. rest.

It has been such a long process, and taken a lot of energy to complete it, especially since large amounts of formal academic writing isn't usually my thing. I would like to extend a heartfelt thanks especially to my supervisor for being there from beginning to end and beyond to support me through this endeavour - and everyone else who has helped out in one way or another (you know who you are).

Next step is the viva, which will be some time in July. I know who my examiners are going to be, but I'm unsure whether it would be wise to say here. Between now and then, I want to ~~stalk~~ investigate my examiners' research histories, which should give me an insight into their perspective on my research.

Once the viva is done, I expect to have a bunch of corrections to do. Once those are completed, I will to the best of my ability be releasing my thesis for all to read for free. I still need to talk to people to figure out how to do that, but rest assured that if you can't get enough of my research via the papers I've written for some reason, then my thesis will not be far behind.

Coming to the end of my PhD and submitting my thesis has been surprisingly emotionally demanding, so I thank everyone who is still here for sticking around and being patient as I navigate these unfamiliar events.

Researchy things

While my PhD may be coming to a close (I still can't believe this is happening), I have confirmed that I will have dedicated time for research-related activities. Yay!

This means, of course, that as one ending draws near, a new beginning is also starting. Today's task after writing this post is to readificate around my chosen idea to figure out where there's a gap in existing research for me to make a meaningful contribution. In a very real way, it's almost like I am searching for directions as I did in my very first post in this series.

My idea is connected to the social media research that I did previously on multimodal natural language processing of flooding tweets and images with respect to sentiment analysis (it sounded better in my head).

Specifically, I think I can do better than just sentiment analysis. Imagine an image of a street that's partially underwater. Is there a rescue team on a boat rescuing someone? What about the person on the roof waving for help? Perhaps it's a bridge that's about to be swept away, or a tree that has fallen down? Can we both identify these things in images and map them to physical locations?

Existing approaches to e.g. detect where the water is in the image are prone to misidentifying water that is infact where it should be for once, such as in rivers and lakes. To this end, I propose looking for the people and things in the water rather than the water itself and go for a people-centred approach to flood information management.

I imagine that while I'll probably use data from social media I already have (getting a hold of new data from social media is very difficult at the moment) - filtered for memes and misinformation this time - if you know of any relevant sources of data or datasets, I'm absolutely interested and please get in touch. It would be helpful but not required if it's related to a specific natural disaster event (I'm currently looking at floods, branching out to others is absolutely possible and on the cards but I will need to submit a new ethics form for that before touching any data).

Another challenge I anticipate is that of unlabelled data. It is often the case that large volumes of data are generated during an unfolding natural disaster, and processing it all can be a challenge. To this end, somehow I want my approach here to make sense of unlabelled images. Of course, generalist foundational models like CLIP are great, but lack the ability to be specific and accurate enough with natural disaster images.

I also intend that this idea would be applicable to images from a range of sources, and not just with respect to social media. I don't know what those sources could be just yet, but if you have some ideas, please let me know.

Finally, I am particularly interested if you or someone you know are in any way involved in natural disaster management. What kinds of challenges do you face? Would this be in any way useful? Please do get in touch either in the comments below or sending me an email (my email address is on the homepage of this website).

Persuasive tactics challenge

The research group I'm part of were successful in completing the SemEval Task 4: Multilingual Detection of Persuasion Techniques in Memes! I implemented the 'late fusion engine', which is a fancy name for an algorithm that uses in basic probability to combine categorical predictions from multiple different models depending on how accurate each model was on a per-category basis.

I'm unsure of the status of the paper, but I think it's been through peer-review so you can find that here: BDA at SemEval-2024 Task 4: Detection of Persuasion in Memes Across Languages with Ensemble Learning and External Knowledge.

I wasn't the lead on that challenge, but I believe the lead person (a friend of mine, if you are reading this and want me to link to somewhere here get in touch) on that project will be going to mexico to present it.

Teaching

I'm still not sure what I can say and what I can't, but starting in september I have been asked to teach a module on basic system administration skills. It's a rather daunting prospect, but I have a bunch of people much more experienced than me to guide me through the process. At the moment the plan is for 21 lecture-ish things, 9 labs, and the assessment stuff, so I'm rather nervous about preparing all of this content.

Of course, as a disclaimer nothing written in this section should be taken as absolute. (Hopefully) more information at some point, though unfortunately I doubt that I would be allowed to share the content created given it's University course material.

As always though, if there's a specific topic that lies anywhere within my expertise that you'd like explaining, I'm happy to write a blog post about it (in my own time, of course).

Conclusion

We've taken a little look at what is been going on since I last posted, and while this post has been rather talky (will try for some kewl graphics next time!), nonetheless I hope this has been an interesting read. I've submitted my thesis, started initial readificating for my next research project - which we've explored the ideas here, helped out a group research challenge project thingy, and been invited to do some teaching!

Hopefully the next post in this series will come out on time - long-term the plan is to absolutely continue blogging about the research I'm doing.

Until next time, the journey continues!

(Oh yeah! and finally finally, to the person who asked a question by email about this old post (I think?), I'm sorry for the delay and I'll try to get back to you soon.)

LaTeX templates for writing with the University of Hull's referencing style

Hello, 2024! I'm writing this while it is still some time before the new year, but I realised just now (a few weeks ago for you), that I never blogged about the LaTeX templates I have been maintaining for a few years by now.

It's no secret that I do all of my formal academic writing in LaTeX - a typesetting language that is the industry standard in the field of Computer Science (and others too, I gather). While it's a very flexible (and at times obtuse, but this is a tale for another time) language, actually getting started is a pain. To make this process easier, I have developed over the years a pair of templates for writing that make starting off much easier.

A key issue (and skill) in academic writing is properly referencing things, and most places have their own specific referencing style you have to follow. The University of Hull is no different, so I knew from the very beginning that I needed a solution.

I can't remember who I received it from, but someone (comment below if you remember who it was, and I'll properly credit!) gave me a .bst BibTeX referencing style file that matches the University of Hull's referencing style.

I've been using it ever since, and I have also applied a few patches to it for some edge cases I have encountered that it doesn't handle. I do plan on keeping it up to date for the forseeable future with any changes they make to the aforementioned referencing style.

My templates also include this .bst file to serve as a complete starting template. There's one with a full page title (e.g. for thesis, dissertations, etc), and another with just a heading that sits at the top of the document just like a paper you might find on Semantic Scholar.

Note that I do not guarantee that the referencing style matches the University of Hull's style. All I can say is that it works for me and implements this specific referencing style.

With that in mind, I'll leave the README of the git repository to explain the specifics of how to get started with them:

https://git.starbeamrainbowlabs.com/Demos/latex-templates

They are stored on my personal git server, but you should be able to clne them just fine. Patches are most welcome via email (check the homepage of my website!)!

PhD Update 17: Light at the end of the tunnel

Wow..... it's been what, 5 months since I last wrote one of these? Oops. I'll do my best to write them at the proper frequency in the future! Things have been busy. Before I talk about what's been happening, here's the ever-lengthening list of posts in this series:

As I sit here at the very bitter end of the very last day of a long but fulfilling semester, I'm feeling quite reflective about the past year and how things have gone on my PhD. One of these posts is definitely long overdue.

Timescales

Naturally the first question here is about timescales. "What happened?" I hear you ask. "I thought you said you were aiming for intent to submit September 2023 for December 2023 finish?"

Well, about that.......

As it turns out, spending half of one's week working as Experimental Officer throws off one's estimation of how much work they do. To this end, it's looking more likely that I will be submitting my thesis in early-mid semester 2 this year. In other words, that's around about March 2024 time - give or take a month or two.

After submission the next step will be my viva. Hoping I pass, it's then likely followed by corrections that must be completed based on the feedback from the viva.

What is a viva though? From what I understand, it is an oral exam in which you, your primary supervisor, and 2 examiners comb through your thesis with a fine toothcomb and ask you lots of questions. I've heard it can take several hours to complete. While the standard is to have 1 examiner be chosen internally from your department / institute and one to be chosen externally (chosen by your primary supervisor), in my case I will be having both chosen from external sources as I am now a (part-time) staff member in the Department of Computer Science at the University of Hull (my home institution).

While it's still a little ways out yet, I can't deny that the thought of my viva is making me rather nervous - having everything I've done over the past 4.5 years scrutinised by completely unknown people. In a sense, it feels like once it is time for my viva, there will be nothing more I can do. I will either know the answers to their questions.... or I will not.

Writing

As you might have guessed by now, writing has been the name - and, indeed, aim - of the game since the last post in this series. Everything is coming together rather nicely. It's looking like I'm going to end up with the following structure:

Introduction (not written*)
Background (almost there! currently working on this)
Rainfall radar for 2d flood forecasting (needs expanding)
Social media sentiment analysis (done!)
Conclusion
Acknowledgements, Appendices, etc
Dictionary of terms; List of acronyms (grows organically as I write - I need to go through and make sure I \gls all the terms I've added later)
Bibliography (currently 27 pages and counting O.o)

Technically I have written it, it's just outdated and very bad and needs throwing out the window of the tallest building I can find. Rewrite is pending - see below.

A sneak preview of my thesis as a PDF.

(Above: A sneak preview of my thesis PDF. I'm writing in LaTeX - check out my templates with the University of Hull reference style here! Evidently the pictured section needs some work.....)

I've finished the chapter on social media work, barring some minor adjustments I need to apply to ensure consistency. My current focus is the background chapter. This is most of the way there, but I need some more detail in several sections so I'm working my way through them one at a time. This is resulting a bunch more reading (especiall for vision-based water detection via satellite data), so this is taking some time.

Once I've wrapped up the background section, it will be time to turn my attention to the content chapter #2: Rainfall radar for 2d flood forecasting. Currently, it sits at halfway between a conference paper (check it out! You can read it now, though a DOI is pending and should be available after the conference) and a thesis chapter - so I need to push (pull? drag?) it the rest of the way to the finish line. This will primarily entail 2 things:

Filling out the chapter-specific related works, which are currently rather brief given space and time limitations in a conference paper
Elaborating on things like the data preprocessing, experiments, discussion, etc.

This will also take some time, which together with the background section explains the uncertaincy I still have in my finish date. Once these are both complete, I will be submitting my intent to submit! This will start a 3 month timer, by the end of which I must have submitted my thesis. During this timer period, I will be working on the introduction and conclusion chapters, which I do not expect to take nearly as long as any of the other chapters.

Once I am done writing and have submitted my thesis, I will do everything I can to ensure it is available under an open source licence for everyone to read. I believe strongly in the power of open source (and, open science) to benefit everyone, and want to share everything I've learned with all of you reading this.

At 102 pages A4 single space so far and counting though (not including the aforementioned bibliography), it's a big time investment to read. To this end, I have various publications I've written and posted about here previous that cover most of the stuff I've done (namely the rainfall radar conference paper and social media journal article), and I also want to somehow condense the content of my thesis down into a 'mini-thesis' that's about 3-6 pages ish and post that alongside my main thesis here on my website. I hope that this should provide the broad strokes and a navigation aid for the main document.

Predicting Persuasive Posts

All this writing is going to drive me crazy if I don't do something practical alongside it. Unfortuantely I have long since run out of exuses to run more experiments on my PhD work, so a good friend of mine who is also doing a PhD (they've published this paper) came along at the perfect time the other day asking for some help with a challenge competition submission they want to do. Of course, I had to agree to help out in a support role as the project sounds really interesting¹.

The official title of the challenge is thus: Multilingual Detection of Persuasion Techniques in Memes

The challenge is part of SemEval-2024 and it's basically about classifying memes from some social media network (it's unclear which one they are from) as to which persuasion tactic they are employing to manipulate the reader's opinions / beliefs.

The full challenge page is can be found here: https://propaganda.math.unipd.it/semeval2024task4/index.html

We had a meeting earlier this week to discuss, and one of the key problems we identified was that to score challengers they be using posts in multiple unseen languages. To this end, it strikes me that it is important to have multiple languages embedded in the same space for optimal results.

This is not what GloVe does (it embeds them to different 'spaces', so a model trained data in 1 language won't necessarily work well with another) - as I discovered in my demo for the Hull Science Festival - definitely want to write about this in the final post in that series - so as my role in the team I'm going to push a number of different word embeddings through the system I have developed for the aforementioned science demo to identify which one is best for embedding multilingual text. Expect some additional entries to be added to the demo and an associated blog post on my findings very soon!

Currently, I have the following word embedding systems on my list:

Word2vec
FastText
CLIP
BERT/mBERT
XLM/XLM-RoBERTa

If you know of any other good word embedding models / algorithms, please do leave a comment below.

It also occurs to me while writing this that I'll have to make sure the multilingual dataset I used for the online demo has the same or similar words translated to every language to rule out any difference in embeddings there.

A nice challenge for the Christmas holidays! My experience of collaborating with other researchers is rather limited at the moment, so I'm looking forward to working in a team to achieve a goal much faster than would otherwise be possible.

Beyond the edge

Something that has been constant nagging presence in my mind and steadily growing is the question of what happens next after my thesis. While the details have not been confirmed yet, once everything PhD-related is wrapped up I will most likely be increasing my hours by some amount such that I work Monday - Friday rather than just Monday - Wednesday lunchtime as I have been doing so far.

This extra time will consist of 2 main activities. To the best of my current understanding, this will include some additional teaching responsibilities - I will probably be teaching a module that lies squarely within 1 of my strong points. It will also, crucially, include some dedicated time for research.

This time for research I believe I will be able to spend on research related activities, including for example collaborating with other researchers, reading papers, designing and running experiments, and writing up results into publication form. Essentially what I've been doing on my PhD, just minus the thesis writing!

Of course, the things I talk about here are not set in stone, and me talking about them here is not a declaration of such.

Either way, I do feel that the technical is a strong point of mine that I am rather passsionate about, so I do desire very much to continue dedicating a significant portion of my energy towards doing practical research tasks.

I'm not sure how much I am allowed to talk about the teaching I will be doing, but do expect some updates on that here on my blog too - however high-level and broad strokesy they happen to be. What kind of teaching-related things would you be interested in being updated about here? Please do leave a comment below.

Talking more specifically, I do have a number of research ideas - one of which I have alluded to above - that I want to explore after my PhD. Most of these are based on what I have learnt from doing my PhD and the logical next steps to analyse complex real-time data sources with a view to extracting and processing information to increase situational awareness in natural disaster scenarios. When I get around to this, I will be blogging about my progress in detail here on my blog.

It should probably be mentioned that I am still quite a long way off actually putting any of these ideas into practice (I would definitely not recommend trusting any predictions my current rainfall radar → binarised water depth model makes in the real world yet!), but if you or someone you know works in the field of managing natural disasters, I would be fascinated to know what you would find most useful related to this - please leave a comment below.

Conclusion

This post has ended up being a lot longer than I expected! I've talked about my current writing progress, a rather interesting side-project (more details in a future blog post!), and initial conceptual future plans - both researchy and otherwise.

While my thesis is drawing close to completion (relatively, at least), I hope you will join me here beyond the end of this long journey that is almost at an end. As one book closes, so does another one open. A new journey is / will be only just beginning - one I can't wait to share with everyone here in future blog posts.

If you've got any thoughts, it would be cool if you could share them below.

It goes without saying, but I won't let it impact my writing progress. I divide my day up into multiple slices - one of which is dedicated to focused PhD work - and I'll be pulling from a different slice of time other than the one for my PhD writing to help out with this project. ↩

Building the science festival demo: How to monkeypatch an npm package

A pink background dotted with bananas, with the patch-package logo front and centre, and the npm logo small in the top-left. Small brown package boxes are present in the bottom 2 corners.

In a previous post, I talked about the nuts and bolts of the demo on a technical level, and put it's all put together. I alluded to the fact that I had to monkeypatch Babylon.js to disable the gamepad support because it was horribly broken, and I wanted to dedicate an entire post to the subject.

Partly because it's a clever hack I used, and partly because if I ever need to do something similar again I want a dedicated tutorially-style post on how I did it so I can repeat the process.

Monkeypatching an npm package after installation in a reliable way is an inherently fragile task: it is not something you want to do if you can avoid. In some cases though, it's unavoidable:

If you're short on time, and need something to work
If you are going to submit a pull request to fix something now, but need an interim workaround until your pull request is accepted upstream
If upstream doesn't want to fix the problem, and you're forced to either maintain a patch or fork upstream into a new project, which is a lot more work.

We'll assume that one of these 3 cases is true.

In the game Factorio, there's a saying 'there's a mod for that' that is often repeated in response to questions in discourse about the game. The same is true of Javascript: If you need to do a non-trivial thing, there's usually an npm package that does it that you can lean on instead of reinventing the wheel.

In this case, that package is called patch-package. patch-package is a lovely little tool that enables you to do 2 related things:

a) Generate patch files simply by editing a given npm package in-situ b) Automatically and transparently apply generated patch files on npm install, requiring no additional setup steps should you clone your project down from its repository and run npm install.

Assuming you have a working setup with the target npm package you want to patch already installed, first install patch-package:

npm install --save patch-package

Note: We don't --save-dev here, because patch-package needs to run any time the target package is installed... not just in your development environment - unless the target package to patch is also a development dependency.

Next, delve into node_modules/ and directly edit the files associated with the target package you want to edit.

Sometimes, projects will ship multiple npm packages, with one being containing the pre-minified build distribution, and th other distributing the raw source - e.g. if you have your own build system like esbuild and want to tree-shake it.

This is certainly the case for Babylon.js, so I had to switch from the main babylonjs package to @babylon/core, which contains the source. Unfortunately official documentation for Babylon.js is rather inconsistent which can lead to confusion using the latter, but once I figured out how the imports worked it all came out in the wash.

Once done, generate the patch file for the target package like so:

npx patch-package your-package-name-here

This should create a patch file in the directory patches/ alongside your package.json file.

The final step is to enable automatic and transparent application of the new patch file on package installation. To do this, open up your package.json for editing, and add the following to the scripts object:

"scripts": {
    "postinstall": "patch-package"
}

...so a complete example might look a bit like this:

{
    "name": "research-smflooding-vis",
    "version": "1.0.0",
    "description": "Visualisations of the main smflooding research for outreach purposes",
    "main": "src/index.mjs",

    // ....

    "scripts": {
        "postinstall": "patch-package",
        "test": "echo \"No tests have been written yet.\"",
        "build": "node src/esbuild.mjs",
        "watch": "ESBUILD_WATCH=yes node src/esbuild.mjs"
    },

    // ......

    "dependencies": {
        // .....
    }
}

That's really all you need to do!

After you've applied the patch like this, don't forget to commit your changes to your git/mercurial/whatever repository.

I would also advise being a bit careful installing updates to any packages you've patched in future, in case of changes - though of course installing dependency package updates are vitally important to keep your code updated and secure.

As a rule of thumb, I recommend actively working to minimise the number of patches you apply to packages, and only use this method as a last resort.

That's all for this post. In future posts, I want to look more at the AI theory behind the demo, it's implications, and what it could mean for research in the field in the future (is there even a kind of paper one writes about things one learns from outreach activities that accidentally have a bearing on my actual research? and would it even be worth writing something formal? a question for my supervisor ~~and commenters on that blog post when it comes out~~ I think).

See you in the next post!

(Background to post banner: Unsplash)

Building the science festival demo: technical overview

Hello and welcome to the technical overview of the hull science festival demo I did on the 9th September 2023. If you haven't already, I recommend reading the main release post for context, and checking out the live online demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

I suspect that a significant percentage of the readers of my blog here love technical nuts and bolts of things (though you're usually very quiet in the comments :P), so I'm writing a series of posts about various aspects of the demo, because it's was a very interesting project.

In this post, we'll cover the technical nuts and bolts of how I put it together, the software and libraries I used, and the approach I took. I also have another post written I'll be posting after this one on monkeypatching npm packages after you install them, because I wanted that to be it's own post. In a post after that we'll look look at the research and the theory behind the project and how it fits into my PhD and wider research.

To understand the demo, we'll work backwards and deconstruct it piece by piece - starting with what you see on screen.

Browsing for a solution

As longtime readers of my blog here will know, I'm very partial to cramming things into the browser that probably shouldn't run in one. This is also the case for this project, which uses WebGL and the HTML5 Canvas.

Of course, I didn't implement using the WebGL API directly. That's far too much effort. Instead, I used a browser-based game engine called Babylon.js. Babylon abstracts the complicated bits away, so I can just focus on implementing the demo itself and not reinventing the wheel.

Writing code in Javascript is often an exercise in putting lego bricks together (which makes it very enjoyable, since you rarely have to deviate from your actual task due to the existence of npm). To this end, in the process of implementing the demo I collected a bunch of other npm packages together to which I could then delegate various tasks:

tsv for parsing TSV files
chroma-js for handling colours
pako for running unpacking .gz files with pure JS in the browser
readablestream-lines for making a ReadableStream a line-by-line iterator
octree-es for an Octree implementation
a few other minor packages

Graphics are easy

After picking a game engine, it is perhaps unsurprising that the graphics were easy to implement - even with 70K points to display. I achieved this with Babylon's PointsCloudSystem class, which made the display of the point cloud a trivial exercise.

After adapting and applying a clever plugin (thanks, @sebavan!), I had points that were further away displaying smaller and closer ones larger. Dropping in a perceptually uniform colour map (I wonder if anyone's designed a perceptually uniform mesh map for a 3D volume?) and some fog made the whole thing look pretty cool and intuitive to navigate.

Octopain

Now that I had the points displaying, the next step was to get the text above easy point displaying properly. Clearly with 70K points (140K in the online demo!) I can't display text for all of them at once (and it would look very messy if I did), so I needed to index them somehow and efficiently determine which points were near to the player in real time. This is actually quite a well studied problem, and from prior knowledge I remember that Octrees were reasonably efficient. If I had some tine to sit down and read papers (a great pastime), this one (some kind of location recognition from point clouds; potentially indoor/outdoor tracking) and this one (AI semantic segmentation of point clouds) look very interesting.

Unfortunately, the task of extracting a list of points within a given radius was not something commonly implemented in octree implementations on npm, and combined with a bit of headache figuring out the logic of this and how to hook it up to the existing Babylon renderer resulted in this step taking some effort before I found octree-es and got it working the way I wanted it to.

In the end, I had the octree as a completely separate point indexing data structure, and I used the word as a key to link it with the PointsCloudSystem in babylon.

Gasp, is that a memory leaks I see?!

Given I was in a bit of a hurry to get the whole demo thing working, it should come as no surprise that I ended up with a memory leak. I didn't actually have time to fix it before the big day either, so I had the demo on the big monitor while I kept an eye on the memory usage of my laptop on my laptop screen!

A photo of my demo up and running on a PC with a PS4 controller on a wooden desk. An Entroware laptop sits partially obscured by a desktop PC monitor, the latter of which has the demo full screen.

(Above: A photo of my demo in action.... I kept an eye on the memory graph the taskbar on my laptop the whole time. It only crashed once!)

Anyone who has done anything with graphics and game engines probably suspects where the memory leak was already. When rendering the text above each point with a DynamicTexture, I didn't reuse the instance when the player moved, leading to a build-up of unused textures in memory that would eventually crash the machine. After the day was over, I had time to sit down and implement a pool to re-use these textures over and over again, which didn't take nearly as long as I thought it would.

Gamepad support

You would think that being a well known game engine that Babylon would have working gamepad support. The documentation even suggests as such, but sadly this is not the case. When I discovered that gamepad support was broken in Babylon (at least for my PS4 controller), I ended up monkeypatching Babylon to disable the inbuilt support (it caused a crash even when disabled O.o) and then hacking together a custom implementation.

This custom implementation is actually quite flexible, so if I ever have some time I'd like to refactor it into its own npm package. Believe it or not I tried multiple other npm packages for wrapping the Gamepad API, and none worked reliably (it's a polling API, which can make designing an efficient and stable wrapper an interesting challenge).

To do that though I would need to have some other controllers to test with, as currently it's designed only for the PS4 dualshock controller I have on hand. Some time ago I initially purchased an Xbox 360 controller wanting something that worked out of the box with Linux, but it didn't work out so well so I ended up selling it on and buying a white PS4 dualshock controller instead (pictured below).

I'm really impressed with how well the PS4 dualshock works with Linux - it functions perfectly out of the box in the browser (useful test website) just fine, and even appears to have native Linux mainline kernel support which is a big plus. The little touchpad on it is cute and helpful in some situations too, but most of the time you'd use a real pointing device.

A white PS4 dualshock controller.

(Above: A white PS4 dualshock controller.)

How does it fit in a browser anyway?!

Good question. The primary answer to this is the magic of esbuild: a magical build tool that packages your Javascript and CSS into a single file. It can also handle other associated files like images too, and on top of that it's suuuper easy to use. It tree-shakes by default, and just all-around a joy to use.

Putting it to use resulted in my ~1.5K lines of code (wow, I thought it was more than that) along with ~300K lines in libraries being condensed into a single 4MiB .js and a 0.7KiB .css file, which I could serve to the browser along with the the main index.html file. It's event really easy to implement subresource integrity, so I did that just for giggles.

Datasets, an origin story

Using the Fetch API, I could fetch a pre-prepared dataset from the server, unpack it, and do cool things with it as described above. The dataset itself was prepared using a little Python script I wrote (source).

The script uses GloVe to vectorise words (I think I used 50 dimensions since that's what fit inside my laptop at the time), and then UMAP (paper, good blog post on why UMAP is better than tSNE) to do dimensionality reduction down to 3 dimensions, whilst still preserving global structure. Judging by the experiences we had on the day, I'd say it was pretty accurate, if not always obvious why given words were related (more on this why this is the case in a separate post).

My social media data, plotted in 2D with PCA (left), tSNE (centre), and UMAP (right). Points are blue against a white background, plotted with the Python datashader package.

_(Above: My social media data, plotted in 2D with PCA (left), tSNE (centre), and UMAP (right). Points are blue against a white background, plotted with the Python datashader package.)_

I like Javascript, but I had the code written in Python due to prior research, so I just used Python (looking now there does seem to be a package that implementing UMAP in JS, so I might look at that another time). The script is generic enough that I should be able to adapt it for other projects in the future to do similar kinds of analyses.

For example, if I were to look at a comparative analysis of e.g. language used by social media posts from different hashtags or something, I could use the same pipeline and just label each group with a different colour to see the difference between the 2 visually.

The data itself comes from 2 different places, depending on where you see the demo. If you were luck enough to see it in person, then it's directly extracted from my social media data. The online one comes from page abstracts from various Wikipedia language dumps to preserve privacy of the social media dataset, just in case.

With the data converted, the last piece of the puzzle is that of how it ends up in the browser. My answer is a gzip-compressed headerless tab-separated-values file that looks something like this (uncompressed, of course):

cat    -10.147051      2.3838716       2.9629934
apple   -4.798643       3.1498482       -2.8428414
tree -2.1351748      1.7223179       5.5107193

With the data stored in this format, it was relatively trivial to load it into the browser, decompressed as mentioned previously, and then display it with Babylon.js. There's also room here to expand and add additional columns later if needed, to e.g. control the colour of each point, label each word with a group, or something else.

Conclusion

We've pulled the demo apart piece by piece, and seen at a high level how it's put together and the decisions I made while implementing it. We've seen how I implemented the graphics - aided by Babylon.js and a clever hack. I've explained how I optimised the location polling using achieve real-time performance with an octree, and how reusing textures is very important. Finally, we took a brief look at the dataset and where it came from.

In the next post, we'll take a look at how to monkeypatch an npm package and when you'd want to do so. In a later post, we'll look at the research behind the demo, what makes it tick, what I learnt while building and showing it off, and how that fits in with the wider field from my perspective.

Until then, I'll see you in the next post!

Edit 2023-11-30: Oops! I forgot to link to the source code....! If you'd like to take a gander at the source code behind the demo, you can find it here: https://github.com/sbrl/research-smflooding-vis

My Hull Science Festival Demo: How do AIs understand text?

Hello there! On Saturday 9th September 2023, I was on the supercomputing stand for the Hull Science Festival with a cool demo illustrating how artificial intelligences understand and process text. Since then, I've been hard at work tidying that demo up, and today I can announce that it's available to view online here on my website!

This post is a general high-level announcement post. A series of technical posts will follow on the nuts and bolts of both the theory behind the demo and the actual code itself and how its put together, because it's quite interesting and I want to talk about it.

I've written this post to serve as a foreword / quick explanation of what you're looking at (similar to the explanation I gave in person), but if you're impatient you can just find it here.

All AIs currently developed are essentially complex parametrised mathematical models. We train these models by updating their parameters little by little until the output of the model is similar to the output of some ground truth label.

In other words, and AI is just a bunch of maths. So how does it understand text? The answer to this question lies in converting text to numbers - a process often called 'word embedding'.

This is done by splitting an input sentence into words, and then individually converting each word into a series of numbers, which is what you will see in the demo at the link below - just convert with some magic to 3 dimensions to make it look fancy.

Similar sorts of words will have similar sorts of numbers (or positions in 3D space in the demo). As an example here, at the science festival we found a group of footballers, a group of countries, and so on.

In the demo below, you will see clouds of words processed from Wikipedia. I downloaded a bunch of page abstracts for Wikipedia in a number of different languages (source), extracted a list of words, converted them to numbers (GloVe → UMAP), and plotted them in 3D space. Can you identify every language displayed here?

Find the demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

A screenshot of the initial attract screen of the demo. A central box allows one to choose a file to load, with a large load button directly beneath it. The background is a blurred + bloomed screenshot of a point cloud from the demo itself.

Find the demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

If you were one of the lucky people to see my demo in person, you may notice that this online demo looks very different to the one I originally presented at the science festival. That's because the in-person demo uses data from social media, but this one uses data from Wikipedia to preserve privacy, just in case.

I hope you enjoy the demo! Time permitting, I will be back with some more posts soon to explain how I did this and the AI/NLP theory behind it at a more technical level. Some topics I want to talk about, in no particular order:

General technical outline of the nuts and bolts of how the demo works and what technologies I used to throw it together
How I monkeypatched Babylon.js's gamepad support
A detailed and technical explanation of the AI + NLP theory behind the demo, the things I've learnt about word embeddings while doing it, and what future research could look like to improve word embeddings based on what I've learnt
Word embeddings, the options available, how they differ, and which one to choose.

Until next time, I'll leave you with 2 pictures I took on the day. See you in the next post!

A photo of my demo up and running on a PC with a PS4 controller on a wooden desk. An Entroware laptop sits partially obscured by a desktop PC monitor, the latter of which has the demo full screen.

(Above: A photo of my demo in action!)

A photo of some piles of postcards arranged on a light wooden desk. My research is not shown, but visuals from other researchers' projects are printed, such as microbiology to disease research to jellyfish galaxies.

(Above: A photo of the postcards on the desk next to my demo. My research is not shown, but visuals from other researchers' projects are printed, with everything from microbiology to disease research to jellyfish galaxies.)

I've submitted a paper on my rainfall radar research to NLDL 2024!

A screenshot of the nldl.org conference website.

(Above: A screenshot of the NLDL website)

Hey there! I'm excited that last week I submitted a paper to what I hope will become my very first conference! I've attended the AAAI-22 doctoral consortium online, but I haven't had the opportunity to attend a conference until now. Of course, I had to post about it here.

First things first, which conference have I chosen? With the help of my supervisor, we chose the Northern Lights Deep Learning Conference. It's relatively close by the UK (where I live), it's relevant to my area and the paper I wanted to submit (I've been working on the paper since ~July/August 2023), and the deadline wasn't too tight. There were a few other conferences I was considering, but they either had really awkward deadlines (sorry, HADR! I've missed you twice now), or got moved to an unsafe country (IJCAI → China).

The timeline is roughly as follows:

~early - ~mid November 2023: acceptance / rejection notification
somewhere in the middle: paper revision time
9th - 11th January 2024: conference time!

Should I get accepted, I'll be attending in person! I hope to meet some cool new people in the field of AI/machine learning and have lots of fascinating discussions about the field.

As longtime readers of my blog here might have guessed, the paper I've submitted is on my research using rainfall radar data and ~~abusing~~ image segmentation to predict floods. The exact title is as follows:

Towards AI for approximating hydrodynamic simulations as a 2D segmentation task

As the paper is unreviewed, I don't feel comfortable with releasing it publicly yet. However, feel free to contact me if you'd like to read it and I'm happy to hand out a copy of the unreviewed paper individually.

Most of the content has been covered quite casually in my phd update blog post series (16 posts in the series so far! easily my longest series by now), just explained in formal language.

This paper will also form the foundation of the second of two big meaty chapters of my thesis, the first being based on my social media journal article. I'm currently at 80 pages of thesis (including appendices, excluding bibliography, single spaced a4), and I still have a little way to go before it's done.

I'll be back soon with another PhD update blog post with more details about the thesis writing process and everything else I've been up to over the last 2 months. I may also write a post on the hull science festival which I'll be attending on the supercomputing stand with a Cool Demo™, 'cause the demo is indeed very cool.

See you then!

Stardust Blog

Tag Cloud

PhD Update 20: Like a bad smell.....

Corrections

Corrections? What corrections?

Future research

BlueSky

Final thoughts

Teaching this September

PhD Update 19: The Reckoning

The Structure

My experience

Outcome

Looking ahead

PhD Update 18: The end and the beginning

Theeeeeeeeeeeeesis

Researchy things

Persuasive tactics challenge

Teaching

Conclusion

LaTeX templates for writing with the University of Hull's referencing style

PhD Update 17: Light at the end of the tunnel

Timescales

Writing

Predicting Persuasive Posts

Beyond the edge

Conclusion

Building the science festival demo: How to monkeypatch an npm package

Building the science festival demo: technical overview

Browsing for a solution

Graphics are easy

Octopain

Gasp, is that a memory leaks I see?!

Gamepad support

How does it fit in a browser anyway?!

Datasets, an origin story

Conclusion

My Hull Science Festival Demo: How do AIs understand text?

I've submitted a paper on my rainfall radar research to NLDL 2024!

Stardust
Blog