Starbeamrainbowlabs

Stardust
Blog

PhD, Update 8: Eggs in Baskets

I'm back again with another PhD update blog post! Before we begin, here's a list of all the parts in the series so far:

As in the previous post, progress since last time is split in 2: The Temporal CNN, and the social media side of things. I've started to split my time more evenly between the 2 sides, as it seems like the Temporal CNN is going to take lots more work than anticipated and I'd rather not put all my eggs in 1 basket.

Temporal CNN

As you might have guessed, the Temporal CNN still isn't learning anything, but at least now I think I know what the problem is. Since last time, I've done a bunch of debugging and tests to try and figure out what the problem is. During that process, I've managed to reach a record of ~20% accuracy, which at least gives me hope that it's going to work!

Specifically, I used the MNIST (alternative site) handwriting digit dataset with my "easy" task as explain in the previous post, but with a small difference: I pre-generated 2 random tensors to serve as the "below 5" and "5 and above" targets to predict instead of a pair of tensors filled with 0s or 1s respectively. The model didn't like this at all, so this is how I now know what the problem is.

For those interested, here's the laundry list of other things I've tried since last time:

Knowing what the problem is though is 1 thing, but solving it is another matter entirely. Thankfully, my supervisor and I have a plan to look into using a modified version of the latter half of a variational autoencoder and squidge it onto the tail end of the Temporal CNN. If it works, then I'm imagining that we'll need a new name for the Temporal CNN (suggestions?), but I'll tackle that once I've finished revising the model.

For context, a variational autoencoder is a modified "vanilla" autoencoder, and is 1 of 2 different main classes of generative AI model architecture - the other being Generative Adversarial Networks (GAN). In contrast to a GAN, a variational autoencoder does image-to-image translation with a single model, and maps an input parameter space onto an output parameter space. It first encodes the input to the model into a smaller tensor of features, before upscaling that back into an image again. In this fashion, it can learn to translate between 2 different images - for example putting glasses on people's faces.

To do this, I'm going to implement a vanilla variational autoencoder using the MNIST dataset, and once I've done this I'll then lift part of the model structure and transpose it onto the top of my existing Temporal CNN - by doing it this way I'll ensure that I have a known-good model to work with that is definitely capable of image-to-image translation.

Social Media

In other news, I've started to make some real progress on the social media side of things. I've downloaded and anonymised some tweets (the code for which is open source on npm under the package name twitter-academic-downloader - I intend to write a separate blog post about it at some point soon-ish), and I've also put together an LSTM-based model to start looking at doing some text classification.

I decided to implement said model in Python instead of Javascript, because for what I can tell Tensorflow.js doesn't come with as many batteries included as Tensorflow for Python does for natural language processing-based tasks. This has caused some interesting adventures (and a number of frustrating crashes), but I think I'm starting to get the hang of it.

In particular it's interesting coming from Tensorflow.js (which is a later project), because it seems that Tensorflow for Python is much less cohesive and more disjointed as a library compared to Tensorflow.js, which has learnt and applied lessons from the Python implementation - resulting in a much more cohesive and well thought out API. A prime example of this is the tf.Dataset vs tf.keras.Sequence in the Python version, which isn't an issue in Tensorflow.js, as in the Javascript bindings we have a single tf.Dataset.

This aside, my next step here is to train a significantly sized model that's larger than the mini model with a single layer and 100 units I've been using for testing purposes (that's my task for this afternoon - which I've likely done by the time you're reading this post).

In terms of literature, I've read a bunch more papers on the subject since last time - but I still feel like I've got more to read. Recently I read a series of papers about word embeddings (converting words into numerical tensors), which was very interesting. The process has evolved over the years, starting from a simple dictionary mapping incrementing numbers to words, to training an AI to generate said representations in increasingly sophisticated ways (starting with word2vec, then moving on to in no particular order ELMo, GloVe, and finally BERT - transformers are pretty incredible models). It was a fascinating read - I can recommend it to anyone who's interested in natural language processing (along with this excellent post)

In the model I've implemented, I've ultimately decided to go with GloVe (Global Vectors for Word Representation), as the pre-trained model is simply a text file containing a lookup table one can read into a dictionary or hash table.

Conclusion

Things have been moving forwards - albeit slowly. I've got an idea as to how I can resolve the issues I've been facing with the Temporal CNN (pending a new name once I'm done with all the modifications and I know what the model architecture is going to be like), though it's going to take a lot of work.

Things are finally starting to move in social media land - hopefully the accuracy of the LSTM-based model will be higher than that of the mini model I trained, which was only 50% on a balanced dataset - no better than blind guessing!

See you again in 2 months or so, when hopefully I'll have some real results to show (though of course I'll be keeping up with weekly posts about other things in the meantime). If you have any comments or questions about any of this - please leave a comment below! I'd love to hear your thoughts.

Sources and further reading

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Archive

Art by Mythdael