Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Defending against DDoS attacks hammering my git server

It's no sekret that I have a git server. I host all sorts of stuff on there - from stuff I've talked about on this blog to many other things I have, and still others that are private repositories that I can't share / yet for one reason or another.

While I can't remember exactly when I first set it up, I do remember that gitea wasn't even a thing back then, and I originally setup go git service.

Nowadays, I run the fork of gitea called forgejo, which is a fork of go git service.

Either way, it's been around for a while!

Unfortunately, now that smaller git servers are becoming more common (we still need a social/federated git standard like e.g. ActivityPub), so are attacks against such servers², and one I dealt with yesterday was particularly nasty, so I decided to make a blog post about it.

I'll have CPU for breakfast, lunch, and tea thank you

Before I explain how I dealt with it (mitigated is the technical term I understand), it's important to know the anatomy of the attack. After all, security is important but we can only be secure if we know what we're defending against.

The threat model, if you will.

In this case, the attacker sent random requests to random files on random commits in a large git repository I have on my aforementioned git server.

Yesterday, I measured almost 1 million unique IP addresses making exactly 2 requests at a time each.

If I had the energy, I'd plot em all on a hilbert curve² with a colour gradient for age, maybe even with an animation.

The result of all of this is 100% CPU usage on my 3rd generation dedicated server I rent and a slow terminal experience, because to serve each request Forgejo has to call a git subprocess to inspect the repository and extract the version of the file requested.

That's a very expensive way to handle a HTTP/S request!

At first, I thought I was infected, but further inspection of the logs revealed it not to be so.

With all this in mind, the goal of my expedition was to avoid the spammy HTTP/S calls from hitting the application server (forgejo).

This is all interesting, because it means that a number of common steps to achieve this won't work:

  1. We can't just block the IP address, because there are too many and most of them will be compromised IoT (Internet of Terrible security) devices etc in peoples' homes that are roped into being a botnet.
  2. We can't keep the git server turned off, because I need to use it
  3. I can't block access to the problematic paths on the server, because then the attacker will switch to another set and access to the git server is still impaired
  4. I can't just allow specific IP addresses through, as I have blog post stuff hosted on there and you, one of my readers, would be cut off from accessing it (and I access from my phone sometimes which doesn't have a fixed IP)

...so that just leaves us stuck right?

Teh solutionses!

No so. There's still a strategy that we haven't tried: a Web Application Firewall. Traditionally, such tools are big and very very expensive, but I discovered the other week a tool that did the job and inside an envelope (a couple of megabytes) and price point (free!) I could afford.

That tool is Anubis, and despite the.... interesting name it acts like something of a firewall that sits in front of on the application server, but behind your reverse proxy:

 Public Internet ║ Inside server                                               
                 ║                                                             
                 ╟───────────────┐         ┌───────────────┐  ┌───────────────┐
                 ║     Caddy     │         │    Anubis     │  │    Forgejo    │
Inbound  ────────▶       •       ├─────────▶       •       ├──▶       •       │
requests   80/tcp║ Reverse proxy │         │   Firewall    │  │  App server   │
          443/tcp╟───────────────┘         └───────────────┘  └───────────────┘
                 ║                                localhost           localhost
                 ║                                 2999/tcp            3000/tcp

Essentially, when each request comes in it weighs the risk of a request. 'high-risk' requests, such as those coming from browsers which attackers love to impersonate, get served a small challenge that they must solve to gain access to the website. Low-risk clients, such as git or curl or elinks can go straight through.

This is in the form of a hashing problem: the browser must tell the server what nonce (number only used once) that, alongside a given unique challenge string, produces a hash with a certain number of zeroes (0) when hashed.

Correctly completing the challenge (which doesn't take very long), sets a cookie for that client to gain access to the website without completing another challenge for a certain period of time.

I could go on, but the official documentation explains it pretty well.

Essentially, by serving challenges to high-risk clients instead of allowing requests straight through attempts to access expensive HTTP/S calls (such as loading a random file from a random commit in a random git repo) a server's resources can be protected to give a better experience to the users who use it on a day-to-day basis.

This isn't without its flaws - namely inadvertently blocking good bots - but it does strike enough of a balance that I can keep my git server online without giving up the entirety of my server's resources in the process, which I need to use for other things.

But how?!

I'll assume you already have some sort of reverse proxy in front of some sort of application server. In my case, that's caddy and forgejo.

Anubis' latest release can be downloaded from here, but for Debian/Ubuntu users who want an apt repository I'm rehosting the .deb files from Anubis' releases page in my personal apt repository:

https://apt.starbeamrainbowlabs.com/

Assuming you have an e.g. Ubuntu server, you'll want to install anubis and then navigate to /etc/anubis, in which you should create a configuration file with the name of the user account you'll be starting anubis under.

Each instance of anubis can only handle 1 domain/app at a time, so you'll want 1 system user account per application you want to protect.

For example, I have a config file at /etc/anubis/anubis-git.env with the following content:

TARGET=http://[::1]:3000
BIND=:2999
METRICS_BIND=:2998

....my internal git server is listening on port 3000 on the IPv6 localhost address ::1 for HTTP requests, so that's the target that anubis should forward requests to, as in the ASCII diagram above (made in monosketch).

Then, start the new anubis instance like so:

sudo systemctl enable --now anubis@anubis-git.service

....in my case, the username I created (sudo useradd --system anubis-git etc etc) was anubis-git, so that's what goes in the filename above and after the @ sign when we start the service.

If you haven't seen this syntax before in systemd service names, it allows you to set the username that a supporting service file will start a service with. syncthing does the same thing with the default systemd service definition it provides.

In other words, it lets you start multiple instances of the same service without them clashing with each other.

At any rate, the final piece of the puzzle is telling your reverse proxy to talk to anubis:

git.starbeamrainbowlabs.com {
    log

    reverse_proxy http://[::1]:2999 {
        # ref anubis config setup both of these are required
        header_up X-Http-Version {http.request.proto}
        # ref anubis config, this is esp. required
        header_up X-Real-Ip {remote_host}
    }
}

Replace http://[::1]:2999 with the address of Anubis instead of your application server directly, then check the config and reload:

sudo caddy validate -c /etc/caddy/Caddyfile && sudo systemctl reload caddy

(replacing /etc/caddy/Caddyfile with the path to your Caddyfile of course)

Conclusion

....and you're done!

We've successfully put an application server behind anubis to protect it from malicious requests.

Over time, I assume I will need to tweak the anubis settings, which is possible through what seems to be a rather detailed policy file system (which allows RSS/Atom files through by default, if you're crazy enough to be subbed to any feeds from my git server).

If something seems broken to you now that I've set this up, please do get in touch and I'll try my best to help you out.

I'll be continuing to keep an eye on my web server traffic to see if anything gets through that shouldn't, and adjusting my response as necessary.

Thanks for sticking with me, and when I have the energy I have lots of other cool things to talk about here soon.

--Starbeamrainbowlabs

Aside: IP blocking with Caddy

While implementing the above approach, I found I did need to bring my git server up for my Continuous Integration system (I implemented it well before forgejo got workers and I haven't checked out the latter yet) to work.

To do this, I temporarily implemented an IP address-based allowlist.

If you're curious, here's the code for that:

# temp solution to block anyone who isn't in the allowlist outright
# note that given the sheer range of IPs from what's probably a compromised device-based botnet, we can't just IP block this long-term.
@denied not client_ip 1.2.3.4 5.6.7.8/24 127.0.0.1/8 ::1/128
abort @denied

....throw this in one of the server blocks in your Caddyfile before a reverse_proxy directive - changing the allowed IP addresses of course (leave the IPv4 & IPv6 ones!) - validate & reload, and you should have an instant IP address allowlist system in place!

PhD Update 20: Like a bad smell.....

Hi again! Another wild blog post appeared. PhD corrections, I have come to realise, have a habit of hanging around long past the time you want to have had them finished and done.

Before we get into all of that though, here's the customary list of posts:

See also Doing a 3-way dataset split in Tensorflow // PhD Aside 3, which I posted since the last one of these PhD update blog posts.

Things have not been easy over the last 9 months, but I am making it through. I can't promise when blog posts will come, but know that I have a lot of ideas and it's just a case of having the energy to write them. See also my fediverse account @sbrl@fediscience.org (rss feed) for smaller updates in between times.

Corrections

One of the things I did not realise when I started my PhD was just how much work you are expected to do outside of your main scholarshiped research period. There's writing the thesis, doing the viva, and, of course, the corrections afterwards.

While I'm not sure how much I can share about the corrections I've been given I can say that the process of completing them has been both long and annoying.

More importantly, it is now coming to a close!

Yep, that's right: I'm almost done with my corrections! I just need a laundry list of people to check them and say they are okay, and then I can finally get this PhD thing over with and move on to cooler researchy things that I'll talk about later in this blog post.

As with my viva, the corrections I received were mainly organisational and big picture stuff in nature, though I also had my fair share of experiment redos to complete (ewwww), which have been very time consuming.

Once everything is done, I do plan on making my thesis available for free here on my website if my institution will let me. I doubt it will get indexed in any scholarly search engines any time soon, but hopefully it will be useful to someone here.

Speaking of, I'd like to share a Cool Graph™ I created whilst doing my corrections:

A grid of graphs showing the stability of the metrics for my rainfall radar models over 7 runs. There are areas shaded on the graphs to show the standard deviation and min/max values for each epoch

(Above: A grid of graphs showing the stability of the metrics for my rainfall radar models over 7 runs)

This is, as the caption suggests, a random cross-validation (because anything else would be far too complicated to implement) run of my rainfall radar model I have mentioned before. This was one of the things I was asked to do in my corrections.

The code behind this is kinda cool, as it aggregates a metrics.tsv file from every experiment directory in in a given directory.

As I write this I realise that's kinda confusing, so let me show you what the directory of experiments actually looks like:

+ 202412_crossval-stbl7
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-A
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-B
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-C
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-D
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-E
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-F
    + 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-G

The experiment series directory (202412_crossval-stbl7) contains 7 different runs, matching the stbl7 part of the experiment series name crossval-stbl7.

The directory names there look pretty complicated, but it's actually just made of up all the parameters that I'm currently interested in for that experiment series. Each part is separated by an underscore _.

So, to break down 2024-12-12_deeplabv3+_rainfall_csgpu_ri_celdice_lr0.00001_us2_t0.1_bs32_crossval-stbl7-C:

  • 2024-12-12: The date the (individual) experiment was run. These all ran in parallel on a HPC at my university, hence all the dates are the same even though it takes more than 24 hours to train the rainfall radar model.
  • deeplabv3+: The architectural backbone of the model in question - this time DeepLabV3+.
  • rainfall: Identifier code for the project, rainfall is the traditional informal and short name I gave to the rainfall radar model.
  • csgpu: The place it was trained on. In this case csgpu is a small HPC cluster in my department.
  • ri: This marks the start of the experiment hyperparams I'm interested in, in no particular order (but the order usually remains constant for a given project). ri stands for Remove Isolated.
  • celdice: The loss function. Cross-Entropy Loss + Dice loss.
  • lr0.00001: The Learning Rate, this time 0.00001
  • us2: UpScale 2 - a property of the model in which it upscales the input x2 and downscales it just before the output. This improves the fidelity of the output at the cost of a higher memory usage.
  • t0.1: Threshold of 0.1 - the delimiter between water and no water. At some point, I want to split into multiple bins.
  • bs32: Batch Size of 32.
  • crossval-stbl7-C: The experiment series code - see above.
    • crossval-stbl: The main part of the experiment series code
    • 7: The number of cross-validation runs
    • C: The differentiator. In this case it's part C of the experiment series since they are all the same, but usually each model has something unique about it, e.g. regresstest-regress vs regresstest-class.

Hmmm, looking at this it might be a bit more complicated a system than I first expected, but it makes sense to me. I wonder if I've blogged about how I organise experiments already? If not, that should go on the todo list.

Anyway, this is the foundation of my entire organisational system for running experiments. I've developed quite an intricate system since I started running experiments in 2020, but fundamentally it is based on the principle of preserving as much information about any given experiment that I've run as possible, as I am sure to need it later.

Even if I don't think I'll need it!

In fact, especially if I don't think I'll need it, because I've been bitten enough times to know that it's not a case of if, it's most certainly a case of when.

Corrections? What corrections?

Not to get too distracted, while I don't think the University would like it very much if I shared my exact list of corrections, it boiled down to the following basic principles, in no particular order:

  • There wasn't a clear narrative carrying the problem forwards through the thesis
  • They wanted more experiments running to confirm the stability of the models trained - hence this post on a 3-way split since they wanted a 3-way split, and also hence the stability testing done to produce the graph above amongst others
  • They wanted a regression model training for the rainfall radar model and a comparative analysis against my existing classification-based approach

It doesn't sound like much, but it has been quite a lot of work to get to this point, especially since I have been doing more teaching than I expected starting in September last year. I'm glad now that I applied for a 6 month extension and for the help of the people around me (and 2 people in particular - not sure if I can mention your names, but you know who you are), otherwise I would have run out of time to complete my corrections long ago.

Future research

Now that my corrections are (hopefully) coming to and end and I'm starting to get a handle on the teaching I've been asked to do (wow, and that isn't even the half of it), I'm finally starting to get myself into a place in which I can FINALLY start to look forwards to some more research that is actually useful, as opposed to making seemingly endless corrections to my thesis (the social media chapter in particular I can do SO MUCH BETTER).

First of all are real improvements to my rainfall radar model. These improvements largely fall into a few categories:

  • Analysing and improving the model's ability to actually predict floods, and applying sample weighting (psst, secret second graph for those of you who are still reading!) to hopefully measuably improve my model's ability to make actually useful predictions
  • Swapping out the physics-based model the model is trained on because it's bad and I didn't prepare the data very well and it's all bad
  • Exanding the model's ability to predict multiple bins instead of just a binarised water/no-water situation

These are not necessarily in order, but I imagine I'll likely tackle them in something like this order.

On the social media side, I know that I can do so much better than my social media paper which somehow has 41 citations (just HOW??!). Binary sentiment analysis is cute and all, but at the intersection of AI, disaster situational awareness, and user interface (UI) design and user experience (UX) I believe that I can do much better in the organisation of unstructured data.

With the use of contemporary AI algorithms and UI/UX, the extraction and presentation of richer information should be possible.

Even though these research plans won't be part of my PhD, I will still continue blogging about it! Who knows, I might even start a new long-running blog post series to mark the beginning of a new era in my life.

And, of course, I'll continue to share Cool Graphs™!

BlueSky

BlueSky: As a last thing, I'm going to blog about it at some point but I'm now using bridgy fed to allow you to follow me on Bluesky! I'm @sbrl.fediscience.org.ap.brid.gy, and to interact with me you'll need to follow @ap.brid.gy.

BlueSky seems to be becoming very popular, especially my circles - but while it pretends to be decentralised, it isn't. For this reason and others, my primary social media will remain on the Fediverse to ensure and preserve the long-term viability of my account.

I encourage you to join the fediverse too - it's a nice and friendly place :D

Final thoughts

It has been a long road, but I am finally nearing the end of one book and the beginning of another. This is not the last post in this series - I have at least 1 more planned. When I have the energy, I want to talk about my experiences learning to teach (I'm doing a course called PCAP right now, as it was a stipulation of my contract) in what may be a longer blog post than I expect.

I'm looking forward to continuing my research journey and blogging along the way right here at my stardust blog (I think this is the first time I've mentioned my blog's name!).

I'll see you next time, in what might be one of the last blog posts in this series: PhD Update 21: Where the water meets the sky.

--Starbeamrainbowlabs

Doing a 3-way dataset split in Tensorflow // PhD Aside 3

Heya!

It's been a while since I talked about my PhD, so I wanna change that today!

Teaching post coming just as soon as I find the energy to write it.

Anyway, one of the things I have been asked to do as part of my thesis corrections is to split my dataset into 3 parts and then run some random experiments.

This seemed like an odd request to me when there are more important things I need to be doing to stablise the models I'm training, but in the process of implementing support for this in a number of my models I hit upon a snag:

Tensorflow doesn't have native support for 3-way dataset splits!

If Tensorflow (tf) doesn't have support for it, then clearly it can't be that important :P

In all seriousness though, it did mean that I needed a solution and after searching around and getting nowhere very quickly (you really don't need to go to all this trouble to solve this one), I decided that something must be done!

So, I ended up implementing a thing that I thought was rather cool, so I thought I'd share it here.

Other parts in this PhD aside series:

More coming soon as I get distracted and find cool shiny things while doing my PhD! If you want more PhD-related posts, do check the PhD tag here on my blog.

Splitting into multiple pieces

As a quick reminder, when training an AI we usually split the dataset we're training it on into 2 parts:

  1. Training (usually ~80%)
  2. Validation (usually ~20%)

....the model only gets to learn from the training data, and we hold the validation data back so we can test the model on it later. If it does a lot worse on the validation dataset than on the training dataset, then we can reasonably conclude that the model isn't generalising very well to new data/samples/etc/ it hasn't seen before.

However, sometimes people decide it's a great idea to split a dataset into 3 parts rather than 2.

The reasoning behind this - as far as I'm aware, is that while you're working on optimising your model by trying different architectures, hyperparameters, etc, you are in a sense optimising for the model's performance on the validation dataset.

So, to this end it is suggested that by having a third split - called the test split, one can evaluate the model again and make really really sure how well it's generalising (or not) to new data.

This is just as much as I know at the moment and it doesn't really make sense in my head, so if you have a better way of explaining it please do leave a comment below.

Yeah, Tensorflow sucks

.....maybe not really (though CUDA / GPU support does suck very much >_<). The thing to remember about AI model frameworks like Tensorflow is that data loading efficiency is everything. A tangent for another time perhaps (is your model training slowly? then this is most likely your issue!), but what matters here is that the signature of tf.keras.Model.fit() (the function that actually trains the shiny new model you've just created / loaded from disk / etc) looks a bit like this:

model.fit(
    x, # probably either a tf.Tensor or a tf.data.Dataset 
    y, # probably the same as x immediately above
    validation_split=0.0, # Percentage of x to treat as the validation dataset
    validation_data=None, # The validation dataset - see above
    callbacks=None, # A list of functions to call at different times - this will become important later
    # .....
)

....this is just terrible design, if you ask me. x and y here are the input(s) and ground truth labels respectively.

Single letter variable names should not be allowed!

Unfortunately, the only options we have for inputting another dataset to calculate metrics on (e.g. cross-validation, dice coefficient, intersection-over-union - IoU, etc etc etc) is to pass a validation dataset - there's no option to pass e.g. a list() of datasets to evaluate on, which is a shame as it would almost make sense.

Cracking the glass

So, we're at a loss right? What to do? We have 3 nice neat tf.data.Datasets ready do go and no way to ensure we get metrics calculated reliably on the 3rd one.

The solution here that I came up with is to write a custom callback that manually iterates over the dataset and calculates the metrics, before then sneakily appending them to TF's main metrics log system so that TF never suspects a thing, and writes them out along with all the other metrics for us :D

Learning to work within the framework of your choice is a key part of the process. Don't just try and hack your way around it - most frameworks - TF included - provide many different ways to manipulate tensors etc if you learn how they work.

In this case, we know that Tensorflow has a callback system, because TF ships with a number of default callbacks like tf.keras.callbacks.CSVLogger (which can also output as TSV, my favourite file format when I'm not using jsonl).

Writing a custom callback is a 2 step process:

  1. Write a custom class that inherits from tf.keras.callbacks.Callback
  2. Instantiate an instance of our new class and pass it to tf.keras.Model.fit()

Let's go through these 1 by 1.

Writing a callback

As mentioned, inheriting from tf.keras.callbacks.Callback is the aim of the game here. We can make it really quite lightweight and do it within 10 lines of code:

import tensorflow as tf

class CallbackExtraValidation(tf.keras.callbacks.Callback):
    def __init__(self):
        super(CallbackExtraValidation, self).__init__()
        pass

    def on_epoch_end(self, epoch, logs=None):
        pass

The way callbacks work in TF is that they are a class with a bunch of methods. The main tf.keras.callbacks.Callback class defines a bunch of empty methods, and then you override the ones you're interested in.

Some examples of things you can add a callback for:

  • The start/end of training as a whole
  • The start/end of every batch (sometimes called a step, but a step cans ometimes mean multiple batches at once)
  • The start/end of every epoch (which most of the time but not always is before/after the entire dataset has been seen by the model once)
  • The start/end of validation

....and so on.

You get the picture: it's a way that you can add simple hooks that let you run custom functions that do stuff at a time of your choosing.

In our case, we know that we wanna evaluate our model on an extra dataset at the end of every epoch, so we added a method for that.

Check the docs for a comprehensive look at the possible functions you can override here:

https://devdocs.io/tensorflow~2.9/keras/callbacks/callback

Now we have a custom function running when we want it to, it's just a case of grabbing the dataset(s) and evaluating the model with em:

import tensorflow as tf

class CallbackExtraValidation(tf.keras.callbacks.Callback):
    def __init__(self, datasets, verbose="auto"):
        super(CallbackExtraValidation, self).__init__()
        self.datasets = datasets # Dictionary in the form { string: tf.data.Dataset }, where `string` here is the name of the dataset
        self.verbose = verbose

    def on_epoch_end(self, epoch, logs=None):
        for name, dataset in self.datasets.items():
            metrics = self.model.evaluate(
                dataset,
                verbose=self.verbose,
                return_dict=True
            )

Excellent! See, we can even throw in enumerating over a list of datasets with minimal effort (I'm looking at you, Tensorflow!).

Okay, so we have metrics by cheesing the issue and asking TF to evaluate a model. We even have a reference to the model in question provided for us by Tensorflow - how very kind!

....but how do we hoodwink Tensorflow and slip the extra metrics we've calculated into the main metrics stream without it noticing?

The solution has actually been provided by Tensorflow itself: The logs argument that is passed to on_epoch_end is the metrics log for that current epoch, so we can just update it.

We don't wanna overwrite any of the existing metrics though, so the solution is just to prepend a static string to the name of each metric as we copy it over:

for metric_name, metric_value in metrics.items():
    logs[f"{name}_{metric_name}"] = metric_value

See? Easy peasy!

That's all there is to it.

Of course, a few additional checks need to be added because I'm paranoid and Tensorflow sometimes passes None instead of the logs instance for a giggle, but it does work reliably.

Find the full version of this class with those checks here:

https://github.com/sbrl/research-rainfallradar/blob/897bfa3/aimodel/src/lib/ai/components/CallbackExtraValidation.py

Find the full version with updates (though the link miiight break if I rejig the repo's directory structure or rename the class etc):

https://github.com/sbrl/research-rainfallradar/blob/897bfa3/aimodel/src/lib/ai/components/CallbackExtraValidation.py

How to pass to .fit()?

Just for completeness, this is how you pass the above in .fit():

model.fit(
    dataset_input,
    dataset_labels,
    callbacks = [
        CallbackExtraValidation({
            "test": my_extra_dataset
        })
    ]
)

..in short, the callbacks argument of tf.keras.Model takes a list() of tf.keras.callbacks.Callback instances. Simply instantiate an instance of your custom callback class, and you're away, just as if it were an official part of the Tensorflow framework!

Conclusion

This has been a not-so-quick post (I swear I meant this to be shorter.....!) about doing a 3-way dataset split in Tensorflow because Tensorflow doesn't support it natively.

I hope you found this useful - let me know if you have any other questions or comments - I'm happy to answer them - both in the comments below and as subjeccts for future blog posts!

The end of the teaching is on the horizon (though I'm having to do a teaching course which is meh because it's eating into my time again) - so I'm hoping that my energy levels will start to recover once I've found a new rhythm for this new semester that is just starting.

Happy new year 2025! o/

Heya there!

This is just a short post to say hi, and happy new year! I hope everyone had a great christmas / winter break.

As I battle a bit of burnout from doing rather too much teaching related tasks in a short space of time (hoping to do a detailed post on my teaching experiences soon), I have actually achieved a lot this past year despite this:

Looking ahead

Looking ahead, a significant portion of my energy is going to be spent on getting my corrections done for my PhD. It is requiring the re-running of a bunch of experiments with a grab-bag of new features being added to the codebases, which if you have been following the commit history have been arriving slowly.

I'll hopefully be at a good enough point soon-ish to write another PhD update blog post about this.

Last year, I said that I hoped 2024 would be the year I finally finish my PhD. It was..... kinda sorta maybe okay-not-really. This time I really do want 2025 to be the year I finally finish this PhD....... I'm ready to just be done with the stupid thing now.

In 2025 I want to dedicate more time to blogging here. 2024 has not been a great year for this blog, so I want to try and change that this year. I have had lots of ideas for blog posts..... I just haven't had the energy to write them. Hopefully this will mean lots of cool new blog posts about things I've learnt and found!

If you're interested in keeping up to date with what I've been up to, I can recommend following me on the fediverse (@sbrl@fediscience.org). I post smaller stuff there that either isn't bug enough for a blog post, or I don't have the energy to blog about at the time.

Mastodon (the fediverse software used by the instance I'm on) has a built-in RSS feed, if that's your jam: https://fediscience.org/@sbrl.rss

I want to do some incremental improvements to my website here soon - including tidying up and finishing the list of researchy things I've been doing on my homepage. Nothing too ground breaking (though I have bigger plans for a better backend to this blog, but I need lotsa time to impleent it)

Final thoughts

This last year has been rather stressful and emotional in many different ways - including some I have not mentioned here (like people I know very well leaving the University, including my primary supervisor, though I am maintaining regular and normal contact). I hope that 2025 is less stressful than 2024.

If there's something you'd like me to blog about that I've been doing that I haven't blogged about yet, I've probably forgotten about it. Please get in touch by leaving a comment below!

A cute wooden bauble in the dark on a christmas tree from a few years ago. It has a snowman against the night sky with a postbox with a bird sitting on it and some trees and stuff. The multicoloured christmas lights are turned on and shining brightly. I like this bauble very much.

(Source: Taken by me. See alt text for detailed description.)

Thanks for sticking with me for these last 10 years. There is ALWAYS hope. Especially when you can't see it.

--Starbeamrainbowlabs, your friendly but very tired blogger

Compiling the wacom driver from source to fix tilt & rotation support

I was all sat down and setup to do some digital drawing the other day, and then I finally snapped. My graphics tablet (a secondhand Wacom Intuos Pro S from Vinted) - which supports pen tilt - was not functioning correctly. Due to a bug that has yet to be patched, the tilt X/Y coordinates were being wrongly interpreted as unsigned integers (i.e. uint32) instead of signed integers (e.g. int32). This had the effect of causing the rotational calculation to jump around randomly, making it difficult when drawing.

So, given that someone had kindly posted a source patch, I set about compiling the driver from source. For some reason that is currently unclear to me, it is not being merged into the main wacom tablet driver repository. This leaves compiling from source with the patch the only option here that is currently available.

It worked! I was so ecstatic. I had tilt functionality for the first time!

Fast-forward to yesterday....... and it broke again, and I first noticed because I am left-handed and I have a script that flips the mapping of the pad around so I can use it the opposite way around.

I have since fixed it, but the entire process took me long enough to figure out that I realised that I was halfway there to writing a blog post as a comment on the aforementioned GitHub issue, so I decided to just go the rest of the way and write this up into a full blog post / tutorially kinda thing and do the drawing I wanted to do in the first place tomorrow.

In short, there are 2 parts to this:

  • input-wacom, the kernel driver
  • xf86-input-wacom, the X11 driver that talks to the kernel driver

....and they both have to be compiled separately, as I discovered yesterday.

Who is this for?

If you've got a Wacom Intuos tablet that supports pen tilt / rotation, then this blog post is for you.

Mine is a Wacom Intuos Pro S PTH-460.

This tutorial has been written on Ubuntu 24.04, but it should work for other systems too.

If there's the demand I might put together a package and put it in my apt repo, though naturally this will be limited to the versions of Ubuntu I personally use on my laptop - though do tend to upgrade through the 6-monthly updates.

I could also put together an AUR package, but currently on the devices I run Artix (Arch derivative) I don't usually have a tilt-supporting graphics tablet physically nearby when I'm using them and they run Wayland for unavoidable reasons.

Linux MY_DEVICE_NAME 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kompiling the kernel module

Navigate to a clean directory somewhere persistent, as you may need to get back to it later.

If you have the kernel driver installed, then uninstall it now.

On Ubuntu / apt-based systems, they bundle the kernel module and the X11 driver bit all in a single package..... hence the reason why we hafta do all the legwork of compiling and installing both the kernel module and the X11 driver from source :-/

e.g. on Ubuntu:

sudo apt remove xserver-xorg-input-wacom

Then, clone the git repo and checkout the right branch:

git clone https://github.com/jigpu/input-wacom.git -b fix-445
cd input-wacom;

....then, ref the official instructions install build-time dependencies if required:

sudo apt-get install build-essential autoconf linux-headers-$(uname -r)

...check if you have these installed already by replacing apt-get install with apt-cache policy.

Then, build and install all-in-one:

if test -x ./autogen.sh; then ./autogen.sh; else ./configure; fi && make && sudo make install || echo "Build Failed"

....this will prompt for a password to install directly into your system. I think they recommend to do it this way to simplify the build process for people.

This should complete our khecklist for the kernel module, but to activate it you'll need to reboot.

Don't bother doing that right now though on Ubuntu, since we have the X11 driver to go. For users on systems lucky enough to split the 2 drivers up, then you can just reboot here.

You can check (after rebooting!) if you've got the right input-wacom kernel module with this command:

grep "" /sys/module/wacom*/version

....my research suggests you need to have a wacom tablet plugged in for this to work.

If you get something like this:

$ grep "" /sys/module/wacom*/version
v2.00

....then you're still using your distribution-provided wacom kernel module. Go uninstall it!

The output you're looking for should look a bit like this:

$ grep "" /sys/module/wacom*/version
v2.00-1.2.0.37.g2c27caa

Compiling the X11 driver

Next up is xf86-input-wacom, the X11 side of things.

Instructions for this are partially sourced from https://github.com/linuxwacom/xf86-input-wacom/wiki/Building-The-Driver#building-with-autotools.

First, install dependencies:

sudo apt-get install autoconf pkg-config make xutils-dev libtool xserver-xorg-dev$(dpkg -S $(which Xorg) | grep -Eo -- "-hwe-[^:]*") libx11-dev libxi-dev libxrandr-dev libxinerama-dev libudev-dev

Then, clone the git repository and checkout the latest release:

git clone https://github.com/linuxwacom/xf86-input-wacom.git
cd "xf86-input-wacom";
git tag; # Pick the latest one from this list
git switch "$(git tag | tail -n1)"; # Basically git switch TAG_NAME

It should be at the bottom, or at least that's what I found. For me, that was xf86-input-wacom-1.2.3.

Then, to build and install the software from source, run these 2 commands one at a time:

set -- --prefix="/usr" --libdir="$(readlink -e $(ls -d /usr/lib*/xorg/modules/input/../../../ | head -n1))"
if test -x ./autogen.sh; then ./autogen.sh "$@"; else ./configure "$@"; fi && make && sudo make install || echo "Build Failed"

Now you should have the X11 side of things installed. In my case that includes xsetwacom, the (questionably designed) CLI for managing the properties of connected graphics tablets.

If that is not the case for you, you can extract it from the Ubuntu apt package:

apt download xserver-xorg-input-wacom
dpkg -x DEB_FILEPATH_HERE .
ar xv DEB_FILEPATH_HERE # or, if you don't have dpkg for some reason

....then, go locate the tool and put it somewhere in your PATH. I recommend somewhere towards the end in case you forget and fiddle with your setup some more later, so it gets overridden automatically. When I was fidddling around, that was /usr/local/games for me.

Making X11 like the kernel Driver

Or also known as enabling hotplug support. Or getting the kernel module and X11 to play nicely with each other.

This is required to make udev (the daemon that listens for devices to be plugged into the machine and then performs custom actions on them) tell the X server that you've plugged in your graphics tablet, or X11 to recognise that tablet devices are indeed tablet devices, or something else vaguely similar to that effect.

Thankfully, this just requires the installation of a single configuration file in a directory that may not exist for you yet - especially if you uninstalled your distro's wacom driver package.

Do it like this:

mkdir -p /etc/X11/xorg.conf.d/;
sudo curl -sSv https://raw.githubusercontent.com/linuxwacom/xf86-input-wacom/refs/heads/master/conf/70-wacom.conf -o /etc/X11/xorg.conf.d/70-wacom.conf

Just case they move things around as I've seen happen in far too many tutorials with broken links before, the direct link to the exact commit of this file I used is:

https://github.com/linuxwacom/xf86-input-wacom/blob/47552e13e714ab6b8c2dcbce0d7e0bca6d8a8bf0/conf/70-wacom.conf

Final steps

With all that done and out of the way, reboot. This serves 2 purposes:

  1. Reloading the correct kernel module
  2. Restarting the X11 server so it has the new driver.

Make sure to use the above instructions to check you are indeed running the right version of the input-wacom kernel module.

If all goes well, tilt/rotation support should now work in the painting program of your choice.

For me, that's Krita, the AppImage of which I bundle into my apt repository because I like the latest version:

https://apt.starbeamrainbowlabs.com/

The red text "Look! Negative TX/TY (TiltX / TiltY) numbers!" crudely overlaid using the Shutter screenshotting tool on top of a screenshot of the Krita tablet tester with a red arrow pointing at the TX/TY values highlighted in yellow.

Conclusion

Phew, I have no idea where this blog post has come from. Hopefully it is useful to someone else out there who also owns an tilt-supporting wacom tablet who is encountering a similar kinda issue.

Ref teaching and the previous post, preparing teaching content is starting to slwo down now thankfully. Ahead are the uncharted waters of assessment - it is unclear to me how much energy that will take to deal with.

Hopefully though there will be more PhD time (post on PhD corrections..... eventually) and free energy to spend on writing more blog posts for here! This one was enjoyable to write, if rather unexpected.

Has this helped you? Are you still stuck? Do report any issues to the authors of the above two packages I've shown in this post!

Comments below are also appreciated, both large and small.

Ducks

Heya!

I like ducks. Have some ducks:

Some duck wallpaper

(Source: Unknown. If you're the creator and can prove it, comment below and I'll attribute properly)

....teaching is not easy, and I don't like preparing content week-by-week (i.e. preparing content to teach the next week) very much at all.

I recognise this is the longest break in blog posts there has been since I began this on 29th June 2014 (wow, it has been 10 years here already?!). Both energy and time are extraordinarily low at the moment (I have not had even a moment to work on my PhD corrections in over a month at this point).

However, there is hope that this is not a permanent state of affairs (and if I have anything to say about it, it won't be).

Hopefully in a few weeks things should improve to the point that I have energy to work on my PhD and post here again.

I've got several cool ideas for posts that I want to write:

  • Most obviously, I want to write a blog post about my experiences teaching
  • I've found a really neat solution to a 3-way split of a dataset in Tensorflow

And others that are a thing in the background.

Just a short check in to let everyone know that while I am very exhausted I am also very determined to keep this blog going as a permanent thing.

It's not much of a 10 year celebration, but if you've been reading here for a while I thank you SO MUCH for continuing to stick around, even if you don't comment.

There is always hope.

--Starbeamrainbowlabs

Teaching this September

A banner from a game long lost. Maybe I'll remake it and finish it someday.

Hello!

Believe it or not, I'm going to be teachificatinating a thing at University this semester, which starts at the end of this month and lasts until around December-ish time (yeah, I'm surprised too).

It's called Secure Digital Infrastructure, and I'll be teaching Linux and system administration skills, so that includes the following sorta-areas:

(related posts aren't necessarily the exact content I'm going to cover, but are related)

To this end, it is quite stressful and is taking significantly more energy than I expected to prepare for this.

I definitely want to talk about it here, but that will likely happen after the fact - probably some time in January or February.

Please be patient with me as I navigate this new and unexpected experience :-)

--Starbeamrainbowlabs

A banner from a game long lost. Maybe I'll remake it and finish it someday.

PhD Update 19: The Reckoning

The inevitability of all PhDs. At first it seems distant and ephemeral, but it is also the inescapable and unavoidable destination for all on the epic journey of the PhD.

Sit down and listen as I tell my own tale of the event I speak of.

I am, of course, talking about the PhD Viva. It differs from country to country, but here in the UK the viva is an "exam" that happens a few months after you have submitted your thesis (PhD Update 18: The end and the beginning). Unlike across the pond in the US, in the UK vivas are a much more private affair, with only you, the chair, and your internal and external examiners normally attending.

In my case, that was 2 externals (as I am also staff, ref Achievement get: Experimental Officer Position!), an internal, and of course the chair. I won't name them as I'm unsure of policy there, but they were experts in the field and very kind people.

I write this a few weeks removed from the actual event (see also my post on Fediscience at the time), and I thought that my viva itself deserved a special entry in this series dedicated to it.

My purpose in this post is to talk about my experience as honestly and candidly as I can, and offer some helpful advice from someone who has now been through the process.

The Structure

The viva itself took about 4 hours. It's actually a pretty complicated affair: all your examiners (both internal and external) have to read your thesis and come up with a list of questions (hidden from you of course). Then, on the day but before you enter the room they have to debate who is going to ask what to avoid duplication.

In practice this usually means that the examiners will meet in the morning to discuss, before having lunch and then convening for the actual viva bit where they ask the questions. In my case, I entered the room to meet the examiners and say hi, before leaving again for them to sort out who was going to ask what.

Then, the main part of the viva simply consists of you answering all the questions that they have for you. Once all the questions are answered, then the viva is done.

You are usually allowed a copy of your thesis in one form or another to assist you while answering their questions. The exact form this will take varies from institution to institution, so I recommended always checking this with someone in charge (e.g. the Doctoral College in my case) well in advance - you don't want to be hit with paperwork and confusion minutes before your viva is scheduled to start!

After the questions, you leave the room again for the examiners to deliberate over what the outcome will be, before calling you back into the room to give you the news.

Once they have done this: the whole thing is over and you can go sleep (trust me, you will not want to do anything else).

My experience

As I alluded to in the aforementioned post on fediscience (a node in the fediverse), I found the viva a significantly intense experience - and one I'm not keen on repeating any time soon. I strongly recommend having someone nearby as emotional support for after the viva and during those periods when you have to step out of the room. I am not ashamed to admit that there were tears after the exam had ended.

More of the questions than I expected focused on the 'big picture' kinda stuff, like how my research questions linked in with the rest of the thesis, and how the thesis flowed. I was prepared for technical questions -- and there were some technical questions -- but the 'fluffy stuff' kinda questions caught me a little off guard. For example, there were some questions about my introduction and how while I introduced the subject matter well, the jump into the technical stuff with the research questions was quite jarring, with concepts mentioned that weren't introduced beforehand.

To this end, I can recommend looking over the 'big picture' stuff beforehand so that you are prepared for questions that quiz you on your motivations for doing your research in the first place and question different aspects research questions.

It can also feel quite demoralising, being questioned for hours on what has been your entire life for multiple years. It can feel like all you have done is pointless, and you need to start over. While it is sure that you could improve upon your methods if you started from scratch, remember that you have worked hard to get to this point! You have discovered things that were not known to the world before your research began, and that is a significant accomplishment!

Try not to think too hard about the corrections you will need to make once the viva is done. Institutions differ, but in my case it is the job of the chair to compile the list of corrections and then send them to you (in one form or another). The list of corrections - even if they are explained to you verbally when you go back in to receive the result - may surprise you.

Outcome

As I am sure that most of you reading this are wondering, what was my result?! Before I tell you, I will preface the answer to your burning question with a list of the possible outcomes:

  • Pass with no corrections (extremely rare)
  • Pass with X months corrections (common, where X is a multiple of 3)
  • Fail (also extremely rare)

In my case, I passed with corrections!

It is complicated by the fact that while the panel decided that I had 6 months of corrections to do, I am not able to spend 100% of my time doing them. To this end, it is currently undefined how long I will have to do them - paperwork is still being sorted out.

The reasons for this are many, but chief among them is that I will be doing some teaching in September - more to come on my experience doing that in a separate post (series?) just as soon as I have clarified what I can talk about and what I can't.

I have yet to recieve a list of the corrections themselves (although I have not checked my email recently as I'm on holiday now as I write this), but it is likely that the corrections will include re-running some experiments - a process I have begun already.

Looking ahead

So here we are. I have passed my viva with corrections! This is not the end of this series - I will keep everyone updated in future posts as I work through the corrections.

I also intend to write a post or two about my experience learning to teach - a (side)quest that I am currently persuing in my capacity as Experimental Officer (research is still my focus - don't worry!)

Hopefully this post has provided some helpful insight into the process of the PhD viva - and my experience in mine.

The viva is not a destination: only a waypoint on a longer journey.

If you have any questions, I am happy to anwser them in the comments, and chat on the fediverse and via other related channels.

Ubuntu 24.04 upgrade report

Heya! I just upgraded to from Ubuntu 23.10 to Ubuntu 24.04 today, so I thought I'd publish a quick blog post on my experience. There are a number of issues to watch out for on this one.

tldr: Do not upgrade a machine to which you do not have physical access to 24.04 until the first point-release comes out!

While the do-release-upgrade itself went relatively well, I encountered a number of problematic issues that significantly affected the stability of my system afterwards, which I describe below, along with the fixes and workarounds that I applied.

Illustration of a striped numbat, looking up at fireflies against a pink and purple gradient background with light rays coming from the top corners

(Above: One of the official wallpapers for Ubuntu 24.04 Noble Numbat entitled "Little numbat boy", drawn by azskalt in Krita)

apt sources

Of course, any do-release-upgrade you run is going to disable third-party sources. But this time there's a new mysterious format for apt sources that looks a bit like this:

Enabled: yes
Signed-By: /etc/apt/trusted.gpg.d/sbrl.asc
Types: deb
URIs: https://apt.starbeamrainbowlabs.com/
Suites: ./
Components: 

....pretty strange, right? As it turns out, Ubuntu 24.04 has decided to switch to this new "DEB822" apt sources format by default, though I believe the existing format that looks like this:

deb [signed-by=/etc/apt/trusted.gpg.d/sbrl.asc] https://apt.starbeamrainbowlabs.com/ ./ # apt.starbeamrainbowlabs.com

....should still work. Something else to note: the signed-by there is now required, and sources won't work without it.

For more information, see steeldriver's Ask Ubuntu Answer here:

Where is the documentation for the new apt sources format used in 24.04? - Ask Ubuntu

Boot failure: plymouth and the splash screen

Another issue I encountered was this bug:

boot - Kubuntu 24.04 Black Screen / Not Booting After Upgrade - Ask Ubuntu

...basically, there's a problem with the splash screen which crashes the system because it tries to load an image before the graphics drivers load. The solution here is to disable the splash option in the grub settings.

This can be done either before you reboot into 24.04, or if you have already rebooted into 24.04, in the grub menu you can simply hit e on the default Ubuntu entry in your grub menu and then remove the word splash from the boot line there.

If you are lucky enough to see this post before you reboot, then simply edit /etc/default/grub and change quiet splash under GRUB_CMDLINE_LINUX_DEFAULT to be an empty string:

GRUB_CMDLINE_LINUX_DEFAULT=""

...and then update grub like so:

sudo update-grub

Boot failure: unable to even reach grub

A strange one I encountered was an inability to even reach grub, even if I manually select the grub.efi as a boot target via my UEFI firmware settings (I'm on an entroware laptop so that's F2, but your key will vary).

This one kinda stumped me, so I found this page:

Boot-Repair - Community Help Wiki

...which suggests a boot repair tool. Essentially it reinstalls grub and fixes a number of other common issues, such as a missing nvram entry for grub (UEFI systems need bootloaders registering against them), missing packages - I suspect this was the issue this time - and other common issues.

It did claim that my nvram was locked, but it still seems to have resolved the issue anyway. I do recommend booting into the live Ubuntu session with the toram kernel parameter (press e in the grub menu → add kernel parameter → press ctrl + x) and them removing your flash drive before running this tool, just to avoid it getting confused and messing with the bootloader on your flash drive - thus rendering it unusable - by accident.

Essentially, boot into a live environment, connect to the Internet, and run then these commands:

sudo add-apt-repository ppa:yannubuntu/boot-repair && sudo apt update
sudo apt install -y boot-repair
boot-repair

sudo is not required for some strange reason.

indicator-keyboard-service memory leak

Finally, there is a significant memory leak in indicator-keyboard-service - which I assume provides the media/function key functionality, which I only noticed because I have a system resource monitor running in my system tray (indicator-multiload; multiload-ng is an alternative version that may work if you have issues with the former).

The workaround I implemented was to move the offending binary aside and install a stub script in its place:

cd /usr/libexec/indicator-keyboard
sudo mv indicator-keyboard-service indicator-keyboard-service.bak
sudo nano indicator-keyboard-service

In the text editor for the replacement for indicator-keyboard-service, paste the following content:

#!/usr/bin/env sh
exit 0

...save and exit. Then, chmod +x:

sudo chmod +x indicator-keyboard-service

....this should at least workaround the issue so that you can regain system stability.

I run the Unity desktop, but this will likely affect the GNOME desktop and others too. There's already a bug report on Launchpad here:

Bug #2055388 "suspected memory leak with indicator-keyboard (causing gnome-session-flashback to freeze after startup)" : Bugs : indicator-keyboard package : Ubuntu

...if this issue affects you, do make sure to go and click the green text at this top-ish of the page to say so. The more people that say it affects them, the higher it will be on the priority list to fix.

Conclusion

A number of significant issues currently plague the upgrade process to 24.04:

  • Memory leaks from indicator-keyboard-service
  • Multiple issues preventing systems from booting by default

...I recommend that upgrading to 24.04 is done cautiously at this time. If you do not have physical access to a given system or do not have the time/energy to fix issues that prevent your system from booting successfully, I strongly recommend waiting for the first or second point release (i.e. 24.04.1 / 24.04.2) before upgrading.

If you haven't already, I also strongly recommend configuring timeshift to take automated snapshots of your system so that you can easily roll back in case of a failure.

Finally, I also recommend upgrading via the command line with this command:

sudo do-release-upgrade

...and carefully monitoring the logs as the upgrade process is running. Then, do not reboot as it asks you to until you have checked and resolved all of the above issues.

That's all I have at the moment for upgrading Ubuntu. I have 3 other systems to upgrade from 22.04, but I'll be waiting for the first point release before attempting that. I'll make another post (or a comment on this one) to let everyone know how it went when I do begin the process of upgrading them.

If you've encountered any issues in the upgrade process to 24.04 (or have any further insight into the issues I describe here), please do leave a comment below!

A memory tester for the days of UEFI

For the longest time, memtest86+ was a standard for testing sticks of RAM that one suspects may be faulty. I haven't used it in a while, but when I do use it I find that an OS-independent tool (i.e. one that you boot into instead of your normal operating system) is the most reliable way to identify faults with RAM.

It may surprise you, but I've had this post mostly written up for about 2 years...! I remembered about this post recently, and decided to rework some of it and post it here.

Since UEFI was invented (Unified Extensible Firmware Interface) and replaced the traditional BIOS for booting systems around the world, booting memtest86+ suddenly became more challenging, as it is not currently compatible with UEFI. Now, it has been updated to support UEFI though, so I thought I'd write a blog post about it - mainly because there are very rarely guides on booting images like memtest86+ from a multiboot flash drive, like the one I have blogged about before.

Before we begin, worthy of note is memtest86. While it has a very similar name, it is a variant of memtest86+ that is not open source. I have tried it though, and it works well too - brief instructions can be found for it at the end of this blog post.

I will assume that you have already followed my previous guide on setting up a multiboot flash drive. You can find that guide here:

Multi-boot + data + multi-partition = octopus flash drive 2.0?

Alternatively, anywhere you can find a grub config file you can probably follow this guide. I have yet to find an actually decent reference for the grub configuration file language, but if you know of one, please do post it in the comments.

Memtest86+ (the open source one)

Personally, I recommend the open-source Memtest86+. Since the update to version 7.0, it is now compatible with both BIOS and UEFI-based systems without any additional configuration, which is nice. See the above link to one of my previous blog posts if you would like a flash drive that boots both BIOS and UEFI grub at the same time.

To start, visit the official website, and scroll down to the download section. From here, you want to download the "Binary Files (.bin/.efi) For PXE and chainloading" version. Unzip the file you download, and you should see the following files:

memtest32.bin
memtest32.efi
memtest64.bin
memtest64.efi

....discard the files with the .efi file extension - these are for booting directly instead of being chainloaded by grub. As the names suggest, the ones with 64 in the filename are the ones for 64-bit systems, which includes most systems today. Copy these to the device of your choice, and the open up your relevant grub.cfg (or equivalent grub configuration file - /etc/default/grub on an already-installed system) for editing. Then, somewhere in there add the following:

submenu "Memtest86+" {
    if loadfont unicode ; then
        set gfxmode=1024x768,800x600,auto
        set gfxpayload=800x600,1024x768
        terminal_output gfxterm
    fi

    insmod linux

    menuentry "[amd64] Start Memtest86+, use built-in support for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=both
    }
    menuentry "[amd64] Start Memtest86+, use BIOS legacy emulation for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=legacy
    }
    menuentry "[amd64] Start Memtest86+, disable SMP and memory identification" {
        linux /images/memtest86/memtest64.bin nosmp nosm nobench
    }
}

...replace /images/memtest86/memtest64.bin with the path to the memtest64.bin (or memtest32.bin) file, relative to your grub.cfg file. I forget where I took the above config file from, but I can't find it in my history.

If you are doing this on an installed OS instead of a USB flash drive, then things get a little more complicated. You will need to dig around and find what your version of grub considers paths to be relative to, and put your memtest64.bin file somewhere nearby. If you have experience with this, then please do leave a comment below.

This should be all you need. For those using a grub setup for an already-installed OS (e.g. via /etc/default/grub), then you will need to run a command for your changes to take effect:

sudo update-grub

Adding shutdown/reboot/reboot to bios setup/firmware options

Another thing I discovered recently is how to add options to my grub menu to reboot, shutdown, and reboot into firmware settings. rEFInd (an alternative bootloader to grub that I like very much, but I haven't yet explored for booting multiple ISOs on a flash drive) has these in its menus by default, but grub doesn't - so since I discovered how to do it recently I thought I'd include the config here for reference.

Simply add the following somewhere in your grub configuration file:

menuentry "Reboot" {
    reboot
}

menuentry "Shut Down" {
    halt
}

menuentry "UEFI Firmware / BIOS Settings" {
    fwsetup
}

Bonus: Memtest86 (non open-source)

I followed [https://www.yosoygames.com.ar/wp/2020/03/installing-memtest86-on-uefi-grub2-ubuntu/] this guide, but ended up changing a few things, so I'll outline the process here. Again, I'll assume you alreaady have a multiboot flash drive.

Firstly, download memtest86-usb.zip and extract the contents. Then, find the memtest86-usb.img file and find the offset of the partition that contains the actual EFI image that is the memtest86 program:


fdisk -lu memtest86-usb.img

Disk memtest86-usb.img: 500 MiB, 524288000 bytes, 1024000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 68264C0F-858A-49F0-B692-195B64BE4DD7

Device              Start     End Sectors  Size Type
memtest86-usb.img1   2048  512000  509953  249M Microsoft basic data
memtest86-usb.img2 514048 1023966  509919  249M EFI System

Then, take the start position of the second partition (the last line that is highlighted), and multiply it by 512, the sector size. In my case, the number is 263192576. Then, mount the partition into a directory you have already created:

sudo mount -o loop,offset=263192576 memtest86-usb.img /absolute/path/to/dir

Then, browse the contents of the mounted partition and copy the EFI/BOOT directory off to your flash drive, and rename it to memtest86 or something.

Now, update your grub.cfg and add the following:

menuentry "memtest86" {
    chainloader /images/memtest/BOOTX64.efi
}

....replacing /images/memtest/BOOTX64.efi with the path to the BOOTX64.efi file that should be directly in the BOOT directory you copied off.

Finally, you should be able to try it out! Boot into your multiboot flash drive as normal, and then select the memtest86 option from the grub menu.

Extra note: booting from hard drives

This post is really turning into a random grab-bag of items in my grub config file, isn't it? Anyway, An option I don't use all that often (but is very useful when I do need it), are options to boot from the different hard drives in a machine. Since you can't get grub to figure out how many there are in advance, you have to statically define them ahead of time:



submenu "Boot from Hard Drive" {
    menuentry "Hard Drive 0" {
        set root=(hd0)
        chainloader +1
    }
    menuentry "Hard Drive 1" {
        set root=(hd1)
        chainloader +1
    }
    menuentry "Hard Drive 2" {
        set root=(hd2)
        chainloader +1
    }
    menuentry "Hard Drive 3" {
        set root=(hd3)
        chainloader +1
    }
}

....chainloading (aka calling another bootloader) is a wonderful thing :P

Of course, expand this as much as you like. I believe this approach also works with specific partitions with the syntax (hd0,X), where X is the partition number starting from 0.

Again, add to your grub.cfg file and update as above.

Conclusion

This post is more chaotic and disorganised than I expected, but I thought it would be useful to document some of the tweaks I've made to my multiboot flash drive setup over the years - something that has more proven its worth many many times since I first set it up.

We've added a memory (RAM) tester to our setup, using the open-source Memtest86+, and the alternative non-open-source version. We've also added options to reboot, shutdown, and enter the bios/uefi firmware settings.

Finally, we took a quick look at adding options to boot from different hard drives and partitions. If anyone knows how to add a menu item that could allow one to distinguish between different hard disks, partitions, their sizes, and their content more easily, please do leave a comment below.

Sources and further reading

Art by Mythdael