Starbeamrainbowlabs

About

Hello!

I am a computer science student researcher who is doing a PhD at the University of Hull. My project title is Using Big Data and AI to dynamically predict flood risk.

I started out teaching myself about various web technologies, and then I managed to get a place at University, where I am now. I've previously done a degree (BSc Computer Science) and a Masters (MSc Computer Science with Security and Distributed Computing) at the University of Hull. I've done a year in industry too, which I found to be particularly helpful in learning about the workplace and the world.

I currently know Javascript (Browser + Node.js), CSS3, HTML5 etc, Python (Tensorflow, PyTorch, etc), Jupyter Notebook (best for small things not large things), PHP, C / C++ (mainly for Arduino), some Rust, and C# + Monogame / XNA (+ WPF). Oh yeah, and I can use XSLT too. I should format this list better at some point.

I love to experiment and learn about new things on a regular basis. You can find some of the things that I've done in the labs and code sections of this website, or on GitHub (both in my personal time and for my PhD). My current big personal projects are Pepperminty Wiki, an entire wiki engine in a single file (the source code is spread across multiple files - don't worry!), and WorldEditAdditions.

I can also be found in a number of other different places around the web. I've compiled a list of the places that I can remember below.

Social

Other

I can be contacted at the email address webmaster at starbeamrainbowlabs dot com. Suggestions, bug reports and constructive criticism are always welcome.

For those looking for my GPG key, you can find it here. My key id is C2F7843F9ADF9FEE264ACB9CC1C6C0BB001E1725, and is uploaded to the public keyserver network, so you can download it with GPG like so: gpg --keyserver hkps://keyserver.ubuntu.com:443/ --recv-keys C2F7843F9ADF9FEE264ACB9CC1C6C0BB001E1725

Blog

Blog Roll | Article Atom Feed | Mailing List

Latest Post

Defending against DDoS attacks hammering my git server

It's no sekret that I have a git server. I host all sorts of stuff on there - from stuff I've talked about on this blog to many other things I have, and still others that are private repositories that I can't share / yet for one reason or another.

While I can't remember exactly when I first set it up, I do remember that gitea wasn't even a thing back then, and I originally setup go git service.

Nowadays, I run the fork of gitea called forgejo, which is a fork of go git service.

Either way, it's been around for a while!

Unfortunately, now that smaller git servers are becoming more common (we still need a social/federated git standard like e.g. ActivityPub), so are attacks against such servers ², and one I dealt with yesterday was particularly nasty, so I decided to make a blog post about it.

I'll have CPU for breakfast, lunch, and tea thank you

Before I explain how I dealt with it (mitigated is the technical term I understand), it's important to know the anatomy of the attack. After all, security is important but we can only be secure if we know what we're defending against.

The threat model, if you will.

In this case, the attacker sent random requests to random files on random commits in a large git repository I have on my aforementioned git server.

Yesterday, I measured almost 1 million unique IP addresses making exactly 2 requests at a time each.

If I had the energy, I'd plot em all on a hilbert curve ² with a colour gradient for age, maybe even with an animation.

The result of all of this is 100% CPU usage on my 3rd generation dedicated server I rent and a slow terminal experience, because to serve each request Forgejo has to call a git subprocess to inspect the repository and extract the version of the file requested.

That's a very expensive way to handle a HTTP/S request!

At first, I thought I was infected, but further inspection of the logs revealed it not to be so.

With all this in mind, the goal of my expedition was to avoid the spammy HTTP/S calls from hitting the application server (forgejo).

This is all interesting, because it means that a number of common steps to achieve this won't work:

We can't just block the IP address, because there are too many and most of them will be compromised IoT (Internet of Terrible security) devices etc in peoples' homes that are roped into being a botnet.
We can't keep the git server turned off, because I need to use it
I can't block access to the problematic paths on the server, because then the attacker will switch to another set and access to the git server is still impaired
I can't just allow specific IP addresses through, as I have blog post stuff hosted on there and you, one of my readers, would be cut off from accessing it (and I access from my phone sometimes which doesn't have a fixed IP)

...so that just leaves us stuck right?

Teh solutionses!

No so. There's still a strategy that we haven't tried: a Web Application Firewall. Traditionally, such tools are big and very very expensive, but I discovered the other week a tool that did the job and inside an envelope (a couple of megabytes) and price point (free!) I could afford.

That tool is Anubis, and despite the.... interesting name it acts like something of a firewall that sits in front of on the application server, but behind your reverse proxy:

 Public Internet ║ Inside server                                               
                 ║                                                             
                 ╟───────────────┐         ┌───────────────┐  ┌───────────────┐
                 ║     Caddy     │         │    Anubis     │  │    Forgejo    │
Inbound  ────────▶       •       ├─────────▶       •       ├──▶       •       │
requests   80/tcp║ Reverse proxy │         │   Firewall    │  │  App server   │
          443/tcp╟───────────────┘         └───────────────┘  └───────────────┘
                 ║                                localhost           localhost
                 ║                                 2999/tcp            3000/tcp

Essentially, when each request comes in it weighs the risk of a request. 'high-risk' requests, such as those coming from browsers which attackers love to impersonate, get served a small challenge that they must solve to gain access to the website. Low-risk clients, such as git or curl or elinks can go straight through.

This is in the form of a hashing problem: the browser must tell the server what nonce (number only used once) that, alongside a given unique challenge string, produces a hash with a certain number of zeroes (0) when hashed.

Correctly completing the challenge (which doesn't take very long), sets a cookie for that client to gain access to the website without completing another challenge for a certain period of time.

I could go on, but the official documentation explains it pretty well.

Essentially, by serving challenges to high-risk clients instead of allowing requests straight through attempts to access expensive HTTP/S calls (such as loading a random file from a random commit in a random git repo) a server's resources can be protected to give a better experience to the users who use it on a day-to-day basis.

This isn't without its flaws - namely inadvertently blocking good bots - but it does strike enough of a balance that I can keep my git server online without giving up the entirety of my server's resources in the process, which I need to use for other things.

But how?!

I'll assume you already have some sort of reverse proxy in front of some sort of application server. In my case, that's caddy and forgejo.

Anubis' latest release can be downloaded from here, but for Debian/Ubuntu users who want an apt repository I'm rehosting the .deb files from Anubis' releases page in my personal apt repository:

https://apt.starbeamrainbowlabs.com/

Assuming you have an e.g. Ubuntu server, you'll want to install anubis and then navigate to /etc/anubis, in which you should create a configuration file with the name of the user account you'll be starting anubis under.

Each instance of anubis can only handle 1 domain/app at a time, so you'll want 1 system user account per application you want to protect.

For example, I have a config file at /etc/anubis/anubis-git.env with the following content:

TARGET=http://[::1]:3000
BIND=:2999
METRICS_BIND=:2998

....my internal git server is listening on port 3000 on the IPv6 localhost address ::1 for HTTP requests, so that's the target that anubis should forward requests to, as in the ASCII diagram above (made in monosketch).

Then, start the new anubis instance like so:

sudo systemctl enable --now anubis@anubis-git.service

....in my case, the username I created (sudo useradd --system anubis-git etc etc) was anubis-git, so that's what goes in the filename above and after the @ sign when we start the service.

If you haven't seen this syntax before in systemd service names, it allows you to set the username that a supporting service file will start a service with. syncthing does the same thing with the default systemd service definition it provides.

In other words, it lets you start multiple instances of the same service without them clashing with each other.

At any rate, the final piece of the puzzle is telling your reverse proxy to talk to anubis:

git.starbeamrainbowlabs.com {
    log

    reverse_proxy http://[::1]:2999 {
        # ref anubis config setup both of these are required
        header_up X-Http-Version {http.request.proto}
        # ref anubis config, this is esp. required
        header_up X-Real-Ip {remote_host}
    }
}

Replace http://[::1]:2999 with the address of Anubis instead of your application server directly, then check the config and reload:

sudo caddy validate -c /etc/caddy/Caddyfile && sudo systemctl reload caddy

(replacing /etc/caddy/Caddyfile with the path to your Caddyfile of course)

Conclusion

....and you're done!

We've successfully put an application server behind anubis to protect it from malicious requests.

Over time, I assume I will need to tweak the anubis settings, which is possible through what seems to be a rather detailed policy file system (which allows RSS/Atom files through by default, if you're crazy enough to be subbed to any feeds from my git server).

If something seems broken to you now that I've set this up, please do get in touch and I'll try my best to help you out.

I'll be continuing to keep an eye on my web server traffic to see if anything gets through that shouldn't, and adjusting my response as necessary.

Thanks for sticking with me, and when I have the energy I have lots of other cool things to talk about here soon.

--Starbeamrainbowlabs

Aside: IP blocking with Caddy

While implementing the above approach, I found I did need to bring my git server up for my Continuous Integration system (I implemented it well before forgejo got workers and I haven't checked out the latter yet) to work.

To do this, I temporarily implemented an IP address-based allowlist.

If you're curious, here's the code for that:

# temp solution to block anyone who isn't in the allowlist outright
# note that given the sheer range of IPs from what's probably a compromised device-based botnet, we can't just IP block this long-term.
@denied not client_ip 1.2.3.4 5.6.7.8/24 127.0.0.1/8 ::1/128
abort @denied

....throw this in one of the server blocks in your Caddyfile before a reverse_proxy directive - changing the allowed IP addresses of course (leave the IPv4 & IPv6 ones!) - validate & reload, and you should have an instant IP address allowlist system in place!

Labs

Code

Research

Note: This section is still under construction, so excuse the mess :-) TODO ALL the icons to make this look pretty; add recent posters + papers etc

A map of the UK with yellow and purple circles dotted all over it. The circles represent positve and negative sentiment of social media posts analysed and plotted on a map. This image was produced by an MSc student named on the paper.

Real-time social media sentiment analysis for rapid impact assessment of floods

Classifying multimodal text + images via sentiment analysis from social media with contrastive learning aids situational awareness in floods.

DOI

Using multimodal data and AI to dynamically map flood risk

Coming soon :-)

PDF

Tools

I find useful tools on the internet occasionally. I will list them here.