## Just another day: More spam, more defences

When post is released I'll be in an exam, but I wanted to post again about the perfectly fascinating spam situation here on my blog. I've blogged about fending off spam on here before (exhibits a, b, c), but I find the problem is detecting it in a transparent manner that you as the reader don't notice very interesting, so I think I'll write another post on the subject. I could use a service like Google's ReCAPTCHA, but that would be boring :P

Recently I've had a trio of spam comments make it all the way through my (rather extensive) checks and onto my blog here. I removed them, of course, but it still baffled me as to why they made it through.

It didn't take long to find out. When I was first implementing comments on here, I added a logger specifically for purposes such as this that saves everything about current environmental state to a log file for later inspection - for both comments that make it through, and those that don't. It's not available publically available, of course (but if you'd like to take a look, just ask and I'll consider it). Upon isolating the entries for the spam comments, I discovered a few interesting things.

• The comment keys were aged 21, 21, and 17 seconds respectively (the lower limit I have set is 10 seconds)
• All 3 comments claimed that they were Firefox 57
• 2 out of 3 comments used HTTP 1.0 (even though they claimed to be Firefox 57, and despite my server offering HTTP/1.1 and HTTP/2.0)
• All 3 comments utilised HTTPS
• The IP Addresses that the comments came form were in Ukraine, Russia, and Canada (hey?) respectively
• All 3 appear to be phishing scams, with a link leading to a likely malicious website
• The 2 using HTTP/1.0 also asked my server to close the connection after sending a response
• All 3 asked not to be tracked via the DNT HTTP header
• The last comment had some really weird capitalisation. After consulting someone experienced on the subject, I learnt that the writer likely natively spoke an eastern language, such as Chinese

This was most interesting. From this, I can conclude:

• The last comment was likely submitted by a Chinese operator - even though the source IP address is located in Ukraine
• All three are spoofing their user agent string.
• Firefox 57 uses HTTP/2.0 by default if you're really in a browser, and the spam comments utilised HTTP/1.0 and HTTP/1.1
• Curiously, all of this took place over HTTPS. I'd be really curious to log which cipher was used for the connection here.
• In light of this, if I knew more about HTTP client libraries, I could probably identify what software was really used to submit the spam comments (and possibly even what operating system it was running on). If you know, please comment below!

To combat this development, I thought of a few options. Firstly, raising the minimum comment age, whilst effective, may disrupt the user experience, which I don't want to do. Plus, the bot owners could just increase the delay even more. To that end, I decided not to do this.

Secondly, with the amount of data I've collected, I could probably write an AI that takes the environment in and spits out a 'spaminess' score, much like SpamAssassin and rspamd do for email. Perhaps a multi-weighted system would work, with a series of tests that add or take away from the final score? I might investigate upgrading my spam detection system to do this in the future, as it would not only block spam more effectively, but provide a more distilled overview of the characteristics of each comment submission than I have currently.

Lastly, I could block HTTP/1.0 requests. While not perfect (1 out of 3 requests used HTTP/1.1), it would still catch some more bots out without disrupting user experience - as normal browsers (include text-based ones IIRC) use HTTP/1.1 or above. HTTP/1.1 has been around since 1991 (27 years!), so if you're not using it by now - upgrade! For now, this is the best option I can see.

From today, if you try to submit a comment and get a HTTP 505 HTTP Version Not Supported error and see a message saying something like this:

You sent your request via HTTP/1.0, but this is not supported for submitting comments due to high volume of spam. Please retry with HTTP/1.1 or higher.