Fighting Spam on your blog
Since I have written my own blog script from scratch, I have learnt a lot about how spambots spam my site in order to implement measures to stop them. This post is a compilation of all the methods that I have discovered so far.
Currently I have yet to rate the effectiveness of each of these measures since at the time of writing this post I have only just finished rewiring the commenting script so that I can 'measure' the effectiveness of each of the methods described below.
Method 1: Honeypots
If you don't take either an email address or a web address on your blog, try adding a email
or website
field and hiding it via CSS. The more complex, indirect, and obscure the CSS you hide it with, the better. Just make sure that is actually hidden.
This blog uses a hidden website
field along with a warning for users who see it due to poor browser support.
Method 2: No super long comments
This isn't really a proper method, but I found that spam comment on my blog were generally really long. So I am imposing a 2000 character limit on comments. If people have more to say, then they can reply to their own comment, and use service like pastebin or hastebin for code.
Method 3: Keys
This is the really important one. I was finding that while the above 2 methods were stopping some of the spam, I was getting some smart spambots with chrome/firefox-like user agent strings that I can only summarise knew how to tell whether a from control was hidden or not by reading the CSS or my website.
The hidden key
field is basically a timestamp of when page was served to the user by the server. In it's simplest form, it can just be the output of PHP's time()
function.
In this blog, however, the timestamp is run through a number of different functions, such as base64_encode()
and strrev()
. Pick a few string manipulation functions that are reversible.
This timestamp can then be analysed by the server. If the timestamp is too far in the past (say 24 hours old), or under 10 seconds old, then the comment is rejected. Spambots will either fetch and cache your page for longer than 24 hours, or they will fetch your page and post a comment immediately. As soon as I set this blog to reject comments posted within 10 seconds of loading the page, I haven't had a single spam comment :)
Summary
So there you go: 2 1/2 methods to banish spam on your blog - for now. The real secret here to log as much information about your commenters as possible (in my case I have been capturing the contents of $_POST
, $_GET
, and $_SERVER
) and working your way through it comparing the requests of legitimate commenters and spammers. The above are simply exploits of the differences I found (with some help from Google). If you can think of any more tricks, please post a comment below!