I recently noticed an upswing in the traffic my blog gets from comment spam bots. They're never successfully able to post comments, of course, but it still results in a lot of hits to the Movable Type script that handles comment submissions: mt-comments.cgi
Notice the "cgi" there? That's right. This is a old school stand-alone Perl CGI script. I'm not running it under mod_perl, so for each request Apache must fork() and exec() to start the Perl interpreter. Then Perl has to parse and compile the script, along with all of its supporting modules.
This all culminates in an error message back to the spam bot--a message that is surely discarded. In short, it's a lot of effort to tell a spam bot to go fuck off. And it causes my 4 year old web server to strain at times.
So I decided to add a new layer to my defenses recently. I added mod_security to my Apache setup and crafted a few rules to combat most of the poorly written bots as well as those that are slightly more well designed.
You see, mod_security provides a decent framework for request filtering within Apache. You can craft all sorts of rules to validate input and check various conditions before control continues in the request handling.
Here's are a few of the rules I use:
SecFilterSelective REQUEST_METHOD "^GET$" chain SecFilterSelective REQUEST_URI "^/mt/mt-comments.cgi"
That basically looks for GET requests attempting to access the comments script. Even though to only references on my entire site to mt-comments.cgi are in forms that specify POST, some bots try to use GET anyway. This is a simple way to guard against them.
A keen observer might point out that I should write a rule that allows only POST requests, rather than denying GETs. You never know when someone might try to use PUT requests or something equally useless.
# Don't allow POST to mt-comments.cgi without 'jeremy' SecFilterSelective REQUEST_URI "^/mt/mt-comments.cgi" chain SecFilterSelective POST_PAYLOAD "!jeremy" "redirect:http://jeremy.zawodny.com/comments-jeremy.html"
That rule doesn't allow anyone to hit ht-comments.cgi unless the POST payload (the data being submitted) contains the string "jeremy" (case-insensitive). The custom field I've added to the comment form all my blog entries requires that you type my name anyway. But this pushed a loose version of that check into Apache itself.
This rule will let requests through that contain my name anywhere (in the comments, the name, the URL, whatever), but that doesn't concern me. The few that do make it through will still be checked by the Perl code anyway.
Rather than merely returning an error code, I redirect the bot to a page that tells them what was wrong--just in case it's a human, not a bot.
The results are encouraging. I've been running this setup for about 3 days now and I've blocked over 1,000 attempts. No unusual complaints have come in from would-be commenters so far.
I first learned of mod_security from a couple of ONLamp.com articles:
In addition to providing a good introduction, they also provide some useful rules to plug into your configuration. I've used a handful of them in my setup, but I omitted them in the examples above.
Posted by jzawodn at September 17, 2006 09:38 PM