I recently noticed an upswing in the traffic my blog gets from comment spam bots. They're never successfully able to post comments, of course, but it still results in a lot of hits to the Movable Type script that handles comment submissions: mt-comments.cgi

Notice the "cgi" there? That's right. This is a old school stand-alone Perl CGI script. I'm not running it under mod_perl, so for each request Apache must fork() and exec() to start the Perl interpreter. Then Perl has to parse and compile the script, along with all of its supporting modules.

This all culminates in an error message back to the spam bot--a message that is surely discarded. In short, it's a lot of effort to tell a spam bot to go fuck off. And it causes my 4 year old web server to strain at times.

So I decided to add a new layer to my defenses recently. I added mod_security to my Apache setup and crafted a few rules to combat most of the poorly written bots as well as those that are slightly more well designed.

You see, mod_security provides a decent framework for request filtering within Apache. You can craft all sorts of rules to validate input and check various conditions before control continues in the request handling.

Here's are a few of the rules I use:

SecFilterSelective REQUEST_METHOD "^GET$" chain
SecFilterSelective REQUEST_URI "^/mt/mt-comments.cgi"

That basically looks for GET requests attempting to access the comments script. Even though to only references on my entire site to mt-comments.cgi are in forms that specify POST, some bots try to use GET anyway. This is a simple way to guard against them.

A keen observer might point out that I should write a rule that allows only POST requests, rather than denying GETs. You never know when someone might try to use PUT requests or something equally useless.

# Don't allow POST to mt-comments.cgi without 'jeremy'
SecFilterSelective REQUEST_URI "^/mt/mt-comments.cgi" chain
SecFilterSelective POST_PAYLOAD "!jeremy" "redirect:http://jeremy.zawodny.com/comments-jeremy.html"

That rule doesn't allow anyone to hit ht-comments.cgi unless the POST payload (the data being submitted) contains the string "jeremy" (case-insensitive). The custom field I've added to the comment form all my blog entries requires that you type my name anyway. But this pushed a loose version of that check into Apache itself.

This rule will let requests through that contain my name anywhere (in the comments, the name, the URL, whatever), but that doesn't concern me. The few that do make it through will still be checked by the Perl code anyway.

Rather than merely returning an error code, I redirect the bot to a page that tells them what was wrong--just in case it's a human, not a bot.

Results

The results are encouraging. I've been running this setup for about 3 days now and I've blocked over 1,000 attempts. No unusual complaints have come in from would-be commenters so far.

Further Reading

I first learned of mod_security from a couple of ONLamp.com articles:

In addition to providing a good introduction, they also provide some useful rules to plug into your configuration. I've used a handful of them in my setup, but I omitted them in the examples above.

Posted by jzawodn at September 17, 2006 09:38 PM

Reader Comments
# John Engler said:

Jeremy, you're so old school.... up grade to Wordpress, and Akimset, all the cool kids are doing it ;)

Hope you're well, and hope to meet you at SxSW this coming year, bud.

on September 18, 2006 01:38 AM
# Ryan said:

nah John, they key isn't to upgrade to wordpress... it's to write your own CMS.

I know it's a pain.. but programs like wordpress and moveable type are big time spam targets... since once a spammer knows how to spam one, chances are pretty good it'll work on hundreds more.

The key is to do something different in your blog that isn't universally applicable.... like jeremy's "type jeremy below"

on September 18, 2006 06:03 AM
# Oscar Merida said:

Ryan: I've written my own blog software and eventually the spammers figure out how to spam it - although I haven't gone the "do something different" route yet.

Jeremy: check out the prebuilt rulesets from gotroot.com, tons of pre-written rules for keeping out a lot of spam. Its reduced the comment spam that gets through to one or two per day at most. It might be a little more overhead, if you're worried about the impact on server perfomance, but I think its worth it. You will need to tweak the rules a little bit, as some of them are overly broad, like blocking anything with HTTP_REFERRER from blogspot.com.

I'm a huge fan of mod_security because, it works, and you add a security layer across all your web applications without having to modify them. In the case of comment spam, you can protect multiple blogs this way.

on September 18, 2006 09:03 AM
# Ryan said:

Oscar... yeah that was my point.. the point was that the blog software be specific to one blog... if you start distributing it, or using it in more than one place.... then it becomes a viable target for the spam bots.

once you're on the list.... you never get off. My Blog (linked above) got on that list... and at one point when I didn't have anything in place, I was getting 10 megs / day porn pills and poker links posted... they'd post one on every old article (thousands) in the archive every day... .like it went on infinte loop.

on September 18, 2006 11:59 AM
# Brad Choate said:

I highly recommend running MT under FastCGI or mod_perl. The difference is dramatic and you don't incur the process load you do with CGIs. Even so, mod_security has become a requirement for any blog that accepts comments. Killing as many spam requests as high in the chain as possible is a good thing to do regardless of how fast the request is handled.

One of the things we strived to do with MT is provide a spam filtering solution that is highly configurable and winds up making each MT install different in terms of how it blocks spam. The effect being that spammers have a harder time targeting MT with a script that works for everyone.

And lastly, the last thing you should do when fighting spam is to reveal exactly how you're fighting spam. :)

on September 18, 2006 02:26 PM
# Rob Said said:

Ryan, I agree with Oscar on this one.

I've never been one to use off-the-shelf software for servers. I wrote some blog software specifically for my own personal use. It doesn't get many vbisitors, yet it only took a couple of months for the spammers to target it.

It's a perfect example of how security through obscurity just doesn't work.

on September 23, 2006 09:26 AM
# chris said:

"Security through Obscurity" ?

There's no other kind.

Even the most powerful security in the world still relies on the obscure fact that nobody's worked out how to factor large primes yet.

Can *anyone* think of any kind of security that's not 100% based on something obscure? Even your own front door lock is just is based on some obscure spacing of nicks in steel tumblers (or the corresponding grooves on your key)

on October 3, 2006 05:46 AM
# Antony Shen said:

Thank you for your example on setting on the mod_security filter. Not that I don't know Regular Expression, but my brain was just not working on how to set the filter after fighting comment spams.
I have MT-Scode on, and none of the spams/attack went through, however, it did use a lot of CPU resource hence causing server non-responding. You might be interested to know about the count... 20555 requests of mt-comments.cgi in 12 days, and in one particular minute there were 114 requests to mt-comments.cgi, (not the most intensive one)

It really helps. Thank you very much.

on November 14, 2006 05:54 AM
# Sarah Thomson said:

SecFilterSelective REQUEST_URI "^/mt/mt-comments.cgi" chain
SecFilterSelective POST_PAYLOAD "!jeremy" "redirect:http://jeremy.zawodny.com/comments-jeremy.html"

Dude that is freakin sweet! I've used mod_rewrite to do something similar but I like this better.

List of all 27 request methods- http://www.askapache.com/2007/htaccess/27-request-methods-for-use-with-apache-and-rewritecond-and-htaccess.html

on March 7, 2007 02:49 AM
# Leena said:

Tons of pre-written rules for keeping out a lot of spam. Its reduced the comment spam that gets through to one or two per day at most, My Blog got on that list and at one point when I didn't have anything in place, One of the things we strived to do with MT is provide a spam filtering solution that is highly configurable and winds up making each MT install different in terms of how it blocks spam

on August 14, 2007 03:32 AM
# pjm said:

Have you done an upgrade to the 2.x series of mod_security? Apparently the rules need a fair amount of rewriting; I'm trying to find a similar how-to for the 2.x rules.

on May 24, 2008 08:10 AM
# Jeremy Zawodny said:

No, I'm still in the 1.x world.

on May 24, 2008 04:41 PM
# Hunza Cap said:

Traditional Hunza Cap Velvet on Top, Embroidery Strip, Available in Different Sizes (S, M, L)

on June 30, 2009 12:06 AM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

 

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.