September 28, 2003

Blog Comment Spam on the Rise

Yesterday Jay noticed that I was having a comment spam problem. A few low-life moron assholes have been using my blog to try to boost the PageRank of their various businesses: search engine optimization, porn, and cheap prescription drugs.

He suggested that I look at his Killing Comment Spam Dead posting, which contains some very good ideas and pointers to prior art, such as queuing submissions, url matching blacklists, form tricks, and Mark's discussion of what most approaches suck.

After looking over it all and getting sick of censoring things myself, I'm not sure what I want do do about the problem. I've considered:

  1. turning off comments (bad)
  2. turning off comments after an entry is more than a few days old (might help, easy to do)
  3. sending confirmation URLs via e-mail to the poster (valid e-mail address required but not displayed on the site)
  4. writing a bit of content scanning code (there are certain features in common with all my comment spam)
  5. keeping all comments for each post in a separate file that's included at display time via an IFRAME or FRAME in the page. Then I'd drop in a robots.txt file that tells Google to ignore all comments. That'd defeat the spammer's main goal: higher PageRank.

But so far I haven't decided what to do. I'm inclined to try #2 and #3 but am still mulling things over and deleting 5-15 spams per day.

#5 is interesting and, to my knowledge, I'm the first to suggest it. Anyone else tried this yet? There are a few more tricky variations I'm thinking of too, such as noticing googlebot requests and feeding them slightly different content (hyperlinks stripped from comments, maybe?).

Would any of you frequent comment posters be offended by having to click a URL that arrived via e-mail to confirm your posting? What if you only had to do it once--ever? Think of it as lightweight semi-anonymous registration.

I'm not saying I'm gonna do it, but I clearly need to do something. I just need to figure out the right compromise between (1) keeping things free and open, (2) wasting my time, and (3) wasting your time.

Hmm.

Update: I've been using Jay Allen's cool MT-Blacklist for the last few weeks. It's not perfect, but it does 95% of what I need.

Posted by jzawodn at 05:53 PM