Would you be surprised to know that some people who work in the search engine "industry" know who is responsible for a lot of the comment spam out there? I met some of them recently. And some of them even have blogs of their own. Seriously.
I haven't written much about this yet, but with the recent problems that have been exposed in MovableType (see: Comment Spam Load Issue, More on Comment Spamming, and Spam and the Tragedy of the Commons).
One of the comment spammers asked me: "You know why we spam blogs, don't you?" And I knew the answer. They do it because blogs are easy targets and because, just like e-mail spam, it works.
Jay Allen said:
If I chase the spammers out of my yard and onto the neighbors, it's only a matter of time until they come back. No, we all need to disincentivize these fuckers now.
He's right. And there's and 80/20 solution that ought to go a long way toward solving this problem. We know that spam works because of web page ranking algorithms based on link counting (PageRank, WebRank, whatever). But as humans, we can clearly distinguish between content posted by a blog's owner and that posted by random, anonymous, and possibly malicious users (or spambots). Search engines today seem not to, but there's a reasonable argument that it's worth putting some effort into.
If you assume the following:
- 80% of blogs are hosted by or produced on one of the more popular blogging platforms
- 80% of people don't significantly tweak the default templates available in their blogging software
- those people are the least likely to be actively fighting spam and, as a result, have more spam than the 20% of blogs where the owner is more defensive
Then a partial solution is fairly clear. I've heard and seen others discuss it over the past few months. The search engines needs to be smarter about reading and indexing content.
When folks like Tim build software that classifies pages, the software needs to be able to recognize the difference between links produced by the blog owner(s) and those contributed by readers and spambots.
Once you can identify the difference between those two types of links, you simply stop using the second type of link when calculating rank. Sure, you can still count them for the purpose of providing link counts--just donn't factor them into the ranking.
How's that for removing the incentive?
I bet you'd like to know what software the blog spammers use to run their own weblogs, wouldn't you?
Posted by jzawodn at December 18, 2004 06:53 PM