In reading Scott's post about weblog comment spam, I was reminded of a thought I've had for some time now. But rather than just tell you, I'll tell you how I came upon the idea and see how quickly you come to the same conclusion.
When I'm asked to interview job candidates at work, it's usually in one of a few capacities. Most often it's "the database interview" in which I get to figure out how much the interviewee knows about relational databases--specifically MySQL. (Gee, I wonder why the pick me for that.)
Other times I'm either interviewing folks for what I call a "cultural fit." That basically means I'm trying to figure out if he or she will "fit in" at Yahoo while also conveying an idea of what it's like to work at Yahoo, both the good and bad.
A few times I've also been asked to interview folks in a general technical capacity and to see how well they think about thorny issues, solve problems, etc. When I do that, one of my favorite lines of questioning involves search engine technology and the challenges of indexing the whole web.
At some point we end up discussing PageRank and similar techniques for figuring out site popularity and the various ways that one can abuse those techniques. So I eventually ask something like this:
Assuming that you have a map of the entire web (a link map or "graph" if you want to get all computer science about it), can you think of ways that you might try to detect and ultimately combat link spammers who are clearly trying to game the system?
The ensuing discussion is usually interesting, mostly because the candidate has rarely ever thought about the issue. But when prompted to do so a light bulb usually goes off. Sometimes it takes a few seconds but it usually happens.
Think about it a bit. I bet you'll come up with a few approaches. They may not be perfect, but that's hardly the point.
Assuming that you have a sizable list of all the weblogs around (meaning that you're Feedster or maybe Technorati), you crawl them regularly (or at least fetch their RSS feeds), know how often they update, and even know which ones frequently cross-link, can you think of one or more techniques for detecting weblog comment spam almost as it happens?
Yeah? Me too.
Posted by jzawodn at May 12, 2004 09:08 PM