I was listening to the Jan 14th Gilmor Gang a little while ago. This is the first time I've listened to the show. I tuned in mainly to hear what their featured guest, Adam Bosworth, had to say.
In the part of the conversation, they discussed something I do frequently: look for references to my blog posts on the three services that provide this feature in a somewhat timely fashion: Technorati, Feedster, and Bloglines.
In fact, if you read my blog by visiting its home page in your browser (rather than looking at individual posts in an aggregator), you may have noticed that I provide per-post navigation links for this:
I check those links several times a day to see what people are saying about things I've written recently. While I've not kept a detailed log of my findings, I've noticed that the speed, freshness, and comprehensiveness of the three services vary quite a bit. Speed is all about query speed: how quickly I see results. Freshness is a measure of how quickly the services discover new links. Comprehensiveness is determined by finding the most links to me (or any other URL, of course).
Technorati
Technorati has the slowest searches but it's the most comprehensive. It also seems to find more links to me in a timely manner (freshness) than anyone else. The query speed is less of an issue than it was in the past, though there are still occasional 120+ second queries. It's generally the slowest of the three for this use.
Bloglines
Bloglines is almost always the fastest of the three at returning results. However it's generally the least fresh and not as comprehensive as Technorati or Feedster.
Feedster
Feedster varies a lot. At times I've been impressed with the freshness and speed. Other times it's disappointing. Over time, though, the comprehensiveness is good, but not as good as Technorati.
Summary
To put this in tabular form, if I was to rate the three services against each other, it'd look something like this. (1 is best, 2 is good, 3 is worst)
Service | Search Speed | Freshness | Comprehensiveness | Score |
Bloglines | 1 | 3 | 3 | 7 |
Feedster | 2 | 2 | 2 | 6 |
Technorati | 3 | 1 | 1 | 5 |
For my use, Technorati has the edge today.
What This Means
It's probably safe to assume that comprehensiveness is a measure of the index size (or number of feeds each service reads). Since Feedster and Technorati are both in the business of trying to read all of them, it's useful comparison. Bloglines happens to provide this service, but it really exists to serve the needs of readers. The fact that Bloglines is competitive at all speaks to Mark's capabilities as a developer and architect.
Based on what I've seen, I'd guess that Technorati has the largest index (the give out the number on their home page), followed by Feedster and Bloglines.
So I Ask
This leaves me with two questions to ask:
- What have your experiences been?
- Has anyone built a tool that tracks the speed, freshness, and comprehensiveness of the three services? I know that tools like this exist for web search.
It'd be interesting to simply measure the lag time in the index updates (freshness) for each service. I'd probably use PubSub as a baseline to do that. PubSub is fresh and comprehensive, but it doesn't keep any history. It provides notification or alerts based on pre-defined queries, not ad-hoc queries that look back in time.
Posted by jzawodn at January 15, 2005 10:50 PM
I of course have also had to steal this little nugget of niceness ;)
Since many publishers offer only snippets in xml feeds, comprehensiveness is also greatly affected by whether full html is indexed. Only technorati does this AFAIK.
Ben Hammersley may have used a tool for latencies in this related post:
http://www.benhammersley.com/weblog/2004/08/01/small_pieces_slowly_moving.html
It's an interesting idea... I'm going to put these links on my blog, too.
Jeremy, you made a typo in the title of your post: you should replace "Blogines" by "Bloglines"
I noticed (and stole) those links a month or so ago. Is is kinda depressing since they are usually blank.
I was recently wondering to myself about these same topic. My blog isn't nearly as popular, so my data set is much smaller. I came up with a different rank for the big three:
technorati: I agree with them being the most comprehensive, generally picking up references faster than the other two. Unfortunately, like you mentioned, they easily have the slowest response time to queries. They also drop references inconsistent way, leaving some on for a year and dropping other after a few months.
feedster: When ever I look at references using feedster the word that comes to mind is spastic. I've seen references show up one day, disappear the next only to reappear a day or two later. As for freshness, I'd say they rank about the same as bloglines, perhaps a tiny bit better.
bloglines: I've found that they take longer to find references (freshness), usually less than a day behind technorati. I agree with you that they are almost always the fastest in returning data. I'd rank bloglines second overall and much closer to first based on consistency, because of bloglines shows all references, without dropping odd intervals like technorati.
You are probably able to see these things in more detail because of the popularity of your blog. Seems like with a little bit of work bloglines could take the #1 spot for this type of search. Presumably technorati and feedster are working on improving things too.
The slowness of technorati drives me up the wall, however the new ability to search Technorati, Flickr and Delicious at the same time definitely makes it worth using.
Jeremy,
Thanks for the kind words, and for the honest critique. We've been working 110% on scaling the Technorati service, making it faster, better, and more responsive. Are you stil finding that it is very slow most of the time? We know that our "Rank by Authority" search is still pretty slow, we're working on fixing that, but we also find that most people aren't doing that kind of ranking. Are there any particular searches other than the Authority ranking that are consistently timing out on you? If so, please let me know (email is dsifry at technorati dot com, direct phone is 415 846-0232) and we'll fix the bugs immediately.
The rate at which the blogosphere is growing combined with our growth rate has been a tremendous challenge and opportunity. Thanks for using the service, and I hope that we can improve and continue to live up to your expectations.
Dave
PubSub lagtime 15-18 mins on the "MyStack", while XML feeds can take more than 24 hrs.
Since blogs are maybe 1% of the web and quite a bit more structured, why are searches orders of magnitude slower than Google? And why won't Google obliterate Technorati and Feedster in 20% time?
Why would you check it several times a day? Subscribe the to Feedster link feed for jeremy.zawodny.com
http://feedster.com/links.php?url=jeremy.zawodny.com&type=rss
Scott, I already do subscribe to a few general feedster searches. But this is for zeroing in on a specific post.
Mud wrote: "PubSub lagtime 15-18 mins on the "MyStack", while XML feeds can take more than 24 hrs."
MyStack is an application that I built that consumes and reformats "XML Feeds" from PubSub. It stands outside PubSub and it gets data no faster then any other consumer of PubSub's data. Thus, there can be no difference between the time it takes to get something via MyStack and the time it takes to get it via any other consumer of PubSub XML feeds. There is one exception. The PubSub generated RSS and Atom files are only updated every 15 minutes. This is to discourage people from polling our feeds more often then 15 minutes. With that one exception, all PubSub XML feeds (REST, XMPP, SOAP or telnet) receive new results the instant that they are discovered by PubSub. "Delays", if any, can be measured in milliseconds or seconds at most.
Admittedly, there will be some delay between when something is posted on a website and when PubSub finds it. If the site pings us, it usually only takes a few seconds or minutes before we fetch the site's feed and process it. However, if a site doesn't ping us, we'll simply scan their feeds on schedule -- 1, 3, or more hours depending on the site and its history. In any case, once PubSub actually find some data, it is matched in milliseconds. We don't store data, we don't index it. There is no delay in our system.
bob wyman
I'm surprised this is the first Gang you've heard. But it's attention metadata in any case, in this one what you haven't heard that's interesting. Thanks for listening.
And it's G-i-l-l-m-o-r, Jeremy. Your post didn't come through on any of the services, but did show up in my attention feed on Rojo.
Bob Wyman,
You should know better than to make a statement like that. Your statement has no credibility.
You do not know the limitations and problems with your own system.
There are problems with PubSub that cause delays, and Bob Wyman doesn't even know it. If you want to read more, go to the link under my name.
The detailed questions are there. Bob Wyman is free to answer them, or pretend the problem doesn't exist.
There are delays in the PubSub system, and Bob Wyman doesn't know why.