Sometimes I'm a little surprised by how long some ideas take to bubble up. Other times I'm surprised by the form they take.
I'm doubly surprised this time.
Google Sitemaps (BETA, of course) has me scratching my head a bit. Rather than build on existing work, it seems that Google wants people to build up and submit sitemaps to them so they can increase the freshness and coverage (or comprehensiveness) of their web search index.
Of course, those are two of the four critical variables for Getting Search Right. Around these halls we call them RCFP:
- Comprehensiveness (or coverage, which is a smaller word)
So it's clear what the motivations here are. Nicely, they've decided to apply a Creative Commons License to the work. It's good to see more and more CC licenses out there, especially from the Big Players.
Last summer, I wrote something titled Feed Search vs. Web Search in which I talked about some of the differences between the Googles and Yahoos of the world and the Technoratis and Feedsters.
Under the heading of "Real-Time Pings", I wrote:
Many of these new-fangled content publishing systems (MovableType, WordPress, you name it) have the built-in ability to "ping" services like weblogs.com, Technorati, Feedster, My Yahoo, and so on. They do this to let those services know that something is new. The services typically react by fetching an updated copy of the feed within seconds and extracting the relevant info.
These real-time pings mean that we don't have to wait for a full polling or crawling cycle before getting the latest content. But the old school "web" search engines don't listen for these pings. Instead of seeing this post moments have I click the 'post' button, they're generally 6-36 hours behind.
But what if they did listen for pings? Or maybe offered a compatible ping API?
Emphasis is, of course, mine.
I wonder why they're not simply offering to extend the current weblog ping protocol a bit to work toward the goals of freshness and coverage? It seems to me that with an installed base of millions of ping-generating tools, that'd be a no-brainer. I'm surprised that Danny Sullivan didn't ask this either.
If I had my way, we'd be plugging ping server streams directly into our web crawlers.
See Also: So, You'd Like To Map Your Site by Anil Dash, with more "prior art" that wasn't used.
Posted by jzawodn at June 07, 2005 01:34 PM