So I've been wondering why Google doesn't setup a URL that weblog software can "ping" each time a new entry is posted. I already ping weblogs.com and moveabletype.org each time a new entry is posted. Why not rig up Google to do nearly real-time weblog indexing?
Now, I already know from my own stats that Google crawls my blog daily. Certainly we can improve on that.
Why does this matter? Simple. Virtually all of the traffic coming to my weblog that is not the result of someone else pointing at my weblog is from a Google search. I'd guess that Google generates 80-90% of my non-directly-linked hits. (Yahoo is a distant second place.)
Sure, Dave has suggested that Google use the info available at weblogs.com to do this, but why not eliminate the middle-man entirely? (No offense, Dave.)
Hmm. This gives me an idea for work that's sorta related to another idea I had after a co-worker showed me something that is best described as the opposite of Google Sets. Well, sort of. Lots of fun stuff to hack on there.
Posted by jzawodn at December 21, 2002 02:11 PM
or dave could implement the blo.gs cloud interface (http://blo.gs/cloud.php), and google and other services like technorati and blogrolling.com (and blo.gs) could get real-time updates from weblogs.com instead of polling hourly.
moveabletype.org could implement it, too, and then people could ping whoever they want, and the various services would just share the data instead of people having to ping multiple services.
but why might google not be eager to rush into this? the seo crowd. this seems like it would be a ripe target for them to try abusing, so i expect any steps google takes to be measured.
With movabletype it's a supporters thing.. If you donate money they give you a code that allows pinging their site.
Well, I'd love to see this happen. I've offered Sergei Brin to come in and help them do it, but so far, no response.
I meant, I offered TO Sergei that I'd come in and help them do it...
In fact, I'm pretty sure this is going to be one of the major developments next year. Robots.txt is only a restriction list. Meta tags are useless. Google will probably want webmasters to adopt some kind of format or procedure to keep them posted about updates. They'll need to verify that the system is not abused to attract attention on sites that weren't really updated, but this would potentially allow them to keep a fresher index with less computing resources.
Let's take a hint from Froogle, their new shopping engine: they're asking merchants to provide them data feeds (from once a day to once a month.) They're most probably pondering how to do it most efficiently from a technical standpoint, but they might also have business model questions.
Google doesn't do paid inclusion, but what about paid refresh, i.e. it's free to be in the monthly Google dance, but daily refreshes are paying?