Dave says:

Isn't it obvious that either Google or Yahoo will buy Feedster so their search engine can understand RSS. Then the other guy is going to wonder why they missed the boat. After that, they can make their search engines understand OPML and throw out the antiquated centralized directories and let the amateurs compete to create the best directory for a given topic, the same way we compete for page rank.

What makes Dave think that Yahoo and Google's technology doesn't already "understand" RSS, I wonder? RSS is simple. Really simple. And structured. Hardly the mess that HTML is. It's not a really hard problem if you already have crawling infrastructure and the ability to query structured data.

Heck, I fully expect Microsoft's search engine to grok RSS and/or OPML.

I don't mean to detract from Scott's work on Feedster. It clearly fills hole that nobody else has.

(Reminder: I don't speak for my employer or any of my employer's technology providers. Not that will stop anyone from reading all sorts of stuff into what I say...)

Posted by jzawodn at July 16, 2003 09:42 PM

Reader Comments
# Justin said:

I agree completely about RSS being a great magnitude easier to parse than html. I have been working on a home brew crawler for some time now and I appreciate the fact that XML and its derivatives are really nice.

You don't have to tell me that yahoo or google for that matter know how to parse RSS or XML. Most of the news section on yahoo finance is driven buy XML from what I can tell. And I assume that google news has something very similar with its partners.

It makes sense that google and yahoo would spend the R&D time to develop a solution to these problems.

on July 16, 2003 11:40 PM
# Reverend Jim said:

I agree. I don't mean to knock Feedster or the people behind it... but, it's really nothing difficult... it just happens to be the first.

Google's search engine is a million times better. And parsing RSS is trivial at best. If Google wanted to build an RSS Search engine they certainly wouldn't need to buy Feedster. They've already got the equipment, facilities and programming talent.

on July 17, 2003 04:57 AM
# Scott Johnson said:

Sigh. RSS is simple on the surface. I'll grant you that. But like anything else it is complex since people do odd things with it. Anyone want to guess at how many feeds are standards compliant? Anyone wonder why we have routines like guess_permalink() (handles the guid / permalink controversy). (note this was since renamed).

Autodiscovery is also a joke in that its rarely supported. So we have a lot of specific knowledge in how to figure out "For blogging system X, where is the rss feed".

Note as far as comments on our search engine, go, check out http://feedster.com/help/. We've got over 5 man years of R&D in engine technology and our engine has been tested against the TREC data (ok so we are search engine geeks through and through).

Every problem looks trivial on the surface. Scaling it is hard. Making it deal with real world data and what users do is awful.

That's the real IP we own -- making this stuff work scalably.

on July 17, 2003 06:57 AM
# Jon Gales said:

As I note in the track backed post (I guess me and Jeremy have a similar thought process as I wrote my piece a few days ago), none of the code in Feedster could be used in a production environment for something like Google. Sure it's "scalable", but try putting a million queries against it an hour. Google obviously knows how to scale, and already has the Googlebot to get data (HTML, RSS, whatever it may be). I just don't see a buyout happening, but I'd love to see one as Scott deserves some cash.

on July 17, 2003 08:10 AM
# Jeremy Zawodny said:


(I also said this via e-mail.)

Just to clarify, I didn't say it was easy problem. I said it wasn't a
"really hard problem" and compared it to web/html indexing and search.
But hopefully that was clear in my post. I was surprised that Dave
thought the big guys couldn't do it given their existing knowledge and
infrastructure. These are companies with an army of engineers tacking
their problems. The fact that you two have come this far this fast
is, indeed impressive (as I've noted many times).

on July 17, 2003 12:11 PM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.


Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.