September 12, 2003

RSS Auto-Discovery 2.0

As I've noted recently, I've been playing (and fighting) with RSS Discovery issues.

It has occurred to me that there's some non-existent infrastructure that we (whoever "we" really is) need to build if RSS is going to really, really take off the way it should. The first piece of it builds on existing work, while the second is relatively uncharted territory as far as I can tell. And I'll only touch on it briefly at the end of this post.

Today

The current breed of RSS Auto-Discovery is quite handy and simple. By embedding this bit of XML:

<link rel="alternate" type="application/rss+xml" title="RSS" href="http://jeremy.zawodny.com/blog/index.rdf">

In my blog home page, I'm advertising to aggregators and other software tools that there's a feed available for their use. They don't have to do any guesswork or check all the links on my page or look for the little orange XML button like a human might. For automated tools, that's great. The user goes to my blog home page and clicks a "Subscribe" button in their aggregator (or maybe uses a bookmarklet) and the computer does all the hard work.

This works really well for blogs. Many blogging tools seem to be providing it by default now. But RSS is ultimately about more than blogs, right? At Yahoo, we've begun to provide RSS feeds for Yahoo! News (info), Ask Yahoo! (no info page, feed here), and Yahoo! Buzz (info page). And there's more RSS to come.

If you look under the covers, you'll find the necessary <link> tag in various places. Like on the Technology News page:

<link rel="alternate" type="application/rss+xml" title="RSS" href="http://rss.news.yahoo.com/rss/tech">

But let's think about this from the point of view of someone writing aggregation software that wants the user to understand nothing about RSS and little if anything about the structure of a site like Y! News. Notice that the tag is not on every story page, like this one. (Maybe it should be?)

Tomorrow

Keeping track of all that is going to become difficult as RSS goes more and more mainstream. We can't expect users like my Dad to know they're supposed to "subscribe" on the category page but not from the article pages. We can't expect them to know that there's even a difference, really. Nor can we expect software to figure this out--yet. And we can't expect news aggregators to add a whole category just for Yahoo content to their default feed list. That'd be crazy for a number of presumably obvious reasons. We can't expect users to run a web search for "Yahoo RSS" and end up on my blog. Remember, users won't know what RSS is. So the tools have to be able to figure out that Yahoo offers several categories of RSS feeds (News, Buzz, etc.) and many feeds within each of them.

We need more than what today's auto-discovery provides. And before you start think that this is just a Yahoo problem, consider News.com. On their pages (like this one) they provide the orange XML button but no auto-discovery support. I suspect that if a better solution was available, someone could convince them to use it too. After all, the more widespread their headlines are distributed, the more traffic (and presumably money) they get. Like Yahoo, they have a complex offering of content from a discovery point of view.

Proposal

What we need is "RSS Auto-Discovery 2.0." At least that's what I'm calling it.

What is it? It's a bit more infrastructure that could go a long way toward scaling RSS for big sites and making the aggregators job easier. It should solve the problems I tried to introduce above.

I don't claim to have this all figured out--at least not exactly. But some of us at Yahoo think this is an important problem. And we'd like to see it solved. Soon. But rather than dictate how we (or *I*) think it ought to work, let's come up with something that will work. You know, an on-line group effort but without all the politics. Can we do that? I hope so.

Having said that, I do have some ideas that I'd like to start with--in the hopes that a productive discussion will follow.

I think this is all pointing at a per-site machine-readable directory of the available RSS feeds. And the obvious format for that directory is OPML. It seems that most aggregators have settled on it as the de-facto standard for subscription interchange. And some even use it as their native "database" format.

So let's use it. I've talked with Dave Winer and he agrees that it's a sensible use of OPML.

Now, how might automated tools learn of the existence of this directory? I see two obvious ways that could happen.

  1. The file, like /robots.txt could reside at a well known location with a well known name, say /feeds.opml or /rss.opml? This has the advantage of being very simple.
  2. We add a new, optional <link> tag that tells tools where they can find a relevant OPML file that contains information about the various feeds offered. This has the advantage of being like what we do today and it provides a bit of flexibility.

There was a more complex variation that I've omitted. Why complicate something more than it needs to be? That's a compelling argument in my book.

Starting from here, where do we go next? Discuss here? On the syndication list? Or the aggregators list? Or on a Wiki?

Update: Mark Fletcher (of Bloglines) suggests that we need a machine readable way to specify a blogroll (in OPML, I'd assume). I completely agree. Any other meta-data we're missing today?

Update #2: Diego Doval has mockups in RSS and OPML.

What else?

Oh, yeah. I said there was a second thing, didn't I?

After this issue fizzles out, maybe we can move on to solving the next problem: Machine Readable Licenses for RSS feeds. This was partly prompted by Derek's Syndication vs. Aggregation post a few weeks ago. We had a few IM discussions about this, but it really is something that we need to resolve in a standard and machine-readable way. There needs to be a tag or link that means "you can syndicate this, but you absolutely cannot charge a fee" or whatever.

It's a can of worms, I'm sure.

Posted by jzawodn at 01:19 PM

I've Reverted

To my natural state, apparently.

For whatever reason, I haven't managed to get to bed before 2am for a single day this week. And one day it was 3:30am when I hit the sack.

Needless to say, I'm a night person.

On the plus side, I've still been getting up at a decent hour. But instead of heading to work, I've been working from home in the AM and then heading to work around noonish. It seems to be working out very well. It's a good compromise between the productivity of being at home where it's quiet and the necessary face-to-face interaction of being in the office.

Posted by jzawodn at 12:36 AM