RSS Auto-Discovery 2.0 (by Jeremy Zawodny)

As I've noted recently, I've been playing (and fighting) with RSS Discovery issues.

It has occurred to me that there's some non-existent infrastructure that we (whoever "we" really is) need to build if RSS is going to really, really take off the way it should. The first piece of it builds on existing work, while the second is relatively uncharted territory as far as I can tell. And I'll only touch on it briefly at the end of this post.

Today

The current breed of RSS Auto-Discovery is quite handy and simple. By embedding this bit of XML:

In my blog home page, I'm advertising to aggregators and other software tools that there's a feed available for their use. They don't have to do any guesswork or check all the links on my page or look for the little orange XML button like a human might. For automated tools, that's great. The user goes to my blog home page and clicks a "Subscribe" button in their aggregator (or maybe uses a bookmarklet) and the computer does all the hard work.

This works really well for blogs. Many blogging tools seem to be providing it by default now. But RSS is ultimately about more than blogs, right? At Yahoo, we've begun to provide RSS feeds for Yahoo! News (info), Ask Yahoo! (no info page, feed here), and Yahoo! Buzz (info page). And there's more RSS to come.

If you look under the covers, you'll find the necessary <link> tag in various places. Like on the Technology News page:

But let's think about this from the point of view of someone writing aggregation software that wants the user to understand nothing about RSS and little if anything about the structure of a site like Y! News. Notice that the tag is not on every story page, like this one. (Maybe it should be?)

Tomorrow

Keeping track of all that is going to become difficult as RSS goes more and more mainstream. We can't expect users like my Dad to know they're supposed to "subscribe" on the category page but not from the article pages. We can't expect them to know that there's even a difference, really. Nor can we expect software to figure this out--yet. And we can't expect news aggregators to add a whole category just for Yahoo content to their default feed list. That'd be crazy for a number of presumably obvious reasons. We can't expect users to run a web search for "Yahoo RSS" and end up on my blog. Remember, users won't know what RSS is. So the tools have to be able to figure out that Yahoo offers several categories of RSS feeds (News, Buzz, etc.) and many feeds within each of them.

We need more than what today's auto-discovery provides. And before you start think that this is just a Yahoo problem, consider News.com. On their pages (like this one) they provide the orange XML button but no auto-discovery support. I suspect that if a better solution was available, someone could convince them to use it too. After all, the more widespread their headlines are distributed, the more traffic (and presumably money) they get. Like Yahoo, they have a complex offering of content from a discovery point of view.

Proposal

What we need is "RSS Auto-Discovery 2.0." At least that's what I'm calling it.

What is it? It's a bit more infrastructure that could go a long way toward scaling RSS for big sites and making the aggregators job easier. It should solve the problems I tried to introduce above.

I don't claim to have this all figured out--at least not exactly. But some of us at Yahoo think this is an important problem. And we'd like to see it solved. Soon. But rather than dictate how we (or *I*) think it ought to work, let's come up with something that will work. You know, an on-line group effort but without all the politics. Can we do that? I hope so.

Having said that, I do have some ideas that I'd like to start with--in the hopes that a productive discussion will follow.

I think this is all pointing at a per-site machine-readable directory of the available RSS feeds. And the obvious format for that directory is OPML. It seems that most aggregators have settled on it as the de-facto standard for subscription interchange. And some even use it as their native "database" format.

So let's use it. I've talked with Dave Winer and he agrees that it's a sensible use of OPML.

Now, how might automated tools learn of the existence of this directory? I see two obvious ways that could happen.

The file, like /robots.txt could reside at a well known location with a well known name, say /feeds.opml or /rss.opml? This has the advantage of being very simple.
We add a new, optional <link> tag that tells tools where they can find a relevant OPML file that contains information about the various feeds offered. This has the advantage of being like what we do today and it provides a bit of flexibility.

There was a more complex variation that I've omitted. Why complicate something more than it needs to be? That's a compelling argument in my book.

Starting from here, where do we go next? Discuss here? On the syndication list? Or the aggregators list? Or on a Wiki?

Update: Mark Fletcher (of Bloglines) suggests that we need a machine readable way to specify a blogroll (in OPML, I'd assume). I completely agree. Any other meta-data we're missing today?

Update #2: Diego Doval has mockups in RSS and OPML.

What else?

Oh, yeah. I said there was a second thing, didn't I?

After this issue fizzles out, maybe we can move on to solving the next problem: Machine Readable Licenses for RSS feeds. This was partly prompted by Derek's Syndication vs. Aggregation post a few weeks ago. We had a few IM discussions about this, but it really is something that we need to resolve in a standard and machine-readable way. There needs to be a tag or link that means "you can syndicate this, but you absolutely cannot charge a fee" or whatever.

It's a can of worms, I'm sure.

Posted by jzawodn at September 12, 2003 01:19 PM | edit

Reader Comments

# Breyten said:

1. no OPML please. Using OCS for this is much more sensible anyway :). We could the create a tag like
We don't need more 404's :)

2. Maybe i'm wrong here, but isn't the Creative Commons all about machine readable licenses?

on September 12, 2003 01:59 PM

# Russ said:

Why use OPML? RSS could do the same thing. See: http://www.myelin.co.nz/post/2003/9/5/#200309052 But if you're all hot and heavy on OPML, I don't want to start Yet Another Weblog Standards War. It's just a suggestion.

Please. For the love all things holy. Don't use a Wiki.

-Russ

on September 12, 2003 02:14 PM

# Gay Gilmore said:

Very interested in a apolitical discussion of this (I work an online aggregator), but I'm not sure I understand what value the opml provides.

It would be great if every node/page of the site knew to put the right tag for subscribing to the appropriate rss -- so you could just say "subscribe me to be notified of new stories like this" from the article about Madonna's book, and boom, you've just got the new children's book news. (There is a side problem that this article might be listed in several feeds...do you want "new children's books" or "celebrity gossip"? ...and don't get me started on the full text v. summary debate...)

What would be in the opml? I mean it would be great for us (a centralized aggregator) to visit any page on a site and download a file that provides every available rss feed on that site so we could add them all to our directory in one swift move, but after that...I just don't know how this same file will help the end user. In my example above, would the aggregator (or universal subscription mechanism, a whole 'nother ball of wax) be required to somehow parse the opml to figure out which feed is appropriate for a "subscription" command from this random page?

how do the dots connect?

(p.s. thanks for bringing it up because those wee orange buttons cannot last if we are ever to cross the chasm...)

on September 12, 2003 02:24 PM

# Jeremy Zawodny said:

Russ:

> Why use OPML?

Fast adoption? Most weblog/aggregator software already groks it.

> I don't want to start Yet Another Weblog Standards War.

None of us do, I hope.

> Please. For the love all things holy. Don't use a Wiki.

Yeah, I don't like 'em either. But some people seem to think they're terribly helpful.

on September 12, 2003 02:29 PM

# Timothy Appnel said:

Jeremy...

Can you clarify if you mean the "blogrolling" format when you say OPML. They are two different though related things.

I concur on avoiding Yet Another Weblog Standards War and using a wiki for this. This is something that is worthwhile discussing and developing.

on September 12, 2003 02:43 PM

# Diego said:

Jeremy, I think this is a great idea! I second Russ' suggestion (just as my preference) that RSS be used. My reasons: It seems to me that this meshes as the other half of using RSS for archiving. You'd get the list of feeds on one feed, the actual daily feed on another and the archive on the last. There's a certain symmetry to it :) Additionally, the aggregator could automatically subscribe itself to the feed of feeds and present new items when they come along (something that will be more useful for large sites like Yahoo! or News.com I assume), and RSS already includes update frequencies, etc. Finally, the tags title, link, and description mesh perfectly with the concept.

All that said, whichever way it's done, if it works, I'm all for it.

on September 12, 2003 04:23 PM

# JY said:

* For the machine readable licenses, it's solved out there :

http://backend.userland.com/creativeCommonsRssModule

So we can move on to the more difficult dogs readable licenses :-)

* For specifying the blogroll, check http://www.scripting.com/ html source to see the Subcriptions and Blogroll tags in the head :

* When you talk about a /rss.opml file, do you mean it has to be at the root of the base URL, or can it be in any directory in any level? (i think about hosting sites like radio.weblogs.com that would need an rss.opml file for each blog..)

on September 12, 2003 04:23 PM

# Mark said:

Please don't use OPML. At least use a format that can be validated in some minimal sense of the word. OPML is essentially just a container of angle brackets with a sequence of undocumented attributes. In fact, developers are encouraged to add their own undocumented attributes, something Userland has done many times.

http://diveintomark.org/archives/2002/04/15/investigating_opml

on September 12, 2003 06:58 PM

# On-Vacation said:

I know! What we need is a search engine for RSS feeds so that they are easy to find on any topic... and maybe some little meta-tags that the authors can use to add a title, description and maybe some keywords to their feed to help the search engine out... but what if someone misuses that?

Hmmm... this is all starting to sound oddly familar somehow...

on September 12, 2003 07:29 PM

# Zoe said:

I would second Diego's proposal to stick to RSS...

Aside from the pleasant effect of symmetry... there is also the benefit of not having to specify the entire site hierarchy up from...

on September 13, 2003 02:01 AM

# mpt said:

Well, you could just add the <link> element to those individual article pages that are missing it. But why use a perfectly good existing standard, when you can invent a new one to make Web publishing even more complicated? :-(

(“But what if an article belongs in multiple feeds?” Then give it multiple <link> elements. That way software can present a human with an understandable choice of feeds containing that article, rather than a confusing choice of all feeds on what might be a very large site like Yahoo.)

on September 13, 2003 02:26 AM

# Todd Larason said:

Just saying "Use OPML" isn't nearly enough; the OPML spec (such as it is) doesn't define the attributes used for RSS-in-OPML lists; in the real world, I've seen "htmlUrl" & "xmlUrl" vs "htmlurl" & "xmlurl", type="rss" vs. no type attribute, "title" and "description" vs. "text", and version="RSS" & version="RSS2" vs. no version attribute.

on September 13, 2003 05:02 AM

# Jeff Waugh said:

Another voice in the night recommending that OPML be avoided. Not for any political reason, simply that it is "not good XML". Someone mentioned the lack of definition for the format, which is a problem, but potentially worse: Using attributes for content is a very bad idea. They're meant to be for metadata, not human data. A very practical result of this is inability to provide translations for that content in a standard manner... Sure, you could provide OPML for each language in a different file, or use content negotiation, but that's a bad hack around the initial problem. :-(

The RSS idea sounds pretty sane. Only one thing missing, if you think it's important: hierarchy in the blogroll.

on September 13, 2003 05:18 AM

# SARTRE said:

Jeremy,

Just secured the domain name http://oldright.com/ awaiting activation of redirect to http://batrgroup.blogspot.com/

Have applied with NewsIsFree and NewsKnowledge to become a content partner provider. Main site is: http://batr.org which has commentary and related archives.

Desire script to place on LEFT column of Blog to allow daily grabbing of title and link back to similar sites that I select.

Posting using RSS from a particular NewsIsFree link works great. Seek ability to automate the process. (Blogger posts on the right column)

Key is be able to access the correct content. Moreover content like the NYT is not acceptable.

Hope it's OK to post this inquiry?

Thanks,

SARTRE

on September 13, 2003 07:25 AM

# Phillip Pearson said:

FYI, the Topic Exchange has a publicly-accessible list of channels in OPML format at:

http://topicexchange.com/topics/opml

It's done this way to be compatible for the k-collector channel list, available at:

http://k-collector.evectors.it/itentdirectory/topicRoll.opml?dir=140

I'd just like to say that I'm happy to publish something like this in either OPML or RSS format (or whatever people want) with links to the RSS feeds instead.

BTW the resulting RSS will give pretty much the same info as this:

http://topicexchange.com/topics/rss

on September 14, 2003 12:24 AM

# Danny said:

I'm sure you've noticed, improving autodiscovery is in scope for the Atom project too (I think it would make sense to talk about autodiscovery in general, rather than tying it just to RSS).
Anyhow, I notice there's a fair bit of anti-OPML feeling. Some alternatives are here :

http://dannyayers.com/2003/08/atom-s.htm

on September 14, 2003 02:14 AM

# Diego said:

As far as I can see Jeremy's initial proposal is not the target of other autodiscovery standards that have been proposed or used, including the well-known RSD and the proposals that Danny mentioned for Atom, since they deal with autodiscovery of APIs (blog posting and programmatic access) rather than automated feed listing, which is what Jeremy was looking for. Changing the current auto-discovery techniques from dealing with APIs to dealing with only feeds would imply changing them or expanding them as well--which means that having to spec this properly is inescapable--and in that case, why not simply spec on top of something that already exists and is used, like OPML or RSS? True, Atom might in the end spec something that includes this concept more cleanly if you will, but then again Atom is redefining everything, from feed formats, to APIs, to auto-discovery.

Particularly since Jeremy said that he'd like to see this problem solved "Soon" (which is a plus I think!) this, in my opinion, precludes the use of technologies still under development for things that are currently in use and deployed.

And, btw, Jeremy was wondering how to deal with this process: mailing list, Wiki, etc. Not much has been said regarding that.

I propose we use weblogs themselves, with maybe one or two pages with pointers to all the discussion. I think this will keep the discussion focused and avoid problems.

on September 14, 2003 08:57 AM

# Timothy Appnel said:

I have to agree with Danny that we consider autodiscovery in general -- even if he forgot to list WSIL in his list of alternates after I emailed him on it. (ahem.)

As I've said before I'm of the mind that feeds (like RSS) are simple forms of web services. It would seem a bit silly to do something so specific to RSS feeds when something more inclusive may be possible with a little bit of additional effort.

Just a thought.

on September 15, 2003 07:25 AM

# -lc- said:

Here's my attempt at an OPML feed-list for multi-blog sites using MovableType:

http://www.movabletype.org/support/index.php?act=ST&f=14&t=27995

Strangely I researched this before I saw your recent thoughts on this issue. Group-conciousness at work eh?

By the way, consider "index.opml" in the root of the site for auto-discovery. It seems logical to me.

on September 23, 2003 02:08 AM

# polonus said:

Dear Jeremy,

A good start is to start to use Rocket RSS Reader, starts conveniently inside your browser on an account using your e mail address. Has an excellent search function. Really a thing to make RSS and adding RSS feed easy peasy. Try and enjoy, it is za darmo, that is free.

Pozdrowenia,

Polonus

on July 20, 2004 11:59 AM

# Ken Dreger said:

The "RSS" Feed team will research the use of using a RSS feed to add relative articles into our forum areas from various RSS feed sites around the world.

Research and develop a relevant RSS feed that will feed into our MYSQL database on a nightly basis.

will this product do this for us?
Ken Dreger
kdreger@hspig.org

on October 3, 2004 08:43 PM

# google排名 said:

[canton fair][翻译公司][留学][荷兰留学][英国留学][加拿大留学][澳大利亚留学][美国留学][法国留学][英国签证] [加拿大签证][英国大学排名][专升本][留学中介][鹿特丹商学院][立新世纪][香花槐] [管理培训] [房地产培训] [房地产][人力资源] [翻译公司] [幼儿园] [集团电话] [印刷][Google优化排名][信封打印][办公家具][视频会议][整形][整容][丰胸][隆胸][减肥][美容][整形医院][整形美容][美容整形][北京整形医院][下颌角][瘦脸][隆鼻][除皱][吸脂][双眼皮][人造美女]

on April 13, 2005 08:21 AM

# Won said:

Article Search??

on February 19, 2006 04:38 AM

# 6699 said:

成人电影
成人论坛
成人图片
成人网站
成人小说
黄色电影
黄色论坛
黄色图片
黄色网站
黄色小说
激情小电影
三级片
色情电影
色情论坛
色情图片
色情网站
色情小说
写真
性爱电影
性爱论坛
性感美女床上自拍
艺术激情享受
自拍
走光
激情电影
三级片
黄色电影
色情电影
激情小电影
成人电影
性爱电影
写真
走光
自拍

on May 22, 2006 08:52 AM

# bing said:

Good artitle, you know it on 2003, good on you.

on May 23, 2007 11:13 PM

# Armin said:

RSS Auto Discovery 2.0 is very useful things, I use this plugin on my blog and work great

on October 18, 2007 09:32 AM

# Roberto said:

Just a (simple?) question. Is it possible to indicate rel=alternate for an external content page? (for example a translated page in a different language)

Thank you

on March 30, 2008 10:24 AM

# Robert Spychala said:

I'm playing with an idea for short URL auto-disovery.

http://sites.google.com/a/snaplog.com/wiki/short_url_hints

Would love to get your feedback if possible.

r.S.

on April 2, 2009 10:49 AM

# topspeed said:

on April 5, 2010 07:08 PM

Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.