September 12, 2003

RSS Auto-Discovery 2.0

As I've noted recently, I've been playing (and fighting) with RSS Discovery issues.

It has occurred to me that there's some non-existent infrastructure that we (whoever "we" really is) need to build if RSS is going to really, really take off the way it should. The first piece of it builds on existing work, while the second is relatively uncharted territory as far as I can tell. And I'll only touch on it briefly at the end of this post.


The current breed of RSS Auto-Discovery is quite handy and simple. By embedding this bit of XML:

<link rel="alternate" type="application/rss+xml" title="RSS" href="">

In my blog home page, I'm advertising to aggregators and other software tools that there's a feed available for their use. They don't have to do any guesswork or check all the links on my page or look for the little orange XML button like a human might. For automated tools, that's great. The user goes to my blog home page and clicks a "Subscribe" button in their aggregator (or maybe uses a bookmarklet) and the computer does all the hard work.

This works really well for blogs. Many blogging tools seem to be providing it by default now. But RSS is ultimately about more than blogs, right? At Yahoo, we've begun to provide RSS feeds for Yahoo! News (info), Ask Yahoo! (no info page, feed here), and Yahoo! Buzz (info page). And there's more RSS to come.

If you look under the covers, you'll find the necessary <link> tag in various places. Like on the Technology News page:

<link rel="alternate" type="application/rss+xml" title="RSS" href="">

But let's think about this from the point of view of someone writing aggregation software that wants the user to understand nothing about RSS and little if anything about the structure of a site like Y! News. Notice that the tag is not on every story page, like this one. (Maybe it should be?)


Keeping track of all that is going to become difficult as RSS goes more and more mainstream. We can't expect users like my Dad to know they're supposed to "subscribe" on the category page but not from the article pages. We can't expect them to know that there's even a difference, really. Nor can we expect software to figure this out--yet. And we can't expect news aggregators to add a whole category just for Yahoo content to their default feed list. That'd be crazy for a number of presumably obvious reasons. We can't expect users to run a web search for "Yahoo RSS" and end up on my blog. Remember, users won't know what RSS is. So the tools have to be able to figure out that Yahoo offers several categories of RSS feeds (News, Buzz, etc.) and many feeds within each of them.

We need more than what today's auto-discovery provides. And before you start think that this is just a Yahoo problem, consider On their pages (like this one) they provide the orange XML button but no auto-discovery support. I suspect that if a better solution was available, someone could convince them to use it too. After all, the more widespread their headlines are distributed, the more traffic (and presumably money) they get. Like Yahoo, they have a complex offering of content from a discovery point of view.


What we need is "RSS Auto-Discovery 2.0." At least that's what I'm calling it.

What is it? It's a bit more infrastructure that could go a long way toward scaling RSS for big sites and making the aggregators job easier. It should solve the problems I tried to introduce above.

I don't claim to have this all figured out--at least not exactly. But some of us at Yahoo think this is an important problem. And we'd like to see it solved. Soon. But rather than dictate how we (or *I*) think it ought to work, let's come up with something that will work. You know, an on-line group effort but without all the politics. Can we do that? I hope so.

Having said that, I do have some ideas that I'd like to start with--in the hopes that a productive discussion will follow.

I think this is all pointing at a per-site machine-readable directory of the available RSS feeds. And the obvious format for that directory is OPML. It seems that most aggregators have settled on it as the de-facto standard for subscription interchange. And some even use it as their native "database" format.

So let's use it. I've talked with Dave Winer and he agrees that it's a sensible use of OPML.

Now, how might automated tools learn of the existence of this directory? I see two obvious ways that could happen.

  1. The file, like /robots.txt could reside at a well known location with a well known name, say /feeds.opml or /rss.opml? This has the advantage of being very simple.
  2. We add a new, optional <link> tag that tells tools where they can find a relevant OPML file that contains information about the various feeds offered. This has the advantage of being like what we do today and it provides a bit of flexibility.

There was a more complex variation that I've omitted. Why complicate something more than it needs to be? That's a compelling argument in my book.

Starting from here, where do we go next? Discuss here? On the syndication list? Or the aggregators list? Or on a Wiki?

Update: Mark Fletcher (of Bloglines) suggests that we need a machine readable way to specify a blogroll (in OPML, I'd assume). I completely agree. Any other meta-data we're missing today?

Update #2: Diego Doval has mockups in RSS and OPML.

What else?

Oh, yeah. I said there was a second thing, didn't I?

After this issue fizzles out, maybe we can move on to solving the next problem: Machine Readable Licenses for RSS feeds. This was partly prompted by Derek's Syndication vs. Aggregation post a few weeks ago. We had a few IM discussions about this, but it really is something that we need to resolve in a standard and machine-readable way. There needs to be a tag or link that means "you can syndicate this, but you absolutely cannot charge a fee" or whatever.

It's a can of worms, I'm sure.

Posted by jzawodn at 01:19 PM

I've Reverted

To my natural state, apparently.

For whatever reason, I haven't managed to get to bed before 2am for a single day this week. And one day it was 3:30am when I hit the sack.

Needless to say, I'm a night person.

On the plus side, I've still been getting up at a decent hour. But instead of heading to work, I've been working from home in the AM and then heading to work around noonish. It seems to be working out very well. It's a good compromise between the productivity of being at home where it's quiet and the necessary face-to-face interaction of being in the office.

Posted by jzawodn at 12:36 AM

September 11, 2003

Google Copycat

Interesting. It seems that Google is testing Related Searches on their site. That's so nice of them, copying something we launched on Yahoo Search about 6 months ago.

Of course, I'm biased. I wrote the first production version of the system that built the lists of related terms, fine tuned it, and whatnot. It was a pain in the ass for various reasons, but hey, it was innovative, right? :-)

Wow, and all this time later, the Search Easter Egg still lives.

Oh, another interesting tidbit. In roughly 6 months, the core code has gone from Perl to Java and is now C++. Heh.

Posted by jzawodn at 08:45 PM

September 09, 2003

More RSS Hacking

I've finally gotten back to the RSS autodiscovery work that I mentioned a few weeks ago.

Since then, I've scrapped all my code and started over. I'm not relying on third party code to parse RSS, HTML, or XML anymore. I just began coding up support for the most common cases and things have taken off. The code can reliably find the RSS feed for nearly every blog on my blogroll.

Very cool. It's not quite the hell I thought it'd be. And it took far less code that expected. I'm not done by any means, but it's a good start.

There are a few notable exceptions, of course. Blogs that don't support autodiscovery and don't point to any obvious looking files. And Slashdot. I have no idea how this happened, but they missed the "http:" portion of the URL! Seriously. Their HTML says:

<LINK REL="alternate" TITLE="Slashdot RSS" HREF="//" TYPE="application/rss+xml">

Anyway... Other than a few anomalies it's not bad. Tomorrow I'll try much harder to find odd cases for it to cope with. I'd like to see my test suite go from 15 sites to about 50 or 80 representative URLs.

It's fun to code once in a while. :-)

Posted by jzawodn at 10:19 PM

September 08, 2003

Next time I'll call Dell

I'm so fucking sick of the PC Hardware industry.

Since I started mucking with PCs a long time ago, I've been a fan of ordering parts and building my own systems. And when it came time to upgrade, I'd do it myself.

Screw that.

After having built roughly 15 computers in the past 15 years or so (some for me, some for friends/family), I give up. I don't have time for the inevitable bullshit that comes with realizing that something just isn't working right--both in the hardware itself and the associated software and drivers. It's really, really, really not worth it.

What prompted this, you wonder?

I've wasted an entire damned day doing what should have been a trivial upgrade. I recently sold my venerable ThinkPad 600E to a friend. And I found another to buy the guts of my P3-866 desktop machine at work (the one that I brought in, not the one Yahoo supplies--long story). Anyway, with the combined funds I planned to upgrade the guts of that desktop a bit.

The Story

A week or so ago, the new parts arrived: a Pentium 4 2.4GHz processor, 1GB of 400MHz RAM (2 512 DIMMs), and an Intel D865PERL motherboard.

Yesterday I went to work in the morning to help with a database server switch. After that was done, I headed over to my desk to perform the swap. I had brought in one of my two LCD monitors and planned to just leave it at work where it'll get more use. That was roughly 10:30am. Six hours later, I felt a lot like Mark Pilgrim trying to install Windows XP.

I removed the old motherboard, leaving the CPU, fan, and RAM installed. I figure I'll just ship it to Andy that way. I installed the new motherboard, RAM, and CPU. But when I powered it up, it didn't do much. The CPU fan came on and a few things lit up on the motherboard, but the hard disk didn't spin at all.

So far it was pretty much in line with my expectations. I've never (and I mean never) had a motherboard work on the first try. So I carefully reseated everything, looked for shorts, etc. No dice.

Figuring there might be some useful on-screen info, I decided to plug in the video card. But I couldn't. My old AGP card (a 3dfx Voodoo 3 3000) didn't fit. Okay, something was weird. The AGP connector looked as if it had been mounted backwards on the board. The little piece of plastic in the socket that's there to make sure you only plug the card in the right way appeared to be in the wrong place.

I double-checked the little picture of the motherboard. Yup, it says "AGP" there, so this is where the video card goes. Considering that all the other slots are PCI slots, there wasn't a lot choice in the matter.

At noon I decided to head home, taking all the pieces and parts with me. I have a few spare video cards in my collection and I figured something would fit.


All my other cards had the same problem. So I headed over to Intel's web site and looked more closely at the product specs. The pictures that Intel provides told me that the socket on my board was certainly not on backwards. Then I noticed that it had a "Universal 0.8/1.5 V AGP 3.0 connector (with integrated retention mechanism) supporting 4x and 8x AGP cards."

Hmm. I don't think AGP 4x even existed when I got my Voodoo several years back. And something told me that "universal" doesn't mean what I thought it should...


A visit to Fry's

I headed off to Fry's in search of a cheap video card that was fancy enough to work in the motherboard and which came from a vendor that had decent XFree86 support.

After a bit of browsing, I settled on an ATI Radeon 9200. It wasn't the absolute cheapest but at ~$120 it was much cheaper than most of the apparently high-end cards they had. And it had a DVI port for my LCD.

I returned home at 2pm (Lawrence Expressway was all carved up for repaving). Oh, I should note that a trip to Fry's and home is 2/3rd of a trip to Yahoo and home. Keep that in mind later.

Before I opened the box, I visited the XFree86 web site and checked the Driver Status to make sure that the ATI Radeon was on the list. If it wasn't, I'd take the card back without breaking the shrink wrap seal and find something that was on the list.

It was on the list! So I opened and installed the card. I connected the VGA cable (since I wasn't sure where the DVI cable for my LCD monitor was) and turned it on.

Same problem.

I spent the next 45-60 minutes trying everything I could think of doing. I moved memory around, reseated the CPU, re-checked connections, etc. Eventually I had the motherboard completely removed from the case and sitting on anti-static bags. I figured that would eliminate the chance of any electrical shorts between the case and the Baird.

Same problem.

I figured the board was fried. But I decided to browse the motherboard installation docs one more time to see if I could find anything I missed. I did.

Apparently, the Pentium 4 CPU is such a fucking pig that P4 boards require a second power connector (12V) on the board. Guess what? The power supply in my 3 year old case doesn't have one of those.


Not only was the case a pain in the ass to work inside (the P4 board was just long enough to get in the way of cabling the drives), the power supply was useless for the new board.

Another visit to Fry's

I haded back out in search of a replacement power supply or a whole new case. After looking at the prices and selection, I opted for a new case--one with more room. I got a case for ~$79 and headed home. I arrived home at roughly 5pm to finish the job.

I removed all the crap from the old case and installed it in the much nicer new case. After everything was plugged in, it worked on the first try.


Next I proceeded to download the latest Knoppix release (that's what I use on non-servers now), burned a CD and began the process of migrating data off the old hard disks (an 8.4GB and a 20GB disk). It seems that 2003-09-05 had just come out, so I was using very fresh code. Anyway, I figured I might as well put my two spare 80GB disks to use, so I spent the next 1.5 hours moving data around and then installed the drives along with the DVD drive and CD burner.

Video Hassles

Then I booted Knoppix into the GUI mode to poke around and then run the hard disk installer. It came up in 1024x768 mode but I didn't worry. I can tweak the video after the fact. I've managed to make my home "desktop" machine speak 1600x1200 to the LCD using it's built-in less powerful ATI card before.

The install finished and I got everything set the way I wanted, so I set about making the video work right. For whatever reason, Knoppix hadn't figured out that I had an ATI card and was using the vesa driver.

I performed many Google searches and quickly noticed that nasty feeling forming the pit of my stomach. Getting the Radeon 9200 working with XFree86 is not a trivial proposition. At first, this information look promising, except that I'm not running RedHat. But the magic seemed to be telling XFree86 ChipId and using the "ati" or "radeon" driver in the config. (Note: I've always hated X configuration.)

No go. My monitor claims not to be getting a signal when I try that stuff.

More searching.

Found some Debian specific notes. But they require way more effort that I'd like to invest. At 11pm, incredibly pissed off at the PC hardware industry for requiring me to upgrade my power supply and video card, pissed off about having wasted an entire day on this ordeal, I decided to just take the machine to work and get it back on the network. I could always do a bit more searching and muck with the X stuff in the morning.

So I drove to Yahoo and home. Again.

The Next Day

When I got to work today, I experimented with X configs a bit more. All told, I figured I've tried 30-40 different configurations and I'm still using 1024x768 and the vesa driver. I can't get the "ati" or "radeon" drivers to do shit. And I don't even care anymore. I'll use a slow-ass VGA driver if it can drive my monitor at 1600x1200. Hell, I'd settle for 8 bit color at this point. I'm not gaming. Just using xterms and a browser.

So, here I sit with a blazingly fast CPU, lots of disk space, and a beautiful monitor wondering what the hell I should do with it all. Just give up and install Windows? Buy another video card and try to sell this one locally? Throw it all in the dumpster and become a park ranger in Montana?

Seriously, why is this shit so damned difficult?

A Resolution

I resolve to never do this again. From now on, I will "upgrade" by selling my old computer and using the cash to offset the purchase of a brand new, pre-assembled and tested computer. Just like I do with laptops. The only "upgrades" I will ever do myself will involve adding memory or disk space. That's it. Just like I do with laptops.

At this point I really wonder how much time and money I'd have saved by just calling Dell.


There are so many other things I had planned to get done yesterday.

Posted by jzawodn at 01:33 PM