I've finally gotten back to the RSS autodiscovery work that I mentioned a few weeks ago.
Since then, I've scrapped all my code and started over. I'm not relying on third party code to parse RSS, HTML, or XML anymore. I just began coding up support for the most common cases and things have taken off. The code can reliably find the RSS feed for nearly every blog on my blogroll.
Very cool. It's not quite the hell I thought it'd be. And it took far less code that expected. I'm not done by any means, but it's a good start.
There are a few notable exceptions, of course. Blogs that don't support autodiscovery and don't point to any obvious looking files. And Slashdot. I have no idea how this happened, but they missed the "http:" portion of the URL! Seriously. Their HTML says:
<LINK REL="alternate" TITLE="Slashdot RSS" HREF="//slashdot.org/index.rss" TYPE="application/rss+xml">
Anyway... Other than a few anomalies it's not bad. Tomorrow I'll try much harder to find odd cases for it to cope with. I'd like to see my test suite go from 15 sites to about 50 or 80 representative URLs.
It's fun to code once in a while. :-)