I've finally gotten back to the RSS autodiscovery work that I mentioned a few weeks ago.
Since then, I've scrapped all my code and started over. I'm not relying on third party code to parse RSS, HTML, or XML anymore. I just began coding up support for the most common cases and things have taken off. The code can reliably find the RSS feed for nearly every blog on my blogroll.
Very cool. It's not quite the hell I thought it'd be. And it took far less code that expected. I'm not done by any means, but it's a good start.
There are a few notable exceptions, of course. Blogs that don't support autodiscovery and don't point to any obvious looking files. And Slashdot. I have no idea how this happened, but they missed the "http:" portion of the URL! Seriously. Their HTML says:
<LINK REL="alternate" TITLE="Slashdot RSS" HREF="//slashdot.org/index.rss" TYPE="application/rss+xml">
Anyway... Other than a few anomalies it's not bad. Tomorrow I'll try much harder to find odd cases for it to cope with. I'd like to see my test suite go from 15 sites to about 50 or 80 representative URLs.
It's fun to code once in a while. :-)
Posted by jzawodn at September 09, 2003 10:19 PM
//slashdot.org/ works in IE - guess it assumes http: protocol by default... I wonder how many browsers it'd break though. ;)
Actually, this is part of relative URIs as defined in RFC 2396.
"A relative reference beginning with two slash characters is termed a network-path reference, as defined by in Section 3. Such references are rarely used."
This is so "[...] it is possible for a single set of hypertext documents to be simultaneously accessible and traversable via each of the "file", "http", and "ftp" schemes if the documents refer to each other using relative URI [...]"
Boy, did you hit the nail on the head!
I don't know *how* many times I've started off with another's library and ended up scrapping it because it was poorly written, or bloated beyond belief (notable exception is curl--curl rocks).
Usually, with good programming practices and a bit of experience, a task is much simpler and elegant than originally feared.
Aah yes, I too have cursed Slashdot's RSS autodiscovery.
Stupid question regarding Slashdot -- are there actually any RSS feeds except for /index.rss ? E.g. for people's ./ journals that would be nice but I failed to find any.
Sorry if this is FAQ or OT.
Hey Jeremy, I assume MT has autodiscovery on by default, right? I can't see it in the settings and would hate to be one of the blogs pissing you off ;)
Yup. It's built-in to the default templates in the recent versions. Yours is all set. :-)