After spending most of the day on the problem of "how to find the RSS feed (if there is one) given a URL" I've come to the conclusion that it's a pain in the ass.
I have a much better appreciation for the pain that aggregators developers have been thru. And that includes Feedster, Technorati, and others who have the same (or similar) problems.
Ugh.
Posted by jzawodn at August 20, 2003 11:59 PM
Actually sucks is the understatement here. Try it sometime on a Russian or Hebrew blog and you're in for a real treat. God bless the little orange icon.
Every blog platform should support the RSS autodiscovery standard:.
Yes they should support the RSS autodiscovery standard, but the majority dont, and when they do they dont always stick to the standard, I have run into three different alternatives. You can also probe for file names, but there is no standard there either, and we have identified 100+ file names for rss feeds. Finally some sites use a specially constructed URL (blogspot), those are good because they are easy to put together.
Finally the fun part is that some sites just return an HTML error page with a 200 status if you specify a bad url (when you probe for file names), so you have to check that the returned page is actually RSS/RDF, which requires a regex from hell.
Expect to put about a week or two of work into this if you want to do this reliably.
...and then sometimes, you just KNOW that the feed is at backend.php :)
I've had very good luck with this bit of PHP written by Keith Devens. I use it in Feed on Feeds, and it works great. Enough blogs support auto-discovery now that this works almost all the time, in my experience. Maybe I'm just subscribing to extra-geeky blogs, though.
If Webmasters make their feeds impossible to find, that's their problem. I don't see why it should be the aggregator's responsibility.
I really cant agree with Wes Felter, a lot of weblog users have no idea whether they produce an RSS feed or not, but they do want to see their weblog indexed. It is an education issue and as more and more people enter the weblog space, you will find that you need to do more and more educating as these people are not early adopters (read 'geeky'), but users of the technology who dont necessarily understand all the ins and outs of weblogging.
There's also Mark's ultra-liberal RSS Finder, which seems like it tries a variety of methods:
http://diveintomark.org/projects/rss_finder/index.html
A description of it is here:
http://diveintomark.org/archives/2002/08/15/ultraliberal_rss_locator
jeremy, you know what i do? i select part of the blog posting text, copy it to clipboard, paste it into feedster search box, and look for the rss feed url.
unfortunately, feedster isn't as good as google, so this never works from first search. i have to delete some words, and try at least 5 or 6 times, but eventually, i find it ;)
will this work for you?
http://www.blogstreet.com/rssdiscovery.html
Hi,
I too have tried BlogStreets : RSS Discovery and it works pretty well. Why dont you ask them what approach they have taken towards it
Regards
Harsh
Syndic8 has a search function which will do exactly what Jeremy wants.
More info in my blog entry.
It can be accessed from the site and via the XML-RPC interface.
No, it is not hard to discover the RSS feed, just look at the source code of the page and try to find:
-rss
-rdf
-xml or
-backend
It is my personal formula ;D cheers :)