The Sun's broken RSS still affecting Chipwrapper one month on
Well, it has been a month now since Dave Cross pointed out on his blog that The Sun's re-design utterly broke their RSS feeds, and still we await them being fixed. As a consequence it also broke several aspects of Chipwrapper. Until today I'd resisted the temptation to poke around and try and fix things, on the grounds that surely The Sun themselves would put things right. That doesn't seem to be the case.
The first problem was that The Sun changed the URLs of some of their feeds. That meant that the Yahoo! Pipes that power the sport and football headline feeds on Chipwrapper were no longer getting any content from The Sun.
I worked on the assumption that once somebody realised, the trivial technical changes would be made to make sure that a request for the old address http://www.thesun.co.uk/rssFeed/rssIndexDisplay/0,,2006070000,00.xml got served the new content from http://www.thesun.co.uk/sol/homepage/feeds/rss/article247739.ece. It isn't exactly rocket surgery. And, as Dave pointed out, it loses The Sun a slice of their audience. Subscribers to The Sun's feeds in my Google Reader Top 100 might be subscribed, but they are being served no content.
That isn't the only problem. The new feeds are malformed, and the URL given for each story is a relative one rather than an absolute one. Even with all the headlines appearing again in the Chipwrapper feeds, clicking on one of them generates a 404 error.
I was hoping to put a little hack into my Perl scripts to re-write The Sun's URLs by pre-pending the necessary "http://www.thesun.co.uk" to the front of them - but alas it isn't as simple as that. In order, I guess, to count the number of click-throughs, The Sun's RSS feed is syndicated by MediaFed, and they perform a re-direct on the URL. In the feeds themselves, The Sun's story URLs are given as a hashed re-direct reference - and it is impossible to deduce from the key the ultimate destination URL on The Sun's site.
I did think about pinging The Sun's homepage at the same time as I complied the Chipwrapper feed, and extracting a URL to match the story headline in the feed - but honestly, why I should I go to the effort to fix their content syndication mistakes?
For now, I've left The Sun's headlines in the Chipwrapper feeds, knowing that they will 404. Hopefully, someone at The Sun's web operation will eventually wake up to the fact that they are suddenly not getting any traffic through from their feeds.