Converting BBC Sport RSS feeds to WML

 by Martin Belam, 23 June 2004

About five years ago if I got up late and didn't get to see the football news in the morning, I would have to wait until the Evening Standard hit the streets, or try to find a chance to listen to Radio Five Live during working hours in a busy record shop. Once I moved to working in an office with an always on connection it meant that if I got up late, missed the telly and wanted to find out the latest football news, I would have to wait until I got to work and then use the internet. Then I got a phone with the mini-interweb on it, and it meant I could check the latest football news on the bus on the way to work, regardless of whether I was early, late or indifferent.

And in a classic example of Loosemore's Law - that has still proved to be not fast enough....

Don't get me wrong - the BBC's mobile service publishes excellent content - its just that when I'm dealing with only seconds of access time that I know is costing me money per byte I get impatient. What with my T610's browser being fiddly to use, and with Leeds United dropping off the front page of the BBC Sport site into the nether regions of the old English Division One, I was finding it a pain having to navigate through the menu and download individual stories - whilst clinging on to a handrail on the 212 bus.

What I needed was a magic bullet that would display the headlines and story summaries from all of the main indexes I was interested in from BBC Sport all on one page - and voila - bbcrss2wml.pl was born.

It takes the BBC Sport RSS Feeds I'm interested in, i.e. the top football stories, the soon-to-be renamed The Championship index, plus the Leeds United and Leyton Orient sections - and munges them all into one wml page for my T610 with the story summaries displayed.

The advantages are that all the downloading and processing overhead is being handled by the currybet.net web server, which is costing me a fixed rate, and not adding to my phone's download limit or costing me money, and that I get my football fix all in one place.

The disadvantage is that if I want to download the whole story, I can't always do it directly from the page. I was disappointed to find that you can't inherently guess what the mobile URL of an individual story will be from its web URL. In the script, for each RSS feed there is a variable that represents the specific transform you have to do to the URL - but they only work 95% of the time. For example today's web story "Cadamarteri set for Leeds" at http://news.bbc.co.uk/sport1/hi/football/teams/l/leeds_united/3825773.stm had the URL http://www.bbc.co.uk/mobile/bbc_sport/football/english_div_1/story3829913.wml on the BBC's mobile service. If the seven digit identifying number isn't the same it doesn't matter how clever you are in understanding the way the different path structures work!

Anyway I should also give thanks at this point to The Wireless FAQ, and LeoN at Puget Sound Software, whose grabrss.pl didn't do exactly what I wanted, but acted as an excellent tutorial in how I needed to use XML::DOM - and also to Murray and Iain who have gradually taught me at work that someone somewhere has probably done something similar before, and that a little digging around for modules can save you a lot of time.

3 Comments

For simpler html to parse you always have the low graphics version.

BTW, did you notice all the commented out football links in this page http://news.bbc.co.uk/sport1/hi/football/teams/default.stm

Though probably mostly foreign leagues on hiatus.

D'oh! I didn't think of that - but I did notice that BBC Sport have changed their left-hand nav to reflect the re-branding of the Football League but have left the URL's alone - the directories are still eng_div_1, eng_div_2 etc

Keep up to date on my new blog