Human readable linked data URIs - a follow up
The constant avalanche of borderline 'dofollow' spam in my comment inbox means I'd almost forgotten what it was like to have a thread where real people that I actually know really discuss the thing I've written. 
That happened yesterday though, with my plea for human readable open linked data URIs.
There was a good debate on here, and on Twitter, about whether it was desirable or possible to make 100% persistent URIs that are 100% human readable, and a question about what we are trying to achieve through human readability. And Karen Loasby, information architect at the RNIB, popped up on Twitter to remind us that, from her point of view, we are discussing human listenability.
A lot of the discussion centred around the URLs for the BBC /programmes site. One of the risks for the BBC, having such a high profile website, is that it becomes an easy target for criticism. As Michael Smethurst put it on Twitter last night:
"repeating myself, ish, but wondering if we should run 'redesign the /programmes uri scheme' as a blue peter competition?!?"
In all seriousness, I worked alongside the team developing Programme Information Pages in White City, and I well understand the arguments that took the BBC down the path of randomly generated computer squiggles for URLs.
What concerns me though, is that unless people are vocal about the benefits of human readable URIs, people will look at the BBC website and assume that their implementation is the right, and possibly the only, way of doing things. For /iplayer and /programmes it almost certainly is. It is a question, though, of workflow and scale.
With The Guardian's tag pages, we could very easily mint our URLs to have some combination of creation timestamp and database ID. That way whether she was /culture/cheryl-cole or /culture/cheryl-tweedy or becomes /culture/cheryl-terry or /culture/cheryl-andre, she'd always be 20060715-2d499150-1c42-4ffb-a90c-1cc635519d33 to us.
But at the time a new Guardian tag is created, we force the journalist/sub-editor/keyword manager/production editor to give it a human readable label that is unique in its namespace. Since we have that as part of the workflow, it doesn't make sense not to use it.
You can see another example of this on The Guardian site, with the URLs for individual articles. Our previous CMS assigned a random bunch of numbers to a story. Our R2 platform allows the user to set a URL slug - what-is-information-architecture for example. However, if the URL is not manually set, which happens a lot with archive content, the story inherits a readable URL from the mandatory tags that have been applied to it, e.g. guardian.co.uk/sport/2006/feb/27/winterolympics2006.winterolympics.
If you have the volume of output that the BBC does across its vast network of channels and stations, then I'm easily convinced that computer generated URLs are the right way to go. But let's not take an aspiration to be human readable off the table just because it would be prohibitively expensive for that example.
 This is thanks to the site being listed on loads of directories and mailing lists as a high PR dofollow blog, particularly in Turkey for some reason. [Return to article]
 Only teasing. Michael quite rightly pointed out to me that http://www.bbc.co.uk/programmes/bluepeter works too :-) [Return to article]