Human readable linked data URIs - a follow up

 by Martin Belam, 2 March 2010

The constant avalanche of borderline 'dofollow' spam in my comment inbox means I'd almost forgotten what it was like to have a thread where real people that I actually know really discuss the thing I've written. [1]

That happened yesterday though, with my plea for human readable open linked data URIs.

There was a good debate on here, and on Twitter, about whether it was desirable or possible to make 100% persistent URIs that are 100% human readable, and a question about what we are trying to achieve through human readability. And Karen Loasby, information architect at the RNIB, popped up on Twitter to remind us that, from her point of view, we are discussing human listenability.

A lot of the discussion centred around the URLs for the BBC /programmes site. One of the risks for the BBC, having such a high profile website, is that it becomes an easy target for criticism. As Michael Smethurst put it on Twitter last night:

"repeating myself, ish, but wondering if we should run 'redesign the /programmes uri scheme' as a blue peter competition?!?"

Although I think he might have meant a b006md2v competition. [2]

Blue Peter and the mysterious URL

In all seriousness, I worked alongside the team developing Programme Information Pages in White City, and I well understand the arguments that took the BBC down the path of randomly generated computer squiggles for URLs.

What concerns me though, is that unless people are vocal about the benefits of human readable URIs, people will look at the BBC website and assume that their implementation is the right, and possibly the only, way of doing things. For /iplayer and /programmes it almost certainly is. It is a question, though, of workflow and scale.

With The Guardian's tag pages, we could very easily mint our URLs to have some combination of creation timestamp and database ID. That way whether she was /culture/cheryl-cole or /culture/cheryl-tweedy or becomes /culture/cheryl-terry or /culture/cheryl-andre, she'd always be 20060715-2d499150-1c42-4ffb-a90c-1cc635519d33 to us.

But at the time a new Guardian tag is created, we force the journalist/sub-editor/keyword manager/production editor to give it a human readable label that is unique in its namespace. Since we have that as part of the workflow, it doesn't make sense not to use it.

You can see another example of this on The Guardian site, with the URLs for individual articles. Our previous CMS assigned a random bunch of numbers to a story. Our R2 platform allows the user to set a URL slug - what-is-information-architecture for example. However, if the URL is not manually set, which happens a lot with archive content, the story inherits a readable URL from the mandatory tags that have been applied to it, e.g. guardian.co.uk/sport/2006/feb/27/winterolympics2006.winterolympics.

If you have the volume of output that the BBC does across its vast network of channels and stations, then I'm easily convinced that computer generated URLs are the right way to go. But let's not take an aspiration to be human readable off the table just because it would be prohibitively expensive for that example.



[1] This is thanks to the site being listed on loads of directories and mailing lists as a high PR dofollow blog, particularly in Turkey for some reason. [Return to article]

[2] Only teasing. Michael quite rightly pointed out to me that http://www.bbc.co.uk/programmes/bluepeter works too :-) [Return to article]

6 Comments

Just wanted to add another genuine-comment-from-someone-you-know...

Could you explain why the URLS for your topic pages include the 'section' of the paper/website in the path (eg http://www.guardian.co.uk/culture/cheryl-cole and http://www.guardian.co.uk/uk/alan-sugar) rather than 'type' (which might give us something like http://www.guardian.co.uk/people/cheryl-cole and http://www.guardian.co.uk/people/alan-sugar)?

And how do you decide whether a person belongs in 'culture' or in 'UK' or 'World' or 'Media'?

Re Blue Peter you would articulate in this manner via HTML page.

<a href="http://www.bbc.co.uk/programmes/b006md2v competition">Blue Peter</a> .

Human Happy, Machine Happy.

There is a few things at play there Frankie. Firstly, pages inherit their livery and navigation from their top-level zone, so /football/ashley-cole gets green and sports links, and /culture/cheryl-cole gets #D1008B. Secondly, we like the namespace in the URL so that we can have /world/egypt focus on news, and football/egypt focus on 'The Pharaohs', and need neither disambiguation for the user, or negotiation about who 'owns' a tag. We've discussed a lot internally whether people, businesses and locations merit a special 'class' of tag that is treated differently. So far I'm unconvinced that there is a compelling business case to make a change, although I can see the arguments either way. Incidentally, we do have a People A-Z on the site.

Hi Martin,

I'm still a bit confused. How do you decide whether cheryl-cole goes in "culture" or "uk" or "world"? And what happens when someone moves from one section to another (eg Vinnie Jones stops being a footballer and starts being a filmmaker)? Do you have both /sport/vinnie-jones and /culture/vinnie-jones (and live with the duplication), or move the URL, or keep it wherever it started out?

"I'd almost forgotten what it was like to have a thread where real people that I actually know really discuss the thing I've written."

If you want to sit around with people you know try sitting around a campfire with your parents, the internet is for those who like meeting new people.

As for the "dofollow" spam, you should know you are listed at or near the top of multiple dofollow lists floating around out there. Get your name removed from them and you will solve all your problems and go back to happy land where you don't have to meet new people. Simply Google High PR dofollow blogs or something of the sort, and you will be in every article on page 1. Just a heads up "mate"....

Marty B, your sarcasm is only rendered slightly less effective by the fact that you clearly didn't read the footnote where I specifically mention the repeated listing of this site as a high PR "dofollow" blog :-)

Keep up to date on my new blog