Internal and external links on guardian.co.uk - comments follow up

 by Martin Belam, 18 August 2010

Last week I wrote a couple of posts looking at 5 ways that The Guardian puts external links onto web pages and Inline article links to tag pages on guardian.co.uk. They've both generated a few comments, which I've replied to on the original posts. The comments have raised a bunch of interesting points, and so I figured it might be worth gathering them all together into a blog post of their own.

Darren Ratcliffe asked about how we maintain the quality of links in something like the Guardian Environment Network:

"Is there a regular review of sites once they have been submitted to the network... i.e. if you add www.????.com this week and then in three months time it's no longer about football, it's about something far less appropriate... how do they pick it up?"

I've answered that the sites and the content in the network are regularly checked, so we would pretty quickly pick up a site that had gone 'rogue'.

A couple of people were interested in the mechanics of how our auto-linking to tag pages worked. Dustin asked:

"Do you have something in your algorithm that only links the first listed keyword, or does it link every matching keyword in the article?"

As far as I'm aware, the tool autolinks the first instance of every keyword, unless it occurs within an already manually inserted hyperlink, or the keyword is on a 'blacklist' of things not to autolink.

Daniel Rose was one of several people who wondered about the value of auto-linking within the body copy of articles on guardian.co.uk:

"I guess I can see how in some circumstances inline links add value for the reader. If they don't add value for reader though, is there any sense including them? I mean, does it add any value for the site?"

I replied that we don't generally tag articles with every single topic mentioned in the body copy, and so this extends the lateral navigation possibilities from a story - particularly when it may have been repurposed from print copy and be completely devoid of hyperlinks. The issue for me I guess is whether we should be investing time and effort perfecting the world's greatest autolinking algorithm, or whether we should be looking at workflow improvements that would render such a tool unnecessary.

Mo left what he described as a 'rant' about the auto-linking, which didn't seem much like a rant to me as it was very well argued:

"Sometimes, especially with tech and media stories, it's incredibly annoying -- things you'd definitely expect to be hyperlinks out to an external site end up at tag pages...While you could argue that linking to a company's homepage isn't necessarily the most useful thing to do, it is what most [handwaving alert] people expect the target of a link to be when they see 'This week, ((Microsoft)) announced...'.

The key here is that automatic links are no substitute for properly-written hypertext, which is -- after all -- what it's supposed to be. By all means augment the text with internal tag links (although I'd hope a document containing a mixture would style the tag links differently to external ones). I don't think there's anything bad about linking to tagged collections, either, but the conclusion I've reached after a few years of using guardian.co.uk is that this means of doing it generally errs towards violating the principle of least surprise (and makes for articles which read quite oddly in some cases, given the link styling acts as a highlight of sorts).

Personally -- and this is all my opinion and so effectively worth squat, despite it being a fairly well-considered one -- I think inline links should generally link to the most authoritative/canonical location of the thing you're linking: companies should link to the company website, names of websites should link to the website you're talking about ('The video-sharing website Vimeo today announced...' should never link the word Vimeo to an internal tag collection page for heaven's sake!).

As you suggest, generic terms are fairer game, but there needs to be clear differentiation between internal and external links and the purpose of the link should be comparatively obvious from the outset."

I've replied:

"If it was up to me, all news articles everywhere would include relevant hand-picked hyperlinks to both external and internal stories, sources, topic pages and websites, and we'd go back into the archive adding them in where they were lacking. But I think we have to accept that with the time and technology constraints that exist in newsrooms, that this isn't going to happen anytime soon. Auto-linking is a crude tool to at least get some of our tag pages exposed to the reader. It may well be that we don't make enough use of our 'stop' list of keywords that shouldn't be auto-linked."

Malcolm Coles has also joined in the thread, and like Mo made some points about how styling and sign-posting could give the user a better sense of what to expect from clicking an inline link. Malcolm points out:

"The biggest problem, I think, is the inconsistency - some links are auto-added by the CMS, some are manually added to external sites, and some are manually added by the author. It's only by mousing over a given link that you can tell where it's going to go."

It is an issue that I looked at in another blog post last week - "External links from news sites - what should the user experience be?". I pointed out that:

"Using different colours on different types of links within an article won't make it obvious to the user what is going on, and littering body copy with icons and (External link) parentheses doesn't make for a great reading experience."

Malcolm Coles left another comment pointing out just what a tricky problem that is to crack:

"I once had a blog post planned on this very subject as I'd been reading a Guardian article with three links in it - one to an internal tag page, one to an external site's home page, and one to a previous Guardian article. It was impossible to tell where they were going (from memory, I think they were all one-word links) without checking each one. I never wrote the blog post as I couldn't come up with a solution that worked for everyone - although I did think colour coding could work for the Guardian as you don't change the link colour of already visited links - so your current pale blue could be used for internal links and, say, dark blue for external ones. You could maybe use the title tag to make it clearer from an accessibility point of view. But then i realised that new users would probably think that the dark blue links were links they had already visited. So no one would ever click them. Then I gave up."

Next...

I've been fascinated by the debate around external links from news websites. I do think the way that most 'legacy media' sites behave with regard to external links is directly as a result of their web operations being a 'new-fangled' bit of the business bolted onto the side, rather than an enterprise in their own right. In my next blog post around this topic I'll have some examples of how workflow breaks the hyperlink experience even in some of the most recent digital news products to hit the market.

3 Comments

Speaking as a reader, rather than a tech, I'm very pro internal / external links whether auto generated or not. Even if it is linking a trade name such as Vimeo to an internal tag list - I'm on a news site so it is logical to assume that I am looking for news about that link rather than the company's website. Surely people hover over a link and check the URL in the bottom of the browser before clicking?

I'd meant to say "some are manually added to previous articles" not "some are manually added by the author". Feel free to correct and delete this comment - or leave this and make my inability to proof my own comments even more obvious.

There's really no reason not to style external links differently if they are genuinely performing two different roles, as they do in the case of automated/manual in-line links. I think Malcolm gave up too easily: there are loads of ways to style it meaningfully and non-disruptively.

Wikipedia's icon approach works quite well. I use user CSS to iconize all mailto: and pdf links. Other sites use little arrows or angles to good effect.

Is the occasional »<a>external link</a> really going to ruin the reading experience? I'd argue that it's an efficient way of adding human-readable markup to the hypertext. Reading the web is different from reading newspapers, and I think people are, and will increasingly become, comfortable with parsing metadata along with the words.

Or the iconless way, use different colours (underlined body text colour for internal links, blue for external?), or different mouseover behaviour (dashed underlining?, italics?), all very simple styling. There must be a compromise of subtlety where users who don't notice the change don't lose anything, but users who do notice the change can appreciate the gain.

I don't have much experience with screen-readers, but no change would be a downgrade, surely? Ideally the audio cue should be as simple as the visual. Make the title text a !Kung click, or something.

New windows: should default to same window. How much effort is it to middle-click or shift-click? Browser settings should be taking care of this.

Legal disclaimers: Surely unnecessary? Bury them somewhere in the T&Cs if the lawyers insist.

Linkrot: Is a fact of life. This goes hand in hand with "$MediaCo is not responsible for the content of external sites": we know, guys, relax. I like Stijns' caching suggestion. Maybe re-writing to point at the Wayback Machine would dodge the legal and infrastructural issues.

I totally agree with your conclusion, by the way. All of these issues are the consequence of looking at the web through a slightly inapt lens. Patches only go so far: the workflow needs to change.

Keep up to date on my new blog