Inline article links to tag pages on

 by Martin Belam, 10 August 2010

In yesterday's post, "5 ways that The Guardian puts external links onto web pages", I mentioned how Patrick Smith had sparked off a lot debate with a blog post entitled "Link to the past: why do some news sites STILL not link out in 2010?". In it, he suggested The Guardian website as being one of the best examples of linking out - an assertion he partially retracted, saying:

"On the subject of and linking - it does do a good job directing readers to interesting and relevant things on its blogs and in the technology/media section, but I am swayed by some commenters below criticising my assertion that the site is 'good at linking', as the majority of links do appear to be internal-facing subject page links."

In our web CMS, we have a check-box that offers the option to 'Automate linking of keywords'. We have literally thousands of topic keywords, and the CMS will automatically insert a hyperlink to a tag page if the text in the body of an article matches the keyword. The automatic link occurs once on the first mention of the keyword, and we maintain a 'blacklist' of terms that don't get linked. The tool skips over any words that are already forming part of a hyperlink.

Whilst it certainly serves to increase the number of internal links pointing to those keyword resource pages, I think the benefits for the end user here are obvious in a lot of cases. If a journalist uses the phrase 'credit crunch' in an article, and it is automatically turned into a hyperlink to our credit crunch tag page, and that page opens with explainers on 'Credit crisis - how it began' and 'How the bubble burst', then that is a valuable and useful service to readers.

Credit Crunch tag page on

It is, however, a much less convincing user experience when the keyword in question is a company or organisation name.

If you click on the hyperlinked word 'Microsoft' in the middle of a Guardian news article, as a user are you expecting to see more news stories about the company? To get a stock-quote for MSFT? Or to stop reading news entirely and instead go to Or directly to if the piece in question was about that particular bit of their software portfolio?

The automatic linking is an admittedly blunt tool for putting topic based hyperlinks into articles that would otherwise be without any inline links at all - and as a result some are more useful than others for the end user. What could be better, I think, is if there was some finer grain control over what got automatically hyperlinked. Personally I'd prefer to see the tags for people, companies and organisations exempt from it, as I think those are the types of links where there is the most disconnect between expecting an external link and receiving an internal one.

The question is, of course, how much effort do you put into devising an algorithm to perfect the automatic linking of keywords, versus optimising workflow so that you don't need to automate links on an ongoing basis.


As I said, Patrick's blog post rekindled a debate about when and how news organisations should include external links, a theme I hope to return to in a couple of further blog posts.


Sometimes, especially with tech and media stories, it's incredibly annoying -- things you'd definitely expect to be hyperlinks out to an external site end up at tag pages (I've a feeling I've seen this happen, though I wouldn't swear on it, where the source article was the only piece with that tag in the first place). While you could argue that linking to a company's homepage isn't necessarily the most useful thing to do, it is what most [handwaving alert] people expect the target of a link to be when they see "This week, ((Microsoft)) announced...".

The key here is that automatic links are no substitute for properly-written hypertext, which is -- after all -- what it's supposed to be. By all means augment the text with internal tag links (although I'd hope a document containing a mixture would style the tag links differently to external ones). I don't think there's anything bad about linking to tagged collections, either, but the conclusion I've reached after a few years of using is that this means of doing it generally errs towards violating the principle of least surprise (and makes for articles which read quite oddly in some cases, given the link styling acts as a highlight of sorts).

Personally -- and this is all my opinion and so effectively worth squat, despite it being a fairly well-considered one -- I think inline links should generally link to the most authoritative/canonical location of the thing you're linking: companies should link to the company website, names of websites should link to the website you're talking about ("The video-sharing website Vimeo today announced..." should never link the word Vimeo to an internal tag collection page for heaven's sake!).

As you suggest, generic terms are fairer game, but there needs to be clear differentiation between internal and external links and the purpose of the link should be comparatively obvious from the outset (only ever linking to internal tagged collections is cheating, as it creates a mismatch between expected and actual behaviour IMO).

er, rant over, sorry.

I guess I can see how in some circumstances inline links add value for the reader. If they don't add value for reader though, is there any sense including them? I mean, does it add any value for the site?

Automatic linking to keywords might not be a very good idea to do. I do not think any algorithm is so intelligent as human brain. Try to do what exactly you want to do. This might be taking some of your time. But still do not leave your activity in any "intelligent software".

Auto linking is an interesting concept. Do you have something in your algorithm that only links the first listed keyword, or does it link every matching keyword in the article?

I agree that inline links can add a great deal of value to the reader. I guess the issue these days is that, as a post or article reader, I never know whether I should trust the links I see or not. Are they going to assist me to find further, relevant information, or are they really part of an affiliate marketing campaign? I can see why some sites continue to stay away from them.

Bit late to this as I was on holiday. But I often find the Guardian's autolinking really jarring - I noticed it this morning when reading this article about the Liverpool sale:

The word business is autolinked to the business section in this paragraph:

"Yesterday the family announced the death at 95 of Sheikha Maryam al-Shamsi, the mother of Sharjah's ruler, Sheikh Sultan bin Mohammed al-Qassimi. All members of the family will now observe seven days mourning during which no business will be conducted."

That aside, the biggest problem, I think, is the inconsistency - some links are auto-added by the CMS, some are manually added to external sites, and some are manually added by the author. It's only by mousing over a given link that you can tell where it's going to go. As a result, I have all but given up clicking on in-line links (I tend to head for the tags list in the right hand column). Don't suppose you'd care to share any stats on CTRs on the in-line links?!? I just find it hard to believe that people, in the middle of reading an article about a subject, click on the auto-link to a topic page ... I can see they might want to do it (a) once they've finished or (b) as a way to escape something they've lost interest in. But hunting around for a link in the copy doesn't seem the best way to accomplish those goals.

Returning to how it works ...

On this page, say:

The two instances of (I imagine) autolinked words are "Everton" and "Transfer window". Which are the terms listed in the tag/keyword list in your skinny middle column. Does that mean the autolinking is only done to words that appear there (which would appear to make it even more redundant from a reader's point of view)? I couldn't work out why Arsenal wasn't linked in the copy or a tag?

All that aside, I do think the Guardian is one of the best newspapers at linking out - but it's clearly much more common in the more bloggy bits than in the hard news bits.

Keep up to date on my new blog