Internal and external links on guardian.co.uk - comments follow up
Last week I wrote a couple of posts looking at 5 ways that The Guardian puts external links onto web pages and Inline article links to tag pages on guardian.co.uk. They've both generated a few comments, which I've replied to on the original posts. The comments have raised a bunch of interesting points, and so I figured it might be worth gathering them all together into a blog post of their own.
"Is there a regular review of sites once they have been submitted to the network... i.e. if you add www.????.com this week and then in three months time it's no longer about football, it's about something far less appropriate... how do they pick it up?"
I've answered that the sites and the content in the network are regularly checked, so we would pretty quickly pick up a site that had gone 'rogue'.
A couple of people were interested in the mechanics of how our auto-linking to tag pages worked. Dustin asked:
"Do you have something in your algorithm that only links the first listed keyword, or does it link every matching keyword in the article?"
As far as I'm aware, the tool autolinks the first instance of every keyword, unless it occurs within an already manually inserted hyperlink, or the keyword is on a 'blacklist' of things not to autolink.
Daniel Rose was one of several people who wondered about the value of auto-linking within the body copy of articles on guardian.co.uk:
"I guess I can see how in some circumstances inline links add value for the reader. If they don't add value for reader though, is there any sense including them? I mean, does it add any value for the site?"
I replied that we don't generally tag articles with every single topic mentioned in the body copy, and so this extends the lateral navigation possibilities from a story - particularly when it may have been repurposed from print copy and be completely devoid of hyperlinks. The issue for me I guess is whether we should be investing time and effort perfecting the world's greatest autolinking algorithm, or whether we should be looking at workflow improvements that would render such a tool unnecessary.
Mo left what he described as a 'rant' about the auto-linking, which didn't seem much like a rant to me as it was very well argued:
"Sometimes, especially with tech and media stories, it's incredibly annoying -- things you'd definitely expect to be hyperlinks out to an external site end up at tag pages...While you could argue that linking to a company's homepage isn't necessarily the most useful thing to do, it is what most [handwaving alert] people expect the target of a link to be when they see 'This week, ((Microsoft)) announced...'.
The key here is that automatic links are no substitute for properly-written hypertext, which is -- after all -- what it's supposed to be. By all means augment the text with internal tag links (although I'd hope a document containing a mixture would style the tag links differently to external ones). I don't think there's anything bad about linking to tagged collections, either, but the conclusion I've reached after a few years of using guardian.co.uk is that this means of doing it generally errs towards violating the principle of least surprise (and makes for articles which read quite oddly in some cases, given the link styling acts as a highlight of sorts).
Personally -- and this is all my opinion and so effectively worth squat, despite it being a fairly well-considered one -- I think inline links should generally link to the most authoritative/canonical location of the thing you're linking: companies should link to the company website, names of websites should link to the website you're talking about ('The video-sharing website Vimeo today announced...' should never link the word Vimeo to an internal tag collection page for heaven's sake!).
As you suggest, generic terms are fairer game, but there needs to be clear differentiation between internal and external links and the purpose of the link should be comparatively obvious from the outset."
"If it was up to me, all news articles everywhere would include relevant hand-picked hyperlinks to both external and internal stories, sources, topic pages and websites, and we'd go back into the archive adding them in where they were lacking. But I think we have to accept that with the time and technology constraints that exist in newsrooms, that this isn't going to happen anytime soon. Auto-linking is a crude tool to at least get some of our tag pages exposed to the reader. It may well be that we don't make enough use of our 'stop' list of keywords that shouldn't be auto-linked."
Malcolm Coles has also joined in the thread, and like Mo made some points about how styling and sign-posting could give the user a better sense of what to expect from clicking an inline link. Malcolm points out:
"The biggest problem, I think, is the inconsistency - some links are auto-added by the CMS, some are manually added to external sites, and some are manually added by the author. It's only by mousing over a given link that you can tell where it's going to go."
It is an issue that I looked at in another blog post last week - "External links from news sites - what should the user experience be?". I pointed out that:
"Using different colours on different types of links within an article won't make it obvious to the user what is going on, and littering body copy with icons and (External link) parentheses doesn't make for a great reading experience."
Malcolm Coles left another comment pointing out just what a tricky problem that is to crack:
"I once had a blog post planned on this very subject as I'd been reading a Guardian article with three links in it - one to an internal tag page, one to an external site's home page, and one to a previous Guardian article. It was impossible to tell where they were going (from memory, I think they were all one-word links) without checking each one. I never wrote the blog post as I couldn't come up with a solution that worked for everyone - although I did think colour coding could work for the Guardian as you don't change the link colour of already visited links - so your current pale blue could be used for internal links and, say, dark blue for external ones. You could maybe use the title tag to make it clearer from an accessibility point of view. But then i realised that new users would probably think that the dark blue links were links they had already visited. So no one would ever click them. Then I gave up."
I've been fascinated by the debate around external links from news websites. I do think the way that most 'legacy media' sites behave with regard to external links is directly as a result of their web operations being a 'new-fangled' bit of the business bolted onto the side, rather than an enterprise in their own right. In my next blog post around this topic I'll have some examples of how workflow breaks the hyperlink experience even in some of the most recent digital news products to hit the market.