HTML5 for journalists

 by Martin Belam, 5 August 2010

I've contributed before to the debate about whether journalists need to be programmers, but whether or not you've ever wrestled with Ruby, PHP or Python, if you are involved in online publishing in any way you are almost certainly familiar with some HTML. Even if it is only by pressing buttons in your CMS, you'll know how to add mark-up like <a href="">, <strong>, <blockquote> and <em> to your articles.

Well, HTML is changing significantly for the first time in the best part of a decade, and you'll need to learn to at least recognise, if not use, some new tags.

HTML5 adds nearly thirty new tags to the language. Geeks, technologists and Apple are most excited about <canvas> and <video>, which promise to replace a lot of the functionality provided by browser plugins like Flash. As a journalist, though, you are much more likely to encounter some of the new tags which provide additional semantic mark-up to articles.

Page structure

Several of these new tags are devoted to marking up the information structure of pages, namely <header>, <nav>, <article> and <footer>. These are intended to replace the common habit of having code like <div id="header">, <div id="nav"> and <div id="content">, all of which you will see if you view the source of this page.

These types of tags will most likely be in the templates of your pages, and you won't need to bother with them too much. They may though, end up having a huge impact on the way web content is consumed. They are going to make it increasingly easier to deliver the same article content over multiple platforms, and in multiple formats, in the way that Safari 5's Reader mode, Instapaper or Phil Gyford's "Today's Paper" already strip articles to their bare bones to provide an 'enhanced' reading experience.

Article structure

As a journalist, web production person or sub-editor, where you may get more entangled with the new language is with tags like <section> and <aside>. The first of these, <section>, represents a slightly vague generic 'section' of an article, so that, for example, a single article that contained two or three different TV reviews or recipes can now have them individually surrounded by <section> tags.

The second tag, <aside>, is used to mark-up something tangentially related to the main body of text. In a news context, that might represent a factbox, some links to related stories, or a sidebar detailing the key points of a story's timeline.

Enhanced mark-up

Several new HTML5 tags are there to enhance the way that content is understood by machines, or is rendered by browsers and devices. All of which will ultimately result in a better user experience for humans. Examples of these that you may start coming across include:

<time>

The <time> tag involves including a more precise machine-readable version of a time when you mention it in an article, or when a 'published' date is included. We've already started using it at The Guardian. This tag will most likely be added in automatically by CMS systems that output HTML5, but you may find that interfaces start including an option to add a little more precision when you are writing 'yesterday' or 'on Friday'.

<details>

<details> is intended to provide a way of including extra content or information that can be optionally expanded or collapsed by the user. That might be something like additional information about an album release, or the opening times of an exhibition.

<figure> & <figcaption>

HTML5 is going to allow mark-up to replicate the print feature of having an image, chart or diagram that sits in the main flow of content, but that is not actually part of the main article. These elements will be marked-up as a <figure>, with <figcaption> specifying, as it suggests, a caption to go with the image.

<mark>

In the same way that you can currently use <em> and <strong> within a paragraph to stress particular words, the new <mark> tag provides a way to highlight or signify text. Visually you can style it however you want, although a yellow highlighter pen effect seems to be the favourite so far, but the point is to allow you to distinguish portions of text.

<wbr>

<wbr> is a tag to insert when there is a 'word break opportunity', rather than an arbitrary line break as enforced by <br> or <br />. An example might be to suggest potential breaks in a long word like:

'Super<wbr>cali<wbr>fragi<wbr>listic<wbr>expiali<wbr>docious'

That will almost certainly have come out horribly on this website, but in the HTML5 world, your browser would, if necessary, have hyphenated the word beautifully at an appropriate point.

Find out more...

Here are a few other articles that might help you get to grips with the concept of HTML5:

28 Comments

Really useful post Martin. Will make a great pointer for my students. Thanks.

There are some limitations with the TIME element, in that it doesn't support dates without years (such as anniversaries: "every year on 5 August"), fuzzy dates ("around 21 June 1980"; "19th century") or non-Gregorian (pre 1750) dates. Those of us who feel it should are making a case on the HTML working group wiki, at: wiki.whatwg.org/wiki/Time_element. Related Twitter discussion is tagged #html5time

Great article. Points out the most relevant changes in HTML5 i think.

Very good post ! However, these are bad news for Microsoft. Windows 7 Phone Operating System will be launched soon, but will not able to process HTML5 and Flash.

Another point. I suppose that the big CMS's will be changed radically by this new standard.

To provide a different perspective on .html. I do run a small online publishing site. I've really had to learn a fair amount about .css, .html, and other technical aspects. It's not easy keeping up with the flow of information and developments with coding.

I looked at the list of sites using HMTL5 markup and it's hard to notice much of a difference from simply a visual perspective.

I don't believe that HTML5 is as much of an issue as CSS3 is. Yet both are not going to be largely adopted immediately, in particular because Internet Explorer stays the world's dominating browser and isn't able to handle large parts of the new technology.

Microsoft is certainly doing well with the latest version of Internet Explorer 9 and the css doesn't seem to be an issue that is bothering them. they are getting the bugs out.

With HTML5, many of the new features constitute threats on their own, due to how they increase the number of ways an attacker could harness the user's browser to do harm of some sort.

Wow, now I have to learn the new HTML? I just learned how to write articles, and now its changing! Just my luck! Great post, very informative!

I still think IE 9 is sluggish compared to firefox. Also, regards to the other comment of HTML5 - I'm certain this will be used extensively and be very popular with most sites in the future. As a programmer I think the next BIG wave of technologies will be HTML5, Ajax, and mobile marketing/technologies.

Thanks for the post, I am starting to use some of these in my site-building. It's a shame that so many people still stay with IE - I stopped using it a long time ago because it is not very html/css friendly anymore. But unfortunately we still have to cater to Microsoft when making websites because so many people use it.

As more journalists are publishing online it is important for them to understand basic HTML commands. We understand that HTML 5 is going to be capable of doing wonders, but for journalists it is best to focus on what is going to help their readers. This post does a great job at providing that.

I'm actually looking forward to the HTML being fully intergrated, granted - its a dumbed down version of how it was written before - but it does makes things faster and easier. The <canvas> and <video> are definitely nice.
Thanks for the post, I didn't know about some of the html expressions.

Super post. I am clued up a little with HTML, basically as you say because it has become part of everyday life writing for my blog. I'm excited for the changes in html5 and perhaps am in a possition now to make most benefit of it.

I do not think that a jouranlist has to learn writing programs in html5. But it is true that html5 is a fantastic development and it seems that all the programs will run in the browser in the near future.

Update: HTML5 has so many useful features its crazy, I know of a web designer that said javascript will soon be replaced by it, even google showed the power of html5 when they had the balls, that would move when your mouse neared it, almost looked like flash! Unreal!

HTML5 will change the mobile Web. Magazines will have to start considering Ipad Apps or HTML5 ready Websites, so they will still be on the market in 10 years. Anybody tested HTML5 on IE9 yet?

Just stumbled across your post. I'd honestly never thought about who formats articles on newspaper websites like the Guardian et al. Great if journos are actually submitting articles with their formatting, links etc. already in place, and even better if they're doing it in HTML5.

P.S. Lol at your CAPTCHA! Love it.

Although most of these new tags look very useful I wonder about backward compatibility. I mean, how would you deal with the many users that didn't yet install the newest browsers that support HTML5?

So happy to see even journalists are starting to use some new techno like HTML5. But sadly many ICP (internet content provider) don't know a thing about it or won't bother to adapt. How many time have we experienced "your device doesn't support Flash, please download the latest version from Adobe, blablablah". Yup, you know I'm talking about my poor iphone.

Hi. Thanks for this great blog.
Just one thing on <wbr> : it is something really special, as it has no reference to hyphenation.
<wbr> (word break) means: "The browser may insert a line break here, if it wishes." If the browser does not think a line break necessary nothing happens. Its intent is to allow wordbreaking without showing any ‘-’. In case of veryveryveryverylongwords. Or URLs. And unlike &shy; :
The ‘good’ way to proper hyphenation in web browsers is to use &shy; which is the ‘soft hyphen’, an invisible character that'll tell the browser that it can break the word at this place, showing a ‘-’ at the right of the line.
For further reference, visit quirksmode.

I think HTML5 usage in mainstream media, and usage by journalists, should be held off until HTML5 compliant browsers are more prominent. For now, fall-back, or graceful degradation needs to be used as a lot of users still have outdated browsers who do not support all the HTML5 goodness.

Very useful post. Looks like it makes some powerful functions easily accessible to writers. I wonder how long the uptake curve will be though.

Thank you for this post. I'm trying to learn HTLM5 at the moment and those tips are really useful for me. In my opinion - all the web will move to HTML5 in 2-3 years.

Thanks for this introduction to the new HTML5. I really love HTML5, the big problem is the browser support though. When even the W3C says to hold off on deploying HTML5 in websites why should we be implementing HTML5 right now?

As a few people have already mentioned, the problem is browser support, in particular Internet Explorer. Even though IE9 will be much improved it's only available on Vista upwards. That still leaves millions of XP users on versions 6-8. Ideally, IE9 should be a required automatic update for all versions of windows. Now that's wishful thinking!

Keep up to date on my new blog