"1267 and all that..." - John Sheridan on turning UK law into linked data at Online Information

Martin Belam by Martin Belam, 8 December 2010

Last week I was presenting at the Online Information conference in London, and I've been blogging my notes from some of the other sessions I attended. On Monday I blogged about social media in the enterprise featuring talks by Gordon Vala-Webb, Helen Clegg, Hugo Evans & Angela Ashenden. Yesterday I shared my notes from a talk about optimising site and enterprise search by Iain Fletcher.

Today I want to blog one of the most interesting and entertaining sessions I saw at Online Information. John Sheridan was talking about "parsing data structure from legislation". Not, it must be said, for most people, a promising topic. In fact, via Twitter, the Conservatives' head of digital Samuel Coates confessed he was "impressed he made that interesting".

His talk was based on the work done at legislation.gov.uk, which has digital versions of British law from 1267 to the present day. Which, he said, had given them "some formatting issues over the years".

These days, it takes 14 years of legal training before you are allowed to write legislation, with good reason. There are rules around the judicial weight given to individual points in an act of law, depending on the structure. As Sheridan put it: "Law is fantastically well architected information". Even the use of quotes is enormously precise, and the typographical layout conveys some of the legal meaning. Sheridan asked: "Wouldn't it be great if we could start to squeeze out of the statute books some of that structure?"

The task before them therefore was to try and take that written word and turn it into linked data with a clear semantic model. It is a very complex and rich set of information to try and represent as pure data. What I particularly liked was the way they had structured their URLs. It is common in law for a new Act to insert some text into the body of a previous one. This gives a versioning problem. As John Sheridan put it: "The statute book has known pasts, known futures, and unknown futures. All at the same time"

Their chosen solution has been to be able to append things to the end of the URL to represent the different states. Adding /proposed with give a version of the known future, whilst adding a date on the end will show how the text of an Act will appear at that point in time.

The example Sheridan was using here was the Academies Act 2010, which will add some lines to the 1993 Charities Act on the first day of next year. The original version of the Charities Act as passed in 1993 appears at legislation.gov.uk/ukpga/1993/10/contents/enacted, the current text at legislation.gov.uk/ukpga/1993/10/contents, and the future revised version at legislation.gov.uk/ukpga/1993/10/contents/2011-01-01. It seemed a very elegant solution.

If you are of that mind, you can see how cross-referencing and versioning applies itself very well to some of the metaphors we use in computing, and to the open linked data principles in particular. These rely on permanent unique URIs for things - in this case documents, sub-sections of documents, and definitions. The Academies Act 2010 relies, for example, on definitions set out in the 2006 Companies Act.

Next...

John Sheridan said something else that resonated with the theme of the whole linked data track - "We are governed by data". In my next blog post about Online Information I'll have my notes from a session I watched concerning linked data and local government.

Keep up to date on my new blog