Let a million data structures bloom...

Martin Belam by Martin Belam, 9 September 2010

I didn't go to dConstruct this year, but it has certainly made plenty of ripples. And there was one slide in particular which I thought I should respond to...

And rather like Cynndd's response to that UX debate on Monday, my response might just surprise you.

Because I pretty much agree with 99% of everything Tom Coates said in Brighton.

He made a passionate argument that the networking protocols that have bought us the Internet are the equivalent of investing in massive infrastructure projects like roads. It isn't the road that transforms society, but what is enabled by the traffic. And the traffic on the network is increasingly not web 'pages', but data built up from the way that people use services, both as part of browsing the web but also part of using data-enabled devices in the physical world.

But "Death to the semantic web" is certainly an eye-catcher of a slide.

I wouldn't go that far, but one thing that has concerned me with the Linked Data ecosphere is that it is mostly being driven by academic research rather than real business applications. It means that there is an expectancy that everything will be perfectly marked up, be right, validate 100% against the standard, and be free from spam and malicious intentions.

Well, I don't know about you, but I haven't seen that hope come true anywhere else on the web, so I don't see why that should start now.

What does work on the web is...well...things that work.

Some data standards just have to be 'good enough' rather than perfect. Tom cited Lanyrd as an example of something that could be rapidly put together using loose data exchange on the network, building services on top of services. The work Glenn Jones has done on "Re-using data people have left around the web" is also another example of building things on some rough-and-ready half-implemented microformats, that don't require boffins and OWL and SPARQL and ontologies and all that jazz.

However, I don't see the necessity of Tom's distinction between 'the semantic web that must die', and 'the web of joined data' that he champions. There is no reason why we shouldn't be striving to produce structured linked data and RDF and semantic web technologies where appropriate. The more metadata the merrier. Nor is there any need to unpick the Linked Data cloud just because it doesn't feel agile enough.

Linked Data Universe July 2009

To my mind, the top down perfect linked data solution and the bottom up re-use of data to build things greater than the sum of their parts will happily co-exist, in the same way that a service like Flickr uses industry standard EXIF data and human-supplied 'folksonomy' metadata and geodata to compile meaningful data about a photo. One is a standard, and the other is subject to uncontrolled synonyms, abbreviations, colloquialisms and misspellings. But together they work.

And that is, after all, the end goal - a web of data that works for end users, regardless of the technologies underpinning the data.

You can listen to Tom's excellent talk, and all the presentations at dConstruct, on their podcast page, and you can download a PDF of the original slides.

16 Comments

Leigh Dodds says (via Twitter): "One comment: I think the linkeddata approach is a bottom up one too, the emphasis is very different to previous semweb approaches"

I think the point that I was trying to make above all others was precisely what you've written above - I have no problem with Semantic Web technologies if people want to use them - I'm in favour of a Web of Data by any means necessary. What I think is wrong is the full aspirational goal of the Semantic Web or Giant Global Graph that TBL is advocating, and particularly the W3Cs attempts to make any future 'web of data' one that is true to the Semantic Web. I think that the Semantic Web rhetoric colonises any attempts to talk about a connected web of data and services, and I think that's a genuinely bad thing. We need a more humble idea of the Semantic Web, as one of a competing range of technologies and ideas that are all being explored in order to reach a goal that is larger than any one set of technologies of ideologies.

I used to use the phrase 'Dirty Semantics' a lot - use what works for you, and the disparate parties over time will find the right way to derive value from them. The most important job is to explain the goals, possibilities and aspirations in a way that makes sense to individual organisations or people, and let them choose how to implement it. Standards emerge slowly, but that's okay - at least it'll be standards based on what works, rather than a community, frankly, rather dominated with library scientists, owning the discourse. If the end result is RDF, OWL and SPARQL everywhere, then I'll be as happy as anyone. I don't think it will, I suppose, but surely the goal in this case is more important than the specifics of how we get there...

I totally agree with Tom's position. He is basically articulating (and reiterating) similar sentiments expressed here:

1. Data 3.0 Manifesto (decoupling Linked Data from RDF)

2. http://bit.ly/cAWhQ4 -- John F Sowa's Comments about nothing new about Linked Data & Semantic Web etc..

Another apropos oriented link re. theme of this post.

1. I Have Yet to Metadata I Didn’t Like -- Mike Bergman post from a few weeks ago.

There's a difference between stating that the phrase Semantic Web is dead and that of the concept and goals of the Semantic Web as being dead. I wrote an article about the need to rebrand the term Semantic Web: The Semantics of the Semantic Web: Don’t Confuse the Concept with the Movement.

The Internet is so fascinating because it is both organized and chaotic... innovations can come from the top, bottom, and in fact, from any axis surrounding the cloud. The most novice users and the most experienced programmers can both create great and useful things with the data available.

This is the beauty in the net... all is alive, nothing is dead... yet.

I somewhat liken the internet to the ocean, and each site as a particular make of water filter.

Some water filters are slapped together relatively haphazardly to filter information in a relatively coarse manner, whereas others may be so fine as to be able to filter individual bits of plankton or bacteria or precious metals. The person retrieving the filter at the end of the day knows what to expect, because they've molded the filter according to their specifications, their tolerances.

They have gathered what they desire, and perhaps due to a defect in the filter (a bug) or some catch the filter should not have obtained (a hacker exploiting some vulnerability in the platform), they get something they didn't necessarily intend to retrieve.

In terms of a perhaps less watery analogy, I could look at Q&A sites such as Mahalo vs Yahoo Answers. Mahalo offers an incentive plan in the form of virtual tips (Mahalo Dollars, which can convert to actual currency at a specific exchange rate) to encourage healthy competition among answerers, whereas Yahoo offers a virtual point system with no incentive other than user recognition.

Mahalo's "filter" is the finer of the two in my opinion, and the information it obtains from the internet at large is more valuable to me as a user, so I for one would gravitate towards their system in search of an answer.

Both services make choices about how much functionality they expose to their respective users, and these decisions translate into appeal to these same users, tempered by all the various factors that can doom them to mediocrity, whether Yahoo's "kitchen sink" approach or Mahalo's capacity issues, particularly around the time of the initial release of their Answers service.

Anyway, thanks for the enlightening post, I'm still trying to soak up all the terminology but I think I get the picture.

the line is truly a catcher..killing semantic web is a diverse statement..i have clicked on the link provided by Jeff thanks for that supplement..

I agree with your sentiments, Martin. For me, the web is like an intricate and intertwining channel that hosts what we input

Agreed Tom. Great way to catch one's attention.
Darth - a little out there for me! Thanks.

The thing I like about this post is your comparison of what works that we already have. Flickr is a great example of using the built in metadata AND the data that we add as humans. I totally agree that having as much data as possible is going to be a good thing, and we shouldn't want death to a semantic web.

The age old problem is frequently that academia becomes the vehicle for driving changes in technology rather than business processes themselves. Just look at most technology it follows a standardised approach which then has to be adapted by business to make it able to operate in a live business environment

Anyone finding semantic web technologies a little, shall we say, overwhelming might like to look at microformats - they re-use data on the page as its own metadata, by simply applying HTML classes in the way they were meant to be used, to classify content.

Again, I think it's important that people listen to what I said in context. I do think the Semantic Web as a concept should die, because it needs to be replaced with a new, similar, but non-prescriptive, collaboratively generated web of data instead. The term Semantic Web is so laden with meaning, and comes with such governance structures and constraints around it that it actually needs to be thrown away. It pollutes any debate that is happening in the world about making the web more joined up.

I definitely want a more joined up web, and I don't care if people want to use SemWeb technologies, but the dream needs to be separated from the specific ideologies and technologies of the SemWeb community if it's going to survive.

Tom, I think I get the crux of your argument. When you say non-prescriptive: it feels at times as though the semantic web has or is becoming or could become something of a self-fulfilling prophecy and one that as you quite correctly point out is inherently self-limiting and/or self-defeating.

Keep up to date on my new blog