The English-Speaking Panel at IBERSID 2005 in Zaragoza

 by Martin Belam, 4 November 2005

I've been lucky enough this week to be in Zaragoza speaking at the University there, which has been hosting IBERSID 2005.

First off I must credit the patience / endurance of the audience, who listened to nearly two hours of talks on quite technical subjects, in a tongue other than their own, and secondly thank the organiser, Prof. Francisco Javier García Marco, who was so courteous and made us feel so welcome.

I formed part of a four person session that had been put together as a programme by Dr. Alan Gilchrist, and I also had the benefit of seeing him present his paper.

Alan Gilchrist was presenting on the attempt by a small group in the UK to set a new standard in the construction of thesauri - BS8723, to replace two existing British Standards established back in the 1980s, which don't take account of the digital revolution. His working group have been consulting on the issue, as increasing number of people and interest groups use thesauri in digital media, and there are now several software packages on the market aimed at people wishing to construct thesauri and taxonomies.

He made two points that stuck with me. One, that a recent survey had found that of the taxonomies being created in businesses today around 14% are constructed by trained librarians, and 45% by line managers. That is a very worrying trend for information professionals.

The second thing he said that stood out for me, was his comment that in this kind of area a standard isn't strictly a standard.

This isn't electric plugs and sockets. We are looking to build a consensus of best practice

Speaking first in our English language session was Jennifer Boyle from Scottish Natural Heritage. She was bringing to us a case study of how SNH had begun to implement the e-Government metadata standard.

The problems seemed to be the same as I hear over and over again from case studies like this - legacy technology, a central solution that doesn't quite meet precise requirements (in this case the geo-spatial e-Government standards are not granular enough for their purposes), and a large number of staff who need to be bought around to a change in culture.

One thing that seemed to have worked well for them was that, as Jennifer put it, "instead of getting everyone into a room and talking about metadata all morning" they built a couple of demonstrations that showed how enhanced metadata would improve the information situation for everyone, and let the system's advantages sell themselves.

Prof. Dennis Nicholson was on next, from the Centre for Digital Library Research.

Their research premise was that museums, libraries and collections all over the UK use different sets of subject knowledge classification. Everybody thinks being able to look material up cross-collection would be a great idea, but nobody appears willing to make the first move towards harmonising their subject vocabularies. In fact they are all looking for *someone else* to fix it.

Enter stage right HILT or High-Level Thesaurus Project. This is a system that takes the Dewey Decimal System as a classification spine, and maps subject vocabularies onto it. The example search was for 'teeth', which produced a choice between several Dewey categories. The user then selected one of these, and the system bought back a list of collections that had information classified under that category, allowing the user to choose where to carry out their search.

They are looking to develop a version of this as a SOAP service over the next 15 months.

Dennis did mention the potential costs in terms of maintenance overhead and constant mapping and re-mapping, and I couldn't help feel that although this might prove useful in the short term, as an approach it simply papered over the cracks in the information infrastructure until there was a genuine will to fix the problem at source.

Professor Peter Enser followed with a fascinating insight into the problems of image retrieval. He illustrated the problem at the start of his presentation by demonstrating how you could retrieve exactly the same picture with 7 vastly different queries, depending on whether you were requesting a generic or specific object, abstract concepts, a specific title, or something to which the image had a defined relationship.

He highlighted Blobworld as state-of-the-art, but even then demonstrated how an image query for more pictures of a zebra could return pictures of an elephant and a canal boat because mathematically they equated as being similar. In the end it was Peter who had the only heartening news from the panel - that human picture editors are going to be required to add context to images for a good while to come.

Out of the other presentations came a theme of what Alan Gilchrist called "disintermediation" in the information sphere. By this I understand he meant that the growth of digital technology had vastly increased the amount of information available, and the complexity of that information, yet at the same time Google and other search engines had removed the cataloguing and indexing middlemen (and women) and encouraged an approach of using just one or two words and expecting to get precise results. In all of the English speaking presentation I think there was a sense that an expectation had been set up by either e-Government, or the BBC, or the UK museum industry, yet for the technology to work as expected it needed to rely on a fundamental metadata and data structure that simply isn't yet there.

I enjoyed the session, although as I mentioned, I thought the audience had a harder job on their hands than the speakers. I found the delegates who I spoke to (with my two words of Spanish and their mostly immaculate English) very friendly.

1 Comment

Hmm. Information Architecture. Taxonomies. A tricky subject.

If you ask Clay Shirky, ontology is overrated anyway...

Keep up to date on my new blog