Select Page

Dominic Oldman: “SKOS is the obvious choice for representing our thesauri in semantic form”

June 20, 2011

8

All Posts

Dominic Oldman is Deputy Head of the Information Systems department at the British Museum. He is Principal Investigator of the ResearchSpace project, a project funded by the Andrew W. Mellon Foundation aiming to develop a semantic research environment for the culture and heritage sector.

PoolParty Team had the chance to talk with Dominic about the importance of semantic technologies and thesauri (SKOS) in the cultural heritage sector and the plans of the British Museum to integrate these technologies into their information systems.

What is the purpose of your thesaurus project?

The British Museum already uses thesauri as part of its collection record system. They include:

  • Object type (e.g. pin, cup)
  • Material (e.g. paper, stone)
  • Technique of manufacture (e.g. carved, incised)
  • Material Culture/Period (e.g. 13th dynasty, Late Minoan)
  • Ware (specialised thesaurus for pottery, e.g. Black Glaze Ware, Samian)
  • School (used for artworks, e.g. Italian, Aesthetic Movement)
  • Escapement type (specialist thesaurus for clocks and watches)
  • Subject (e.g. animal, acupuncture)
  • Ethnic Name (e.g. Aztec, Yoruba)
  • Place (with modern and archaic types)

These examples are typical for the cultural and heritage sector but many organisations build their own vocabularies. This means that different terms can be used to describe the same type of object. The British Museum leads a cross organisational project, ResearchSpace (www.researchspace.org), which aims to harmonise cultural data supplied by different organisation using the semantic Resource Description Framework (RDF) standard. The project will use a high level ontology, the Conceptual Reference Model (CIDOC-CRM), to apply a framework for all the imported data, but it also requires that terminology is harmonised. This means mapping links between different thesauri terms supplied by the users of ResearchSpace.

Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

Thesauri allow museums to quickly locate and use the correct and precise terms for object records. The number of terms held within different thesauri means it would be difficult for staff documenting the Museum’s collection to efficiently and accurately locate the correct terms otherwise. The thesauri are used both to control data entry, and to allow narrower-term searching of our data (so for example, we can do searches such as ‘find all vessels’ without the word ‘vessel’ having to be present in the object descriptions). We can also retrieve correctly using synonym or near-synonym search terms. ResearchSpace will import collection records which are supported by controlled thesauri terms , so although thesauri management is not a key objective of the project, linking between different terms within the thesauri is.

Which role does SKOS and/or Linked Data play in order to achieve your goals?

SKOS is the only well-established semantic standard for thesauri, and it is the obvious choice for representing our thesauri in semantic form. The use of the RDF schema to store data means that data can be easily linked. This principle can be applied to the controlled terms that have been embedded into the different datasets. Mappings between different terms in different thesauri can be used to enhance the connections between data supplied by different organisations. If a search understands that different terminology means the same, or is similar, then this improves the relationships that can be established and allows scholars to find interesting and new pathways, or stories, through the data. We are also interested in extending our collection vocabularies to other internal information systems so that connections can be made between collection data, and say, the events data that we publish on our web site. Establishing a consistent vocabulary for internal systems should improve the integration that can be achieved when publishing to the web and therefore can improve our service to the visitor.

What are the most important values you generate for your stakeholders?

  1. Eventually, as we and other museums and galleries expose our data in semantic form, to allow structured searching across multiple heritage-related data repositories.
  2. To provide enhanced data exploration and visualisation facilities for the curatorial community, by linking with other semantic data repositories, such as GeoNames.

What are the most important arguments to use Semantic Web standards and Linked Data, especially in the cultural heritage domain?

The British Museum is a museum of the world and collaborates with people and organisations to enhance our understanding of history and culture through objects. Bringing together data from different organisations can be expensive and take a long time. Even when projects deliver they can have limited scope if the data standards used are not accessible to all. The Semantic Web provides a framework for making data more accessible and easier to harmonise. It has the potential to unlock information that would be difficult to uncover using traditional data technologies. It allows more people and organisations across the world to put the data to more uses that the Museum could do alone.

What kind of applications can be built or have been built on top of your thesauri?

The ResearchSpace project aims to build terminology mapping tools that allow researchers to build mapping profiles to support searching for their particular research projects. These profiles will support a ResearchSpace semantic search tool. However, these tools would be independent of PoolParty.

Why did you choose PoolParty to manage your thesauri?

PoolParty is not currently used as the primary way to manage our thesauri but is part of the research and development being undertaken as the Museum moves towards semantic data. It is currently used for examining and experimenting with our thesauri and investigating ways of utilising semantic technology further.

How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work?

This is still in planning.

Do you plan to publish your thesauri or parts of it on the LOD cloud? Under which licenses?

Yes, we hope to publish semantically. The licence has yet to be agreed.

What are your future plans and next steps?

Hopefully, the British Museum will publish its collection records and thesauri as linked data. The ResearchSpace project will be developed over the next year into a working prototype with a view to then providing a full production system for the research community.

You may also like these posts …