Project Overview

Project Proposal

Abstract

Metadata – ‘data about data’ – is a crucial component of information infrastructures that supports the finding, retrieving and sharing of resources; metadata in the social sciences and humanities helps researchers, educators, students and others to share data and information, and to collaborate in research, education and learning. Metadata work is highly resource intensive, and there can be significant technical problems with metadata quality. These issues are compounded by the fact that interoperability requires the use of a standardized metadata format; however, many collections of digital resources are described by local and specialized standards, and the metadata for these collections first has to be mapped and crosswalked to a standard format in order to be useful in cyberinfrastructure. Taken together, these constraints mean that there is a significant ‘metadata gap’ between the amount of digital resources requiring standardized description for use in cyberinfrastructure, and the number of people who can generate such descriptions. Machine-based alternatives to metadata generation show promise but can run into difficulties when faced with describing heterogeneous resources. Social classification techniques (tagging, folksonomies) show promise but are uncontrolled and often non-technical in nature.

We propose to develop a novel and innovative method for automatically generating tags for digital resources by digging into metadata records and digital resource content, and then performing text analyses of the existing metadata for each resource, the content of the resource itself, and any further descriptive information that can be associated with the resource. The technique will be automatic, scalable, and through tag clustering will permit resource discovery across multiple heterogeneous collections, without first having to crosswalk metadata to a standard format. This should represent an increase in productivity as well as a cost savings in terms of metadata generation.

The broader impact of this proposed work lies in its novel approach to the design, production and use of interoperable resource descriptions in information infrastructure. This approach would improve federated-like discovery across heterogeneous repositories for humanities and social science researchers. The results of the project would likely be extendable and applicable to many other small to medium multi-disciplinary metadata repositories that seek to add value for their users.

Leave a Reply