Controlled Vocabularies

Controlled Vocabularies

“Whether you realize it or not, you’re already familiar with controlled vocabularies. The Library of Congress subject headings and Yahoo’s search criteria are a couple of examples. So, as you’ve probably guessed by now, controlled vocabularies are predetermined sets of terms that fit together to describe a specific domain such as kitchen appliances, nuclear engineering, or dirt biking.”

Controlled Vocabularies encompass a number of sub-types:

  • thesauri
  • spelling variations
  • semantic synonyms
  • broader terms
  • narrower terms
  • related terms
  • taxonomy
    • categories
    • facets
    • keywords
  • AuthorityFile?
  • Somewhere in that mix are StopWords and others.

    from [Cuisinarts, E-Commerce, and … Controlled Vocabularies] By LouRosenfeld

    [frequently asked questions about online searching]
    What is Controlled Vocabulary?
    What is the best way to understand controlled vocabulary?
    What are the terms used in the print materials?
    Do all databases use Controlled Vocabulary?
    What is the best way to use Controlled Vocabulary in searching?

    [Creating Controlled Vocabularies for image databases]
    Controlled vocabulary and thesauri creation for describing images in a database
    This site offers excellent links to locations offering controlled vocabularies for many different fields of study. Much of the information explains how photographers can use the IPTC meta-data in the digital image file “header” to make the most out of searches using existing image database and cataloging programs. Articles with other issues of interest to those creating an image database (such as filenaming considerations, image security, etc) are also discussed.

    [WWW = Wealth, Weariness or Waste]
    Controlled vocabulary and thesauri in support of online information access
    D-Lib Magazine November 1998
    This article offers some thoughts on the problems of access to information in a machine-sensible environment, and the potential of modern library techniques to help in solving them. It explains how authors and publishers can make information more accessible by providing indexing information that uses controlled vocabulary, terms from a thesaurus, or other linguistic assistance to searchers and readers.

    [Vocabulary as a central concept in library and information science]
    The nature and role of vocabulary in information systems is examined. “Vocabulary” commonly refers to the stylized adaptation of natural language to form indexes and thesauri. Much of bibliographic access, filtering, and information retrieval can be viewed as matching or translating across vocabularies. Multiple vocabularies are simultaneously present. A simple query in an online catalog normally involves at least five distinct vocabularies: those of the authors; the cataloger; the syndetic structure; the searcher; and the formulated query.

    this paper delves deeply into the problems presented in existing controlled vocabularly systems (eg. Library of Congress). Very interesting reading.

    Further Reading

    There are related conversations about the relationships of controlled vocabularies, thesauri, taxonomies, etc. on the [Elegant Hack blog] and the [SIGIA-L mailing list]

    [Functions of a thesaurus / classification /ontological knowledge base (PDF)] has a list of functions a thesaurus should provide, written by Dagobert Soergel at the University of Maryland School of Information Studies.

    This reading gives a fairly complete list of functions that should convince anybody of the importance of studying classification. It starts with an overview and then gives details for each major functional area. A list of thesauri / classification schemes at the end illustrates further the practical importance of this topic.

    [Content Organization Methods Overview (.DOC)] is a short comparison and explanation of types of Controlled Vocabularies written by Andrew Otwell. It was written to help explain these options to a client, and is meant to be a basic overview of the topic.

    See also: