The Legal Thesaurus Project
“Users of … intranets frequently express frustration with how much time it takes to find items—both when searching for known items and when browsing to see if items on a particular topic exist in the
system. . . Browsing and search functions are much enhanced if the indexing and topic hierarchy, or taxonomy, make sense to the user and are customized to reflect the content of the source
documents.” Jan Sykes, Information Management Services, February 2001
“Keyword search captures only 33% of relevant information.” Chris Wilkie, BBC Information and
Archives, Sept. 2002
- Cost of finding (time, frustration)
- Cost of not finding (bad decisions)
- Cost of training (staff turnover)
- Value of discovery (related information, browsing)
- Language is ambiguous – synonyms, abbreviations, acronyms, misspellings, homonyms, antonyms, etc.
- What are the specific objectives of the project?
- Are essential objects hidden in a lot of chaff?
- Are a few good objects sufficient? Or is it necessary to find the best, the one that makes a difference, oreverything on a topic?
Process of Thesaurus Design
- Understand user and organizational needs
- Define the subject scope (the Thesaurus scope)
- Identify sources of ‘raw’ vocabulary
- Harvest terms (wordstock) that are likely to be search terms in the field
- Group the terms into broad categories, subcategories and sub-subcategories
- Establish relationships
- Collect feedback and revise until stable
Who are the users?
- How expert are they in the field?
- Do they understand the use of thesauri?
- Do they prefer natural or controlled vocabulary?
Scope of the Thesaurus
- Core topics immediately pertinent to the main subject (most terms, more specific)
- Fringe topics supportive of the main area (fewer terms, less specific)
- Which Display?
Literary vs. User Warrant
- Identify sources for the raw vocabulary
- Sample literature read (etc.) by users
- Sample general literature (conference proceedings, core journals)
- Sample user questions
- Existing thesauri, dictionaries, encyclopedias
- Titles, abstracts of articles in the field
Refine the raw vocabulary. Recommendations
- Be sure terms are acceptable to users
- Distinguish among words with different meanings
- Give instruction, clarify how terms should be used, add scope notes
- Construct and identify facets
- Check and add relationships
- More user involvement = better suited to use
- Take every opportunity to involve users
- Start from user search logs to find commonly used terms
- User experience focus groups
- Solicit community feedback
- Online discussion groups
- Term submissions
- Searchers want to search multiple databases at once
- Indexers want to use a vocabulary they are familiar with to index objects in a different domain
- Content producers want to merge multiple databases
- Indexed using different vocabularies
- User communities want a single thesaurus that spans multiple domains
- International organizations want a single vocabulary that supports searching in multiple languages
Testing & Evaluation: Methods
- What are some useful criteria for evaluating a controlled vocabulary?
- Heuristic Evaluation: Evaluation by an expert or a panel of experts
- Affinity Modeling: Task a sample of users with organizing your terms and compare to your own organization of the terms
- Usability Testing: Holistic evaluation of the information system, including the content, interface, etc.
The Legal Thesaurus is displayed in several formats, including an alphabetical format and hierarchical format. Preferred and non-preferred concepts are interfiled in the alphabetical listing. These two displays are meant to support the user in different ways. The alphabetical display allows the user the ability to look up a known concept to determine if it is the preferred concept or to find the appropriate preferred concept for non-preferred concepts. The hierarchical display supports subject browsing by the user to find the appropriate concept for the materials to be assigned subject concepts.