Introduction to Classification
Humans have been classifying for as long as they have used a written language. One of the earliest examples is Pinakes at the Library of Alexandra. This short history takes us up to Ranganathan’s Colon Classification in the 1900s, and observes that the first enterprise use was by publishers of indexes. On the web we have seen the classification structures of the Yahoo directory and the Open Directory Project. Today, “Pattern matching is the basis for much of what occurs in these systems for rules based categorization.”
The paper argues that second order properties or metaproperties are essential for classification and navigation of information, for example for faceted classification and the navigation it generates. The paper observes that metaproperties, are not accommodated well within such standard schemes as Z39.19, description logics (DLs), and the formal ontologies OWL, BFO, and DOLCE. ”
This is a very detailed article with diagrams. Classification specialists will be most interested.
The loose tagging that people do in personal indexing of articles can be mined to reveal facets as we see through this article – “This paper illustrates how a facet analysis of a broad folksonomy based on the postulational approach can reveal underlying conceptual categories and facets to which the folksonomy’s aggregated tags belong. In this way, facet analysis techniques are used to manually expose a faceted classification ontology in the flat tag space, thus revealing user-generated relationships between information items. ”
Postulational approach (in case you are wondering) “to facet analysis refers to a methodology used for both the creation (by a classificationist) and subsequent usage (by classifiers) of a faceted classification scheme.”
The study used data from Library Thing, a social networking site for people to catalog their book collections. From an analysis of tags to 76 history books emerged two universes: book and subject, and out of these a discernment of facets that was aided by using the Ranganathan framework of Personality, Matter, Energy, Space and Time.
Of interest, especially to librarians, is this conclusion – “As will be discussed, the inclusion of users in a faceted classification may provide novel ways to personalize faceted navigation. ”
Classify your content
Very readable and well illustrated article on simple classification schemes to use to organize content.
Text analytics to help in classification
Classification is essential but may be overwhelming to staff. Because of the volume automated classification is needed – and text analytics software can help.
“The latest advancements in text analytics use sophisticated techniques to determine the conceptual meanings within each file to compensate for shortcomings and extend the functionality of the applications that use policy rule engines. Use of text analytics greatly increases the accuracy of the classification by interpreting the meaning of terms in their context instead of being limited by the character strings inherent in policy rule engines.”
Contolled vocabulary in falksonomies
Answers the question “what can a thesaurus do for a folksonomy-based system?” Also makes clear how the folksonomy differs from a classification system.
“Folksonomy is a bottom-up approach where users themselves join the classification, compared to top-down taxonomy and library classifications. By this nature, folksonomy classification can reflect users’ actual interest in real time (Niwa et al., 2006). In contrast to hierarchical library classifications (e.g., DDC or LCC) and thesauri, there is usually no limit for choice of tags in folksonomy; so many similar tags are generated. ”
Forrester paper on Information Classification
A free report from Forrester for Information and knowledge management (I&KM) professionals. There are several critical reasons for classifying information in addition to information retrieval — “such as ensuring security, implementing a retention policy, and optimizing the use of storage”. “Now is the time for I&KM professionals to sync up with security and IT operations professionals to identify and then augment existing classification policies. Create a classification template that meets the 80/20 rule, enabling all team members to quickly classify about 80% of information in the organization.”
This monograph, published by the University of Maryland, examines methods for exploratory web search where the subject is new to them or complex. It begins with theory of search, information retrieval, and information seeking, surveys existing information systems , and concludes with some thoughts on evaluation.
From the abstract:
“This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney).”
The “search environments review” describes the value and use of a variety of classification methods: hierarchical, faceted, automatic clustering, and social tagging – with examples drawn from the public Web. These techniques help a user in refining or clarifying a search.
Next there is the matter of viewing the results and ways in which results can be presented in ways that go beyond linear: 2D format (treemap, hyperbolic tree, scatter plots, cluster maps), or 3D (Data Mountain browser).
Both sections provide a good overview of the principles of the design and their relative merits.
Measuring the Success Of a Classification System by Iain Barker, Boxes and Arrows (April 2007)
Barker adapted work by Donna Maurer for evaluating card-based classification and applied it to quantitatively showing the improvements to be obtained from a new classification system for the company intranet.