Analysis and Indexing Methods
Contents
8.1. Research Comparing Automatic and Human Indexing.
8.2. Human Analysis and Indexing.
8.2.1. Cognition Versus Social Construction in Human Analysis and Indexing
8.2.2. Human Indexing Rules.
8.2.2.1. Human Indexing Rules for Image Text.
8.2.2.2. Human Indexing Rules Based on Probabilistic Analysis.
8.3. Automatic Analysis and Indexing.
8.3.1. In the Beginning Was the Word.
8.3.2. Simple Keyword Indexing.
8.3.3. Negative Vocabulary Control: Stop Lists.
8.3.4. Counting Words.
8.3.5. Comparative Counting and Weighting.
8.3.6. Improving the Count: Stemming.
8.3.7. Natural Word Distributions.
8.3.8. Words Versus Phrases.
8.3.9. Managing Vocabulary in Automatic Indexing.
8.3.10. Automatic Vocabulary Management.
8.3.11. Clustering.
8.3.11.1. Latent Semantic Indexing.
8.3.12. Citation Indexes.
8.3.12.1. Bibliographic Coupling.
8.3.12.2. Co-Citation.
8.3.13. Relevance Feedback.
8.4. Subject Analysis and Indexing in Indexing and Abstracting Services.
8.5. Growing Role of Automatic Analysis and Indexing.
8.5.1. Censorship or Guidance?
8.6. Our Examples.
8.6.1. A book Index.
8.6.2. An Indexing and Abstracting Service.
8.6.3. A Full-Text Encyclopedia/Digital Library.
● 1
● 2 human indexing versus automatic indexing
● 3 results of human indexing versus automatic indexing
● 4 multiple approaches to indexing in IR databases
● 5 automatic indexing of language texts versus image texts and other non-language texts
● 6 recommended resources on indexing processes
8.1. Research Comparing Automatic and Human Indexing.
● 7
● 8 role of users in IR research
● 9 variables in IR research
● 10 size of documentary units among variables in IR research
● 11 extent of indexable matter among variables in IR research
● 12 exhaustivity among variables in IR research
● 12a specificity among variables in IR research
● 13 browsability among variables in IR research
● 14 syntax among variables in IR research
● 15
● 16 vocabulary management among variables in IR research
● 17 surrogation among variables in IR research
● 18 conflation of variables in IR research
● 19 views of Cooper (William S.) on variables in IR research
● 20 conflation of variables in IR research
● 21 role of users in IR research at TREC
● 22 evidence from use of automatic indexing versus human indexing
● 23 user preferences for automatic indexing versus human indexing
● 24 effectiveness of automatic indexing
● 25 cost-benefit analysis of human indexing versus automatic indexing
● 26
8.2. Human Analysis for Indexing.
● 27 methods of human analysis for human indexing
● 28 cognitive processes in human indexing
● 29
● 30 role of documentary features in human indexing
● 31 cognitive processes in human indexing
● 32 analysis steps in human indexing
● 33 views of Mulvany (Nancy) on human indexing
● 34
● 35
● 36
● 37 cultural factors in human indexing versus automatic indexing
● 38 cultural factors in automatic indexing
● 39 views of Chan (Lois Mai) on human indexing
● 40
● 41
● 42 views of Chicago manual of style on human indexing
● 43 views of Fugmann (Robert) on human indexing
● 44 views of Soergel (Dagobert) on human indexing
● 45 views of Lancaster (F. W.) on human indexing
● 46 views of Fairthorne (Robert) on human indexing
● 47 views of O’Connor (Brian) on human indexing
● 48 views of Wellisch (Hans) on human indexing
● 49
● 50
● 51 views of Wilson (Patrick) on human indexing
● 52
● 53
● 54
● 55 concrete entity and event databases versus IR databases
● 56 views of Taylor (Arlene) on human indexing
● 57 views of Hjørland (Birger) on human indexing
● 58 activity theory: treatment of knowledge organization
● 59 paradigms of information science
● 60 role of domain analysis in information understanding
● 61 views of Hjørland (Birger) on nature of subjects
● 62
● 63 variability in human indexing
● 64 consistency in human indexing
● 65
● 66 inconsistency in searching
8.2.1. Cognition Versus Social Construction in Human Analysis and Indexing.
● 67 views of Frohmann (Bernd) on human indexing
● 68
● 69 views of Foskett (A. C.) on human indexing
● 70 views of Farradane (Jason) on human indexing
● 71
● 72 views of Beghtol (Clare) on human indexing
● 73 views of Anderson (James D.) on human indexing
● 74
● 75 views of Artandi (Susan) on human indexing
● 76 human indexing as model for automatic indexing
● 77 positive attributes of human indexing
● 78 application of views of Wittgenstein (Ludwig) to human indexing
● 79
● 80 application of views of Wittgenstein (Ludwig) to social construction of indexing rules
● 81 queer theory compared to indexing theory
● 82 queer theory
● 83 essentialism versus social constructionism in gender studies
● 84
● 85 role of gender in human indexing
● 86 social construction of gender
● 87 culture versus cognition in human indexing
● 88 views of Frohmann (Bernd) on social context of human indexing
● 89
8.2.2. Human Indexing Rules.
● 90 human indexing as two step process
● 91 rules for analysis in human indexing
● 92 standards for analysis in human indexing: British and international
● 93 guidelines for analysis in cataloging and classification at Rutgers University
● 94 subjective nature of guidelines for indexing
● 95 views of Hjørland (Birger) on guidelines for indexing
● 96 relation of subject scope and documentary scope to rules for human indexing
● 97 specialized rules for human indexing
● 98 rules for indexing for MLA international bibliography
● 99 rules for indexing about diesel engines by Ranganathan
● 100 role of specialized categories in human indexing
● 101 limitations of rules for human indexing
● 102 qualitative judgments in request-oriented human indexing
● 103 views of Frohmann (Bernd) on rules for human indexing
● 104 purposes of information retrieval for diverse users
● 105 domain analysis as basis for rules for human indexing
● 106 wants versus needs in information retrieval
● 107 political aspects of information retrieval
● 108 identification of non-topical features in human indexing;
bibliographic coupling and co-citation as basis for indexing
● 109
8.2.2.1. Human Indexing Rules for Image Text.
● 110 views of Jorgensen (Corinne) on indexing of image texts
● 111 views of Pérez-López (Kathleen Golitko) on automatic indexing of image texts
● 112 recommended resources on human indexing of image texts
● 113 terminology for image texts and sound texts
8.2.2.2. Human Indexing Rules Based on Probabilistic Analysis.
● 114 views of Frohmann (Bernd) on rules for human indexing of Cooper (William S.)
● 115 views of Cooper (William S.) on human indexing
● 116 decision theory, utility theory, and gedanken experimentation in rules for human indexing
● 117
● 118
● 119 odds-payoff indexing chart
Figure 8.1. Cooper’s odds-payoff indexing chart
“Possible format for a graphic aid to gedanken indexers.
The data are fictitious” (Cooper 1978, p. 117)
● 120
● 121
● 122
● 123
● 124 numerical values for decision making in human indexing
8.3. Automatic Indexing.
● 125 automatic indexing versus human searching
● 126 automatic indexing of language texts versus image texts and sound texts
● 127 indexing of image texts by Altavista web search engine
● 128 theoretical models for automatic indexing: vector-space model, probabilistic model
● 129 language model for automatic indexing
● 130 recommended resources on automatic indexing
8.3.1. In the Beginning Was the Word.
● 131 definitions of words in automatic indexing
● 132 definitions of words in Chinese language
● 133 treatment of punctuation in automatic indexing
● 134 treatment of hyphens in automatic indexing
● 135 treatment of slashes in automatic indexing
● 136 treatment of underscores and full stops (periods) in automatic indexing
● 137 treatment of parentheses in automatic indexing
● 138 treatment of apostrophes in automatic indexing
● 139 treatment of numbers in automatic indexing
● 140
● 141
● 142 treatment of single characters in automatic indexing
● 143 definition of words in automatic indexing
● 144 treatment of upper- and lower-case letters in automatic indexing
8.3.2. Simple Keyword Indexing.
● 145
8.3.3. Negative Vocabulary Control: Stop Lists.
● 146 stop lists for reducing size of indexes
● 147 choice of words for stop lists
● 148 number of words in stop lists
● 149 negative vocabulary control
8.3.4. Counting Words.
● 150 use of frequency of words for ranking texts
8.3.5. Comparative Counting and Weighting.
● 151 inverse document frequency of words
● 152 calculation of document weights
● 153
● 154
8.3.6. Improving the Count: Stemming.
● 155 impact of stemming on frequency of words
● 156 identification of word roots in stemming
● 157 stemming of plural “s” suffixes
● 158 stemming of multiple suffixes
● 159 impact of stemming
8.3.7. Natural Word Distributions.
● 160 Zipf’s law on distributions of words in texts
● 161 identification of keywords based on transition points in Zipfian distributions
● 162 automatic indexing compared to human indexing
● 163 Zipfian distribution of words in article by Booth (A. D.)
● 164 transition point in Zipfian distribution of words
● 165 identification of keywords based on transition points in Zipfian distributions
● 166 effectiveness of keywords
● 167 keywords based on Zipfian distributions compared to human indexing
● 168 incompatibility of human indexing compared to automatic indexing
● 169 automatic indexing compared to human indexing
● 170
● 171
● 172
● 173
● 174
● 175
● 176
● 177
8.3.8. Words Versus Phrases.
● 178 importance of phrases in automatic indexing
● 179 proper nouns in indexing
● 180 cost versus benefits in identification of phrases in automatic indexing
● 181 identification of phrases in automatic indexing and in searching
● 182
● 183
● 184
● 185 identification of phrases in automatic indexing
● 186 role of phrases in browsing
● 187
● 188
● 189
8.3.9. Managing Vocabulary in Automatic Indexing.
● 190
● 191 positive vocabulary management in automatic indexing
● 192 vocabulary management of equivalent and synonymous terms
● 193 vocabulary management of minor terms
● 194 vocabulary management in automatic indexing
● 195 vocabulary management for displayed indexes
● 196 vocabulary management for electronic searching
● 197 addition of terms to thesauri in automatic indexing
● 198 bypassing vocabulary management in electronic searching
8.3.10. Automatic Vocabulary Management.
● 199
● 200 Associative Interactive Dictionary as example of automatic vocabulary management
● 201 identification of related terms by co-occurrence
● 202 ranking of related terms by frequency of co-occurrence
● 203
● 204
● 205
● 206
● 207
● 208 impact of automatic vocabulary management
8.3.11. Clustering.
● 209 definitions of classing and clustering
● 210 criteria for clusters
● 211 clusters in searching
● 212 document similarity as basis for clustering
● 213 types of clusters: string clusters
● 214 star clusters
● 215 clique clusters
● 216 clump clusters
Figure 8.2. Types of clusters, based on Salton (1975a).
● 217 thresholds in automatic clustering
● 218 automatic clustering techniques: static clustering, dynamic clustering, scatter-gather clustering
● 219 static clustering
● 220
8.3.11.1. Latent Semantic Indexing.
● 221
● 222 vocabulary management in latent semantic indexing
8.3.12. Citation Indexes.
● 223 citation links to older documents
● 224 citation indexes to newer documents
8.3.12.1. Bibliographic Coupling.
● 225 definition of bibliographic coupling
● 226 bibliographic coupling compared to co-citation
8.3.12.2. Co-Citation.
● 227 definition of co-citation; identification of research fronts by co-citation
8.3.13. Relevance Feedback.
● 228 feedback in automatic indexing and in searching
● 229 purpose of relevance feedback
● 230
● 231 procedures in relevance feedback
● 232
● 233 relevance feedback in selective dissemination of information and filtering
● 234 role of human searching behavior in automatic indexing
● 235 pseudo relevance feedback
8.4. Subject Analysis and Indexing in Indexing and Abstracting Services.
● 236
● 237
● 238 MedIndEx as example of expert system for subject analysis and indexing
● 239 use of checktags in subject analysis and indexing
● 240 computer-aided subject analysis and indexing for indexing and abstracting services
8.5. Growing Role of Automatic Analysis and Indexing.
● 241 allocation of automatic indexing versus human indexing
● 242 allocation of human indexing to important documents
● 243
● 244 use of human indexing for identification of useful documents
● 245 views of Bates (Marcia J.) on role of human indexing
● 246
● 247 criteria for allocation of human indexing
8.5.1. Censorship or Guidance?
● 248 measures of use versus censorship
● 249 expert judgment versus use in evaluation of importance
● 250 selection of useful documents by advisory groups and indexing staff
● 251 expert judgment versus user preferences in IR database design
● 252 expert judgment in indexing
● 253 role of human indexers in assessments of authority
● 254 identification of contributions by human indexers
● 255 discovery of controversial documents
● 256 inequality of documents
● 257 application of expert judgment to world-wide web and internet
● 258 machines versus humans in indexing
8.6. Our Examples.
8.6.1. A Book Index.
● 259
8.6.2. An Indexing and Abstracting Service.
● 260
8.6.3. A Full-Text Encyclopedia/Digital Library.
Leave a Reply