*** Welcome to piglix ***

Subject indexing


Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. In other words, it is about identifying and describing the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document such as a book; objects in a collection such as a library; and documents (such as books and articles) within a field of knowledge.

Subject indexing is used in information retrieval especially to create bibliographic indexes to retrieve documents on a particular subject. Examples of academic indexing services are Zentralblatt MATH, Chemical Abstracts and PubMed. The index terms were mostly assigned by experts but author keywords are also common.

The process of indexing begins with any analysis of the subject of the document. The indexer must then identify terms which appropriately identify the subject either by extracting words directly from the document or assigning words from a controlled vocabulary. The terms in the index are then presented in a systematic order.

Indexers must decide how many terms to include and how specific the terms should be. Together this gives a depth of indexing.

The first step in indexing is to decide on the subject matter of the document. In manual indexing, the indexer would consider the subject matter in terms of answer to a set of questions such as "Does the document deal with a specific product, condition or phenomenon?". As the analysis is influenced by the knowledge and experience of the indexer, it follows that two indexers may analyze the content differently and so come up with different index terms. This will impact on the success of retrieval.

Automatic indexing follows set processes of analyzing frequencies of word patterns and comparing results to other documents in order to assign to subject categories. This requires no understanding of the material being indexed. This therefore leads to more uniform indexing but this is at the expense of the true meaning being interpreted. A computer program will not understand the meaning of statements and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus their attention on certain parts of the document such as the title, abstract, summary and conclusions, as analyzing the full text in depth is costly and time consuming An automated system takes away the time limit and allows the entire document to be analyzed, but also has the option to be directed to particular parts of the document.


...
Wikipedia

...