*** Welcome to piglix ***

Corpus linguistics


Corpus linguistics is the study of language as expressed in corpora (samples) of "real world" text. The text-corpus method is a digestive approach for deriving a set of abstract rules, from a text, for governing a natural language, and how that language relates to and with another language; originally derived manually, corpora now are automatically derived from the source texts. Corpus linguistics proposes that reliable language analysis is more feasible with corpora collected in the field, in their natural contexts, and with minimal experimental-interference.

The field of Corpus Linguistics features divergent views about the value of corpus annotation, ranging from John McHardy Sinclair, who advocates minimal annotation, and so allow texts to speak for themselves; to the Survey of English Usage team (University College, London) who advocate annotation as allowing greater linguistic understanding, by way of rigorous recording.

Some of the earliest efforts at grammatical description were based at least in part on corpora of particular religious or cultural significance. For example, Prātiśākhya literature described the sound patterns of Sanskrit as found in the Vedas, and Pāṇini's grammar of classical Sanskrit was based at least in part on analysis of that same corpus. Similarly, the early Arabic grammarians paid particular attention to the language of the Quran. In the Western European tradition, scholars prepared concordances to allow detailed study of the language of the Bible and other canonical texts.

A landmark in modern corpus linguistics was the publication by Henry Kučera and W. Nelson Francis of Computational Analysis of Present-Day American English in 1967, a work based on the analysis of the Brown Corpus, a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources. Kučera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, language teaching, psychology, statistics, and sociology. A further key publication was Randolph Quirk's 'Towards a description of English Usage' (1960) in which he introduced The Survey of English Usage.


...
Wikipedia

...