*** Welcome to piglix ***

Cambridge English Corpus


The Cambridge English Corpus (formerly the Cambridge International Corpus) is a multi-billion word corpus of English language (containing both text corpus and spoken corpus data). The Cambridge English Corpus (CEC) contains data from a number of sources including written and spoken, British and American English. The CEC also contains the Cambridge Learner Corpus, a 40m word corpus made up from English exam responses written by English language learners.

The Cambridge English Corpus is used to inform Cambridge University Press English Language Teaching publications as well as for research in corpus linguistics. Access is currently restricted to authors and researchers working on projects and publications for Cambridge University Press, and researchers at Cambridge English Language Assessment.

The Cambridge English Corpus contains instances of modern written English, taken from newspapers, magazines, novels, letters, emails, textbooks, websites, and many other sources.

The Cambridge English Corpus contains a wide variety of spoken English language, taken from many sources, including everyday conversations, telephone calls, radio broadcasts, presentations, speeches, meetings, TV programmes and lectures.

The Cambridge Learner Corpus (CLC) is a collection of exam scripts written by students learning English, built in collaboration with Cambridge English Language Assessment. The CLC contains scripts from over 180,000 students, from around 200 countries, speaking 138 different first languages and is growing all the time. The exams currently included are:

A unique feature of the Cambridge Learner Corpus is its error coding system. Language specialists identify and annotate errors in the exam scripts. This means that the Corpus can be used to find out about the frequency of different types of errors, the contexts that the errors are made in and the student groups that find particular language areas difficult.

Authors of Cambridge English Language Teaching resources can use this information to target common errors – for example, the Cambridge Advanced Learner’s Dictionary contains ‘Common mistake’ features which highlight frequent learner errors.


...
Wikipedia

...