British National Corpus

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time.

The project to create the BNC involved the collaboration of three publishers (with the Oxford University Press as the lead collaborator, Longman and W. & R. Chambers), two universities (the University of Oxford and Lancaster University), and the British Library. The creation of the BNC started in 1991 under the management of the BNC consortium, and the project was finished by 1994. There have been no additions of new samples after 1994, but the BNC underwent slight revisions before the release of the second edition BNC World (2001) and the third edition BNC XML Edition (2007).

The BNC was the vision of computational linguists whose goal was a corpus of modern (at the time of building the corpus), naturally occurring language in the form of speech and text or writing that could be analyzed by a computer. Hence, it was compiled as a general corpus to pave the way for automatic search and processing in the field of corpus linguistics. One of the ways the BNC was to be differentiated from existing corpora at that time was to open up the data not just to academic research, but also to commercial and educational uses.

The corpus was restricted to just British English, and was not extended to cover World Englishes. This was partly because a significant portion of the cost of the project was being funded by the British government which was logically interested in supporting documentation of its own linguistic variety. Because of its potentially unprecedented size, the BNC required funds from the commercial and academic institutions as well. In turn, BNC data then became available for commercial and academic research.

...
Wikipedia