*** Welcome to piglix ***

Sketch Engine

Sketch Engine
Sketch Engine logo.svg
Logo of Sketch Engine
Concordance in Sketch Engine.png
Sketch Engine concordance page
Original author(s) Adam Kilgarriff, Pavel Rychlý
Developer(s) Lexical Computing Ltd.
Initial release 23 July 2003; 13 years ago (2003-07-23)
Development status Active
Written in C++, Python, JavaScript, jQuery
Operating system Linux, Mac OS X
Platform IA-32, x64 or IA-64
Standard(s) Unicode
Available in 12 languages
Type Corpus manager, Database management systems
License Proprietary software; both commercial and freeware editions are available
Website www.sketchengine.co.uk

Sketch Engine is a corpus manager and analysis software developed by Lexical Computing Limited since 2003. Its purpose is to enable people studying language behaviour (lexicographers, researchers in corpus linguistics, translators or language learners) to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour.

Sketch Engine is a product of Lexical Computing Limited, a company founded in 2003 by the lexicographer and research scientist Adam Kilgarriff. He started collaboration with developer of Manatee and Bonito Pavel Rychlý, a computer scientist working at the Natural Language Processing Centre at Masaryk University and introduced the concept of word sketches.

Since then, Sketch Engine is a commercial software, however all the core features of Manatee and Bonito that were developed by 2003 (and extended since then) are freely available under the GPL license within the NoSketch Engine suite.

Sketch Engine consists of three main components: an underlying database management system called Manatee, a web interface search front-end called Bonito and a web interface for corpus building and management called Corpus Architect.

Manatee is a database management system specifically devised for effective indexing of large text corpora. It is based on the idea of inverted indexing (keeping an index of all positions of a given word in the text). It has been used for indexing of billion-word-size text corpora.

Searching corpora indexed by Manatee is performed by formulating queries in the Corpus Query Language (CQL).

Manatee is written in C++ and has API available for a number of other programming languages including Python, Java, Perl and Ruby. Recently, it was rewritten into Go for faster processing of corpus queries.


...
Wikipedia

...