Don't miss the piglix.com special BONUS offer during our Beta-test period. The next 100 new Registered Users (from a unique IP address), to post at least five (5) piglix, will receive 1,000 extra sign-up points (eventually exchangeable for crypto-currency)!

* * * * *    Free piglix.com Launch Promotions    * * * * *

  • Free Ads! if you are a small business with annual revenues of less than $1M - piglix.com will place your ads free of charge for up to one year! ... read more

  • $2,000 in free prizes! piglix.com is giving away ten (10) Meccano Erector sets, retail at $200 each, that build a motorized Ferris Wheel (or one of 22 other models) ... see details

Text Encoding Initiative


The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains an eponymous technical standard, a journal, a , a SourceForge repository and a toolchain.

The TEI Guidelines, which collectively define an XML format, are the defining output of the community of practice. The format differs from other well-known open formats for text (such as HTML and OpenDocument) in that it's primarily semantic rather than presentational; the semantics and interpretation of every tag and attribute are specified. Some 500 different textual components and concepts (word,sentence,character,glyph,person, etc.); each is grounded in one or more academic discipline and examples are given.

The standard is split into two parts, a discursive textual description with extended examples and discussion and set of tag-by-tag definitions. Schemata in most of the modern formats (DTD, RELAX NG and W3C Schema) are generated automatically from the tag-by-tag definitions. A number of tools support the production of the guidelines and the application of the guidelines to specific projects.

A number of special tags are used to circumvent restrictions imposed by the underlying Unicode; glyph to allow representation of characters that don't qualify for Unicode inclusion and choice to allow overcome the required strict linearity.

Most users of the format do not use the complete range of tags but produce a customisation, using a project-specific subset of the tags and attributes defined by the Guidelines. The TEI defines a sophisticated customization mechanism known as ODD for this purpose. In addition to documenting and describing each TEI tag, an ODD specification specifies its content model and other usage constraints, which may be expressed using schematron.


Project URL Strengths
British National Corpus http://www.natcorp.ox.ac.uk 100 million word snapshot of current English
Oxford Text Archive http://ota.ox.ac.uk/ >1 GB of Linguistic data and electronic texts in 25 languages
Perseus Project http://www.perseus.tufts.edu/ Greek and Latin texts
EpiDoc http://epidoc.sourceforge.net/ Epigraphy and Papyrology
Women Writers Project http://www.wwp.northeastern.edu/ Early modern women writers (Margaret Cavendish, Eliza Haywood, etc.)
New Zealand Electronic Text Centre http://www.nzetc.org/ New Zealand and Pacific Islands texts
The SWORD Project http://www.crosswire.org/sword/ Bible software, dictionaries, Christian literature
FreeDict http://freedict.org Bilingual dictionaries
Text Creation Partnership http://www.lib.umich.edu/tcp/ Early English and American books

  • 1987 Work on what would become the TEI started by the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing. This culminated in the Closing statement of the Vassar Planning Conference
  • 1994 TEI P3 released co-edited by Lou Burnard (at Oxford University) and Michael Sperberg-McQueen (then at the University of Illinois at Chicago, later at the W3C).
  • 1999 TEI P3 updated.
  • 2002 TEI P4 released, moving from SGML to XML; adoption of Unicode, which XML parsers are required to support.
  • 2007 TEI P5 released, including integration with the xml:lang and xml:id attributes from the W3C (these had previously been attributes in the TEI namespace), regularization of local pointing attributes to use the hash (as used in HTML) and unification of the ptr and xptr tags. Together these changes with many more new additions make P5 more regular and bring it closer to current xml practice as promoted by the W3C and as used by other XML variants. Maintenance and feature update versions of TEI P5 have been released at least twice a year since 2007.
  • 2011 TEI P5 v2.0.1 released with support for Genetic editing. (among many other additions the Genetic editing features allow encoding of texts without interpretation as to their specific semantics.)
...
Wikipedia

1,000 EXTRA POINTS!

Don't forget! that as one of our early users, you are eligible to receive the 1,000 point bonus as soon as you have created five (5) acceptable piglix.

...