Serial analysis of gene expression

Serial analysis of gene expression (SAGE) is a technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. The original technique was developed by Dr. Victor Velculescu at the Oncology Center of Johns Hopkins University and published in 1995. Several variants have been developed since, most notably a more robust version, LongSAGE, RL-SAGE and the most recent SuperSAGE. Many of these have improved the technique with the capture of longer tags, enabling more confident identification of a source gene.

Briefly, SAGE experiments proceed as follows:

The output of SAGE is a list of short sequence tags and the number of times it is observed. Using sequence databases a researcher can usually determine, with some confidence, from which original mRNA (and therefore which gene) the tag was extracted.

Statistical methods can be applied to tag and count lists from different samples in order to determine which genes are more highly expressed. For example, a normal tissue sample can be compared against a corresponding tumour to determine which genes tend to be more (or less) active.

Although SAGE was originally conceived for use in cancer studies, it has been successfully used to describe the transcriptome of other diseases and in a wide variety of organisms.

In 1979 teams at Harvard and Caltech extended the basic idea of making DNA copies of mRNAs in vitro to amplifying a library of such in bacterial plasmids

In 1982-3, the idea of selecting random or semi-random clones from such a cDNA library for sequencing was explored by Greg Sutcliffe and coworkers. and Putney et al. who sequenced 178 clones from a rabbit muscle cDNA library

In 1991 Adams and co-workers coined the term Expressed Sequence Tag (EST) and initiated more systematic sequencing of cDNAs as a project (starting with 600 brain cDNAs). The identification of ESTs has proceeded rapidly, with approximately 72.6 million ESTs now available in public databases (e.g. GenBank 11 May 2011, all species).

...
Wikipedia