Serial analysis of gene expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. Several variants have been developed since, most notably a more robust version, LongSAGE, RL-SAGE and the most recent SuperSAGE. Many of these have improved the technique with the capture of longer tags, enabling more confident identification of a source gene.
Briefly, SAGE experiments proceed as follows:
The output of SAGE is a list of short sequence tags and the number of times it is observed. Using sequence databases a researcher can usually determine, with some confidence, from which original mRNA (and therefore which gene) the tag was extracted.
Statistical methods can be applied to tag and count lists from different samples in order to determine which genes are more highly expressed. For example, a normal tissue sample can be compared against a corresponding tumour to determine which genes tend to be more (or less) active.
In 1979 teams at Harvard and Caltech extended the basic idea of making DNA copies of mRNAs in vitro to amplifying a library of such in bacterial plasmids. In 1982-3, the idea of selecting random or semi-random clones from such a cDNA library for sequencing was explored by Greg Sutcliffe and coworkers. and Putney et al. who sequenced 178 clones from a rabbit muscle cDNA library. In 1991 Adams and co-workers coined the term Expressed Sequence Tag (EST) and initiated more systematic sequencing of cDNAs as a project (starting with 600 brain cDNAs). The identification of ESTs proceeded rapidly, millions of ESTs now available in public databases (e.g. GenBank).
In 1995, the idea of reducing the tag length from 100 to 800 bp down to tag length of 10 to 22 bp helped reduce the cost of mRNA surveys. In this year, the original SAGE protocol was published by Victor Velculescu at the Oncology Center of Johns Hopkins University. Although SAGE was originally conceived for use in cancer studies, it has been successfully used to describe the transcriptome of other diseases and in a wide variety of organisms.