The Vertebrate and Genome Annotation (VEGA) project is a biological database dedicated to assisting researchers in locating specific areas of the genome and annotating genes or regions of vertebrate genomes. The VEGA browser is based on Ensembl web code and infrastructure and provides a public curation of known vertebrate genes for the scientific community. The VEGA website is updated frequently to maintain the most current information about vertebrate genomes and attempts to present consistently high-quality annotation of all its published vertebrate genomes or genome regions. VEGA was developed by the Wellcome Trust Sanger Institute and is in close association with other annotation databases, such as ZFIN (The Zebrafish Information Network), the Havana Group and GenBank. Manual annotation is currently more accurate at identifying splice variants, pseudogenes, polyadenylation features, non-coding regions and complex gene arrangements than automated methods.
The Vertebrate Genome Annotation (VEGA) database was first made public in 2004 by the Wellcome Trust Sanger Institute. It was designed to view manual annotations of human, mouse and zebrafish genomic sequences, and it is the central cache for genome sequencing centers to deposit their annotation of human chromosomes. Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI) are now being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. The HAVANA and VEGA Projects are currently being run by Dr. Jennifer Harrow of the Wellcome Sanger Institute.
The Vega database is the central repository for the majority of genome sequencing centers to deposit their annotation of human chromosomes. Since the original VEGA publication, the number of human gene loci annotated has more than doubled to over 49,000 (September 2012 release), over 20,000 of which are predicted to be protein coding. The Havana Group as part of the consensus-coding sequence (CCDS) collaboration and whole-genome extension of the ENCODE project have fully manually annotated the human genome—which is available for reference, comparative analysis and sequence searches on the VEGA database.