Content | |
---|---|
Description | WormBase: a comprehensive resource for nematode research. |
Organisms | Caenorhabditis elegans |
Contact | |
Primary citation | PMID 19910365 |
Access | |
Website | http://www.wormbase.org/ |
WormBase is an online biological database about the biology and genome of the nematode model organism Caenorhabditis elegans and contains information about other related nematodes. WormBase is used by the C. elegans research community both as an information resource and as a place to publish and distribute their results. The database is regularly updated with new versions being released every two months. WormBase is one of the organizations participating in the Generic Model Organism Database (GMOD) project.
WormBase comprises the following main data sets:
In addition, WormBase contains an up-to-date searchable bibliography of C. elegans research and is linked to the WormBook project.
WormBase offers many ways of searching and retrieving data from the database:
Sequence curation at WormBase refers to the maintenance and annotation of the primary genomic sequence and a consensus gene set.
Even though the C. elegans genome sequence is the most accurate and complete eukaryotic genome sequence, it has continually needed refinement as new evidence has been created. Many of these changes were single nucleotide insertions or deletions, however several large mis-assemblies have been uncovered. For example, in 2005 a 39 kb cosmid had to be inverted. Other improvements have come from comparing genomic DNA to cDNA sequences and analysis of RNASeq high-throughput data. When differences between the genomic sequence and transcripts are identified, re-analysis of the original genomic data often leads to modifications of the genomic sequence. The changes in the genomic sequence pose difficulties when comparing chromosomal coordinates of data derived from different releases of WormBase. To aid these comparisons, a coordinate re-mapping program and data are available from:
All the gene-sets of the WormBase species were initially generated by gene prediction programs. Gene prediction programs give a reasonable set of gene structures, but the best of them only predict about 80% of the complete gene structures correctly. They have difficulty predicting genes with unusual structures, as well as those with a weak translation start signal, weak splice sites or single exon genes. They can incorrectly predict a coding gene model where the gene is a pseudogene and they predict the isoforms of a gene poorly, if at all.
The gene models of C. elegans, C. briggsae, C. remanei, and C. brenneri genes are manually curated. The majority of gene structure changes have been based on transcript data from large scale projects such as Yuji Kohara’s EST libraries, Mark Vidal’s Orfeome project (worfdb.dfci.harvard.edu/) Waterston and Hillier’s Illumina data and Makedonka Mitreva’s 454 data. However, other data types (e.g. protein alignments, ab initio prediction programs, trans-splice leader sites, poly-A signals and addition sites, SAGE and TEC-RED transcript tags, mass-spectroscopic peptides, and conserved protein domains) are useful in refining the structures, especially where expression is low and so transcripts are not sufficiently available. When genes are conserved between the available nematode species, comparative analysis can also be very informative.