Content | |
---|---|
Description | The Rfam database provides alignments, consensus secondary structures and covariance models for RNA families. |
Data types captured |
RNA families |
Organisms | all |
Contact | |
Research center | EBI |
Primary citation | PMID 23125362 |
Access | |
Data format | |
Website | rfam |
Download URL | FTP |
Miscellaneous | |
License | Public domain |
Bookmarkable entities |
yes |
Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.
Unlike proteins, ncRNAs often have similar secondary structure without sharing much similarity in the primary sequence. Rfam divides ncRNAs into families based on evolution from a common ancestor. Producing multiple sequence alignments (MSA) of these families can provide insight into their structure and function, similar to the case of protein families. These MSAs become more useful with the addition of secondary structure information. Rfam researchers also contribute to 's .
The interface at the Rfam website allows users to search ncRNAs by keyword, family name, or genome as well as to search by ncRNA sequence or EMBL accession number. [1] The database information is also available for download, installation and use using the INFERNAL software package. The INFERNAL package can also be used with Rfam to annotate sequences (including complete genomes) for homologues to known ncRNAs.
In the database, the information of the secondary structure and the primary sequence, represented by the MSA, is combined in statistical models called profile (SCFGs), also known as covariance models. These are analogous to hidden Markov models used for protein family annotation in the Pfam database. Each family in the database is represented by two multiple sequence alignments in and a SCFG.