*** Welcome to piglix ***

Pfam

Pfam
Pfam logo.gif
Content
Description The Pfam database provides alignments and hidden Markov models for protein domains.
Data types
captured
Protein families
Organisms all
Contact
Research center EBI
Primary citation PMID 19920124
Access
Data format
Website pfam.xfam.org
Download URL FTP 1 FTP 2
Miscellaneous
License GNU Lesser General Public License
Version 30.0
Bookmarkable
entities
yes

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 30.0, was released in July 2016 and contains 16,306 families.

The general purpose of the Pfam database is to provide a complete and accurate classification of protein families and domains. Originally, the rationale behind creating the database was to have a semi-automated method of curating information on known protein families to improve the efficiency of annotating genomes. The Pfam classification of protein families has been widely adopted by biologists because of its wide coverage of proteins and sensible naming conventions.

It is used by experimental biologists researching specific proteins, by structural biologists to identify new targets for structure determination, by computational biologists to organise sequences and by evolutionary biologists tracing the origins of proteins. Early genome projects, such as human and fly used Pfam extensively for functional annotation of genomic data.

The Pfam website allows users to submit protein or DNA sequences to search for matches to families in the database. If DNA is submitted, a six-frame translation is performed, then each frame is searched. Rather than performing a typical BLAST search, Pfam uses profile hidden Markov models, which give greater weight to matches at conserved sites, allowing better remote homology detection, making them more suitable for annotating genomes of organisms with no well-annotated close relatives.

Pfam has also been used in the creation of other resources such as iPfam, which catalogs domain-domain interactions within and between proteins, based on information in structure databases and mapping of Pfam domains onto these structures.

For each family in Pfam one can:

Entries can be of several types: family, domain, repeat or motif. Family is the default class, which simply indicates that members are related. Domains are defined as an autonomous structural unit or reusable sequence unit that can be found in multiple protein contexts. Repeats are not usually stable in isolation, but rather are usually required to form tandem repeats in order to form a domain or extended structure. Motifs are usually shorter sequence units found outside of globular domains.


...
Wikipedia

...