Developer(s) | Johannes Söding, Michael Remmert, Andreas Biegert, Andreas Hauser, Markus Meier, Martin Steinegger |
---|---|
Stable release |
2.0.16 / 18 February 2013
|
Written in | C++ |
Available in | English |
Type | Bioinformatics tool |
License | GPL v3 |
Website | https://github.com/soedinglab/hh-suite |
The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. These sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from their sequence.
Since the sequence of the human genome was determined in 2000, sequencing costs have dropped dramatically. Sequences for thousands of bacteria and hundreds of animals, plants, and fungi are filling the public sequence databases, but few of the functions of all the proteins encoded in these sequences are known. Even out of the approximately 20 000 human proteins, a large fraction of structures and functions remain unknown. To predict the function, structure, or other properties of a protein for which only its sequence of amino acids is known, the protein sequence is compared to the sequences of other proteins in public databases. If a protein with sufficiently similar sequence is found, the two proteins are likely to be evolutionarily related ("homologous"). In that case, they are likely to share similar structures and functions. Therefore, if a sufficiently similar protein with known functions and/or structure can be found by the sequence search, the unknown protein's functions, structure, and domain composition can be predicted. Such predictions greatly facilitate the determination of the function or structure by targeted validation experiments.
The HH-suite HHsearch contains HHsearch and HHblits among other programs and utilities. HHsearch is among the most popular methods for the detection of remotely related sequences and for protein structure prediction, having been cited over 600 times in Google Scholar. The HHsearch and HHblits programs owe their power to the fact that both the query and the database sequences are represented by multiple sequence alignments (MSAs). In these MSAs, the query or database sequence is written in a table together with homologous (related) sequences in such a way that each column contains homologous amino acid residues, that is, residues that have descended from the same residue in the ancestral sequence. The frequencies of amino acids in the columns of such an MSA can be interpreted as probabilities to observe an amino acid in a further homologous sequence at that position. To facilitate automatic scoring of potential sequences for their relatedness to the sequences in the MSA, the MSAs are succinctly described by profile hidden Markov models (HMMs). These are extensions of position-specific scoring matrices (PSSMs). The core algorithms for HMM-HMM alignment give HH-suite its name.