Original author(s) | Burkhard Rost |
---|---|
Developer(s) | Guy Yachdav Laszlo Kajan |
Initial release | 1992 |
Stable release |
1.0.88
|
Operating system | UNIX-based |
Type | Bioinformatics |
License | GPLv2 |
Website | www |
PredictProtein (PP) is an automatic service that searches up-to-date public sequence databases, creates alignments, and predicts aspects of protein structure and function. Users send a protein sequence and receive a single file with results from database comparisons and prediction methods. PP went online in 1992 at the European Molecular Biology Laboratory; since 1999 it has operated from Columbia University and in 2009 it moved to the Technische Universität München. Although many servers have implemented particular aspects, PP remains the most widely used public server for structure prediction: over 1.5 million requests from users in 104 countries have been handled; over 13000 users submitted 10 or more different queries. PP web pages are mirrored in 17 countries on 4 continents. The system is optimized to meet the demands of experimentalists not experienced in bioinformatics. This implied that we focused on incorporating only high-quality methods, and tried to collate results omitting less reliable or less important ones.
The attempt to ‘pre-digest’ as much information as possible to simplify the ease of interpreting the results is a unique pillar of PP. For example, by default PP returns only those proteins found in the database that are very likely to have a similar structure to the query protein. Particular predictions, such as those for membrane helices, coiled-coil regions, signal peptides and nuclear localization signals, are not returned if found to be below given probability thresholds.
Users receive a single output file with the following results. Database searches: similar sequences are reported and aligned by a standard, pairwise BLAST, an iterated PSI-BLAST search. Although the pairwise BLAST searches are identical to those obtainable from the NCBI site, the iterated PSI-BLAST is performed on a carefully filtered database to avoid accumulating false positives during the iteration,. A standard search for functional motifs in the PROSITE database. PP now also identifies putative boundaries for structural domains through the CHOP procedure. Structure prediction methods: secondary structure, solvent accessibility and membrane helices predicted by the PHD and PROF programs, membrane strands predicted by PROFtmb, coiled-coil regions by COILS, and inter-residue contacts through PROFcon, low-complexity regions are marked by SEG and long regions with no regular secondary structure are identified by NORSp,. The PHD/PROF programs are only available through PP. The particular way in which PP automatically iterates PSI-BLAST searches and the way in which we decide what to include in sequence families is also unique to PP. The particular aspects of function that are currently embedded explicitly in PP are all somehow related to sub-cellular localization: we detect nuclear localization signals through PredictNLS, we predict localization independent of targeting signals through LOCnet; and annotations homology to proteins involved in cell-cycle control.