Protein engineering is the process of developing useful or valuable proteins. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It is also a product and services market, with an estimated value of $168 billion by 2017.
There are two general strategies for protein engineering: rational protein design and directed evolution. These methods are not mutually exclusive; researchers will often apply both. In the future, more detailed knowledge of protein structure and function, and advances in high-throughput screening, may greatly expand the abilities of protein engineering. Eventually, even unnatural amino acids may be included, via newer methods, such as expanded genetic code, that allow encoding novel amino acids in genetic code.
In rational protein design, a scientist uses detailed knowledge of the structure and function of a protein to make desired changes. In general, this has the advantage of being inexpensive and technically easy, since site-directed mutagenesis methods are well-developed. However, its major drawback is that detailed structural knowledge of a protein is often unavailable, and, even when available, it can be very difficult to predict the effects of various mutations.
Computational protein design algorithms seek to identify novel amino acid sequences that are low in energy when folded to the pre-specified target structure. While the sequence-conformation space that needs to be searched is large, the most challenging requirement for computational protein design is a fast, yet accurate, energy function that can distinguish optimal sequences from similar suboptimal ones.
Without structural information about a protein, sequence analysis is often useful in elucidating information about the protein. These techniques involve alignment of target protein sequences with other related protein sequences. This alignment can show which amino acids are conserved between species and are important for the function of the protein. These analyses can help to identify hot spot amino acids that can serve as the target sites for mutations. Multiple sequence alignment utilizes data bases such as PREFAB, SABMARK, OXBENCH, IRMBASE, and BALIBASE in order to cross reference target protein sequences with known sequences. Multiple sequence alignment techniques are listed below.