The interdisciplinary research field of Computational and Statistical Genetics uses the latest approaches in genomics, quantitative genetics, computational sciences, bioinformatics and statistics to develop and apply computationally efficient and statistically robust methods to sort through increasingly rich and massive genome wide data sets to identify complex genetic patterns, gene functionalities and interactions, disease and phenotype associations involving the genomes of various organisms. This field is also often referred to as computational genomics. This is an important discipline within the umbrella field computational biology.
During the last two decades, there has been a great interest in understanding the genetic and genomic makeup of various species, including humans primarily aided by the different genome sequencing technologies to read the genomes that has been rapidly developing. However, these technologies are still limited, and computational and statistical methods are a must to detect and process errors and put together the pieces of partial information from the sequencing and genotyping technologies.
A haplotype is defined the sequence of nucleotides (A,G,T,C) along a single chromosome. In humans, we have 23 pairs of chromosomes. Another example is maize which is also a diploid with 10 pairs of chromosomes. However, with current technology, it is difficult to separate the two chromosomes within a pair and the assays produce the combined haplotype, called the genotype information at each nucleotide. The objective of haplotype phasing is to find the phase of the two haplotypes given the combined genotype information. Knowledge of the haplotypes is extremely important and not only gives us a complete picture of an individuals genome, but also aids other computational genomic processes such as Imputation among many significant biological motivations.
For diploid organisms such as humans and maize, each organism has two copies of a chromosome - one each from the two parents. The two copies are highly similar to each other. A haplotype is the sequence of nucleotides in a chromosome. the haplotype phasing problem is focused on the nucleotides where the two homologous chromosomes differ. Computationally, for a genomic region with K differing nucleotide sites, there are 2^K - 1 possible haplotypes, so the phasing problem focuses on efficiently finding the most probable haplotypes given an observed genotype. For more information, see Haplotype.