The McDonald–Kreitman test is a statistical test often used by evolution and population biologists to detect and measure the amount of adaptive evolution within a species by determining whether adaptive evolution has occurred, and the proportion of substitutions that resulted from positive selection (also known as directional selection). To do this, the McDonald–Kreitman test compares the amount of variation within a species (polymorphism) to the divergence between species (substitutions) at two types of sites, neutral and nonneutral. A substitution refers to a nucleotide that is fixed within one species, but a different nucleotide is fixed within a second species at the same base pair of homologous DNA sequences. A site is nonneutral if it is either advantageous or deleterious. The two types of sites can be either synonymous or nonsynonymous within a protein-coding region. In a protein-coding sequence of DNA, a site is synonymous if a point mutation at that site would not change the amino acid, also known as a silent mutation. Because the mutation did not result in a change in the amino acid that was originally coded for by the protein-coding sequence, the phenotype, or the observable trait, of the organism is generally unchanged by the silent mutation. A site in a protein-coding sequence of DNA is nonsynonymous if a point mutation at that site results in a change in the amino acid, resulting in a change in the organism's phenotype. Typically, silent mutations in protein-coding regions are used as the "control" in the McDonald–Kreitman test.
In 1991, John H. McDonald and Martin Kreitman derived the McDonald–Kreitman test while performing an experiment with Drosophila (fruit flies) and their differences in amino acid sequence of the alcohol dehydrogenase gene. McDonald and Kreitman proposed this method to estimate the proportion of substitutions that are fixed by positive selection rather than by genetic drift.
In order to set up the McDonald–Kreitman test, we must first set up a two-way contingency table of our data on the species being investigated as shown below:
To quantify the values for Ds, Dn, Ps, and Pn, you count the number of differences in the protein-coding region for each type of variable in the contingency table.