HKA test

The HKA Test, named after Richard R. Hudson, Martin Kreitman, and Montserrat Aguadé, is a statistical test used in genetics to evaluate the predictions of the Neutral Theory of molecular evolution. By comparing the polymorphism within each species and the divergence observed between two species at two or more loci, the test can determine whether the observed difference is likely due to neutral evolution or rather due to adaptive evolution. Developed in 1989, the HKA test is a precursor to the McDonald-Kreitman test, which was derived in 1991. The HKA test is best used to look for balancing selection, recent selective sweeps or other variation-reducing forces.

Neutral Evolution Theory, first proposed by Kimura in a 1968 paper, and later fully defined and published in 1983, is the basis for many statistical tests that detect selection at the molecular level. Kimura noted that there was much too high of a rate of mutation within the genome (i.e. high polymorphism) to be strictly under directional evolution. Furthermore, functionally less important regions of the genome evolve at a faster rate. Kimura then postulated that most of the modifications to the genome are neutral or nearly neutral, and evolve by random genetic drift. Therefore, under the neutral model, polymorphism within a species and divergence between related species at homologous sites will be highly correlated. The Neutral Evolution theory has become the null model against which tests for selection are based, and divergence from this model can be explained by directional or selective evolution.

The rate of mutation within a population can be estimated using the Watterson estimator formula: θ=4Ν_eμ, where Ν_e is the effective population size and μ is the mutation rate (substitutions per site per unit of time). Hudson et al. proposed applying these variables to a chi-squared, goodness-of-fit test.

The test statistic proposed by Hudson et al., Χ², is:

This states that, for each locus (L) (for which there must be at least two) the sum of the difference in number of observed polymorphic sites in sample A minus the estimate of expected polymorphism squared, all of which is divided by the variance. Similarly, this formula is then applied to Sample B (from another species) and then can be applied to the divergence between two sample species. The sum of these three variables is the test statistic (X²). If the polymorphism within species A, and B, and the divergence between them are all independent, then the test statistic should fall approximately onto a chi-squared distribution.

...
Wikipedia