Statistical power

The power or sensitivity of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H₀) when the alternative hypothesis (H₁) is true. It can be equivalently thought of as the probability of accepting the alternative hypothesis (H₁) when it is true—that is, the ability of a test to detect an effect, if the effect actually exists. That is,

The power of a test sometimes, less formally, refers to the probability of rejecting the null when it is not correct, though this is not the formal definition stated above. The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis. As the power increases, there are decreasing chances of a Type II error (false negative), which are also referred to as the false negative rate (β) since the power is equal to 1−β, again, under the alternative hypothesis. A similar concept is Type I error, also referred to as the “false positive rate” or the level of a test under the null hypothesis.

Power analysis can be used to calculate the minimum sample size required so that one can be reasonably likely to detect an effect of a given size. For example: “how many times do I need to toss a coin to conclude it is rigged?” Power analysis can also be used to calculate the minimum effect size that is likely to be detected in a study using a given sample size. In addition, the concept of power is used to make comparisons between different statistical testing procedures: for example, between a parametric and a nonparametric test of the same hypothesis.

Statistical tests use data from samples to assess, or make inferences about, a statistical population. In the concrete setting of a two-sample comparison, the goal is to assess whether the mean values of some attribute obtained for individuals in two sub-populations differ. For example, to test the null hypothesis that the mean scores of men and women on a test do not differ, samples of men and women are drawn, the test is administered to them, and the mean score of one group is compared to that of the other group using a statistical test such as the two-sample z-test. The power of the test is the probability that the test will find a statistically significant difference between men and women, as a function of the size of the true difference between those two populations.

...
Wikipedia