Misunderstandings of p-values are an important problem in scientific research and scientific education. P-values are often used or interpreted incorrectly. The data obtained by comparing the p-value to a significance level will yield one of two results: either the null hypothesis is rejected (which however does not imply that the null hypothesis is false), or the null hypothesis cannot be rejected at that significance level (which however does not imply that the null hypothesis is true). In Ronald Fisher's formulation, there is a logical disjunction: a low p-value means either that the null hypothesis is true and a highly improbable event has occurred or that the null hypothesis is false.
The following list corrects several common misconceptions regarding p-values:
The p-value fallacy is a common misinterpretation of the p-value whereby a binary classification of hypotheses as true or false is made, based on whether or not the corresponding p-values are statistically significant. The term "p-value fallacy" was coined in 1999 by Steven N. Goodman.
This fallacy is contrary to the intent of the statisticians who originally supported the use of p-values in research. As described by Sterne and Smith, "An arbitrary division of results, into 'significant' or 'non-significant' according to the P value, was not the intention of the founders of statistical inference." In contrast, common interpretations of p-values discourage the ability to distinguish statistical results from scientific conclusions, and discourage the consideration of background knowledge such as previous experimental results. It has been argued that the correct use of p-values is to guide behavior, not to classify results, that is, to inform a researcher's choice of which hypothesis to accept, not to provide an inference about which hypothesis is true.
The p-value does not in itself allow reasoning about the probabilities of hypotheses, which requires multiple hypotheses or a range of hypotheses, with a prior distribution of likelihoods between them, in which case Bayesian statistics could be used. There, one uses a likelihood function for all possible values of the prior instead of the p-value for a single null hypothesis. The p-value describes a property of data when compared to a specific null hypothesis; it is not a property of the hypothesis itself. For the same reason, p-values do not give the probability that the data were produced by random chance alone.