In statistics, a confounding variable (also confounding factor, a confound, a lurking variable or a confounder) is a variable in a statistical model that correlates (directly or inversely) with both the dependent variable and an independent variable, in a way that "explains away" some or all of the correlation between these two variables.
While specific definitions may vary, in essence a confounding variable fits the following four criteria, here given in a hypothetical situation with variable of interest "V", confounding variable "C" and outcome of interest "O":
The preceding correlation-based definition, however, is metaphorical at best – a growing number of analysts agree that confounding is a causal concept, and as such, cannot be described in terms of correlations nor associations (see causal definition).
The concept of confounding must be defined, and managed, in terms of the data generating model (as in the Figure above). Specifically, let X be some independent variable, Y some dependent variable. To estimate the effect of X on Y, the statistician must suppress the effects of extraneous variables that influence both X and Y. We say that, X and Y are confounded by some other variable Z whenever Z is a cause of both X and Y.
In the causal framework, denote as the probability of event Y = y under the hypothetical intervention X = x. X and Y are not confounded if and only if the following holds: