Berkson's paradox also known as Berkson's bias or Berkson's fallacy is a result in conditional probability and statistics which is counterintuitive for some people, and hence a veridical paradox. It is a complicating factor arising in statistical tests of proportions. Specifically, it arises when there is an ascertainment bias inherent in a study design. The effect is related to the explaining away phenomenon in Bayesian networks.
It is often described in the fields of medical statistics or biostatistics, as in the original description of the problem by Joseph Berkson.
The result is that two independent events become conditionally dependent (negatively dependent) given that at least one of them occurs. Symbolically:
In words, given two independent events, if you only consider outcomes where at least one occurs, then they become negatively dependent.
The cause is that the conditional probability of event A occurring, given that it or B occurs, is inflated: it is higher than the unconditional probability, because we have excluded cases where neither occur.
One can see this in tabular form as follows: the gray regions are the outcomes where at least one event occurs (and ~A means "not A").
For instance, if one has a sample of 100, and both A and B occur independently half the time (So P(A) = P(B) = 1/2), one obtains:
So in 75 outcomes, either A or B occurs, of which 50 have A occurring, so
Thus the probability of A is higher in the subset (of outcomes where it or B occurs), 2/3, than in the overall population, 1/2. On the other hand, the probability of A given both B and C (equivalently, given both B and (A or B), which is the same thing as simply given B since B itself implies A or B), is simply the unconditional probability of A since A is independent of B. In the numerical example, we have conditioned on being in the top row: