Post hoc theorizing

In statistics, hypotheses suggested by a given dataset, when tested with the same dataset that suggested them, are likely to be accepted even when they are not true. This is because circular reasoning (double dipping) would be involved: something seems true in the limited data set, therefore we hypothesize that it is true in general, therefore we (wrongly) test it on the same limited data set, which seems to confirm that it is true. Generating hypotheses based on data already observed, in the absence of testing them on new data, is referred to as post hoc theorizing (from Latin post hoc, "after this").

The correct procedure is to test any hypothesis on a data set that was not used to generate the hypothesis.

Suppose fifty different researchers run clinical trials to test whether Vitamin X is efficacious in treating cancer. The vast majority of them find no significant differences between measurements done on patients who have taken Vitamin X and those who have taken a placebo. However, due to statistical noise, one study finds a significant correlation between taking Vitamin X and being cured from cancer.

Taking into account all 50 studies as a whole, the only conclusion that could be made with great certainty is that there remains no evidence that Vitamin X has any effect on treating cancer. However, someone trying to achieve greater publicity for the one outlier study could try to create a hypothesis suggested by the data, by finding some aspect unique to that one study, and claiming that this aspect is the key to its differing results. Suppose, for instance, that this study was the only one conducted in Denmark. It could be claimed that this set of 50 studies shows that Vitamin X is more efficacious in Denmark than elsewhere. However, while the data do not contradict this hypothesis, they do not strongly support it either. Only one or more additional studies could bolster this additional hypothesis.

Testing a hypothesis suggested by the data can very easily result in false positives (type I errors). If one looks long enough and in enough different places, eventually data can be found to support any hypothesis. Yet, these positive data do not by themselves constitute evidence that the hypothesis is correct. The negative test data that were thrown out are just as important, because they give one an idea of how common the positive results are compared to chance. Running an experiment, seeing a pattern in the data, proposing a hypothesis from that pattern, then using the same experimental data as evidence for the new hypothesis is extremely suspect, because data from all other experiments, completed or potential, has essentially been "thrown out" by choosing to look only at the experiments that suggested the new hypothesis in the first place.

...
Wikipedia