Galton's problem

Galton's problem, named after Sir Francis Galton, is the problem of drawing inferences from cross-cultural data, due to the statistical phenomenon now called . The problem is now recognized as a general one that applies to all nonexperimental studies and to experimental design as well. It is most simply described as the problem of external dependencies in making statistical estimates when the elements sampled are not statistically independent. Asking two people in the same household whether they watch TV, for example, does not give you statistically independent answers. The sample size, n, for independent observations in this case is one, not two. Once proper adjustments are made that deal with external dependencies, then the axioms of probability theory concerning statistical independence will apply. These axioms are important for deriving measures of variance, for example, or tests of statistical significance.

In 1888, Galton was present when Sir Edward Tylor presented a paper at the Royal Anthropological Institute. Tylor had compiled information on institutions of marriage and descent for 350 cultures and examined the correlations between these institutions and measures of societal complexity. Tylor interpreted his results as indications of a general evolutionary sequence, in which institutions change focus from the maternal line to the paternal line as societies become increasingly complex. Galton disagreed, pointing out that similarity between cultures could be due to borrowing, could be due to common descent, or could be due to evolutionary development; he maintained that without controlling for borrowing and common descent one cannot make valid inferences regarding evolutionary development. Galton's critique has become the eponymous Galton's Problem, as named by Raoul Naroll, who proposed the first statistical solutions.

By the early 20th century unilineal evolutionism was abandoned and along with it the drawing of direct inferences from correlations to evolutionary sequences. Galton's criticisms proved equally valid, however, for inferring functional relations from correlations. The problem of autocorrelation remained.

Statistician William S. Gosset in 1914 developed methods of eliminating spurious correlation due to how position in time or space affects similarities. Today's election polls have a similar problem: the closer the poll to the election, the less individuals make up their mind independently, and the greater the unreliability of the polling results, especially the margin of error or confidence limits. The effective n of independent cases from their sample drops as the election nears. Statistical significance falls with lower effective sample size.

...
Wikipedia