*** Welcome to piglix ***

Spurious correlation of ratios


In statistics, spurious correlation of ratios is a form of spurious correlation that arises between ratios of absolute measurements which themselves are uncorrelated.

The phenomenon of spurious correlation of ratios is one of the main motives for the field of compositional data analysis, which deals with the analysis of variables that carry only relative information, such as proportions, percentages and parts-per-million.

Spurious correlation is distinct from misconceptions about correlation and causality.

Pearson states a simple example of spurious correlation:

Select three numbers within certain ranges at random, say x, y, z, these will be pair and pair uncorrelated. Form the proper fractions x/y and z/y for each triplet, and correlation will be found between these indices.

The scatter plot on the right illustrates this example using 500 observations of x, y, and z. Variables x, y and z are drawn from normal distributions with means 10, 10, and 30, respectively, and standard deviations 1, 1, and 3 respectively, i.e.,

Even though x, y, and z are statistically independent and therefore uncorrelated, in the depicted typical sample the ratios x/z and y/z have a correlation of 0.53. This is because of the common divisor (z) and can be better understood if we colour the points in the scatter plot by the z-value. Trios of (xyz) with relatively large z values tend to appear in the bottom left of the plot; trios with relatively small z values tend to appear in the top right.

Pearson derived an approximation of the correlation that would be observed between two indices ( and ), i.e., ratios of the absolute measurements :


...
Wikipedia

...