Rubin Causal Model

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Rubin's graduate school colleague, Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, though he discussed it only in the context of completely randomized experiments. Rubin, together with other contemporary statisticians, extended it into a general framework for thinking about causation in both observational and experimental studies.

The Neyman potential outcomes framework is based on the idea of potential outcomes and the assignment mechanism: every unit has different potential outcomes depending on their "assignment" to a condition. Potential outcomes are expressed in the form of counterfactual conditional statements, which state what would be the case conditional on a prior event occurring. For instance, a person would have a particular income at age 40 if they had attended a private college, whereas they would have a different income at age 40 had they attended a public college. To measure the causal effect of going to a public versus a private college, the investigator should look at the outcome for the same individual in both alternative futures. Since it is impossible to see both potential outcomes at once, one of the potential outcomes is always missing. This observation is described as the "fundamental problem of causal inference". A randomized experiment works by assigning people randomly to treatments (in this case, public or private college). Because the assignment was random, the groups are (on average) equivalent, and the difference in income at age 40 can be attributed to the college assignment since that was the only difference between the groups. The assignment mechanism is the explanation for why some units received the treatment and others the control.

Rubin, together with a number of other contributors such as Cochran, developed this approach into a formal framework for assessing causation in observational data. In such data, there is a non-random assignment mechanism: in the case of college attendance, people may choose to attend a private versus a public college based on their financial situation, parents' education, relative ranks of the schools they were admitted to, etc. If all of these factors can be balanced between the two groups of public and private college students, then the effect of the college attendance can be attributed to the college choice.

...
Wikipedia