Algorithmic probability

In algorithmic information theory, algorithmic (Solomonoff) probability is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the prior obtained by this formula, in Bayes' rule for prediction.

In the mathematical formalism used, the observations have the form of finite binary strings, and the universal prior is a probability distribution over the set of finite binary strings. The prior is universal in the Turing-computability sense, i.e. no string has zero probability. It is not computable, but it can be approximated.

Algorithmic probability deals with the questions: Given a body of data about some phenomenon that we want to understand, how can we select the most probable hypothesis of how it was caused from among all possible hypotheses and how can we evaluate the different hypotheses? How can we predict future data and how can we measure the likelihood of that prediction being the right one?

Four principle inspirations for Solomonoff's algorithmic probability were: Occam's razor; Epicurus' principle of multiple explanations; modern computing theory (e.g. use of a universal Turing machine) and Bayes rule for prediction.

Occam's razor and Epicurus' principle are essentially two different nonmathematical approximations of the universal prior.

Occam's razor means 'among the theories that are consistent with the observed phenomena, one should select the simplest theory'.

Epicurus' principle of multiple explanations proposes that if more than one theory is consistent with the observations, keep all such theories.

At the heart of the universal prior is an abstract model of a computer, such as a universal Turing machine. Any abstract computer will do, as long as it is Turing-complete, i.e. every finite binary string has at least one program that will compute it on the abstract computer.

The abstract computer is used to give precise meaning to the phrase `simple explanation'. In the formalism used, explanations, or theories of phenomena are computer programs that generate observation strings when run on the abstract computer. A simple explanation is a short computer program; a complex explanation is a long computer program. Simple explanations are more likely, so a high-probability observation string is one generated by a short computer program, or perhaps by any of a large number of slightly longer computer programs. A low-probability observation string is one that can only be generated by a long computer program.

...
Wikipedia