*** Welcome to piglix ***

Information gain in decision trees


In information theory and machine learning, information gain is a synonym for Kullback–Leibler divergence. However, in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the expected value of the Kullback–Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one.

In particular, the information gain about a random variable X obtained from an observation that a random variable A takes the value A=a is the Kullback-Leibler divergence DKL(p(x | a) || p(x | I)) of the prior distribution p(x | I) for x from the posterior distribution p(x | a) for x given a.

The expected value of the information gain is the mutual information I(X; A) of X and A – i.e. the reduction in the entropy of X achieved by learning the state of the random variable A.

In machine learning, this concept can be used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a decision tree. Usually an attribute with high mutual information should be preferred to other attributes.

In general terms, the expected information gain is the change in information entropy from a prior state to a state that takes some information as given:


...
Wikipedia

...