*** Welcome to piglix ***

Earth mover's distance


In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved.

The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms or probability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions.

Assume that we have a set of points in (dimension ). Instead of assigning one distribution to the set of points, we can cluster them and represent the point set in terms of the clusters. Thus, each cluster is a single point in and the weight of the cluster is decided by the fraction of the distribution present in that cluster. This representation of a distribution by a set of clusters is called the signature. Two signatures can have different sizes, for example, a bimodal distribution has shorter signature (2 clusters) than complex ones. One cluster representation (mean or mode in ) can be thought of as a single feature in a signature. The distance between each of the features is called as ground distance.


...
Wikipedia

...