*** Welcome to piglix ***

Normalized compression distance


Normalized compression distance is a way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two pictures, two systems, two genomes, to name a few. Such a measurement should not be application dependent or arbitrary. A reasonable definition for the similarity between two objects is how difficult it is to transform them into each other.

We assume that the objects one talks about are finite strings of 0s and 1s. Thus we mean string similarity. Every computer file is of this form, that is, if an object is a file in a computer it is of this form. One can define the information distance between strings and as the length of the shortest program that computes from and vice versa. This shortest program is in a fixed programming language. For technical reasons one uses the theoretical notion of Turing machines. Moreover, to express the length of one uses the notion of Kolmogorov complexity. Then, it has been shown


...
Wikipedia

...