Similarity function

In statistics and related fields, a similarity measure or similarity function is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity measure exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects. E.g., in the context of cluster analysis, Frey and Dueck suggest defining a similarity measure

where $\|x-y\|_{2}^{2}$ is the squared Euclidean distance.

Cosine similarity is a commonly used similarity measure for real-valued vectors, used in (among other fields) information retrieval to score the similarity of documents in the vector space model. In machine learning, common kernel functions such as the RBF kernel can be viewed as similarity functions.

In spectral clustering, a similarity, or affinity, measure is used to transform data to overcome difficulties related to lack of convexity in the shape of the data distribution. The measure gives rise to an $(n,n)$ -sized similarity matrix for a set of $n$ points, where the entry $(i,j)$ in the matrix can be simply the (negative of the) Euclidean distance between $i$ and $j$ , or it can be a more complex measure of distance such as the Gaussian $e^{-\|s_{1}-s_{2}\|^{2}/2\sigma ^{2}}$ . Further modifying this result with network analysis techniques is also common.

...
Wikipedia