Silhouette refers to a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object lies within its cluster. It was first described by Peter J. Rousseeuw in 1986.
The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from -1 to 1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters.
The silhouette can be calculated with any distance metric, such as the Euclidean distance or the Manhattan distance.
Assume the data have been clustered via any technique, such as k-means, into clusters. For each datum , let be the average dissimilarity of with all other data within the same cluster. We can interpret as how well is assigned to its cluster (the smaller the value, the better the assignment). We then define the average dissimilarity of point to a cluster as the average of the distance from to all points in .