Category utility is a measure of "category goodness" defined in Gluck & Corter (1985) and Corter & Gluck (1992). It attempts to maximize both the probability that two objects in the same category have attribute values in common, and the probability that objects from different categories have different attribute values. It was intended to supersede more limited measures of category goodness such as "cue validity" (Reed 1972; Rosch & Mervis 1975) and "collocation index" (Jones 1983). It provides a normative information-theoretic measure of the predictive advantage gained by the observer who possesses knowledge of the given category structure (i.e., the class labels of instances) over the observer who does not possess knowledge of the category structure. In this sense the motivation for the category utility measure is similar to the information gain metric used in decision tree learning. In certain presentations, it is also formally equivalent to the mutual information, as discussed below. A review of category utility in its probabilistic incarnation, with applications to machine learning, is provided in Witten & Frank (2005, pp. 260–262).
The probability-theoretic definition of category utility given in Fisher (1987) and Witten & Frank (2005) is as follows:
where is a size- set of -ary features, and is a set of categories. The term designates the marginal probability that feature takes on value , and the term designates the category-conditional probability that feature takes on value given that the object in question belongs to category .