Multi-label classification

In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem of categorizing instances into one of more than two classes. Formally, multi-label learning can be phrased as the problem of finding a model that maps inputs x to binary vectors y, rather than scalar outputs as in the ordinary classification problem.

There are two main methods for tackling the multi-label classification problem: problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems, which can then be handled using single-class classifiers. Algorithm adaptation methods adapt the algorithms to directly perform multi-label classification. In other words, rather than trying to convert the problem to a simpler problem, they try to address the problem in its full form.

Several problem transformation methods exist for multi-label classification; the baseline approach, called the binary relevance method, amounts to independently training one binary classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. This method of dividing the task into multiple binary tasks has something in common with the one-vs.-all (OvA, or one-vs.-rest, OvR) method for multiclass classification. Note though that it is not the same method: in binary relevance we train one classifier for each label, not one classifier for each possible value for the label.

Various other transformations exist. Of these, the label powerset (LP) transformation creates one binary classifier for every label combination attested in the training set. The random $k$ -labelsets (RAKEL) algorithm uses multiple LP classifiers, each trained on a random subset of the actual labels; prediction using this ensemble method proceeds by a voting scheme.

Classifier chains are an alternative ensembling methods that have been applied, for instance, in HIV drug resistance prediction.

...
Wikipedia