In machine learning, the kernel perceptron is a variant of the popular perceptron learning algorithm that can learn kernel machines, i.e. non-linear classifiers that employ a kernel function to compute the similarity of unseen samples to training samples. The algorithm was invented in 1964, making it the first kernel classification learner.
The perceptron algorithm is an old but popular online learning algorithm that operates by a principle called "error-driven learning". It iteratively improves a model by running it on training samples, then updating the model whenever it finds it has made an incorrect classification with respect to a supervised signal. The model learned by the standard perceptron algorithm is a linear binary classifier: a vector of weights w (and optionally an intercept term b, omitted here for simplicity) that is used to classify a sample vector x as class "one" or class "minus one" according to
where a zero is arbitrarily mapped to one or minus one. (The "hat" on ŷ denotes an estimated value.)
In pseudocode, the perceptron algorithm is given by:
By contrast with the linear models learned by the perceptron, a kernel machine is a classifier that stores a subset of its training examples xi, associates with each a weight αi, and makes decisions for new samples x' by evaluating
Here, K is some kernel function. Formally, a kernel function is a non-negative semidefinite kernel (see Mercer's condition), representing an inner product between samples in a high-dimensional space, as if the samples had been expanded to include additional features by a function Φ: K(x, x') = Φ(x) · Φ(x'). Intuitively, it can be thought of as a similarity function between samples, so the kernel machine establishes the class of a new sample by weighted comparison to the training set. Each function x' ↦ K(xᵢ, x') serves as a basis function in the classification.