In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters.
One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions.
Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case.
The theorem in mathematical terms:
Let be a nonconstant, bounded, and monotonically-increasing continuous function. Let denote the m-dimensional unit hypercube . The space of continuous functions on is denoted by . Then, given any , there exists an integer , such that for any function , there exist real constants and real vectors , where , such that we may define: