Gradient boosting

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

The idea of gradient boosting originated in the observation by Leo Breiman that boosting can be interpreted as an optimization algorithm on a suitable cost function. Explicit regression gradient boosting algorithms were subsequently developed by Jerome H. Friedman simultaneously with the more general functional gradient boosting perspective of Llew Mason, Jonathan Baxter, Peter Bartlett and Marcus Frean. The latter two papers introduced the abstract view of boosting algorithms as iterative functional gradient descent algorithms. That is, algorithms that optimize a cost function over function space by iteratively choosing a function (weak hypothesis) that points in the negative gradient direction. This functional gradient view of boosting has led to the development of boosting algorithms in many areas of machine learning and statistics beyond regression and classification.

(This section follows the exposition of gradient boosting by Li.)

Like other boosting methods, gradient boosting combines weak "learners" into a single strong learner, in an iterative fashion. It is easiest to explain in the least-squares regression setting, where the goal is to "teach" a model $F$ to predict values in the form ${\hat {y}}=F(x)$ , by minimizing the mean squared error $({\hat {y}}-y)^{2}$ to the true values $y$ (averaged over some training set).

...
Wikipedia