Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.
Gradient descent is also known as steepest descent. However, gradient descent should not be confused with the method of steepest descent for approximating integrals.
Gradient descent is a popular method in the field of machine learning because part of the process of machine learning is to find the highest accuracy, or to minimize the error rate, given a set of training data. Gradient descent is used to find the minimum error by minimizing a "cost" function.
Gradient descent is based on the observation that if the multi-variable function is defined and differentiable in a neighborhood of a point , then decreases fastest if one goes from in the direction of the negative gradient of at , . It follows that, if