(15) OPTIMIZATION: Momentum Gradient Descent

Another way to improve Gradient Descent convergence

Carla Martins
4 min readApr 11, 2024

In article 13 about optimization, we discussed optimization techniques, particularly focusing on gradient descent. the previous article have highlighted the potential issue of slow convergence with gradient descent, especially in cases of relatively flat gradients. To overcome this challenge, conjugate gradient descent offers a solution for accelerating convergence. Additionally, momentum gradient descent presents another effective approach, which we will explore further in this article.

In the context of optimization algorithms like gradient descent, momentum can be envisioned as akin to a ball rolling down a nearly flat incline. Just as a ball gathers momentum while descending due to the force of gravity, momentum refers to the accumulated influence of past gradients on the current update direction. When the gradient is relatively flat, as on a gentle slope, traditional gradient descent may proceed slowly. However, by incorporating momentum, the optimization process gains a ‘memory’ of past gradients, allowing it to maintain or even accelerate its pace, akin to how the rolling ball gains speed over time. Consequently, momentum gradient descent navigates through flat regions or shallow local minima more efficiently, facilitating faster convergence toward…

--

--

Carla Martins
Carla Martins

Written by Carla Martins

Compulsive learner. Passionate about technology. Speaks C, R, Python, SQL, Haskell, Java and LaTeX. Interested in creating solutions.