Member-only story

(20) OPTIMIZATION: Adaptive Moment Estimation — ADAM

The popular optimization algorithm and the last one in this series

5 min readJul 31, 2024

ADAM has become one of the most widely used optimization algorithms in machine learning due to its robustness and efficiency. It combines ideas from two other optimization methods discussed here: AdaGrad and RMSprop.

First, let’s take a look at the ADAM equation to check how familiar it is to us:

If you have checked my previous posts you will easily find it similar to earlier algorithms, except for the terms ^mt and ^vt. These terms are called first-moment estimate (^mt) and second-moment estimate (^vt).

First Moment Estimate

The first-moment estimate in ADAM is essentially the moving average of the gradients. Its purpose is to capture the mean of gradients and help smooth the gradient updates, reducing the variance and allowing the optimization process to be more stable. Mathematically, it is represented by:

(20) OPTIMIZATION: Adaptive Moment Estimation — ADAM

The popular optimization algorithm and the last one in this series

Written by Carla Martins

No responses yet