(19) OPTIMIZATION: AdaDelta — Stability and Adaptability
An alternative algorithm when ADAGRAD and RMSprop don’t have the necessary stability
AdaDelta is an advanced optimization algorithm developed to solve the limitations posed by the discrepancy in the units in the update equations for Gradient Descent, Momentum, ADAGRAD, and RMSprop.
To address this inconsistency, the researchers introduced a mechanism derived from RMSprop that utilizes an exponentially decaying average of the squared updates. This approach effectively standardizes the units across the different update equations, ensuring a more cohesive and logical framework for the optimization process.
Moreover, by leveraging this exponentially decaying average, AdaDelta eliminates the need for a fixed learning rate parameter. This is particularly advantageous as it allows the algorithm to dynamically adjust the learning rate based on the history of updates, thereby maintaining a more stable and adaptable learning rate throughout the training process. But note that it does not necessarily mean it will converge faster!
This change in learning rate, not only corrects the unit mismatch but also significantly enhances the overall performance and adaptability of the optimization procedure. AdaDelta ensures that the…