Multi-Optimizer Comparison Visualization

▼ 🧮 Training-loop and detailed calculation Select optimizer to see calculations

📝 Algorithm Pseudocode

Select an optimizer to see pseudocode

Step-by-step Calculations

Select an optimizer and run optimization to see detailed calculations

📐 Optimizer Mathematical Formulas

SGD (Stochastic Gradient Descent)

θ_t+1 = θ_t - α∇f(θ_t)

Where α is the learning rate and ∇f(θ_t) is the gradient

Momentum

v_t = βv_t-1 + ∇f(θ_t)

θ_t+1 = θ_t - αv_t

Where β is momentum coefficient (typically 0.9) and v_t is velocity

AdaGrad

G_t = G_t-1 + ∇f(θ_t)²

θ_t+1 = θ_t - α∇f(θ_t)/√(G_t + ε)

Where ε is small constant (1e-8) and G_t accumulates squared gradients

Adam (Adaptive Moment Estimation)

m_t = β₁m_t-1 + (1-β₁)∇f(θ_t)

v_t = β₂v_t-1 + (1-β₂)∇f(θ_t)²

m̂_t = m_t/(1-β₁^t)

v̂_t = v_t/(1-β₂^t)

θ_t+1 = θ_t - α·m̂_t/(√v̂_t + ε)

Where β₁=0.9, β₂=0.999, m̂_t and v̂_t are bias-corrected estimates

RMSprop (Root Mean Square Propagation)

E[g²]_t = ρE[g²]_t-1 + (1-ρ)∇f(θ_t)²

θ_t+1 = θ_t - α∇f(θ_t)/√(E[g²]_t + ε)

Where ρ is decay rate (typically 0.9) and E[g²]_t is exponentially weighted average

AdaDelta (Adaptive Learning Rate)

E[g²]_t = ρE[g²]_t-1 + (1-ρ)∇f(θ_t)²

Δθ_t = -√(E[Δθ²]_t-1 + ε)/√(E[g²]_t + ε) · ∇f(θ_t)

E[Δθ²]_t = ρE[Δθ²]_t-1 + (1-ρ)Δθ_t²

θ_t+1 = θ_t + Δθ_t

Where ρ is decay rate (typically 0.95), E[g²]_t and E[Δθ²]_t are running averages

ℹ️ About

👨‍💻 Developed by Thanh-Sach LE

📧 Email: ltsach@hcmut.edu.vn

🏛️ Ho Chi Minh City University of Technology (HCMUT) - VNUHCM

This interactive visualization tool demonstrates various optimization algorithms used in neural network training, including SGD, Momentum, AdaGrad, Adam, RMSprop, and AdaDelta. Users can compare their performance on different mathematical functions and observe detailed training dynamics.

🏁 Neural Network Optimizer Training Visualization

📐 Optimizer Mathematical Formulas

ℹ️ About