🏁 Neural Network Optimizer Training Visualization

📊 Function Selection & Parameters Himmelblau Function
Himmelblau Function
f(x,y) = (x² + y - 11)² + (x + y² - 7)²
X Range
to
Y Range
to
Rosenbrock Function
f(x,y) = 100(y - x²)² + (1 - x)²
X Range
to
Y Range
to
Quadratic Bowl
f(x,y) = x² + y²
X Range
to
Y Range
to
Booth Function
f(x,y) = (x + 2y - 7)² + (2x + y - 5)²
X Range
to
Y Range
to
Optimizers Control Panel 6 enabled
SGD
Learning Rate
Momentum
Learning Rate
Beta
AdaGrad
Learning Rate
Epsilon
Adam
Learning Rate
Beta1
Beta2
Epsilon
RMSprop
Learning Rate
Beta
Epsilon
AdaDelta
Rho
Epsilon
🎨 Color Configuration
Cold Color (Low Heights)
Hot Color (High Heights)
📊 Detailed Logging
Select Optimizer for Detailed Logging:
🎯 3D Surface View
Active
📊 2D contour view
1.0x
Active
(0.00, 0.00)
📊 Optimizer Performance Comparison Loss & Gradient Analysis
📉 Loss Function Over Time
📈 Gradient Magnitude Over Time
🧮 Training-loop and detailed calculation Select optimizer to see calculations
📝 Algorithm Pseudocode
Select an optimizer to see pseudocode
Step-by-step Calculations
Select an optimizer and run optimization to see detailed calculations

📐 Optimizer Mathematical Formulas

SGD (Stochastic Gradient Descent)
θt+1 = θt - α∇f(θt)
Where α is the learning rate and ∇f(θt) is the gradient
Momentum
vt = βvt-1 + ∇f(θt)
θt+1 = θt - αvt
Where β is momentum coefficient (typically 0.9) and vt is velocity
AdaGrad
Gt = Gt-1 + ∇f(θt)2
θt+1 = θt - α∇f(θt)/√(Gt + ε)
Where ε is small constant (1e-8) and Gt accumulates squared gradients
Adam (Adaptive Moment Estimation)
mt = β₁mt-1 + (1-β₁)∇f(θt)
vt = β₂vt-1 + (1-β₂)∇f(θt)2
t = mt/(1-β₁t)
t = vt/(1-β₂t)
θt+1 = θt - α·m̂t/(√v̂t + ε)
Where β₁=0.9, β₂=0.999, t and t are bias-corrected estimates
RMSprop (Root Mean Square Propagation)
E[g²]t = ρE[g²]t-1 + (1-ρ)∇f(θt)2
θt+1 = θt - α∇f(θt)/√(E[g²]t + ε)
Where ρ is decay rate (typically 0.9) and E[g²]t is exponentially weighted average
AdaDelta (Adaptive Learning Rate)
E[g²]t = ρE[g²]t-1 + (1-ρ)∇f(θt)2
Δθt = -√(E[Δθ²]t-1 + ε)/√(E[g²]t + ε) · ∇f(θt)
E[Δθ²]t = ρE[Δθ²]t-1 + (1-ρ)Δθt2
θt+1 = θt + Δθt
Where ρ is decay rate (typically 0.95), E[g²]t and E[Δθ²]t are running averages

ℹ️ About

👨‍💻 Developed by Thanh-Sach LE
📧 Email: ltsach@hcmut.edu.vn
🏛️ Ho Chi Minh City University of Technology (HCMUT) - VNUHCM

This interactive visualization tool demonstrates various optimization algorithms used in neural network training, including SGD, Momentum, AdaGrad, Adam, RMSprop, and AdaDelta. Users can compare their performance on different mathematical functions and observe detailed training dynamics.