Interactive visualization of ResNet50 with 50 layers and skip connections
Transformer-based architectures for computer vision
Residual Network with 50 layers, uses skip connections to solve vanishing gradient problem
Very Deep Convolutional Networks with 16 layers, uses small 3×3 filters
EfficientNet with compound scaling, balances accuracy and efficiency
Mobile-optimized network with inverted residuals and linear bottlenecks
Base Vision Transformer with 12 transformer blocks and 768 embedding dimensions
Large Vision Transformer with 24 transformer blocks and 1024 embedding dimensions
Swin Transformer with shifted windows for efficient vision modeling
Data-efficient Image Transformer with knowledge distillation