Computer Vision classification pipeline with CNN architectures
Raw images from the Oxford Pets dataset are loaded and prepared for the CNN model. Each image is resized to 224×224 pixels to match the input requirements of pre-trained models.
import torch
from torchvision import transforms
from PIL import Image
# Load image
image = Image.open('path/to/image.jpg')
# Define transforms for input
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor()
])
# Apply transforms
input_tensor = transform(image)
print(f"Input shape: {input_tensor.shape}") # [3, 224, 224]
Images are normalized using ImageNet statistics and augmented during training to improve model generalization. This step is crucial for transfer learning.
# Training transforms with augmentation
train_transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.RandomCrop((224, 224)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
# Validation transforms (no augmentation)
val_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
We use pre-trained models from timm library: ResNet50, EfficientNet-B3, and MobileNetV3. These models were trained on ImageNet and provide excellent feature extraction capabilities.
import timm
import torch.nn as nn
# Create models using timm
models = {
'resnet50': timm.create_model('resnet50', pretrained=True, num_classes=37),
'efficientnet_b3': timm.create_model('efficientnet_b3', pretrained=True, num_classes=37),
'mobilenetv3_large_100': timm.create_model('mobilenetv3_large_100', pretrained=True, num_classes=37)
}
# Model summary
for name, model in models.items():
print(f"{name}: {sum(p.numel() for p in model.parameters())} parameters")
Deep residual network with 50 layers. Excellent balance of accuracy and efficiency.
Compound scaling with optimal efficiency. Best accuracy with fewer parameters.
Mobile-optimized architecture. Fastest inference with good accuracy.
The final classification layer is adapted from 1000 ImageNet classes to 37 pet breed classes. This is the key step in transfer learning - we freeze the feature extractor and only train the classifier head.
# Freeze feature extractor
for param in model.parameters():
param.requires_grad = False
# Replace classifier head
if hasattr(model, 'classifier'):
model.classifier = nn.Linear(model.classifier.in_features, 37)
elif hasattr(model, 'fc'):
model.fc = nn.Linear(model.fc.in_features, 37)
elif hasattr(model, 'head'):
model.head = nn.Linear(model.head.in_features, 37)
# Only train the new classifier
for param in model.classifier.parameters():
param.requires_grad = True
The model outputs probability scores for all 37 breed classes. We use top-1 and top-5 accuracy to evaluate performance. The best models achieve 95%+ accuracy on the test set.
# Model inference
model.eval()
with torch.no_grad():
outputs = model(input_tensor)
probabilities = torch.softmax(outputs, dim=1)
predicted_class = torch.argmax(probabilities, dim=1)
# Get top-5 predictions
top5_probs, top5_indices = torch.topk(probabilities, 5, dim=1)
print(f"Predicted class: {predicted_class.item()}")
print(f"Top-5 classes: {top5_indices[0].tolist()}")
print(f"Top-5 probabilities: {top5_probs[0].tolist()}")
Download and organize the Oxford Pets dataset with 37 breed classes.
Choose between ResNet50, EfficientNet, ViT, or MobileNet architectures.
Adapt pre-trained models from ImageNet (1000 classes) to our 37 breed classes.
import torch
import torchvision
from torchvision import transforms, models
# Load pre-trained model
model = models.resnet50(pretrained=True)
model.fc = torch.nn.Linear(model.fc.in_features, 37) # 37 classes
# Data transforms
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
Run the complete pipeline in Google Colab with free GPU access.
Compare performance metrics across CNN models
| Model | Precision ↑ | Recall ↑ | F1-Score ↑ | Accuracy ↑ | Top-5 Acc ↑ |
|---|
| Model | Training Time ↓ | Inference Time ↓ | FLOPs ↓ | Parameters ↓ | Model Size ↓ |
|---|
Select a model from the comparison table above to view its confusion matrix
Click on a model row in the comparison table to view its confusion matrix