Classification - Oxford Pets

🔄 Pipeline

Computer Vision classification pipeline with CNN architectures

🖼️

Image Input

224×224 RGB images

→

🔧

Preprocessing

Fixed

Normalize, Resize

→

🧠

CNN Model

Choose Below

ResNet50, EfficientNet, MobileNet...

→

🎯

Classifier Head

Adapted

1000 → 37 classes

→

✨

Classification Results

37 breed predictions

🖼️ Input Image Processing

What happens here?

Raw images from the Oxford Pets dataset are loaded and prepared for the CNN model. Each image is resized to 224×224 pixels to match the input requirements of pre-trained models.

Code Example

import torch
from torchvision import transforms
from PIL import Image

# Load image
image = Image.open('path/to/image.jpg')

# Define transforms for input
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

# Apply transforms
input_tensor = transform(image)
print(f"Input shape: {input_tensor.shape}")  # [3, 224, 224]

🔧 Image Preprocessing

Normalization & Augmentation

Images are normalized using ImageNet statistics and augmented during training to improve model generalization. This step is crucial for transfer learning.

Code Example

# Training transforms with augmentation
train_transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.RandomCrop((224, 224)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                        std=[0.229, 0.224, 0.225])
])

# Validation transforms (no augmentation)
val_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                        std=[0.229, 0.224, 0.225])
])

🧠 CNN Model Architecture

Pre-trained CNN Models

We use pre-trained models from timm library: ResNet50, EfficientNet-B3, and MobileNetV3. These models were trained on ImageNet and provide excellent feature extraction capabilities.

Code Example

import timm
import torch.nn as nn

# Create models using timm
models = {
    'resnet50': timm.create_model('resnet50', pretrained=True, num_classes=37),
    'efficientnet_b3': timm.create_model('efficientnet_b3', pretrained=True, num_classes=37),
    'mobilenetv3_large_100': timm.create_model('mobilenetv3_large_100', pretrained=True, num_classes=37)
}

# Model summary
for name, model in models.items():
    print(f"{name}: {sum(p.numel() for p in model.parameters())} parameters")

Available Models

🏗️

ResNet50

95.2%

Accuracy

25.6M

Parameters

4.1B

FLOPs

Deep residual network with 50 layers. Excellent balance of accuracy and efficiency.

⚡

EfficientNet-B3

96.1%

Accuracy

12M

Parameters

1.8B

FLOPs

Compound scaling with optimal efficiency. Best accuracy with fewer parameters.

📱

MobileNetV3

93.8%

Accuracy

5.5M

Parameters

0.2B

FLOPs

Mobile-optimized architecture. Fastest inference with good accuracy.

🎯 Classifier Head Adaptation

Transfer Learning Adaptation

The final classification layer is adapted from 1000 ImageNet classes to 37 pet breed classes. This is the key step in transfer learning - we freeze the feature extractor and only train the classifier head.

Code Example

# Freeze feature extractor
for param in model.parameters():
    param.requires_grad = False

# Replace classifier head
if hasattr(model, 'classifier'):
    model.classifier = nn.Linear(model.classifier.in_features, 37)
elif hasattr(model, 'fc'):
    model.fc = nn.Linear(model.fc.in_features, 37)
elif hasattr(model, 'head'):
    model.head = nn.Linear(model.head.in_features, 37)

# Only train the new classifier
for param in model.classifier.parameters():
    param.requires_grad = True

✨ Classification Results

Model Predictions & Evaluation

The model outputs probability scores for all 37 breed classes. We use top-1 and top-5 accuracy to evaluate performance. The best models achieve 95%+ accuracy on the test set.

Code Example

# Model inference
model.eval()
with torch.no_grad():
    outputs = model(input_tensor)
    probabilities = torch.softmax(outputs, dim=1)
    predicted_class = torch.argmax(probabilities, dim=1)
    
# Get top-5 predictions
top5_probs, top5_indices = torch.topk(probabilities, 5, dim=1)
print(f"Predicted class: {predicted_class.item()}")
print(f"Top-5 classes: {top5_indices[0].tolist()}")
print(f"Top-5 probabilities: {top5_probs[0].tolist()}")

📚 Step-by-Step Tutorial

1

Data Preparation

Download and organize the Oxford Pets dataset with 37 breed classes.

2

Model Selection

Choose between ResNet50, EfficientNet, ViT, or MobileNet architectures.

3

Transfer Learning

Adapt pre-trained models from ImageNet (1000 classes) to our 37 breed classes.

💻 Implementation Code

import torch
import torchvision
from torchvision import transforms, models

# Load pre-trained model
model = models.resnet50(pretrained=True)
model.fc = torch.nn.Linear(model.fc.in_features, 37)  # 37 classes

# Data transforms
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                        std=[0.229, 0.224, 0.225])
])

🚀 Google Colab Notebook

Run the complete pipeline in Google Colab with free GPU access.

🚀 Open in Colab 📁 View on GitHub

📊 Model Comparison

Compare performance metrics across CNN models

Model	Precision ↑	Recall ↑	F1-Score ↑	Accuracy ↑	Top-5 Acc ↑

Model	Training Time ↓	Inference Time ↓	FLOPs ↓	Parameters ↓	Model Size ↓

📊 Confusion Matrix

Select a model from the comparison table above to view its confusion matrix

Click on a model row in the comparison table to view its confusion matrix