📊 Classification Overview

37-class fine-grained breed classification task

7,349

Total Images

for classification

37

Breeds

fine-grained classes

4,978

Dogs

dog images

2,371

Cats

cat images

📊 Class Distribution

Analysis of breed distribution and class balance

Breed Distribution

Species Distribution

Class Balance Analysis

Max Count: 200
Min Count: 184
Imbalance Ratio: 1.1x
Balance Level: ✅ Balanced

Train/Val/Test Split Distribution

Distribution of images across train, validation, and test splits for both species and breeds

2,941

Train Images

40.0%

739

Val Images

10.1%

3,669

Test Images

49.9%

Species-Level Split Distribution

Train/Val/Test distribution for Dog vs Cat classification

Overall Split Ratio

Percentage of images in each split

Breed-Level Split Distribution (Top 15)

Stacked bar chart showing train/val/test distribution for each breed

💡 Split Quality Indicators:
  • Good split: Each breed has ~60-70% train, ~10-15% val, ~20-30% test
  • Warning: Breeds with < 5% val or < 10% test may have insufficient validation/test samples
  • Species balance: Check if Dog/Cat ratio is consistent across splits

Hierarchical Structure

Two-level hierarchy: Species (Cat/Dog) → Breeds

Species → Breeds Hierarchy

🐕 Dogs (25 breeds)

American BulldogAmerican Pit Bull TerrierBasset HoundBeagle
BoxerChihuahuaEnglish Cocker SpanielEnglish Setter
German ShorthairedGreat PyreneesHavaneseJapanese Chin
KeeshondLeonbergerMiniature PinscherNewfoundland
PomeranianPugSaint BernardSamoyed
Scottish TerrierShiba InuStaffordshire Bull TerrierWheaten Terrier
Yorkshire Terrier

🐈 Cats (12 breeds)

AbyssinianBengalBirmanBombay
British ShorthairEgyptian MauMaine CoonPersian
RagdollRussian BlueSiameseSphynx
📊 Hierarchy Levels:
  • Level 1: 2-class classification (Dog vs Cat) - Coarse-grained
  • Level 2: 37-class classification (All breeds) - Fine-grained
  • Multi-task: Can train hierarchical models that predict both levels

🔬 Feature Visualization

Dimensionality reduction: t-SNE and UMAP for breed feature visualization

Features extracted using ResNet50 pretrained on ImageNet

t-SNE 2D Projection

UMAP 2D Projection

PCA Explained Variance

Cumulative variance explained by principal components

🔍 Breed Similarity Analysis

Feature-based similarity between breeds using cosine distance

Similarity Matrix Heatmap

Hover to see pairwise similarities. Diagonal = 1.0 (breed vs itself)

Most Similar Breed Pairs

# Breed 1 Breed 2 Similarity
1 Birman Ragdoll 0.964
2 British_Shorthair Russian_Blue 0.960
3 Birman Siamese 0.958
4 Bengal Egyptian_Mau 0.942
5 Abyssinian Russian_Blue 0.935
6 american_pit_bull_terrier staffordshire_bull_terrier 0.933
7 Abyssinian Bengal 0.933
8 american_bulldog american_pit_bull_terrier 0.932
9 Maine_Coon Ragdoll 0.931
10 British_Shorthair Maine_Coon 0.926

Similarity Insights

High Similarity (>0.8): Breeds may be visually very similar - challenging to classify
Low Similarity (<0.5): Distinct breeds - easier to distinguish

Similarity computed from ResNet50 features using cosine distance. Higher values indicate more similar visual characteristics.

Breed Clusters

Cluster 1 (8 breeds)

great_pyrenees keeshond leonberger newfoundland pomeranian saint_bernard samoyed shiba_inu

Cluster 2 (11 breeds)

Abyssinian Bengal Birman Bombay British_Shorthair Egyptian_Mau Maine_Coon Persian Ragdoll Russian_Blue Siamese

Cluster 3 (7 breeds)

Sphynx american_bulldog american_pit_bull_terrier boxer chihuahua miniature_pinscher staffordshire_bull_terrier

Cluster 4 (5 breeds)

basset_hound beagle english_cocker_spaniel english_setter german_shorthaired

Cluster 5 (6 breeds)

havanese japanese_chin pug scottish_terrier wheaten_terrier yorkshire_terrier

Key Insights

Important findings and recommendations

Dataset Characteristics

✓ Well-Balanced Dataset: Classes are fairly balanced with imbalance ratio < 2.0
Most Similar Breeds: Birman ↔ Ragdoll (similarity: 0.964)
Classification Challenge: Fine-grained classification with 37 breeds requires deep learning models

Recommendations

Model Selection:
  • Transfer Learning (ResNet, EfficientNet)
  • Fine-tune on Pet dataset
  • Data augmentation (rotation, flip, color jitter)
Expected Performance:
  • ResNet50: ~85-90% accuracy
  • EfficientNet: ~90-93% accuracy