TF-IDF + Traditional ML | BBC News Classification

📝

Text Input

"Business news..."

→

🔢

TF-IDF

Fixed

5000 features

→

🤖

Classifier

Choose Below

Select methods

→

✨

Results

96-98% accuracy

📊 Feature Extraction

🔤

TF-IDF (Term Frequency - Inverse Document Frequency)

Converts text into numerical vectors by weighting words based on frequency and uniqueness. Common words (e.g., "the", "is") get low weights, while distinctive words get high weights.

Formula: TF-IDF(t,d) = TF(t,d) × IDF(t) = (term freq in doc) × log(total docs / docs with term)

📚 Scikit-learn TfidfVectorizer Docs 📖 TF-IDF Theory & Examples

🤖 Classification Algorithms

🎲

Naive Bayes

Probabilistic classifier based on Bayes theorem. Fast, efficient, works well with small datasets.

📈 96.7%

📚 Scikit-learn Docs

📈

Logistic Regression

Linear model with sigmoid activation. Simple, interpretable, good baseline.

📈 97.9%

📚 Scikit-learn Docs

🎯

Support Vector Machine

Find optimal hyperplane for classification. Powerful for high-dimensional data.

📈 98.2%

📚 Scikit-learn Docs

🌲

Random Forest

Ensemble of decision trees. Robust, handles non-linearity, reduces overfitting.

📈 97.6%

📚 Scikit-learn Docs

🧠

Neural Network (MLP)

Multi-layer perceptron with hidden layers. Can learn complex patterns.

📈 98.5%

📚 Scikit-learn Docs

🚀

XGBoost

Gradient boosting algorithm. Often wins ML competitions. High performance.

📈 98.8%

📚 Scikit-learn Docs

💻 Full Implementation Code

Ready-to-run Python code with all 6 classifiers. Includes dataset download, TF-IDF extraction, training, and evaluation.

⬇️ Download

⏳ Loading code...

Features:

✅ Automatic dataset download from GitHub Pages
✅ TF-IDF feature extraction (5000 features, bigrams)
✅ 6 classifiers: Naive Bayes, Logistic, SVM, Random Forest, MLP, XGBoost
✅ Training & inference time measurements
✅ Classification reports & confusion matrices
✅ Visualizations saved as PNG files

🚀

Run in Google Colab

Open this notebook in Google Colab for interactive execution with free GPU access.

⚡

No Setup Required

All dependencies pre-installed

🎮

Interactive

Modify code and see results instantly

💾

Save Results

Download plots and models

Open in Colab

Note: The notebook will open in a new tab. You may need to sign in with your Google account.

🔤 TF-IDF + Traditional Machine Learning

📊 Feature Extraction

TF-IDF (Term Frequency - Inverse Document Frequency)

🤖 Classification Algorithms

Naive Bayes

Logistic Regression

Support Vector Machine

Random Forest

Neural Network (MLP)

XGBoost

💻 Full Implementation Code

Features:

Run in Google Colab

No Setup Required

Interactive

Save Results