Skip to content

dpb24/fake-news-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“° Fake News Detector: Binary Classification Model

Libraries: scikit-learn, XGBoost, matplotlib, pandas, numpy
Dataset: ISOT Fake News Detection Dataset

In this project we use the 🐍 Python libraries scikit-learn and XGBoost to build a machine learning model that classifies news articles as fake or real. We combine classical machine learning techniques with engineered textual features to improve model generalisability and performance.

🧠 Approach

  • Text vectorisation: Bag of Words (BoW)
  • Feature engineering: % of special characters & % of capitalised characters
  • Baseline model: DecisionTreeClassifier with GridSearchCV
  • Ensemble model: XGBClassifier with RandomizedSearchCV
  • Robustness: Removed dataset-specific artefacts (eg. reuters) from BoW to improve generalisability

βœ… Results

  • πŸ€– XGBoost ensemble achieved ~99.8% accuracy, precision, recall, and F1 score
  • Top feature: headline_capitalised (engineered)
  • Fun insight: second most important vectorized word for classification β€” "Trump" πŸ‡ΊπŸ‡Έ

πŸ”­ Future Work

  • Test on more diverse, real-world datasets
  • Experiment with advanced text vectorisation (eg. word embeddings, transformer models)
  • Compare with alternative classifiers (eg. Support Vector Machines)

πŸ“– Jupyter Notebook: GitHub | CoLab | Kaggle