Skip to content

anush-data-portfolio/Classification-20NewsGroup

Repository files navigation

20News Group Text Classification Project

Text Classification for News Articles

This project focuses on the classification of news articles into distinct topics using various data science techniques. The dataset comprises 20 article categories with the objective of accurately categorizing articles into their respective classes. The project employs machine learning models, feature engineering, and preprocessing strategies to enhance classification accuracy.

Project Structure

  • Notebooks:
    • 01_Looking_into_data.ipynb: Initial exploration of the dataset.
    • 02_baseline.ipynb: Creation of the baseline with various classifiers.
    • 03_preprocessing.ipynb: Feature engineering and preprocessing steps.
    • 04_feature_extraction.ipynb: Analysis of discriminative features.
    • 05_Grid_search_VC.ipynb: Grid search for model tuning.
    • 06_GD.ipynb: Further exploration, gradient descent, and additional feature engineering.

Instructions for Use

  1. Clone the Repository:

    git clone https://github.com/anush-data-portfolio/Classification-20NewsGroup
    cd text-classification-project
  2. Run the Notebooks:

    • Each notebook is self-contained and handles the installation of necessary libraries. Execute the notebooks in the following order:
      1. 01_Looking_into_data.ipynb
      2. 02_baseline.ipynb
      3. 03_preprocessing.ipynb
      4. 04_feature_extraction.ipynb
      5. 05_Grid_search_VC.ipynb
      6. 06_GD.ipynb
  3. Follow Notebook Instructions:

    • Each notebook provides detailed explanations, code comments, and instructions. Follow the steps outlined in each notebook to understand the project's progression and outcomes.
  4. Internet Connection:

    • Ensure a stable internet connection, as the notebooks handle library installations from online repositories.

Libraries Used

The project relies on the following Python libraries:

  • scikit-learn (sklearn)
  • pandas
  • numpy
  • nltk
  • matplotlib

Important Notes

  • Recreating Results:
    • Users can rerun the notebooks to recreate the results. The notebooks are designed to handle the necessary library installations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published