20News Group Text Classification Project

Text Classification for News Articles

This project focuses on the classification of news articles into distinct topics using various data science techniques. The dataset comprises 20 article categories with the objective of accurately categorizing articles into their respective classes. The project employs machine learning models, feature engineering, and preprocessing strategies to enhance classification accuracy.

Project Structure

Notebooks:
- 01_Looking_into_data.ipynb: Initial exploration of the dataset.
- 02_baseline.ipynb: Creation of the baseline with various classifiers.
- 03_preprocessing.ipynb: Feature engineering and preprocessing steps.
- 04_feature_extraction.ipynb: Analysis of discriminative features.
- 05_Grid_search_VC.ipynb: Grid search for model tuning.
- 06_GD.ipynb: Further exploration, gradient descent, and additional feature engineering.

Instructions for Use

Clone the Repository:

git clone https://github.com/anush-data-portfolio/Classification-20NewsGroup
cd text-classification-project

Run the Notebooks:
- Each notebook is self-contained and handles the installation of necessary libraries. Execute the notebooks in the following order:
  1. 01_Looking_into_data.ipynb
  2. 02_baseline.ipynb
  3. 03_preprocessing.ipynb
  4. 04_feature_extraction.ipynb
  5. 05_Grid_search_VC.ipynb
  6. 06_GD.ipynb
Follow Notebook Instructions:
- Each notebook provides detailed explanations, code comments, and instructions. Follow the steps outlined in each notebook to understand the project's progression and outcomes.
Internet Connection:
- Ensure a stable internet connection, as the notebooks handle library installations from online repositories.

Libraries Used

The project relies on the following Python libraries:

scikit-learn (sklearn)
pandas
numpy
nltk
matplotlib

Important Notes

Recreating Results:
- Users can rerun the notebooks to recreate the results. The notebooks are designed to handle the necessary library installations.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.DS_Store		.DS_Store
01_Looking_into_data.ipynb		01_Looking_into_data.ipynb
02_baseline.ipynb		02_baseline.ipynb
03_preprocessing.ipynb		03_preprocessing.ipynb
04_feature_extraction.ipynb		04_feature_extraction.ipynb
05_Grid_search_VC.ipynb		05_Grid_search_VC.ipynb
06_GD.ipynb		06_GD.ipynb
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
cleaned_data.csv.gz		cleaned_data.csv.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

20News Group Text Classification Project

Text Classification for News Articles

Project Structure

Instructions for Use

Libraries Used

Important Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

anush-data-portfolio/Classification-20NewsGroup

Folders and files

Latest commit

History

Repository files navigation

20News Group Text Classification Project

Text Classification for News Articles

Project Structure

Instructions for Use

Libraries Used

Important Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages