Skip to content

An offline and web-based plagiarism + AI content detector for research writing. Built using Python, Streamlit, BERT, and GPT-2. Supports PDF & DOCX files.

Notifications You must be signed in to change notification settings

tripathy-ji/PlagiarismStudio

Repository files navigation

🧠 Plagiarism & AI Text Detector (Offline + Streamlit Web App)

A professional-grade tool for detecting plagiarism and AI-generated text using NLP, machine learning, and deep learning techniques. Supports both local CLI and modern web interface via Streamlit.


🚀 Features

  • 📂 Supports .docx and .pdf uploads
  • 📏 Cosine Similarity – for basic surface-level plagiarism detection
  • 🧠 BERT Semantic Similarity – detects paraphrased or reworded content
  • 🤖 AI Text Detection – estimates likelihood of AI-generated text using GPT-2 perplexity scoring
  • 🌐 Streamlit Web UI – intuitive browser-based interface (no coding required)
  • 🔐 Runs offline in a virtual environment (safe and private)

🗂️ Project Structure

PlagiarismStudio/
├── extract_text.py
├── check_cosine.py
├── check_bert.py
├── detect_ai.py
├── streamlit_app.py
├── requirements.txt
├── README.md
├── test1.docx
├── source1.docx
└── venv/

💻 How to Use

🔧 Setup

  1. Clone the repository
git clone https://github.com/your-username/PlagiarismStudio.git
cd PlagiarismStudio
  1. Create and activate virtual environment (Windows)
python -m venv venv
venv\Scripts\activate
  1. Install all dependencies
pip install -r requirements.txt

▶️ Run the Web App

streamlit run streamlit_app.py

Then open the link in your browser (usually http://localhost:8501). Upload your research draft and a reference paper to get results.


🧪 Module Overview

File Description
extract_text.py Extracts raw text from PDF and DOCX
check_cosine.py Calculates cosine similarity of two texts
check_bert.py Uses Sentence-BERT for deep similarity detection
detect_ai.py Estimates AI-generation using GPT-2 perplexity
streamlit_app.py Full browser UI for file upload and report

📈 Output Example

  • Cosine Similarity: 85.23%
  • BERT Similarity: 91.78%
  • Perplexity Score: 472.10 → ✅ Likely human-written

📦 Requirements

  • Python 3.8+
  • streamlit
  • pdfminer.six
  • python-docx
  • scikit-learn
  • sentence-transformers
  • transformers
  • torch

(Or use requirements.txt)


📄 License

MIT License – Free to use, share, and modify for research or educational use.


🙌 Acknowledgements

About

An offline and web-based plagiarism + AI content detector for research writing. Built using Python, Streamlit, BERT, and GPT-2. Supports PDF & DOCX files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages