A professional-grade tool for detecting plagiarism and AI-generated text using NLP, machine learning, and deep learning techniques. Supports both local CLI and modern web interface via Streamlit.
- 📂 Supports
.docxand.pdfuploads - 📏 Cosine Similarity – for basic surface-level plagiarism detection
- 🧠 BERT Semantic Similarity – detects paraphrased or reworded content
- 🤖 AI Text Detection – estimates likelihood of AI-generated text using GPT-2 perplexity scoring
- 🌐 Streamlit Web UI – intuitive browser-based interface (no coding required)
- 🔐 Runs offline in a virtual environment (safe and private)
PlagiarismStudio/
├── extract_text.py
├── check_cosine.py
├── check_bert.py
├── detect_ai.py
├── streamlit_app.py
├── requirements.txt
├── README.md
├── test1.docx
├── source1.docx
└── venv/
- Clone the repository
git clone https://github.com/your-username/PlagiarismStudio.git
cd PlagiarismStudio- Create and activate virtual environment (Windows)
python -m venv venv
venv\Scripts\activate- Install all dependencies
pip install -r requirements.txtstreamlit run streamlit_app.pyThen open the link in your browser (usually http://localhost:8501). Upload your research draft and a reference paper to get results.
| File | Description |
|---|---|
extract_text.py |
Extracts raw text from PDF and DOCX |
check_cosine.py |
Calculates cosine similarity of two texts |
check_bert.py |
Uses Sentence-BERT for deep similarity detection |
detect_ai.py |
Estimates AI-generation using GPT-2 perplexity |
streamlit_app.py |
Full browser UI for file upload and report |
- Cosine Similarity:
85.23% - BERT Similarity:
91.78% - Perplexity Score:
472.10→ ✅ Likely human-written
- Python 3.8+
- streamlit
- pdfminer.six
- python-docx
- scikit-learn
- sentence-transformers
- transformers
- torch
(Or use requirements.txt)
MIT License – Free to use, share, and modify for research or educational use.