This project implements federated fine-tuning of GPT-2 on the GLUE MRPC dataset, combining:
- LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning, and
- FedAvg (Federated Averaging) to aggregate client-side adapters while preserving data privacy.
- Federated learning with HuggingFace Transformers and PEFT.
- LoRA adapters for efficient training on GPT-2.
- Evaluated on the GLUE MRPC task (paraphrase detection).
- Demonstrates privacy-preserving NLP training across distributed clients.
federated-gpt2-lora-mrpc/ ├── LICENSE ├── README.md ├── NLP_and_LLMs_Sarah_Altalhi.ipynb # Main notebook with experiments │ ├── configs/ # YAML configs for experiments ├── data/ # Dataset cache (ignored by git) ├── experiments/ # Logs, metrics, figures ├── models/ # Saved LoRA adapters/checkpoints ├── notebooks/ # Optional extra notebooks ├── scripts/ # Setup/run helper scripts │ ├── src/ │ ├── federated/ # FedAvg client/server logic │ ├── models/ # GPT-2 + LoRA model wrappers │ └── utils/ # Metrics, data, seeding utils │ └── tests/ # Optional unit tests
git clone https://github.com/Sarah-Altalhi/federated-gpt2-lora-mrpc.git cd federated-gpt2-lora-mrpc
python -m venv .venv .venv\Scripts\activate # On Windows
source .venv/bin/activate # On Linux/Mac
pip install --upgrade pip pip install -r requirements.txt
Open the notebook in Jupyter or VSCode:
jupyter notebook NLP_and_LLMs_Sarah_Altalhi.ipynb
The notebook contains:
-
Data loading (GLUE MRPC).
-
GPT-2 fine-tuning with LoRA adapters.
-
Federated averaging simulation for distributed clients.
-
Evaluation metrics (Accuracy / F1).
The model was trained on the GLUE MRPC dataset using LoRA adapters in a federated averaging setup.
Setting | #Clients | Rounds | MRPC Accuracy | MRPC F1 |
---|---|---|---|---|
GPT-2 + LoRA (baseline) | 5 | 10 | 0.84 | 0.82 |
GPT-2 + FedAvg | 10 | 20 | 0.86 | 0.84 |
{
"sentence1": "Negotiators talked with the boy for more than an hour , and SWAT officers surrounded the classroom , Bragdon said .",
"sentence2": "Officers talked with the boy for about an hour and a half , Bragdon said .",
"label": 0,
"idx": 3149
}
Hu et al., 2022. LoRA: Low-Rank Adaptation of Large Language Models.
McMahan et al., 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data.
HuggingFace Transformers & Datasets libraries.