AReaL: A Large-Scale Asynchronous Reinforcement Learning System

AReaL is an open-source fully asynchronous reinforcement learning training system for large reasoning and agentic models, developed by the AReaL Team at Ant Group. Built upon the open-source project ReaLHF, we are fully committed to open-source principles by providing training details, data, and infrastructure required to reproduce our results along with the models themselves. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just as you enjoy real-world milk tea (cheers).

AReaL Highlights

⚡ Flexibility: Seamless customization for multi-turn agentic rollout workflows within a single file, and smooth integration with other agentic tooling frameworks.
🚀 Scalability: Through algorithm-system co-design, AReaL delivers stable fully asynchronous RL training with industry-leading speed. AReaL seamlessly adapts to diverse computational environments, scaling from a single node to 1,000+ GPUs.
🔪 Cutting-Edge Performance: AReaL produces state-of-the-art math, coding, and search agents with exceptional capabilities.

📰 News

[2025/08/30] Introducing ASearcher, a state-of-the-art search agent built with AReaL's end-to-end asynchronous RL training. Check out the paper and the open-source repository!

[2025/07/31] (AReaL-lite) We introduce AReaL-lite, a lightweight version of AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite features an algorithm-first API design that prioritizes ease of use and algorithm development, while natively supporting fully asynchronous agentic RL. With 80% fewer lines of code, AReaL-lite maintains 90% of AReaL's performance and core functionality. Check out our AReaL-lite design documentation and the quickstart guide to begin your journey with AReaL-lite!

📋 Previous Releases

[2025/06/03] (v0.3, boba²) We release boba² (double-boba) for fully asynchronous RL training, which achieves 2.77× speedup while delivering comparable or superior training performance compared to synchronous systems. Furthermore, asynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out our v0.3 overview blog and the research paper.

[2025/03/31] (v0.2, boba) Introducing our milestone release—boba! Please call it A-ReaL-boba! This release features significantly faster training with SGLang support and state-of-the-art 7B and 32B models for mathematical reasoning. Check out our v0.2 technical blog.

[2025/02/24] (v0.1) Our initial release includes reproducible results for 1.5B and 7B Large Reasoning Models (LRMs). Check out our v0.1 technical blog.

📚 Examples

Task	Description	Performance
Math	Mathematical problem solving (SFT, GRPO, or PPO)	TBA
LoRA Math	Math Agent Trained With LoRA	TBA
VLM Math	CLEVR visual counting tasks	TBA
Reasoning	Countdown numbers game with custom rewards	Training Curve
Search Agent	An agent with end-to-end reasoning, search, browsing, and summarization capabilities	ASearcher Repo
Tool-Integrated Reasoning	An agent that can invoke tools during reasoning	TIR Example
RLHF	RLHF for LLM Alignment	RLHF Example

🔧 Support Matrix

🧠 Algorithms

Algorithm	Documentation	Paper	Configuration
GRPO	📖 Docs	📄 Paper	🔗 GSM8K Example
PPO	-	📄 Paper	🔗 GSM8K Example
DAPO	📖 Docs	📄 Paper	🔗 GSM8K Example
LitePPO	📖 Docs	📄 Paper	-
Dr.GRPO	📖 Docs	📄 Paper	-
REINFORCE++	-	📄 Paper	🔗 GSM8K Example
RLOO	📖 Docs	📄 Paper	🔗 GSM8K Example
RLHF Reward Modeling	-	-	🔗 RLHF Example
SFT	-	-	🔗 GSM8K Example

Models

Model Family	Megatron	PyTorch FSDP	Notes
Qwen2/3	✅	✅	-
Qwen3-MoE	✅	✅	-
Qwen2.5-VL	❌	✅	Vision-language model
Gemma 3	❌	✅	Vision-language model
Other Hugging Face LLM	❌	✅	Compatibility depending on the version of `transformers`

Training Backends

Backend	DP	Tensor Parallel	Sequence Parallel within TP	Context Parallel	Pipeline Parallel	Expert Parallel	1D Sequence Packing	LoRA
Megatron	✅ (ZeRO-1)	✅	✅	✅	✅	✅	✅	❌
PyTorch FSDP	✅ (FSDP2)	✅	✅	✅	❌	❌	✅	✅

Inference Backends

Backend	Tensor Parallel	Context Parallel	Pipeline Parallel	Data Parallel Attention	Expert Parallel
vLLM	✅	❓	❓	❓	❓
SGLang	✅	❌	❌	✅	✅

🚀 Getting Started

Our training scripts automatically download the required dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run on a single node:

python3 -m areal.launcher.local \
  examples/math/gsm8k_grpo.py \
  --config examples/math/gsm8k_grpo.yaml

To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in the YAML file to point to your shared storage):

python3 -m areal.launcher.ray \
  examples/math/gsm8k_grpo.py \
  --config examples/math/gsm8k_grpo.yaml \
  cluster.n_nodes=2 \
  cluster.n_gpus_per_node=8

For comprehensive setup instructions, see our quickstart guide.

📖 Resources

Code Walkthrough

Running GRPO on GSM8K dataset with AReaL-lite

Customization

🗺️ Future Roadmap

2025 Q3 Roadmap

AReaL is under active development with planned minor releases weekly and major releases monthly. We warmly welcome community engagement and contributions. We are also actively hiring interns and full-time employees with open positions in both the US and China.

🙏 Acknowledgments

We gratefully acknowledge that major contributors are from the AReaL Team at Ant Group and the Institute for Interdisciplinary Information Sciences, Tsinghua University.

We have also received invaluable assistance from the following groups (listed alphabetically):

The Data Intelligence Lab at Ant Research for their data support
The Relaxed System Lab from HKUST for seamless collaboration on numerous system-related aspects
The SGLang team for supporting custom weight update features and their contributions during AReaL-lite development
The Super Computing Technology (SCT) team at Ant Group for their expertise in large-scale cluster operations and maintenance
Special thanks to @Lyken17 for providing valuable suggestions throughout our development process

We also deeply appreciate all pioneering work from the community, particularly the ReaLHF project from OpenPsi Inc. and other outstanding projects, including but not limited to DeepScaleR, Open-Reasoner-Zero, OpenRLHF, VeRL, SGLang, QwQ, Light-R1, and DAPO.

📄 Citation

@inproceedings{mei2025real,
  author       = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
  title        = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},
  booktitle    = {Proceedings of the Eighth Conference on Machine Learning and Systems,
                  MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},
  publisher    = {mlsys.org},
  year         = {2025},
}

@misc{fu2025areal,
      title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},
      author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},
      year={2025},
      eprint={2505.24298},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.24298},
}

Name		Name	Last commit message	Last commit date
Latest commit History 340 Commits
.github		.github
areal		areal
assets		assets
benchmark/verl_v0_3_0_post1_76084d3		benchmark/verl_v0_3_0_post1_76084d3
blog		blog
csrc		csrc
docs		docs
evaluation		evaluation
examples		examples
functioncall		functioncall
notebook		notebook
realhf		realhf
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LEGAL.md		LEGAL.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AReaL: A Large-Scale Asynchronous Reinforcement Learning System

📰 News

📚 Examples

🔧 Support Matrix

🧠 Algorithms

Models

Training Backends

Inference Backends

🚀 Getting Started

📖 Resources

Code Walkthrough

Customization

🗺️ Future Roadmap

🙏 Acknowledgments

📄 Citation

About

Uh oh!

Releases 10

Uh oh!

Contributors 53

Languages

License

inclusionAI/AReaL

Folders and files

Latest commit

History

Repository files navigation

AReaL: A Large-Scale Asynchronous Reinforcement Learning System

📰 News

📚 Examples

🔧 Support Matrix

🧠 Algorithms

Models

Training Backends

Inference Backends

🚀 Getting Started

📖 Resources

Code Walkthrough

Customization

🗺️ Future Roadmap

🙏 Acknowledgments

📄 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Uh oh!

Contributors 53

Languages