TL;DR. We introduce MobileRL, an online agentic reinforcement learning framework that turns general-purpose vision-language models into strong mobile GUI agents. By combining a staged reasoning warm-up with difficulty-adaptive online RL, MobileRL achieves state-of-the-art success rates on AndroidWorld and AndroidLab.
- Evaluation framework
- MobileRL-9B checkpoint — Will be open-sourced soon upon legal approval.
This guide will help you get started quickly with our evaluation framework.
Please follow the steps in the order provided.
The Android Emulator requires KVM (Kernel-based Virtual Machine) support on the host machine.
You can verify if your system supports KVM by running:
apt-get install cpu-checker
kvm-okWe provide packaged test environments for AndroidWorld and AndroidLab as Docker images to simplify setup and ensure reproducibility. Before proceeding, pull the required Docker images:
docker pull xuyifan0731/mobilerl-androidlab-eval
docker pull xuyifan0731/mobilerl-androidworld-evalWe support two modes of usage:
- Local Testing – Recommended for quick debugging and making modifications.
- Docker-based Deployment with AgentRL – Provides a consistent, containerized environment for convenient deployment.
For detailed usage instructions, please refer to inference/README.md.
Building general-purpose graphical user interface (GUI) agents has become increasingly promising with the progress in vision language models. However, developing effective mobile GUI agents with reinforcement learning (RL) remains challenging due to the heavy-tailed distribution of task difficulty and the inefficiency of large-scale environment sampling. We present an online agentic reinforcement learning framework MobileRL to enhance GUI agents in mobile environments. Its core component is the Difficulty-Adaptive GRPO (AdaGRPO) algorithm. In AdaGRPO, we design difficulty-adaptive positive replay and failure curriculum filtering to adapt the model to different task difficulties. We introduce the shortest path reward adjustment strategy to reshape rewards concerning the task length in multi-turn agentic tasks. Those strategies jointly stabilize RL training, improve sample efficiency, and generate strong performance across diverse mobile apps and tasks. We apply MobileRL to two open models (Qwen2.5-VL-7B-Instruct and GLM-4.1V-9B-Base). The resultant MobileRL-9B model achieves state-of-the-art results in terms of success rates on both Android World (75.8%) and Android Lab (46.8%). The MobileRL framework is adopted in the AutoGLM products.
Mobile GUI agents must follow complex instructions, reason over cluttered screens, and act under sparse, delayed rewards—all while task difficulty is heavy-tailed and environment sampling is expensive.
MobileRL addresses these challenges with a two-stage recipe:
- 
Reasoning Warm-up: - reasoning-free sft on large expert data.
- reasoning sft to inject and polish rationale-driven planning and transparency.
 
- 
Online Agentic RL (Difficulty–Adaptive GRPO, AdaGRPO): - Adaptive Positive Replay (AdaPR): store high-quality trajectories and re-use them efficiently.
- Failure Curriculum Filtering (FCF): prune low-quality rollouts and focus learning on actionable tasks.
- Shortest-Path Reward Adjustment (SPA): reward shaping that stabilizes credit assignment for long-horizon interactions.
 
We evaluate on two interactive Android benchmarks:
- AndroidWorld (rule-based trajectory rewards)
- AndroidLab (LM-based reward model; see paper appendix for details)
Success Rate (SR, %) — higher is better
| Models (Proprietary & Open) | #Params | AndroidWorld | AndroidLab | 
|---|---|---|---|
| GPT-4o-2024-11-20 | – | 34.5 | 31.2 | 
| Claude-Sonnet-4-20250514-thinking | – | 41.0 | 40.6 | 
| Qwen2.5-VL-7B-Instruct | 7B | 27.6 | 10.1 | 
| GLM-4.1V-9B-Thinking | 9B | 41.7 | 24.6 | 
| UI-Tars-7B | 7B | 33.0 | 32.6 | 
| V-Droid | 8B | 59.5 | 38.3 | 
| UI-Tars-1.5 | - | 64.2 | - | 
| UI-Genie-Agent | 72B | – | 41.2 | 
Our method (MobileRL)
| MobileRL Variant | #Params | AndroidWorld | AndroidLab | 
|---|---|---|---|
| MobileRL w/ Qwen2.5-VL-7B | 7B | 72.0 | 42.5 | 
| MobileRL w/ GLM-4.1V-9B-Base | 9B | 75.8 | 46.8 | 
If you find MobileRL useful, please cite the paper:
@misc{xu2025mobilerlonlineagenticreinforcement,
      title={MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents}, 
      author={Yifan Xu and Xiao Liu and Xinghan Liu and Jiaqi Fu and Hanchen Zhang and Bohao Jing and Shudan Zhang and Yuting Wang and Wenyi Zhao and Yuxiao Dong},
      year={2025},
      eprint={2509.18119},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.18119}, 
}
@misc{xu2024androidlabtrainingsystematicbenchmarking,
      title={AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents}, 
      author={Yifan Xu and Xiao Liu and Xueqiao Sun and Siyi Cheng and Hao Yu and Hanyu Lai and Shudan Zhang and Dan Zhang and Jie Tang and Yuxiao Dong},
      year={2024},
      eprint={2410.24024},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.24024}, 
}
