(May 19, 2025) 🔥🔥🔥 We're excited to introduce our new model, SE-GUI! It achieves 47.2% accuracy with a 7B model and 35.9% with a 3B model — trained on just 3k open-source samples. Check out the arxiv paper!
(Feb 21, 2025) We’re excited to see our work acknowledged and used as a benchmark in several great projects: Omniparser v2, Qwen2.5-VL, UI-TARS, UGround, AGUVIS, ...
(Jan 4, 2025) The paper and dataset are released. Please also check out ScreenSpot-v2-variants which contains more instruction styles (original instruction, action, target UI description, and negative instructions).
Before you begin, ensure your environment variables are set:
OPENAI_API_KEY
: Your OpenAI API key.
Use the shell scripts to launch the evaluation.
bash run_ss_pro.sh
or
bash run_ss_pro_cn.sh
Please consider citing if you find our work useful:
@inproceedings{
li2025screenspotpro,
title={ScreenSpot-Pro: {GUI} Grounding for Professional High-Resolution Computer Use},
author={Kaixin Li and Meng Ziyang and Hongzhan Lin and Ziyang Luo and Yuchen Tian and Jing Ma and Zhiyong Huang and Tat-Seng Chua},
booktitle={Workshop on Reasoning and Planning for Large Language Models},
year={2025},
url={https://openreview.net/forum?id=XaKNDIAHas}
}