Skip to content

likaixin2000/ScreenSpot-Pro-GUI-Grounding

Repository files navigation

ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

Contributions welcome Research Paper Huggingface Dataset Leaderboard

📢 Updates

(May 19, 2025) 🔥🔥🔥 We're excited to introduce our new model, SE-GUI! It achieves 47.2% accuracy with a 7B model and 35.9% with a 3B model — trained on just 3k open-source samples. Check out the arxiv paper!

(Feb 21, 2025) We’re excited to see our work acknowledged and used as a benchmark in several great projects: Omniparser v2, Qwen2.5-VL, UI-TARS, UGround, AGUVIS, ...

(Jan 4, 2025) The paper and dataset are released. Please also check out ScreenSpot-v2-variants which contains more instruction styles (original instruction, action, target UI description, and negative instructions).

Set Up

Before you begin, ensure your environment variables are set:

  • OPENAI_API_KEY: Your OpenAI API key.

Evaluation

Use the shell scripts to launch the evaluation.

bash run_ss_pro.sh

or

bash run_ss_pro_cn.sh

Citation

Please consider citing if you find our work useful:

@inproceedings{
    li2025screenspotpro,
    title={ScreenSpot-Pro: {GUI} Grounding for Professional High-Resolution Computer Use},
    author={Kaixin Li and Meng Ziyang and Hongzhan Lin and Ziyang Luo and Yuchen Tian and Jing Ma and Zhiyong Huang and Tat-Seng Chua},
    booktitle={Workshop on Reasoning and Planning for Large Language Models},
    year={2025},
    url={https://openreview.net/forum?id=XaKNDIAHas}
}

About

GUI Grounding for Professional High-Resolution Computer Use

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •