OpenCUA: Open Foundations for Computer-Use Agents
Abstrak
Vision-language models have demonstrated impressive capabilities as computer-use agents (CUAs) capable of automating diverse computer tasks. As their commercial potential grows, critical details of the most capable CUA systems remain closed. As these agents will increasingly mediate digital interactions and execute consequential decisions on our behalf, the research community needs access to open CUA frameworks to study their capabilities, limitations, and risks. To bridge this gap, we propose OpenCUA, a comprehensive open-source framework for scaling CUA data and foundation models. Our framework consists of: (1) an annotation infrastructure that seamlessly captures human computer-use demonstrations; (2) AgentNet, the first large-scale computer-use task dataset spanning 3 operating systems and 200+ applications and websites; (3) a scalable pipeline that transforms demonstrations into state-action pairs with reflective long Chain-of-Thought reasoning that sustain robust performance gains as data scales. Our end-to-end agent models demonstrate strong performance across CUA benchmarks. In particular, OpenCUA-72B achieves an average success rate of 45.0% on OSWorld-Verified, establishing a new state-of-the-art (SOTA) among open-source models. Further analysis confirms that our approach generalizes well across domains and benefits significantly from increased test-time computation. We release our annotation tool, datasets, code, and models to build open foundations for further CUA research.
Penulis (42)
Xinyuan Wang
Bowen Wang
Dunjie Lu
Junlin Yang
Tianbao Xie
Junli Wang
Jiaqi Deng
Xiaole Guo
Yiheng Xu
Chen Henry Wu
Zhennan Shen
Zhuokai Li
Ryan Li
Xiaochuan Li
Junda Chen
Boyuan Zheng
Peihang Li
Fangyu Lei
Ruisheng Cao
Yeqiao Fu
Dongchan Shin
Martin Shin
Jiarui Hu
Yuyan Wang
Jixuan Chen
Yuxiao Ye
Danyang Zhang
Dikang Du
Hao Hu
Huarong Chen
Zaida Zhou
Haotian Yao
Ziwei Chen
Qizheng Gu
Yipu Wang
Heng Wang
Diyi Yang
Victor Zhong
Flood Sung
Y. Charles
Zhilin Yang
Tao Yu
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓