Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
Abstrak
Modern engineering, spanning electrical, mechanical, aerospace, civil, and computer disciplines, stands as a cornerstone of human civilization and the foundation of our society. However, engineering design poses a fundamentally different challenge for large language models (LLMs) compared with traditional textbook-style problem solving or factual question answering. Although existing benchmarks have driven progress in areas such as language understanding, code synthesis, and scientific problem solving, real-world engineering design demands the synthesis of domain knowledge, navigation of complex trade-offs, and management of the tedious processes that consume much of practicing engineers' time. Despite these shared challenges across engineering disciplines, no benchmark currently captures the unique demands of engineering design work. In this work, we introduce EngDesign, an Engineering Design benchmark that evaluates LLMs' abilities to perform practical design tasks across nine engineering domains. Unlike existing benchmarks that focus on factual recall or question answering, EngDesign uniquely emphasizes LLMs' ability to synthesize domain knowledge, reason under constraints, and generate functional, objective-oriented engineering designs. Each task in EngDesign represents a real-world engineering design problem, accompanied by a detailed task description specifying design goals, constraints, and performance requirements. EngDesign pioneers a simulation-based evaluation paradigm that moves beyond textbook knowledge to assess genuine engineering design capabilities and shifts evaluation from static answer checking to dynamic, simulation-driven functional verification, marking a crucial step toward realizing the vision of engineering Artificial General Intelligence (AGI).
Penulis (65)
Xingang Guo
Yaxin Li
Xiangyi Kong
Yilan Jiang
Xiayu Zhao
Zhihua Gong
Yufan Zhang
Daixuan Li
Tianle Sang
Beixiao Zhu
Gregory Jun
Yingbing Huang
Yiqi Liu
Yuqi Xue
Rahul Dev Kundu
Qi Jian Lim
Yizhou Zhao
Luke Alexander Granger
Mohamed Badr Younis
Darioush Keivan
Nippun Sabharwal
Shreyanka Sinha
Prakhar Agarwal
Kojo Vandyck
Hanlin Mai
Zichen Wang
Aditya Venkatesh
Ayush Barik
Jiankun Yang
Chongying Yue
Jingjie He
Libin Wang
Licheng Xu
Hao Chen
Jinwen Wang
Liujun Xu
Rushabh Shetty
Ziheng Guo
Dahui Song
Manvi Jha
Weijie Liang
Weiman Yan
Bryan Zhang
Sahil Bhandary Karnoor
Jialiang Zhang
Rutva Pandya
Xinyi Gong
Mithesh Ballae Ganesh
Feize Shi
Ruiling Xu
Yifan Zhang
Yanfeng Ouyang
Lianhui Qin
Elyse Rosenbaum
Corey Snyder
Peter Seiler
Geir Dullerud
Xiaojia Shelly Zhang
Zuofu Cheng
Pavan Kumar Hanumolu
Jian Huang
Mayank Kulkarni
Mahdi Namazifar
Huan Zhang
Bin Hu
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓