arXiv Open Access 2025

Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions

Stéphane Aroca-Ouellette Ian Berlot-Attwell Panagiotis Lymperopoulos Abhiramon Rajasekharan Tongqi Zhu +3 lainnya
Lihat Sumber

Abstrak

Despite rapid progress in artificial intelligence, current systems struggle with the interconnected challenges that define real-world decision making. Practical domains, such as business management, require optimizing an open-ended and multi-faceted objective, actively learning environment dynamics from sparse experience, planning over long horizons in stochastic settings, and reasoning over spatial information. Yet existing human--AI benchmarks isolate subsets of these capabilities, limiting our ability to assess holistic decision-making competence. We introduce Mini Amusement Parks (MAPs), an amusement-park simulator designed to evaluate an agent's ability to model its environment, anticipate long-term consequences under uncertainty, and strategically operate a complex business. We provide human baselines and a comprehensive evaluation of state-of-the-art LLM agents, finding that humans outperform these systems by 6.5x on easy mode and 9.8x on medium mode. Our analysis reveals persistent weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and world modelling. By unifying these challenges within a single environment, MAPs offers a new foundation for benchmarking agents capable of adaptable decision making. Code: https://github.com/Skyfall-Research/MAPs

Topik & Kata Kunci

Penulis (8)

S

Stéphane Aroca-Ouellette

I

Ian Berlot-Attwell

P

Panagiotis Lymperopoulos

A

Abhiramon Rajasekharan

T

Tongqi Zhu

H

Herin Kang

K

Kaheer Suleman

S

Sam Pasupalak

Format Sitasi

Aroca-Ouellette, S., Berlot-Attwell, I., Lymperopoulos, P., Rajasekharan, A., Zhu, T., Kang, H. et al. (2025). Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions. https://arxiv.org/abs/2511.15830

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓