arXiv Open Access 2024

STEER: Assessing the Economic Rationality of Large Language Models

Narun Raman Taylor Lundy Samuel Amouyal Yoav Levine Kevin Leyton-Brown +1 lainnya

Lihat Sumber

Abstrak

There is increasing interest in using LLMs as decision-making "agents." Doing so includes many degrees of freedom: which model should be used; how should it be prompted; should it be asked to introspect, conduct chain-of-thought reasoning, etc? Settling these questions -- and more broadly, determining whether an LLM agent is reliable enough to be trusted -- requires a methodology for assessing such an agent's economic rationality. In this paper, we provide one. We begin by surveying the economic literature on rational decision making, taxonomizing a large set of fine-grained "elements" that an agent should exhibit, along with dependencies between them. We then propose a benchmark distribution that quantitatively scores an LLMs performance on these elements and, combined with a user-provided rubric, produces a "STEER report card." Finally, we describe the results of a large-scale empirical experiment with 14 different LLMs, characterizing the both current state of the art and the impact of different model sizes on models' ability to exhibit rational behavior.

Topik & Kata Kunci

cs.CL econ.GN

Penulis (6)

Narun Raman

Taylor Lundy

Samuel Amouyal

Yoav Levine

Kevin Leyton-Brown

Moshe Tennenholtz

Format Sitasi

APA MLA BibTeX

Raman, N., Lundy, T., Amouyal, S., Levine, Y., Leyton-Brown, K., Tennenholtz, M. (2024). STEER: Assessing the Economic Rationality of Large Language Models. https://arxiv.org/abs/2402.09552

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓