arXiv Open Access 2024

STEER: Assessing the Economic Rationality of Large Language Models

Narun Raman Taylor Lundy Samuel Amouyal Yoav Levine Kevin Leyton-Brown +1 lainnya
Lihat Sumber

Abstrak

There is increasing interest in using LLMs as decision-making "agents." Doing so includes many degrees of freedom: which model should be used; how should it be prompted; should it be asked to introspect, conduct chain-of-thought reasoning, etc? Settling these questions -- and more broadly, determining whether an LLM agent is reliable enough to be trusted -- requires a methodology for assessing such an agent's economic rationality. In this paper, we provide one. We begin by surveying the economic literature on rational decision making, taxonomizing a large set of fine-grained "elements" that an agent should exhibit, along with dependencies between them. We then propose a benchmark distribution that quantitatively scores an LLMs performance on these elements and, combined with a user-provided rubric, produces a "STEER report card." Finally, we describe the results of a large-scale empirical experiment with 14 different LLMs, characterizing the both current state of the art and the impact of different model sizes on models' ability to exhibit rational behavior.

Topik & Kata Kunci

Penulis (6)

N

Narun Raman

T

Taylor Lundy

S

Samuel Amouyal

Y

Yoav Levine

K

Kevin Leyton-Brown

M

Moshe Tennenholtz

Format Sitasi

Raman, N., Lundy, T., Amouyal, S., Levine, Y., Leyton-Brown, K., Tennenholtz, M. (2024). STEER: Assessing the Economic Rationality of Large Language Models. https://arxiv.org/abs/2402.09552

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓