arXiv Open Access 2025

EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents

Sara Fish Julia Shephard Minkai Li Ran I. Shorrer Yannai A. Gonczarowski
Lihat Sumber

Abstrak

We develop evaluation methods for measuring the economic decision-making capabilities and tendencies of LLMs. First, we develop benchmarks derived from key problems in economics -- procurement, scheduling, and pricing -- that test an LLM's ability to learn from the environment in context. Second, we develop the framework of litmus tests, evaluations that quantify an LLM's choice behavior on a stylized decision-making task with multiple conflicting objectives. Each litmus test outputs a litmus score, which quantifies an LLM's tradeoff response, a reliability score, which measures the coherence of an LLM's choice behavior, and a competency score, which measures an LLM's capability at the same task when the conflicting objectives are replaced by a single, well-specified objective. Evaluating a broad array of frontier LLMs, we (1) investigate changes in LLM capabilities and tendencies over time, (2) derive economically meaningful insights from the LLMs' choice behavior and chain-of-thought, (3) validate our litmus test framework by testing self-consistency, robustness, and generalizability. Overall, this work provides a foundation for evaluating LLM agents as they are further integrated into economic decision-making.

Topik & Kata Kunci

Penulis (5)

S

Sara Fish

J

Julia Shephard

M

Minkai Li

R

Ran I. Shorrer

Y

Yannai A. Gonczarowski

Format Sitasi

Fish, S., Shephard, J., Li, M., Shorrer, R.I., Gonczarowski, Y.A. (2025). EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents. https://arxiv.org/abs/2503.18825

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓