arXiv Open Access 2025

EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents

Sara Fish Julia Shephard Minkai Li Ran I. Shorrer Yannai A. Gonczarowski

Lihat Sumber

Abstrak

We develop evaluation methods for measuring the economic decision-making capabilities and tendencies of LLMs. First, we develop benchmarks derived from key problems in economics -- procurement, scheduling, and pricing -- that test an LLM's ability to learn from the environment in context. Second, we develop the framework of litmus tests, evaluations that quantify an LLM's choice behavior on a stylized decision-making task with multiple conflicting objectives. Each litmus test outputs a litmus score, which quantifies an LLM's tradeoff response, a reliability score, which measures the coherence of an LLM's choice behavior, and a competency score, which measures an LLM's capability at the same task when the conflicting objectives are replaced by a single, well-specified objective. Evaluating a broad array of frontier LLMs, we (1) investigate changes in LLM capabilities and tendencies over time, (2) derive economically meaningful insights from the LLMs' choice behavior and chain-of-thought, (3) validate our litmus test framework by testing self-consistency, robustness, and generalizability. Overall, this work provides a foundation for evaluating LLM agents as they are further integrated into economic decision-making.

Topik & Kata Kunci

cs.AI cs.CL cs.GT

Penulis (5)

Sara Fish

Julia Shephard

Minkai Li

Ran I. Shorrer

Yannai A. Gonczarowski

Format Sitasi

APA MLA BibTeX

Fish, S., Shephard, J., Li, M., Shorrer, R.I., Gonczarowski, Y.A. (2025). EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents. https://arxiv.org/abs/2503.18825

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓