Semantic Scholar Open Access 2023 1194 sitasi

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Sewon Min Kalpesh Krishna Xinxi Lyu M. Lewis Wen-tau Yih +4 lainnya

Abstrak

Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of atomic facts and computes the percentage of atomic facts supported by a reliable knowledge source. We conduct an extensive human evaluation to obtain FACTSCOREs of people biographies generated by several state-of-the-art commercial LMs -- InstructGPT, ChatGPT, and the retrieval-augmented PerplexityAI -- and report new analysis demonstrating the need for such a fine-grained score (e.g., ChatGPT only achieves 58%). Since human evaluation is costly, we also introduce an automated model that estimates FACTSCORE using retrieval and a strong language model, with less than a 2% error rate. Finally, we use this automated metric to evaluate 6,500 generations from a new set of 13 recent LMs that would have cost $26K if evaluated by humans, with various findings: GPT-4 and ChatGPT are more factual than public models, and Vicuna and Alpaca are some of the best public models. FACTSCORE is available for public use via `pip install factscore`.

Topik & Kata Kunci

Penulis (9)

S

Sewon Min

K

Kalpesh Krishna

X

Xinxi Lyu

M

M. Lewis

W

Wen-tau Yih

P

Pang Wei Koh

M

Mohit Iyyer

L

Luke Zettlemoyer

H

Hannaneh Hajishirzi

Format Sitasi

Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W., Koh, P.W. et al. (2023). FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. https://doi.org/10.48550/arXiv.2305.14251

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2305.14251
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
1194×
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2305.14251
Akses
Open Access ✓