arXiv Open Access 2024

ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature

Andrew Gray

Lihat Sumber

Abstrak

The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature. For the publishing year 2023, it is found that several of those keywords show a distinctive and disproportionate increase in their prevalence, individually and in combination. It is estimated that at least 60,000 papers (slightly over 1% of all articles) were LLM-assisted, though this number could be extended and refined by analysis of other characteristics of the papers or by identification of further indicative keywords.

Topik & Kata Kunci

cs.DL

Penulis (1)

Andrew Gray

Format Sitasi

APA MLA BibTeX

Gray, A. (2024). ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature. https://arxiv.org/abs/2403.16887

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓