arXiv Open Access 2024

ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature

Andrew Gray
Lihat Sumber

Abstrak

The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature. For the publishing year 2023, it is found that several of those keywords show a distinctive and disproportionate increase in their prevalence, individually and in combination. It is estimated that at least 60,000 papers (slightly over 1% of all articles) were LLM-assisted, though this number could be extended and refined by analysis of other characteristics of the papers or by identification of further indicative keywords.

Topik & Kata Kunci

Penulis (1)

A

Andrew Gray

Format Sitasi

Gray, A. (2024). ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature. https://arxiv.org/abs/2403.16887

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓