Semantic Scholar Open Access 2026 1 sitasi

Efficient detection of AI-generated scientific abstracts with a lightweight transformer

Cuilian Zhang Weijun Zhou

Abstrak

The rapid growth of advanced large language models challenges the authenticity of scientific work, which requires reliable methods for detecting AI-generated scientific text. This paper addresses this challenge by developing and evaluating an efficient text classifier. We first constructed a balanced dataset, focusing initially on the Computer Vision (cs.CV) domain, and subsequently expanding it to include four additional diverse scientific domains (totaling 5,000 abstracts), using human-written samples from arXiv and corresponding AI-generated versions created using Google’s Gemini 2.0 Flash. We then fine-tuned a lightweight Transformer model, DistilBERT, for the classification task. On the primary in-domain (cs.CV) test set, our approach achieved excellent performance, with an accuracy of 99.4% and an Area Under the ROC Curve of 0.9999. Subsequent cross-domain evaluations demonstrated robust generalization (Macro-F1 = 0.948). Further analysis revealed that our model surpasses traditional machine learning baselines not only in accuracy but also in robustness, as it learns deep semantic patterns rather than relying on superficial statistical cues. This work provides a practical, high-performance tool for safeguarding scientific authenticity and establishes a valuable benchmark for future research in AI text detection.

Topik & Kata Kunci

Penulis (2)

C

Cuilian Zhang

W

Weijun Zhou

Format Sitasi

Zhang, C., Zhou, W. (2026). Efficient detection of AI-generated scientific abstracts with a lightweight transformer. https://doi.org/10.1038/s41598-026-35203-3

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1038/s41598-026-35203-3
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.1038/s41598-026-35203-3
Akses
Open Access ✓