arXiv Open Access 2025

Unlocking the Potential of Large Language Models in the Nuclear Industry with Synthetic Data

Muhammad Anwar Daniel Lau Mishca de Costa Issam Hammad

Lihat Sumber

Abstrak

The nuclear industry possesses a wealth of valuable information locked away in unstructured text data. This data, however, is not readily usable for advanced Large Language Model (LLM) applications that require clean, structured question-answer pairs for tasks like model training, fine-tuning, and evaluation. This paper explores how synthetic data generation can bridge this gap, enabling the development of robust LLMs for the nuclear domain. We discuss the challenges of data scarcity and privacy concerns inherent in the nuclear industry and how synthetic data provides a solution by transforming existing text data into usable Q&A pairs. This approach leverages LLMs to analyze text, extract key information, generate relevant questions, and evaluate the quality of the resulting synthetic dataset. By unlocking the potential of LLMs in the nuclear industry, synthetic data can pave the way for improved information retrieval, enhanced knowledge sharing, and more informed decision-making in this critical sector.

Topik & Kata Kunci

cs.CL

Penulis (4)

Muhammad Anwar

Daniel Lau

Mishca de Costa

Issam Hammad

Format Sitasi

APA MLA BibTeX

Anwar, M., Lau, D., Costa, M.d., Hammad, I. (2025). Unlocking the Potential of Large Language Models in the Nuclear Industry with Synthetic Data. https://arxiv.org/abs/2506.08750

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓