A Study on the Extraction of Training Dataset from Fine-Tuned Language Models
Abstrak
Large language models (LLMs) excel at various natural language tasks, even those beyond their explicit training. Fine-tuning these models on smaller datasets enhances their performance for specific tasks but it can also lead to risk of training data memorization, raising privacy concerns. This study explores the extraction of private training data from fine-tuned LLMs through a series of experiments. The focus is on assessing the ease of data extraction using various techniques and examining how factors such as the size of training data, number of epochs, training sample length and content, and fine-tuning parameters influence this process. Our results indicate that data extraction is relatively straightforward with direct model access, especially when training loss is computed over entire prompts. Models with higher precision (8-bit and 16-bit) demonstrate increased memorization capabilities compared to 4-bit quantized models. Even without direct access, insights into training data can be obtained by comparing output probability scores across multiple queries. Furthermore, the study also reveals that the proportion of extractable data increases with training dataset size, given a fixed number of epochs. These findings highlight the privacy risks faced by individuals whose data is used in fine-tuning, as well as for organizations deploying fine-tuned models in public applications.
Topik & Kata Kunci
Penulis (3)
Raja Vavekanand
Aybek Kalandarov Ruzimbaevich
Muhabbat Jumaniyozova
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.24423/cames.2025.1781
- Akses
- Open Access ✓