DOAJ Open Access 2026

Machine Unlearning: A Perspective, Taxonomy, and Benchmark Evaluation

Cristian Cosentino Simone Gatto Pietro Liò Fabrizio Marozzo

Abstrak

Machine Learning (ML) models trained on large-scale datasets learn useful predictive patterns, but they may also memorize undesired information, leading to risks such as information leakage, bias, copyright violations, and privacy attacks. As these models are increasingly deployed in real-world and regulated settings, the consequences of such memorization become practical and high-stakes, reinforced by data-protection frameworks that grant individuals a Right to be Forgotten (e.g., the GDPR). Simply removing a record from the training dataset does not guarantee the elimination of its influence from the model, while retrain-from-scratch procedures are often prohibitive for modern architectures, including Transformers and Large Language Models (LLMs). In this work, we provide a perspective on Machine Unlearning (MU) in supervised learning settings, with a particular focus on Natural Language Processing (NLP) scenarios, grounded in a PRISMA-driven systematic review. We propose a multi-level taxonomy that organizes MU techniques along practical and conceptual dimensions, including exactness (exact versus approximate), unlearning granularity, guarantees, and application constraints. To complement this perspective, we run an illustrative benchmark evaluation using a standardized unlearning protocol on DistilBERT trained on a public corpus of news headlines for topic classification, contrasting the retraining gold standard with representative design-for-unlearning and approximate post hoc techniques. For completeness, we also report two oracle-assisted upper-bound baselines (distillation and scrubbing) that rely on a clean retrained reference model, and we account for their incremental cost separately. Our analysis jointly considers model utility, probabilistic quality, forgetting and privacy indicators, as well as computational efficiency. The results highlight systematic trade-offs between accuracy, computational cost, and removal effectiveness, providing practical guidance for selecting machine unlearning techniques in realistic deployment scenarios.

Topik & Kata Kunci

Information technology

Penulis (4)

Cristian Cosentino

Simone Gatto

Pietro Liò

Fabrizio Marozzo

Format Sitasi

APA MLA BibTeX

Cosentino, C., Gatto, S., Liò, P., Marozzo, F. (2026). Machine Unlearning: A Perspective, Taxonomy, and Benchmark Evaluation. https://doi.org/10.3390/fi18030174

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.3390/fi18030174

Informasi Jurnal

Tahun Terbit: 2026
Sumber Database: DOAJ
DOI: 10.3390/fi18030174
Akses: Open Access ✓