arXiv Open Access 2025

SIMU: Selective Influence Machine Unlearning

Anu Agarwal Mihir Pamnani Dilek Hakkani-Tur

Lihat Sumber

Abstrak

The undesired memorization of sensitive information by Large Language Models (LLMs) has emphasized the need for safety mechanisms that can regulate model behavior. This has led to the development of machine unlearning techniques that enable models to precisely forget sensitive and unwanted information. For machine unlearning, first-order and second-order optimizer-based methods have shown significant progress in enabling LLMs to forget targeted information. However, in doing so, these approaches often compromise the model's original capabilities, resulting in unlearned models that struggle to retain their prior knowledge and overall utility. To address this, we propose Selective Influence Machine Unlearning (SIMU), a two-step framework that enhances second-order optimizer-based unlearning by selectively updating only the critical neurons responsible for encoding the forget-set. By constraining updates to these targeted neurons, SIMU achieves comparable unlearning efficacy while substantially outperforming current methods in retaining the model's original knowledge.

Topik & Kata Kunci

cs.LG cs.AI

Penulis (3)

Anu Agarwal

Mihir Pamnani

Dilek Hakkani-Tur

Format Sitasi

APA MLA BibTeX

Agarwal, A., Pamnani, M., Hakkani-Tur, D. (2025). SIMU: Selective Influence Machine Unlearning. https://arxiv.org/abs/2510.07822

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓