Semantic Scholar Open Access 2021 75 sitasi

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Xinhsuai Dong Anh Tuan Luu Min Lin Shuicheng Yan Hanwang Zhang

Lihat Sumber

Abstrak

The fine-tuning of pre-trained language models has a great success in many NLP fields. Yet, it is strikingly vulnerable to adversarial examples, e.g., word substitution attacks using only synonyms can easily fool a BERT-based sentiment analysis model. In this paper, we demonstrate that adversarial training, the prevalent defense technique, does not directly fit a conventional fine-tuning scenario, because it suffers severely from catastrophic forgetting: failing to retain the generic and robust linguistic features that have already been captured by the pre-trained model. In this light, we propose Robust Informative Fine-Tuning (RIFT), a novel adversarial fine-tuning method from an information-theoretical perspective. In particular, RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process, whereas a conventional one only uses the pre-trained weights for initialization. Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks: sentiment analysis and natural language inference, under different attacks across various pre-trained language models.

Topik & Kata Kunci

Computer Science

Penulis (5)

Xinhsuai Dong

Anh Tuan Luu

Min Lin

Shuicheng Yan

Hanwang Zhang

Format Sitasi

APA MLA BibTeX

Dong, X., Luu, A.T., Lin, M., Yan, S., Zhang, H. (2021). How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?. https://www.semanticscholar.org/paper/a4f533f2b7d77b667e1f05b210924ec7c90cc5d1

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2021
Bahasa: en
Total Sitasi: 75×
Sumber Database: Semantic Scholar
Akses: Open Access ✓