Semantic Scholar Open Access 2019 597 sitasi

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang Pengcheng He Weizhu Chen Xiaodong Liu Jianfeng Gao +1 lainnya

Abstrak

Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning often causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data. To address such an issue in a principled manner, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance. The proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the complexity of the model; 2. Bregman proximal point optimization, which is an instance of trust-region methods and can prevent aggressive updating. Our experiments show that the proposed framework achieves new state-of-the-art performance on a number of NLP tasks including GLUE, SNLI, SciTail and ANLI. Moreover, it also outperforms the state-of-the-art T5 model, which is the largest pre-trained model containing 11 billion parameters, on GLUE.

Penulis (6)

H

Haoming Jiang

P

Pengcheng He

W

Weizhu Chen

X

Xiaodong Liu

J

Jianfeng Gao

T

T. Zhao

Format Sitasi

Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Zhao, T. (2019). SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization. https://doi.org/10.18653/v1/2020.acl-main.197

Akses Cepat

Informasi Jurnal
Tahun Terbit
2019
Bahasa
en
Total Sitasi
597×
Sumber Database
Semantic Scholar
DOI
10.18653/v1/2020.acl-main.197
Akses
Open Access ✓