arXiv Open Access 2025

Adding Alignment Control to Language Models

Wenhong Zhu Weinan Zhang Rui Wang

Lihat Sumber

Abstrak

Post-training alignment has increasingly become a crucial factor in enhancing the usability of language models (LMs). However, the strength of alignment varies depending on individual preferences. This paper proposes a method to incorporate alignment control into a single model, referred to as CLM. This approach adds one identity layer preceding the initial layers and performs preference learning only on this layer to map unaligned input token embeddings into the aligned space. Experimental results demonstrate that this efficient fine-tuning method performs comparable to full fine-tuning. During inference, the input embeddings are processed through the aligned and unaligned layers, which are then merged through the interpolation coefficient. By controlling this parameter, the alignment exhibits a clear interpolation and extrapolation phenomenon.

Topik & Kata Kunci

cs.CL

Penulis (3)

Wenhong Zhu

Weinan Zhang

Rui Wang

Format Sitasi

APA MLA BibTeX

Zhu, W., Zhang, W., Wang, R. (2025). Adding Alignment Control to Language Models. https://arxiv.org/abs/2503.04346

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓