arXiv Open Access 2024

Data-Centric Human Preference with Rationales for Direct Preference Alignment

Hoang Anh Just Ming Jin Anit Sahu Huy Phan Ruoxi Jia
Lihat Sumber

Abstrak

Aligning language models with human preferences through reinforcement learning from human feedback is crucial for their safe and effective deployment. The human preference is typically represented through comparison where one response is chosen over another for a given prompt. However, standard preference datasets often lack explicit information on why a particular choice was made, presenting an ambiguity that can hinder efficient learning and robust alignment, especially given the high cost of acquiring extensive human annotations. While many studies focus on algorithmic improvements, this work adopts a data-centric perspective, exploring how to enhance learning from existing preference data. We propose augmenting standard preference pairs with rationales that explain the reasoning behind the human preference. Specifically, we introduce a simple and principled framework that leverages machine-generated rationales to enrich preference data for preference optimization algorithms. Our comprehensive analysis demonstrates that incorporating rationales improves learning efficiency. Extensive experiments reveal some advantages: rationale-augmented learning accelerates convergence and can achieve higher final model performance. Furthermore, this approach is versatile and compatible with various direct preference optimization algorithms. Our findings showcase the potential of thoughtful data design in preference learning, demonstrating that enriching existing datasets with explanatory rationales can help unlock improvements in model alignment and annotation efficiency.

Topik & Kata Kunci

Penulis (5)

H

Hoang Anh Just

M

Ming Jin

A

Anit Sahu

H

Huy Phan

R

Ruoxi Jia

Format Sitasi

Just, H.A., Jin, M., Sahu, A., Phan, H., Jia, R. (2024). Data-Centric Human Preference with Rationales for Direct Preference Alignment. https://arxiv.org/abs/2407.14477

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓