Semantic Scholar Open Access 2020 858 sitasi

Aligning AI With Shared Human Values

Dan Hendrycks Collin Burns Steven Basart Andrew Critch J. Li +2 lainnya

Lihat Sumber

Abstrak

We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to steer chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.

Topik & Kata Kunci

Computer Science

Penulis (7)

Dan Hendrycks

Collin Burns

Steven Basart

Andrew Critch

J. Li

D. Song

J. Steinhardt

Format Sitasi

APA MLA BibTeX

Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D. et al. (2020). Aligning AI With Shared Human Values. https://www.semanticscholar.org/paper/65906e6027246ae9e4ecd18d6e019a24505c842e

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2020
Bahasa: en
Total Sitasi: 858×
Sumber Database: Semantic Scholar
Akses: Open Access ✓