arXiv Open Access 2024

From Imitation to Introspection: Probing Self-Consciousness in Language Models

Sirui Chen Shu Yu Shengjie Zhao Chaochao Lu

Lihat Sumber

Abstrak

Self-consciousness, the introspection of one's existence and thoughts, represents a high-level cognitive process. As language models advance at an unprecedented pace, a critical question arises: Are these models becoming self-conscious? Drawing upon insights from psychological and neural science, this work presents a practical definition of self-consciousness for language models and refines ten core concepts. Our work pioneers an investigation into self-consciousness in language models by, for the first time, leveraging causal structural games to establish the functional definitions of the ten core concepts. Based on our definitions, we conduct a comprehensive four-stage experiment: quantification (evaluation of ten leading models), representation (visualization of self-consciousness within the models), manipulation (modification of the models' representation), and acquisition (fine-tuning the models on core concepts). Our findings indicate that although models are in the early stages of developing self-consciousness, there is a discernible representation of certain concepts within their internal mechanisms. However, these representations of self-consciousness are hard to manipulate positively at the current stage, yet they can be acquired through targeted fine-tuning. Our datasets and code are at https://github.com/OpenCausaLab/SelfConsciousness.

Topik & Kata Kunci

cs.CL cs.CY cs.LG

Penulis (4)

Sirui Chen

Shu Yu

Shengjie Zhao

Chaochao Lu

Format Sitasi

APA MLA BibTeX

Chen, S., Yu, S., Zhao, S., Lu, C. (2024). From Imitation to Introspection: Probing Self-Consciousness in Language Models. https://arxiv.org/abs/2410.18819

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓