arXiv Open Access 2024

Investigating Representation Universality: Case Study on Genealogical Representations

David D. Baek Yuxiao Li Max Tegmark

Lihat Sumber

Abstrak

Motivated by interpretability and reliability, we investigate whether large language models (LLMs) deploy universal geometric structures to encode discrete, graph-structured knowledge. To this end, we present two complementary experimental evidence that might support universality of graph representations. First, on an in-context genealogy Q&A task, we train a cone probe to isolate a tree-like subspace in residual stream activations and use activation patching to verify its causal effect in answering related questions. We validate our findings across five different models. Second, we conduct model stitching experiments across models of diverse architectures and parameter counts (OPT, Pythia, Mistral, and LLaMA, 410 million to 8 billion parameters), quantifying representational alignment via relative degradation in the next-token prediction loss. Generally, we conclude that the lack of ground truth representations of graphs makes it challenging to study how LLMs represent them. Ultimately, improving our understanding of LLM representations could facilitate the development of more interpretable, robust, and controllable AI systems.

Topik & Kata Kunci

cs.LG cs.AI

Penulis (3)

David D. Baek

Yuxiao Li

Max Tegmark

Format Sitasi

APA MLA BibTeX

Baek, D.D., Li, Y., Tegmark, M. (2024). Investigating Representation Universality: Case Study on Genealogical Representations. https://arxiv.org/abs/2410.08255

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓