arXiv Open Access 2024

Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

Omer Sahin Tas Royden Wagner

Lihat Sumber

Abstrak

Transformer-based models generate hidden states that are difficult to interpret. In this work, we analyze hidden states and modify them at inference, with a focus on motion forecasting. We use linear probing to analyze whether interpretable features are embedded in hidden states. Our experiments reveal high probing accuracy, indicating latent space regularities with functionally important directions. Building on this, we use the directions between hidden states with opposing features to fit control vectors. At inference, we add our control vectors to hidden states and evaluate their impact on predictions. Remarkably, such modifications preserve the feasibility of predictions. We further refine our control vectors using sparse autoencoders (SAEs). This leads to more linear changes in predictions when scaling control vectors. Our approach enables mechanistic interpretation as well as zero-shot generalization to unseen dataset characteristics with negligible computational overhead.

Topik & Kata Kunci

cs.LG cs.CL cs.CV

Penulis (2)

Omer Sahin Tas

Royden Wagner

Format Sitasi

APA MLA BibTeX

Tas, O.S., Wagner, R. (2024). Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers. https://arxiv.org/abs/2406.11624

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓