arXiv Open Access 2023

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Yang Zhang Yawei Li Hannah Brown Mina Rezaei Bernd Bischl +3 lainnya

Lihat Sumber

Abstrak

Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend to test faithfulness is to fit a model on designed data with known relevant features and then compare attributions with ground truth input features.This idea assumes that the model learns to use all and only these designed features, for which there is no guarantee. In this paper, we solve this issue by designing the network and manually setting its weights, along with designing data. The setup, AttributionLab, serves as a sanity check for faithfulness: If an attribution method is not faithful in a controlled environment, it can be unreliable in the wild. The environment is also a laboratory for controlled experiments by which we can analyze attribution methods and suggest improvements.

Topik & Kata Kunci

cs.LG

Penulis (8)

Yang Zhang

Yawei Li

Hannah Brown

Mina Rezaei

Bernd Bischl

Philip Torr

Ashkan Khakzar

Kenji Kawaguchi

Format Sitasi

APA MLA BibTeX

Zhang, Y., Li, Y., Brown, H., Rezaei, M., Bischl, B., Torr, P. et al. (2023). AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments. https://arxiv.org/abs/2310.06514

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓