arXiv Open Access 2025

Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models

Cyrill Scheidegger Malte Londschien Peter Bühlmann
Lihat Sumber

Abstrak

The linear instrumental variable (IV) model is widely used in observational studies, yet its validity hinges on strong assumptions. Classical specification tests such as the Sargan-Hansen J test are limited to overidentified settings and are therefore not applicable in the common just-identified case, where the number of instruments is equal to the number of endogenous variables. We propose a novel test for the well-specification of the linear IV model under the assumption that the structural error is mean independent of the instruments. This assumption enables specification testing even in the just-identified setting. Our approach uses the idea of residual prediction: if the two-stage least squares residuals can be predicted from the instruments better than chance, this indicates misspecification. The resulting test employs sample splitting and a user-chosen machine learning method, and we show asymptotic type I error control and consistency against a broad class of alternatives. We further show how the proposed testing principle can be adapted to settings with weak or many instruments via an Anderson-Rubin-type inversion, thereby substantially extending the applicability. The tests accommodate heteroskedasticity- and cluster-robust inference and are implemented in the R package RPIV and the ivmodels software package for Python.

Topik & Kata Kunci

Penulis (3)

C

Cyrill Scheidegger

M

Malte Londschien

P

Peter Bühlmann

Format Sitasi

Scheidegger, C., Londschien, M., Bühlmann, P. (2025). Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models. https://arxiv.org/abs/2506.12771

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓