We show how state-of-the-art large language models (LLMs), seemingly inapplicable to the small samples typical of macroeconomics, can be trained effectively for macroeconomic forecasting. We estimate a dynamic stochastic general equilibrium (DSGE) model on an initial segment of the data to obtain a posterior distribution over structural parameters. We sample from this posterior to generate millions of theory-consistent synthetic panels that, when mixed with actual macroeconomic data, form the training corpus for a time-series transformer with attention. The trained model is then used to forecast out-of-sample through 2025. The results show that this hybrid forecaster, which combines the theoretical coherence of DSGE models with the representational power of modern LLMs, learns key features of the macroeconomic language.
Matrix completion estimators are employed in causal panel data models to regulate the rank of the underlying factor model using nuclear norm minimization. This convex optimization problem enables concurrent regularization of a potentially high-dimensional set of covariates to shrink the model size. For valid finite sample inference, we adopt a permutation-based approach and prove its validity for any treatment assignment mechanism. Simulations illustrate the consistency of the proposed estimator in parameter estimation and variable selection. An application to public health policies in Germany demonstrates the data-driven model selection feature on empirical data and finds no effect of travel restrictions on the containment of severe Covid-19 infections.
This paper extends the work of Arcidiacono and Miller (2011, 2019) by introducing a novel characterization of finite dependence within dynamic discrete choice models, demonstrating that numerous models display 2-period finite dependence. We recast finite dependence as a problem of sequentially searching for weights and introduce a computationally efficient method for determining these weights by utilizing the Kronecker product structure embedded in state transitions. With the estimated weights, we develop a computationally attractive Conditional Choice Probability estimator with 2-period finite dependence. The computational efficacy of our proposed estimator is demonstrated through Monte Carlo simulations.
This paper serves as a literature review of methodology concerning the (modern) causal inference methods to address the causal estimand with observational/survey data that have been or will be used in social science research. Mainly, this paper is divided into two parts: inference from statistical estimand for the causal estimand, in which we reviewed the assumptions for causal identification and the methodological strategies addressing the problems if some of the assumptions are violated. We also discuss the asymptotical analysis concerning the measure from the observational data to the theoretical measure and replicate the deduction of the efficient/doubly robust average treatment effect estimator, which is commonly used in current social science analysis.
In theory, two-stage least squares (TSLS) identifies a weighted average of covariate-specific local average treatment effects (LATEs) from a saturated specification, without making parametric assumptions on how available covariates enter the model. In practice, TSLS is severely biased as saturation leads to a large number of control dummies and an equally large number of, arguably weak, instruments. This paper derives asymptotically valid tests and confidence intervals for the weighted average of LATEs that is targeted, yet missed by saturated TSLS. The proposed inference procedure is robust to unobserved treatment effect heterogeneity, covariates with rich support, and weak identification. We find LATEs statistically significantly different from zero in applications in criminology, finance, health, and education.
Interval identification of parameters such as average treatment effects, average partial effects and welfare is particularly common when using observational data and experimental data with imperfect compliance due to the endogeneity of individuals' treatment uptake. In this setting, the researcher is typically interested in a treatment or policy that is either selected from the estimated set of best-performers or arises from a data-dependent selection rule. In this paper, we develop new inference tools for interval-identified parameters chosen via these forms of selection. We develop three types of confidence intervals for data-dependent and interval-identified parameters, discuss how they apply to several examples of interest and prove their uniform asymptotic validity under weak assumptions.
The present paper proposes a new treatment effects estimator that is valid when the number of time periods is small, and the parallel trends condition holds conditional on covariates and unobserved heterogeneity in the form of interactive fixed effects. The estimator also allow the control variables to be affected by treatment and it enables estimation of the resulting indirect effect on the outcome variable. The asymptotic properties of the estimator are established and their accuracy in small samples is investigated using Monte Carlo simulations. The empirical usefulness of the estimator is illustrated using as an example the effect of increased trade competition on firm markups in China.
This paper studies inference in first- and second-price sealed-bid auctions with many bidders, using an asymptotic framework where the number of bidders increases while the number of auctions remains fixed. Relevant applications include online, treasury, spectrum, and art auctions. Our approach enables asymptotically exact inference on key features such as the winner's expected utility, the seller's expected revenue, and the tail of the valuation distribution using only transaction price data. Our simulations demonstrate the accuracy of the methods in finite samples. We apply our methods to Hong Kong vehicle license auctions, focusing on high-priced, single-letter plates.
We extend nonparametric regression smoothing splines to a context where there is endogeneity and instrumental variables are available. Unlike popular existing estimators, the resulting estimator is one-step and relies on a unique regularization parameter. We derive rates of the convergence for the estimator and its first derivative, which are uniform in the support of the endogenous variable. We also address the issue of imposing monotonicity in estimation and extend the approach to a partly linear model. Simulations confirm the good performances of our estimator compared to two-step procedures. Our method yields economically sensible results when used to estimate Engel curves.
In this article, we present a method to forecast the Portuguese gross domestic product (GDP) in each current quarter (nowcasting). It combines bridge equations of the real GDP on readily available monthly data like the Economic Sentiment Indicator (ESI), industrial production index, cement sales or exports and imports, with forecasts for the jagged missing values computed with the well-known Hodrick and Prescott (HP) filter. As shown, this simple multivariate approach can perform as well as a Targeted Diffusion Index (TDI) model and slightly better than the univariate Theta method in terms of out-of-sample mean errors.
In this paper we study treatment assignment rules in the presence of social interaction. We construct an analytical framework under the anonymous interaction assumption, where the decision problem becomes choosing a treatment fraction. We propose a multinomial empirical success (MES) rule that includes the empirical success rule of Manski (2004) as a special case. We investigate the non-asymptotic bounds of the expected utility based on the MES rule. Finally, we prove that the MES rule achieves the asymptotic optimality with the minimax regret criterion.
Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops estimation and inference procedures for multiple treatment models in a selection-on-observed-variables framework by modifying the Causal Forest approach (Wager and Athey, 2018) in several dimensions. The new estimators have desirable theoretical, computational, and practical properties for various aggregation levels of the causal effects. While an Empirical Monte Carlo study suggests that they outperform previously suggested estimators, an application to the evaluation of an active labour market pro-gramme shows their value for applied research.
The widespread co-existence of misspecification and weak identification in asset pricing has led to an overstated performance of risk factors. Because the conventional Fama and MacBeth (1973) methodology is jeopardized by misspecification and weak identification, we infer risk premia by using a double robust Lagrange multiplier test that remains reliable in the presence of these two empirically relevant issues. Moreover, we show how the identification, and the resulting appropriate interpretation, of the risk premia is governed by the relative magnitudes of the misspecification J-statistic and the identification IS-statistic. We revisit several prominent empirical applications and all specifications with one to six factors from the factor zoo of Feng, Giglio, and Xiu (2020) to emphasize the widespread occurrence of misspecification and weak identification.
In this chapter, we review variance selection for time-varying parameter (TVP) models for univariate and multivariate time series within a Bayesian framework. We show how both continuous as well as discrete spike-and-slab shrinkage priors can be transferred from variable selection for regression models to variance selection for TVP models by using a non-centered parametrization. We discuss efficient MCMC estimation and provide an application to US inflation modeling.
We propose confidence regions for the parameters of incomplete models with exact coverage of the true parameter in finite samples. Our confidence region inverts a test, which generalizes Monte Carlo tests to incomplete models. The test statistic is a discrete analogue of a new optimal transport characterization of the sharp identified region. Both test statistic and critical values rely on simulation drawn from the distribution of latent variables and are computed using solutions to discrete optimal transport, hence linear programming problems. We also propose a fast preliminary search in the parameter space with an alternative, more conservative yet consistent test, based on a parameter free critical value.
A factor copula model is proposed in which factors are either simulable or estimable from exogenous information. Point estimation and inference are based on a simulated methods of moments (SMM) approach with non-overlapping simulation draws. Consistency and limiting normality of the estimator is established and the validity of bootstrap standard errors is shown. Doing so, previous results from the literature are verified under low-level conditions imposed on the individual components of the factor structure. Monte Carlo evidence confirms the accuracy of the asymptotic theory in finite samples and an empirical application illustrates the usefulness of the model to explain the cross-sectional dependence between stock returns.
We develop a novel test of the instrumental variable identifying assumptions for heterogeneous treatment effect models with conditioning covariates. We assume semiparametric dependence between potential outcomes and conditioning covariates. This allows us to obtain testable equality and inequality restrictions among the subdensities of estimable partial residuals. We propose jointly testing these restrictions. To improve power, we introduce distillation, where a trimmed sample is used to test the inequality restrictions. In Monte Carlo exercises we find gains in finite sample power from testing restrictions jointly and distillation. We apply our test procedure to three instruments and reject the null for one.
Resumo No contexto das Cadeias Globais de Valor (CGV), este artigo tem o objetivo de demonstrar os impactos da economia chinesa no comércio internacional e na distribuição do valor adicionado (VA) mundial, no século XXI. Para isso, primeiramente, foi discutida a evolução da literatura sobre as formas de mensurar o valor do comércio que melhor reflete a dinâmica contemporânea das CGV. Em segundo lugar, foi aplicada a decomposição de Borin e Mancini (2017) nos dados da World Input_Output Database (WIOD) no período 2000-2014. O artigo contribui para a literatura ao apresentar a evolução da metodologia sobre o tema, assim como, ao demonstrar as mudanças estruturais que o crescimento chinês provocou na fragmentação produtiva em CGV e na distribuição do VA mundial em termos setoriais. Destaca-se, ainda, que os dados convergem para a literatura prévia, que sugere um reequilíbrio do crescimento chinês e um processo substituição de importações do país.
The paper aims at developing the Bayesian seasonally cointegrated model for quarterly data. We propose the prior structure, derive the set of full conditional posterior distributions, and propose the sampling scheme. The identification of cointegrating spaces is obtained \emph{via} orthonormality restrictions imposed on vectors spanning them. In the case of annual frequency, the cointegrating vectors are complex, which should be taken into account when identifying them. The point estimation of the cointegrating spaces is also discussed. The presented methods are illustrated by a simulation experiment and are employed in the analysis of money and prices in the Polish economy.
The properties of Maximum Likelihood estimator in mixed causal and noncausal models with a generalized Student's t error process are reviewed. Several known existing methods are typically not applicable in the heavy-tailed framework. To this end, a new approach to make inference on causal and noncausal parameters in finite sample sizes is proposed. It exploits the empirical variance of the generalized Student's-t, without the existence of population variance. Monte Carlo simulations show a good performance of the new variance construction for fat tail series. Finally, different existing approaches are compared using three empirical applications: the variation of daily COVID-19 deaths in Belgium, the monthly wheat prices, and the monthly inflation rate in Brazil.