Bayesian structural equation modelling (BSEM) offers many advantages such as principled uncertainty quantification, small-sample regularisation, and flexible model specification. However, the Markov chain Monte Carlo (MCMC) methods on which it relies are computationally prohibitive for the iterative cycle of specification, criticism, and refinement that careful psychometric practice demands. We present INLAvaan, an R package for fast, approximate Bayesian SEM built around the Integrated Nested Laplace Approximation (INLA) framework for structural equation models developed by Jamil&Rue (2026, arXiv:2603.25690 [stat.ME]). This paper serves as a companion manuscript that describes the architectural decisions and computational strategies underlying the package. Two substantive applications -- a 256-parameter bifactor circumplex model and a multilevel mediation model with full-information missing-data handling -- demonstrate the approach on specifications where MCMC would require hours of run time and careful convergence work. In constrast, INLAvaan delivers calibrated posterior summaries in seconds.
We show that the activation knot of a potentially non-stationary regressor on the adaptive Lasso solution path in autoregressions can be leveraged for selection-free inference about a unit root. The resulting test has asymptotic power against local alternatives in $1/T$ neighbourhoods, unlike post-selection inference methods based on consistent model selection. Exploiting the information enrichment principle devised by Reinschl\"ussel and Arnold arXiv:2402.16580 [stat.ME] to improve the Lasso-based selection of ADF models, we propose a composite statistic and analyse its asymptotic distribution and local power function. Monte Carlo evidence shows that the combined test dominates the comparable post-selection inference methods of Tibshirani et al. [JASA, 2016, 514, 600-620] and may surpass the power of established unit root tests against local alternatives. We apply the new tests to groundwater level time series for Germany and find evidence rejecting stochastic trends to explain observed long-term declines in mean water levels.
Terrie Vasilopoulos, Amy Crisp, Gerard Garvan
et al.
Academic research productivity relies upon the contribution of statisticians, who are typically clustered in statistics and biostatistics departments, isolated from clinical researchers. Most academic health centres have created consultation hubs or research incubators to make statisticians available for individual collaboration to support the clinical research enterprise. Additionally, some clinical departments within academic health centres have recognized the value in colocating statisticians within their clinical departments to improve availability for collaboration with physicians/researchers. Embedded statisticians encounter the same challenges of isolated statisticians regarding professional support and networking, mentorship and clear role expectations. While for all collaborative statisticians, it is important to effectively communicate value to both collaborators and supervisors, this may be especially problematic for embedded statisticians in clinical departments where their supervisors may not have backgrounds in research or statistics. Previous papers have reported valuable metrics for statisticians, particularly those associated with Biostatistics, Epidemiology and Research Design Cores. There is a knowledge gap regarding metrics tailored to meet the needs of the embedded statistician and clinical supervisors. This paper is a first step towards addressing this important need.In this paper, we explore (1) the critical role of collaborative statisticians and the benefits and challenges of the embedded statistician model, (2) the need for additional metrics specific to embedded statisticians which measure value and (3) how to design a value report. We offer a framework for evaluation of the contributions of the embedded statistician with the following domains: (1) collaboration, (2) research output/productivity, (3) mentoring and (4) education. Metrics that are particularly specific to embedded statisticians and that are not routinely captured include time from project initiation to completion/outcome, time from initial statistical consultation to statistical outcome completion and summary of level of contribution for manuscripts and presentations in addition to author order. We conclude with thoughts on future directions for development of metrics and reporting systems for statisticians embedded in clinical departments.
arXiv:2206.10812v1 [stat.ME] proposes a useful algorithm, named generalized Diversity Subsampling (g-DS) algorithm, to select a subsample following some target probability distribution from a finite data set and demonstrates its effectiveness numerically. While the asymptotic performances of g-DS when the true data distribution is known was discussed in arXiv:2206.10812v1 [stat.ME], it remains an interesting question how the estimation errors in the density estimation step, which is an unavoidable step to use g-DS in real-world data sets, influences its asymptotic performance. In this paper, we study the pointwise convergence rate of probability density function (p.d.f) the g-DS subsample to the target p.d.f value, as the data set size approaches infinity, under consideration of the pointwise bias and variance of the estimated data p.d.f.
We study community detection in the contextual stochastic block model arXiv:1807.09596 [cs.SI], arXiv:1607.02675 [stat.ME]. In arXiv:1807.09596 [cs.SI], the second author studied this problem in the setting of sparse graphs with high-dimensional node-covariates. Using the non-rigorous cavity method from statistical physics, they conjectured the sharp limits for community detection in this setting. Further, the information theoretic threshold was verified, assuming that the average degree of the observed graph is large. It is expected that the conjecture holds as soon as the average degree exceeds one, so that the graph has a giant component. We establish this conjecture, and characterize the sharp threshold for detection and weak recovery.
Emil A Stoltenberg, Hedvig ME Nordeng, Eivind Ystrom
et al.
In the statistical literature, the class of survival analysis models known as cure models has received much attention in recent years. Cure models seem not, however, to be part of the statistical toolbox of perinatal epidemiologists. In this paper, we demonstrate that in perinatal epidemiological studies where one investigates the relation between a gestational exposure and a condition that can only be ascertained after several years, cure models may provide the correct statistical framework. The reason for this is that the hypotheses being tested often concern an unobservable outcome that, in view of the hypothesis, should be thought of as occurring at birth, even though it is only detectable much later in life. The outcome of interest can therefore be viewed as a censored binary variable. We illustrate our argument with a simple cure model analysis of the possible relation between gestational exposure to paracetamol and attention-deficit hyperactivity disorder, using data from the Norwegian Mother, Father and Child Cohort Study conducted by the Norwegian Institute of Public Health, and information about the attention-deficit hyperactivity disorder diagnoses obtained from the Norwegian Patient Registry.
agtboost is an R package implementing fast gradient tree boosting computations in a manner similar to other established frameworks such as xgboost and LightGBM, but with significant decreases in computation time and required mathematical and technical knowledge. The package automatically takes care of split/no-split decisions and selects the number of trees in the gradient tree boosting ensemble, i.e., agtboost adapts the complexity of the ensemble automatically to the information in the data. All of this is done during a single training run, which is made possible by utilizing developments in information theory for tree algorithms {\tt arXiv:2008.05926v1 [stat.ME]}. agtboost also comes with a feature importance function that eliminates the common practice of inserting noise features. Further, a useful model validation function performs the Kolmogorov-Smirnov test on the learned distribution.
Diffusion over a network refers to the phenomenon of a change of state of a cross-sectional unit in one period leading to a change of state of its neighbors in the network in the next period. One may estimate or test for diffusion by estimating a cross-sectionally aggregated correlation between neighbors over time from data. However, the estimated diffusion can be misleading if the diffusion is confounded by omitted covariates. This paper focuses on the measure of diffusion proposed by He and Song (2022, Preprint, arXiv:1812.04195v4 [stat.ME]), provides a method of decomposition analysis to measure the role of the covariates on the estimated diffusion, and develops an asymptotic inference procedure for the decomposition analysis in such a situation. This paper also presents results from a Monte Carlo study on the small sample performance of the inference procedure.
HEGY test under seasonal heterogeneity Nan Zou ∗ and Dimitris Politis arXiv:1608.04039v1 [stat.ME] 14 Aug 2016 Department of Mathematics, University of California-San Diego, La Jolla, CA 92093 Abstract Both seasonal unit roots and seasonal heterogeneity are common in seasonal data. When testing seasonal unit roots under seasonal heterogeneity, it is unclear if we can apply tests designed for seasonal homogeneous settings, for example the non-periodic HEGY test (Hylleberg, Engle, Granger, and Yoo, 1990). In this paper, the validity of both augmented HEGY test and unaugmented HEGY test is analyzed. The asymptotic null distributions of the statistics testing the single roots at 1 or −1 turn out standard and pivotal, but the asymptotic null distributions of the statistics testing any coexistence of roots at 1, −1, i, or −i are non-standard, non- pivotal, and not directly pivotable. Therefore, the HEGY tests are not directly applicable to the joint tests for the concurrence of the roots. As a remedy, we bootstrap augmented HEGY with seasonal independent and identically distributed (iid) bootstrap, and unaugmented HEGY with seasonal block bootstrap. The consistency of both bootstrap procedures is established. Simulations indicate that for roots at 1 and −1 seasonal iid bootstrap augmented HEGY test prevails, but for roots at ±i seasonal block bootstrap unaugmented HEGY test enjoys better performance. Keywords: Seasonality, Unit root, AR sieve bootstrap, Block bootstrap, Functional central limit theorem. Introduction Seasonal unit roots and seasonal heterogeneity often coexist in seasonal data, hence the importance to design seasonal unit root tests that allow for seasonal heterogeneity. In particular, given the following heterogeneous quarterly data {Y 4t+s : t = 1, ..., T , s = −3, ..., 0} (see also Ghysels and Osborn, 2001, and Franses and Paap, 2004), generated by α s (L)Y 4t+s = V 4t+s . Suppose V t = (V 4t−3 , ..., V 4t ) 0 is a weakly stationary vector-valued process. Suppose for all s = −3, ..., 0, the roots of α s (L) are on or outside the unit circle. If for some s, the roots of α s (L) are all outside the unit circle, suppose the data are a stretch of a process {Y 4t+s , t = 1, 2, ..., s = −3, ..., 0}; otherwise, suppose Y −3 = Y −2 = Y −1 = Y 0 = 0, all α s (L) share the same set of roots on the unit circle, and this set of roots on the unit circle is a subset of {1, −1, ±i}. We aim to test if all α s (L) share roots at 1, −1, or ±i. To address this task, Franses (1994) and Boswijk, Franses, and Haldrup (1997) limit their scope to finite order seasonal AutoRegressive (AR) data, and apply Johansen’s method (1988) to seasonal unit root tests in seasonal heterogeneous setting. However, ∗ Corresponding author. Email address: nzou@ucsd.edu.
Measures of linear dependence (coherence) and nonlinear dependence (phase synchronization) between any number of multivariate time series are defined. The measures are expressed as the sum of lagged dependence and instantaneous dependence. The measures are non-negative, and take the value zero only when there is independence of the pertinent type. These measures are defined in the frequency domain and are applicable to stationary and non-stationary time series. These new results extend and refine significantly those presented in a previous technical report (Pascual-Marqui 2007, arXiv:0706.1776 [stat.ME], this http URL), and have been largely motivated by the seminal paper on linear feedback by Geweke (1982 JASA 77:304-313). One important field of application is neurophysiology, where the time series consist of electric neuronal activity at several brain locations. Coherence and phase synchronization are interpreted as "connectivity" between locations. However, any measure of dependence is highly contaminated with an instantaneous, non-physiological contribution due to volume conduction and low spatial resolution. The new techniques remove this confounding factor considerably. Moreover, the measures of dependence can be applied to any number of brain areas jointly, i.e. distributed cortical networks, whose activity can be estimated with eLORETA (Pascual-Marqui 2007, arXiv:0710.3341 [math-ph]).