<p>Accurate forward models, particularly radiative transfer models, are essential for the assimilation of both passive and active satellite observations in modern data assimilation frameworks. The Community Radiative Transfer Model (CRTM), widely used in the assimilation of satellite observations within numerical weather prediction systems, especially in the United States, has recently been expanded to include a radar module. This study assesses the new module across multiple radar frequencies using observations from the Earth Clouds, Aerosols and Radiation Explorer Cloud Profiling Radar (EarthCARE CPR), the CloudSat CPR, and the Global Precipitation Measurement Dual-Frequency Precipitation Radar (GPM DPR). Simulated radar reflectivities were compared with the spaceborne measurements to evaluate the impacts of hydrometeor profiles, particle size distributions (PSDs), and frozen hydrometeor habits. The results indicate that both PSD selection and particle shape largely influence the simulated reflectivities, with snow particle habits introducing differences of up to 4 <span class="inline-formula">dBZ</span> in W-band comparisons. For the GPM DPR, reflectivities simulated using the Thompson PSD showed closer agreement with the observations than those using the Abel PSD; this agreement should be interpreted in the context of the limited independence between the observations and the retrievals used as input to the CRTM, which themselves rely on PSD-related assumptions. The sensitivity of forward radar simulations to microphysical assumptions, underscores their importance in the assimilation of radar observations in numerical weather forecast models.</p>
Pretraining for electroencephalogram (EEG) foundation models has predominantly relied on self-supervised masked reconstruction, a paradigm largely adapted from and inspired by the success of vision and language foundation models. However, unlike images and text, EEG datasets are notoriously expensive to collect and characterized by low signal-to-noise ratio. These challenges introduce difficulties in scaling the EEG foundation models and capturing the underlying neural semantics through reconstruction. In this work, we ask the question: can we stand on the shoulders of well-established foundation models from well-represented modalities to bootstrap the pretraining of EEG foundation models? We first demonstrate that mainstream foundation models, such as those from vision and time series, transfer surprisingly well to EEG domain. To this end, we propose the Multi-Teacher Distillation Pretraining (MTDP) framework for pretraining EEG foundation models via a two-stage multi-teacher distillation. In the first stage, we introduce a learnable gating network to fuse representations from diverse teachers (e.g., DINOv3 and Chronos) via a masked latent denoising objective. In the second stage, we distill the fused representation into an EEG foundation model. Extensive evaluations across 9 downstream tasks and 12 datasets demonstrate that our MTDP-based EEG foundation model outperforms its self-supervised counterparts while requiring only 25% of the pretraining data.
Prior-data fitted networks (PFNs) have emerged as powerful foundation models for tabular causal inference, yet their extension to time series remains limited by the absence of synthetic data generators that provide interventional targets. Existing time series benchmarks generate observational data with ground-truth causal graphs but lack the interventional data required for training causal foundation models. To address this, we propose \textbf{CausalTimePrior}, a principled framework for generating synthetic temporal structural causal models (TSCMs) with paired observational and interventional time series. Our prior supports configurable causal graph structures, nonlinear autoregressive mechanisms, regime-switching dynamics, and multiple intervention types (hard, soft, time-varying). We demonstrate that PFNs trained on CausalTimePrior can perform in-context causal effect estimation on held-out TSCMs, establishing a pathway toward foundation models for time series causal inference.
Jan Tagscherer, Sarah de Boer, Lena Philipp
et al.
Developing foundation models in medical imaging requires continuous monitoring of downstream performance. Researchers are burdened with tracking numerous experiments, design choices, and their effects on performance, often relying on ad-hoc, manual workflows that are inherently slow and error-prone. We introduce EvalBlocks, a modular, plug-and-play framework for efficient evaluation of foundation models during development. Built on Snakemake, EvalBlocks supports seamless integration of new datasets, foundation models, aggregation methods, and evaluation strategies. All experiments and results are tracked centrally and are reproducible with a single command, while efficient caching and parallel execution enable scalable use on shared compute infrastructure. Demonstrated on five state-of-the-art foundation models and three medical imaging classification tasks, EvalBlocks streamlines model evaluation, enabling researchers to iterate faster and focus on model innovation rather than evaluation logistics. The framework is released as open source software at https://github.com/DIAGNijmegen/eval-blocks.
У другій частині статті представлено механізм навчання штучної нейронної мережі (ШНМ) структура якої була розроблена у попередньому дослідженні. Значний обсяг навчальних даних (85451 навчальних пар), величина пакету навчання (2000), кількість раундів навчання (500000), а також глибина ШНМ дозволили отримати досить низьку похибку навчання (1,52·10-6) та валідації (1,99·10-6). Крім того, майже на всій тестовій вибірці ШНМ також показала досить якісне передбачення коефіцієнтів оптимального регулятора. Для цього були розраховані максимальні та середньоквадратичні похибки прогнозування.
Однак, окремі значення похибок прогнозування коефіцієнтів поставили під сумнів якість оптимального регулювання руху системи. Для того, щоб оцінити цю якість було вивчено найгірший у сенсі похибки прогнозування результат. Це дозволило встановити, що відхилення величин коефіцієнтів (максимально на 7,86%) не спричиняє значного відхилення динаміки руху системи „кран-вантаж” від того, що отримано за допомогою оптимальних коефіцієнтів лінійно-квадратичного регулятора. Для цього побудовано та проаналізовано графічні залежності фазового портрету маятникових коливань вантажу, функції керування, рушійного зусилля та швидкості руху крана.
У статті відмічена одна із переваг отриманої ШНМ – швидкодія отримання оптимального керування. Вона випливає із того, що доступ до ШНМ потребує значно менших обчислювальних ресурсів, аніж ті, що потрібні для розв’язання рівнянь Ріккаті.
У заключній частині статті наведено рекомендації стосовно реалізації отриманих результатів на практиці. Вони полягають у тому, що на вхід ШНМ передають вхідний вектор, що містить нормовані значення маси вантажу, довжини гнучкого підвісу та коефіцієнта ваги керування. Це дозоляє отримати прогнозні значення коефіцієнтів оптимального регулятора. У подальшому їх використовують для відшукування оптимальної стратегії керування. Остання, в свою чергу, реалізується засобами керованих електроприводних механізмів крана.
Anthony M. Barrett, Jessica Newman, Brandie Nonnecke
et al.
Increasingly multi-purpose AI models, such as cutting-edge large language models or other 'general-purpose AI' (GPAI) models, 'foundation models,' generative AI models, and 'frontier models' (typically all referred to hereafter with the umbrella term 'GPAI/foundation models' except where greater specificity is needed), can provide many beneficial capabilities but also risks of adverse events with profound consequences. This document provides risk-management practices or controls for identifying, analyzing, and mitigating risks of GPAI/foundation models. We intend this document primarily for developers of large-scale, state-of-the-art GPAI/foundation models; others that can benefit from this guidance include downstream developers of end-use applications that build on a GPAI/foundation model. This document facilitates conformity with or use of leading AI risk management-related standards, adapting and building on the generic voluntary guidance in the NIST AI Risk Management Framework and ISO/IEC 23894, with a focus on the unique issues faced by developers of GPAI/foundation models.
This paper introduces function alignment, a novel theory of mind and intelligence that is both intuitively compelling and structurally grounded. It explicitly models how meaning, interpretation, and analogy emerge from interactions among layered representations, forming a coherent framework capable not only of modeling minds but also of serving as a blueprint for building them. One of the key theoretical insights derived from function alignment is bounded interpretability, which provides a unified explanation for previously fragmented ideas in cognitive science, such as bounded rationality, symbol grounding, and analogy-making. Beyond modeling, the function alignment framework bridges disciplines often kept apart, linking computational architecture, psychological theory, and even contemplative traditions such as Zen. Rather than building on any philosophical systems, it offers a structural foundation upon which multiple ways of understanding the mind may be reconstructed.
As industrial products become abundant and sophisticated, visual industrial defect detection receives much attention, including two-dimensional and three-dimensional visual feature modeling. Traditional methods use statistical analysis, abnormal data synthesis modeling, and generation-based models to separate product defect features and complete defect detection. Recently, the emergence of foundation models has brought visual and textual semantic prior knowledge. Many methods are based on foundation models (FM) to improve the accuracy of detection, but at the same time, increase model complexity and slow down inference speed. Some FM-based methods have begun to explore lightweight modeling ways, which have gradually attracted attention and deserve to be systematically analyzed. In this paper, we conduct a systematic survey with comparisons and discussions of foundation model methods from different aspects and briefly review non-foundation model (NFM) methods recently published. Furthermore, we discuss the differences between FM and NFM methods from training objectives, model structure and scale, model performance, and potential directions for future exploration. Through comparison, we find FM methods are more suitable for few-shot and zero-shot learning, which are more in line with actual industrial application scenarios and worthy of in-depth research.
Graphs have emerged as an important foundation for a variety of applications, including capturing and reasoning over factual knowledge, semantic data integration, social networks, and providing factual knowledge for machine learning algorithms. To formalise certain properties of the data and to ensure data quality, there is a need to describe the schema of such graphs. Because of the breadth of applications and availability of different data models, such as RDF and property graphs, both the Semantic Web and the database community have independently developed graph schema languages: SHACL, ShEx, and PG-Schema. Each language has its unique approach to defining constraints and validating graph data, leaving potential users in the dark about their commonalities and differences. In this paper, we provide formal, concise definitions of the core components of each of these schema languages. We employ a uniform framework to facilitate a comprehensive comparison between the languages and identify a common set of functionalities, shedding light on both overlapping and distinctive features of the three languages.
Ana Trišović, Alex Fogelson, Janakan Sivaloganathan
et al.
We present the first large-scale analysis of AI foundation model usage in science - not just citations or keywords. We find that adoption has grown rapidly, at nearly-exponential rates, with the highest uptake in Linguistics, Computer Science, and Engineering. Vision models are the most used foundation models in science, although language models' share is growing. Open-weight models dominate. As AI builders increase the parameter counts of their models, scientists have followed suit but at a much slower rate: in 2013, the median foundation model built was 7.7x larger than the median one adopted in science, by 2024 this had jumped to 26x. We also present suggestive evidence that scientists' use of these smaller models may be limiting them from getting the full benefits of AI-enabled science, as papers that use larger models appear in higher-impact journals and accrue more citations.
Time series foundation models excel at diverse time series forecasting tasks, but their capacity for continuous improvement through incremental learning remains unexplored. We present the first comprehensive study investigating these models' temporal plasticity - their ability to progressively enhance performance through continual learning while maintaining existing capabilities. Through experiments on real-world datasets exhibiting distribution shifts, we evaluate both conventional deep learning models and foundation models using a novel continual learning framework. Our findings reveal that while traditional models struggle with performance deterioration during incremental fine-tuning, foundation models like Time-MoE and Chronos demonstrate sustained improvement in predictive accuracy. This suggests that optimizing foundation model fine-tuning strategies may be more valuable than developing domain-specific small models. Our research introduces new evaluation methodologies and insights for developing foundation time series models with robust continuous learning capabilities.
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar
et al.
In this work, we introduce the task of singing voice deepfake source attribution (SVDSA). We hypothesize that multimodal foundation models (MMFMs) such as ImageBind, LanguageBind will be most effective for SVDSA as they are better equipped for capturing subtle source-specific characteristics-such as unique timbre, pitch manipulation, or synthesis artifacts of each singing voice deepfake source due to their cross-modality pre-training. Our experiments with MMFMs, speech foundation models and music foundation models verify the hypothesis that MMFMs are the most effective for SVDSA. Furthermore, inspired from related research, we also explore fusion of foundation models (FMs) for improved SVDSA. To this end, we propose a novel framework, COFFE which employs Chernoff Distance as novel loss function for effective fusion of FMs. Through COFFE with the symphony of MMFMs, we attain the topmost performance in comparison to all the individual FMs and baseline fusion methods.
<p>Ground-based remote sensing instruments have been widely used for atmospheric research, but applications for air quality monitoring remain limited. Compared to an in situ instrument that provides air quality conditions at the ground level, most remote sensing instruments (nadir viewing) are sensitive to a broad range of altitudes, often providing only integrated column observations. These column data can be more difficult to interpret and to relate to surface values and hence to “nose-height-level” health factors. This research utilized ground-based remote sensing and in situ air quality observations in Canada's Athabasca oil sands region to investigate some of their differences.</p>
<p>Vertical column densities (VCDs) of SO<span class="inline-formula"><sub>2</sub></span> and NO<span class="inline-formula"><sub>2</sub></span> retrieved by Pandora spectrometers located at the Oski-Otin site at Fort McKay (Alberta, Canada) from 2013–2019 were analyzed along with measurements of SO<span class="inline-formula"><sub>2</sub></span> and NO<span class="inline-formula"><sub>2</sub></span> surface concentrations and meteorological data. Aerosol optical depth (AOD) observations by a CIMEL sunphotometer were compared with surface PM<span class="inline-formula"><sub>2.5</sub></span> data. The Oski-Otin site is surrounded by several large bitumen mining operations within the Athabasca oil sands region with significant NO<span class="inline-formula"><sub>2</sub></span> emissions from the mining fleet. Two major bitumen upgraders that are 20 km south-east of the site have total SO<span class="inline-formula"><sub>2</sub></span> and NO<span class="inline-formula"><sub>2</sub></span> emissions of about 40 and 20 kt yr<span class="inline-formula"><sup>−1</sup></span>, respectively. It was demonstrated that remote sensing data from Pandora and CIMEL combined with high-vertical-resolution wind profiles can provide information about pollution sources and plume characteristics. Elevated SO<span class="inline-formula"><sub>2</sub></span> VCDs were clearly observed for times with south and south-eastern winds, particularly at 200–300 m altitude (above ground level). High NO<span class="inline-formula"><sub>2</sub></span> VCD values were observed from other directions (e.g., north-west) with less prominent impacts from 200–300 m winds. In situ ground observations of SO<span class="inline-formula"><sub>2</sub></span> and NO<span class="inline-formula"><sub>2</sub></span> show a different sensitivity to wind profiles, indicating they are less sensitive to elevated plumes than remote sensing instruments. In addition to measured wind data and lidar-observed boundary layer height (BLH), modelled wind profiles and BLH from ECMWF Reanalysis v5 (ERA5) have been used to further examine the correlation between column and surface observations. The results show that the height of emission sources (e.g., emissions from high stacks or near the surface) will determine the ratio of measured column and surface concentration values (i.e., could show positive or negative correlation with BLH). This effect will have an impact on the comparison between column observations (e.g., from the satellite or ground-based remote sensing instruments) with surface in situ measurements.</p>
<p>This study explores differences between remote sensing and in situ instruments in terms of their vertical, horizontal, and temporal sampling differences. Understanding and resolving these differences are critical for future analyses linking satellite, ground-based remote sensing and in situ observations in air quality monitoring and research.</p>
Existing few-shot segmentation (FSS) methods mainly focus on designing novel support-query matching and self-matching mechanisms to exploit implicit knowledge in pre-trained backbones. However, the performance of these methods is often constrained by models pre-trained on classification tasks. The exploration of what types of pre-trained models can provide more beneficial implicit knowledge for FSS remains limited. In this paper, inspired by the representation consistency of foundational computer vision models, we develop a FSS framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence and introduce a lightweight decoder to refine coarse correspondence for fine-grained segmentation. We systematically summarize the performance of various foundation models on FSS and discover that the implicit knowledge within some of these models is more beneficial for FSS than models pre-trained on classification tasks. Extensive experiments on two widely used datasets demonstrate the effectiveness of our approach in leveraging the implicit knowledge of foundation models. Notably, the combination of DINOv2 and DFN exceeds previous state-of-the-art methods by 17.5% on COCO-20i. Code is available at https://github.com/DUT-CSJ/FoundationFSS.
Leo Fillioux, Julio Silva-Rodríguez, Ismail Ben Ayed
et al.
Recent advances in self-supervision and contrastive learning have brought the performance of foundation models to unprecedented levels in a variety of tasks. Fueled by this progress, these models are becoming the prevailing approach for a wide array of real-world vision problems, including risk-sensitive and high-stakes applications. However, ensuring safe deployment in these scenarios requires a more comprehensive understanding of their uncertainty modeling capabilities, which has received little attention. In this work, we delve into the behaviour of vision and vision-language foundation models under Conformal Prediction (CP), a statistical framework that provides theoretical guarantees of marginal coverage of the true class. Across extensive experiments including popular vision classification benchmarks, well-known foundation vision models, and three CP methods, our findings reveal that foundation models are well-suited for conformalization procedures, particularly those integrating Vision Transformers. We also show that calibrating the confidence predictions of these models, a popular strategy to improve their uncertainty quantification, actually leads to efficiency degradation of the conformal set on adaptive CP methods. Furthermore, few-shot adaptation of Vision-Language Models (VLMs) to downstream tasks, whose popularity is surging, enhances conformal scores compared to zero-shot predictions. Last, our empirical study exposes APS as particularly promising in the context of vision foundation models, as it does not violate the marginal coverage guarantees across multiple challenging, yet realistic scenarios.
Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient for serving. Techniques from knowledge distillation may be applied here, where the application model learns to mimic the foundation model. However, specialized application models and foundation models have substantial gaps in capacity, employing distinct architectures, using different input features from different modalities, and being optimized on different distributions. These differences in model characteristics lead to significant challenges for distillation methods. In this work, we propose creating a teaching committee comprising both foundation model teachers and complementary teachers. Complementary teachers possess model characteristics akin to the student's, aiming to bridge the gap between the foundation model and specialized application models for a smoother knowledge transfer. Further, to accommodate the dissimilarity among the teachers in the committee, we introduce DiverseDistill, which allows the student to understand the expertise of each teacher and extract task knowledge. Our evaluations demonstrate that adding complementary teachers enhances student performance. Finally, DiverseDistill consistently outperforms baseline distillation methods, regardless of the teacher choices, resulting in significantly improved student performance.
Anastasios N. Angelopoulos, Rina Foygel Barber, Stephen Bates
This book is about conformal prediction and related inferential techniques that build on permutation tests and exchangeability. These techniques are useful in a diverse array of tasks, including hypothesis testing and providing uncertainty quantification guarantees for machine learning systems. Much of the current interest in conformal prediction is due to its ability to integrate into complex machine learning workflows, solving the problem of forming prediction sets without any assumptions on the form of the data generating distribution. Since contemporary machine learning algorithms have generally proven difficult to analyze directly, conformal prediction's main appeal is its ability to provide formal, finite-sample guarantees when paired with such methods. The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these proof strategies, especially the more recent ones, are scattered among research papers, making it difficult for researchers to understand where to look, which results are important, and how exactly the proofs work. We hope to bridge this gap by curating what we believe to be some of the most important results in the literature and presenting their proofs in a unified language, with illustrations, and with an eye towards pedagogy.
More music foundation models are recently being released, promising a general, mostly task independent encoding of musical information. Common ways of adapting music foundation models to downstream tasks are probing and fine-tuning. These common transfer learning approaches, however, face challenges. Probing might lead to suboptimal performance because the pre-trained weights are frozen, while fine-tuning is computationally expensive and is prone to overfitting. Our work investigates the use of parameter-efficient transfer learning (PETL) for music foundation models which integrates the advantage of probing and fine-tuning. We introduce three types of PETL methods: adapter-based methods, prompt-based methods, and reparameterization-based methods. These methods train only a small number of parameters, and therefore do not require significant computational resources. Results show that PETL methods outperform both probing and fine-tuning on music auto-tagging. On key detection and tempo estimation, they achieve similar results as fine-tuning with significantly less training cost. However, the usefulness of the current generation of foundation model on key and tempo tasks is questioned by the similar results achieved by training a small model from scratch. Code available at https://github.com/suncerock/peft-music/