Hasil "Earthwork. Foundations"

arXiv Open Access 2026

Omni-fMRI: A Universal Atlas-Free fMRI Foundation Model

Mo Wang, Wenhao Ye, Junfeng Xia et al.

Self-supervised fMRI foundation models have shown promising transfer performance, yet most rely on predefined region-level parcellations that discard fine-grained voxel information and introduce atlas-dependent biases. We propose Omni-fMRI, an atlas-free foundation model that operates directly on voxel-level signals. To enable scalable pretraining on 49,497 fMRI sessions across nine datasets, Omni-fMRI introduces a dynamic patching mechanism that substantially reduces computational cost while preserving informative spatial structure. To support reproducibility and fair comparison, we establish a comprehensive benchmark suite spanning 11 datasets and a diverse set of resting-state and task-based fMRI tasks. Experimental results demonstrate that Omni-fMRI consistently outperforms existing foundation models, providing a scalable and reproducible framework for atlas-free brain representation learning. Code and logs are available.

en cs.CE, q-bio.QM

Detail Sumber

CrossRef Open Access 2026

Editorial Board

en

Detail DOI Sumber

arXiv Open Access 2025

TimeFound: A Foundation Model for Time Series Forecasting

Congxi Xiao, Jingbo Zhou, Yixiong Xiao et al.

We present TimeFound, an encoder-decoder transformer-based time series foundation model for out-of-the-box zero-shot forecasting. To handle time series data from various domains, TimeFound employs a multi-resolution patching strategy to capture complex temporal patterns at multiple scales. We pre-train our model with two sizes (200M and 710M parameters) on a large time-series corpus comprising both real-world and synthetic datasets. Over a collection of unseen datasets across diverse domains and forecasting horizons, our empirical evaluations suggest that TimeFound can achieve superior or competitive zero-shot forecasting performance, compared to state-of-the-art time series foundation models.

en cs.LG

Detail Sumber

arXiv Open Access 2025

TerraTorch: The Geospatial Foundation Models Toolkit

Carlos Gomes, Benedikt Blumenstiel, Joao Lucas de Sousa Almeida et al.

TerraTorch is a fine-tuning and benchmarking toolkit for Geospatial Foundation Models built on PyTorch Lightning and tailored for satellite, weather, and climate data. It integrates domain-specific data modules, pre-defined tasks, and a modular model factory that pairs any backbone with diverse decoder heads. These components allow researchers and practitioners to fine-tune supported models in a no-code fashion by simply editing a training configuration. By consolidating best practices for model development and incorporating the automated hyperparameter optimization extension Iterate, TerraTorch reduces the expertise and time required to fine-tune or benchmark models on new Earth Observation use cases. Furthermore, TerraTorch directly integrates with GEO-Bench, allowing for systematic and reproducible benchmarking of Geospatial Foundation Models. TerraTorch is open sourced under Apache 2.0, available at https://github.com/IBM/terratorch, and can be installed via pip install terratorch.

en cs.CV, cs.LG

Detail Sumber

arXiv Open Access 2025

Can Foundation Models Predict Fitness for Duty?

Juan E. Tapia, Christoph Busch

Biometric capture devices have been utilised to estimate a person's alertness through near-infrared iris images, expanding their use beyond just biometric recognition. However, capturing a substantial number of corresponding images related to alcohol consumption, drug use, and sleep deprivation to create a dataset for training an AI model presents a significant challenge. Typically, a large quantity of images is required to effectively implement a deep learning approach. Currently, training downstream models with a huge number of images based on foundational models provides a real opportunity to enhance this area, thanks to the generalisation capabilities of self-supervised models. This work examines the application of deep learning and foundational models in predicting fitness for duty, which is defined as the subject condition related to determining the alertness for work.

en cs.CV

Detail Sumber

arXiv Open Access 2025

QuarkMed Medical Foundation Model Technical Report

Ao Li, Bin Yan, Bingfeng Cai et al.

Recent advancements in large language models have significantly accelerated their adoption in healthcare applications, including AI-powered medical consultations, diagnostic report assistance, and medical search tools. However, medical tasks often demand highly specialized knowledge, professional accuracy, and customization capabilities, necessitating a robust and reliable foundation model. QuarkMed addresses these needs by leveraging curated medical data processing, medical-content Retrieval-Augmented Generation (RAG), and a large-scale, verifiable reinforcement learning pipeline to develop a high-performance medical foundation model. The model achieved 70% accuracy on the Chinese Medical Licensing Examination, demonstrating strong generalization across diverse medical benchmarks. QuarkMed offers a powerful yet versatile personal medical AI solution, already serving over millions of users at ai.quark.cn.

en cs.AI

Detail Sumber

arXiv Open Access 2025

A Vector-Quantized Foundation Model for Patient Behavior Monitoring

Rodrigo Oliver, Josué Pérez-Sabater, Leire Paz-Arbaizar et al.

Foundation models have achieved remarkable success across various domains, yet their adoption in healthcare remains limited. While significant advances have been made in medical imaging, genetic biomarkers, and time series from electronic health records, the potential of foundation models for patient behavior monitoring through personal digital devices remains underexplored. The data generated by these devices are inherently heterogeneous, multisource, and often exhibit high rates of missing data, posing unique challenges. This paper introduces a novel foundation model based on a modified vector quantized variational autoencoder, specifically designed to process real-world data from smartphones and wearable devices. We leveraged the discrete latent representation of this model to effectively perform two downstream tasks, suicide risk assessment and emotional state prediction, on different held-out clinical cohorts without the need of fine-tuning. We also highlight the existence of a trade-off between discrete and continuous latent structures, suggesting that hybrid models may be optimal for balancing accuracy across various supervised and unsupervised tasks.

en cs.LG

Detail Sumber

arXiv Open Access 2025

LLMic: Romanian Foundation Language Model

Vlad-Andrei Bădoiu, Mihai-Valentin Dumitru, Alexandru M. Gherghescu et al.

Recent advances in Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks with commercial models leading the way. While open models usually operate at a smaller scale, they maintain competitiveness through specialization and fine-tuning. However, a significant challenge persists: open models often underperform in low-resource languages due to limited representation in the training corpus. In this paper, we present LLMic, a bilingual foundation language model designed specifically for the Romanian Language. We document the complete process of pretraining a foundation model for a low-resource language, including corpus construction, architecture selection, and hyper-parameter optimization. Our evaluation demonstrates that LLMic can be specialized for tasks in the target language, achieving results comparable to other much larger open models. We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in English-to-Romanian translation tasks. This opens the path for efficient large-scale processing for the Romanian language community, using the much smaller LLMic model

en cs.CL

Detail Sumber

arXiv Open Access 2025

Axiomatic Foundations of Fractal Analysis and Fractal Number Theory

Stanislav Semenov

We develop an axiomatic framework for fractal analysis and fractal number theory grounded in hierarchies of definability. Central to this approach is a sequence of formal systems F_n, each corresponding to a definability level S_n contained in R of constructively accessible mathematical objects. This structure refines classical analysis by replacing uncountable global constructs with countable, syntactically constrained approximations. The axioms formalize: - A hierarchy of definability levels S_n, indexed by syntactic and ordinal complexity; - Fractal topologies and the induced notions of continuity, compactness, and differentiability; - Layered integration and differentiation with explicit convergence and definability bounds; - Arithmetic and function spaces over the stratified continuum R_{S_n}, which is a subset of R. This framework synthesizes constructive mathematics, proof-theoretic stratification, and fractal geometric intuition into a unified, finitistically structured model. Key results include the definability-based classification of real numbers (e.g., algebraic, computable, Liouville), a stratified fundamental theorem of calculus with syntactic error bounds, and compatibility with base systems such as RCA_0 and ACA_0. The framework enables constructive approximation and syntactic regularization of classical analysis, with applications to proof assistants, computable mathematics, and foundational studies of the continuum.

en math.GM

Detail Sumber

arXiv Open Access 2024

Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models

Adam Davies, Elisa Nguyen, Michael Simeone et al.

With the rise of foundation models, there is growing concern about their potential social impacts. Social science has a long history of studying the social impacts of transformative technologies in terms of pre-existing systems of power and how these systems are disrupted or reinforced by new technologies. In this position paper, we build on prior work studying the social impacts of earlier technologies to propose a conceptual framework studying foundation models as sociotechnical systems, incorporating social science expertise to better understand how these models affect systems of power, anticipate the impacts of deploying these models in various applications, and study the effectiveness of technical interventions intended to mitigate social harms. We advocate for an interdisciplinary and collaborative research paradigm between AI and social science across all stages of foundation model research and development to promote socially responsible research practices and use cases, and outline several strategies to facilitate such research.

en cs.AI

Detail Sumber

arXiv Open Access 2024

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

Foundation models for interactive segmentation in 2D natural images and videos have sparked significant interest in building 3D foundation models for medical imaging. However, the domain gaps and clinical use cases for 3D medical imaging require a dedicated model that diverges from existing 2D solutions. Specifically, such foundation models should support a full workflow that can actually reduce human effort. Treating 3D medical images as sequences of 2D slices and reusing interactive 2D foundation models seems straightforward, but 2D annotation is too time-consuming for 3D tasks. Moreover, for large cohort analysis, it's the highly accurate automatic segmentation models that reduce the most human effort. However, these models lack support for interactive corrections and lack zero-shot ability for novel structures, which is a key feature of "foundation". While reusing pre-trained 2D backbones in 3D enhances zero-shot potential, their performance on complex 3D structures still lags behind leading 3D models. To address these issues, we present VISTA3D, Versatile Imaging SegmenTation and Annotation model, that targets to solve all these challenges and requirements with one unified foundation model. VISTA3D is built on top of the well-established 3D segmentation pipeline, and it is the first model to achieve state-of-the-art performance in both 3D automatic (supporting 127 classes) and 3D interactive segmentation, even when compared with top 3D expert models on large and diverse benchmarks. Additionally, VISTA3D's 3D interactive design allows efficient human correction, and a novel 3D supervoxel method that distills 2D pretrained backbones grants VISTA3D top 3D zero-shot performance. We believe the model, recipe, and insights represent a promising step towards a clinically useful 3D foundation model. Code and weights are publicly available at https://github.com/Project-MONAI/VISTA.

en cs.CV

Detail Sumber

arXiv Open Access 2024

GenRL: Multimodal-foundation world models for generalization in embodied agents

Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt et al.

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be adopted in embodied contexts, due to the significant domain gap. However, the lack of multimodal data in such domains represents an obstacle to developing foundation models for embodied applications. In this work, we overcome these problems by presenting multimodal-foundation world models, able to connect and align the representation of foundation VLMs with the latent space of generative world models for RL, without any language annotations. The resulting agent learning framework, GenRL, allows one to specify tasks through vision and/or language prompts, ground them in the embodied domain's dynamics, and learn the corresponding behaviors in imagination. As assessed through large-scale multi-task benchmarking in locomotion and manipulation domains, GenRL enables multi-task generalization from language and visual prompts. Furthermore, by introducing a data-free policy learning strategy, our approach lays the groundwork for foundational policy learning using generative world models. Website, code and data: https://mazpie.github.io/genrl/

en cs.AI, cs.CV

Detail Sumber

arXiv Open Access 2024

Universal Definitions of the Roman Factorial: Introduction to Foundational Functions and the Generalization Process

Leonidas Liponis

This paper introduces a new method for redefining the Roman factorial using universally applicable functions that are not expressed in closed form. We present a set of foundational functions, similar to Boolean operations, to simplify the factorial expression. Through a systematic process of generalization, termed generalization process, we aim to use these foundational functions to create recursive and non-recursive, global definitions of the Roman factorial.

en math.CO

Detail Sumber

arXiv Open Access 2024

Sampling Foundational Transformer: A Theoretical Perspective

Viet Anh Nguyen, Minh Lenhat, Khoa Nguyen et al.

The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. To apply transformers across different data modalities, practitioners have to make specific clever data-modality-dependent constructions. In this paper, we propose Sampling Foundational Transformer (SFT) that can work on multiple data modalities (e.g., point cloud, graph, and sequence) and constraints (e.g., rotational-invariant). The existence of such model is important as contemporary foundational modeling requires operability on multiple data sources. For efficiency on large number of tokens, our model relies on our context aware sampling-without-replacement mechanism for both linear asymptotic computational complexity and real inference time gain. For efficiency, we rely on our newly discovered pseudoconvex formulation of transformer layer to increase model's convergence rate. As a model working on multiple data modalities, SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models.

en cs.LG, cs.CV

Detail Sumber

arXiv Open Access 2024

On the Generalizability of Foundation Models for Crop Type Mapping

Yi-Chia Chang, Adam J. Stewart, Favyen Bastani et al.

Foundation models pre-trained using self-supervised learning have shown powerful transfer learning capabilities on various downstream tasks, including language understanding, text generation, and image recognition. The Earth observation (EO) field has produced several foundation models pre-trained directly on multispectral satellite imagery for applications like precision agriculture, wildfire and drought monitoring, and natural disaster response. However, few studies have investigated the ability of these models to generalize to new geographic locations, and potential concerns of geospatial bias -- models trained on data-rich developed nations not transferring well to data-scarce developing nations -- remain. We evaluate three popular EO foundation models, SSL4EO-S12, SatlasPretrain, and ImageNet, on five crop classification datasets across five continents. Results show that pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet. While only 100 labeled images are sufficient for achieving high overall accuracy, 900 images are required to mitigate class imbalance and improve average accuracy.

en cs.CV, cs.LG

Detail Sumber

arXiv Open Access 2024

Towards Scalable Foundation Models for Digital Dermatology

Fabian Gröger, Philippe Gottfrois, Ludovic Amruthalingam et al.

The growing demand for accurate and equitable AI models in digital dermatology faces a significant challenge: the lack of diverse, high-quality labeled data. In this work, we investigate the potential of domain-specific foundation models for dermatology in addressing this challenge. We utilize self-supervised learning (SSL) techniques to pre-train models on a dataset of over 240,000 dermatological images from public and private collections. Our study considers several SSL methods and compares the resulting foundation models against domain-agnostic models like those pre-trained on ImageNet and state-of-the-art models such as MONET across 12 downstream tasks. Unlike previous research, we emphasize the development of smaller models that are more suitable for resource-limited clinical settings, facilitating easier adaptation to a broad range of use cases. Results show that models pre-trained in this work not only outperform general-purpose models but also approach the performance of models 50 times larger on clinically relevant diagnostic tasks. To promote further research in this direction, we publicly release both the training code and the foundation models, which can benefit clinicians in dermatological applications.

en cs.CV, cs.AI

Detail Sumber

arXiv Open Access 2024

HEMM: Holistic Evaluation of Multimodal Foundation Models

Paul Pu Liang, Akshay Goindani, Talha Chafekar et al.

Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation of Multimodal Models (HEMM) to systematically evaluate the capabilities of multimodal foundation models across a set of 3 dimensions: basic skills, information flow, and real-world use cases. Basic multimodal skills are internal abilities required to solve problems, such as learning interactions across modalities, fine-grained alignment, multi-step reasoning, and the ability to handle external knowledge. Information flow studies how multimodal content changes during a task through querying, translation, editing, and fusion. Use cases span domain-specific challenges introduced in real-world multimedia, affective computing, natural sciences, healthcare, and human-computer interaction applications. Through comprehensive experiments across the 30 tasks in HEMM, we (1) identify key dataset dimensions (e.g., basic skills, information flows, and use cases) that pose challenges to today's models, and (2) distill performance trends regarding how different modeling dimensions (e.g., scale, pre-training data, multimodal alignment, pre-training, and instruction tuning objectives) influence performance. Our conclusions regarding challenging multimodal interactions, use cases, and tasks requiring reasoning and external knowledge, the benefits of data and model scale, and the impacts of instruction tuning yield actionable insights for future work in multimodal foundation models.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2023

A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku

In recent years large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields, such as image recognition and generation. Though achieving great success in their original application case, it is still unclear whether those foundation models can be applied to other different downstream tasks. In this paper, we conduct a short survey on the current methods for discriminative dense recognition tasks, which are built on the pretrained foundation model. And we also provide some preliminary experimental analysis of an existing open-vocabulary segmentation method based on Stable Diffusion, which indicates the current way of deploying diffusion model for segmentation is not optimal. This aims to provide insights for future research on adopting foundation model for downstream task.

en cs.CV

Detail Sumber

arXiv Open Access 2023

On the foundations of entropic cosmologies: inconsistencies, possible solutions and dead end signs

Hussain Gohar, Vincenzo Salzano

In this letter we explore the foundations of entropic cosmology and highlight some important flaws which have emerged and adopted in the recent literature. We argue that, when applying entropy and temperature on the cosmological horizon by assuming the holographic principle for all thermodynamic approaches to cosmology and gravity, one must derive the consistent thermodynamic quantities following Clausius relation. One key assumption which is generally overlooked, is that in this process one must assume a mass-to-horizon relation, which is generally taken as a linear one. We show that, regardless of the type of entropy chosen on the cosmological horizon, when a thermodynamically consistent corresponding temperature is considered, all modified entropic force models are equivalent to and indistinguishable from the original entropic force models based on standard Bekenstein entropy and Hawking temperature. As such, they are also plagued by the same problems and inability to describe in a satisfactory qualitative and quantitative way the cosmological dynamics as it emerges from the probes we have. We also show that the standard accepted parameterization for Hawking temperature (including a $γ$ rescaling) is actually not correctly applied, namely, it is not related to entropy in a thermodynamically consistent way. Finally, we clearly state that the explicit form of the entropic force on cosmological horizons is mostly dictated by the assumption on the mass-to-horizon relation. As such, we discuss what should be done in order to fix all such issues, and what conceptually could be implied by its correct implementation in order to advance in the field.

en gr-qc

Detail Sumber

CrossRef Open Access 2022

Non-Equilibrium Thermodynamic Foundations of the Origin of Life

Karo Michaelian

There is little doubt that life’s origin followed from the known physical and chemical laws of Nature. The most general scientific framework incorporating the laws of Nature and applicable to most known processes to good approximation, is that of thermodynamics and its extensions to treat out-of-equilibrium phenomena. The event of the origin of life should therefore also be amenable to such an analysis. In this review paper, I describe the non-equilibrium thermodynamic foundations of the origin of life for the non-expert from the perspective of the “Thermodynamic Dissipation Theory for the Origin of Life” which is founded on Classical Irreversible Thermodynamic theory developed by Lars Onsager, Ilya Prigogine, and coworkers. A Glossary of Thermodynamic Terms can be found at the end of the article to aid the reader.

27 sitasi en

Detail DOI Sumber

Hasil untuk "Earthwork. Foundations"