Hasil untuk "Earthwork. Foundations"

Menampilkan 20 dari ~639766 hasil · dari arXiv, DOAJ, Semantic Scholar, CrossRef

JSON API
arXiv Open Access 2026
gUFO: A Gentle Foundational Ontology for Semantic Web Knowledge Graphs

João Paulo A. Almeida, Giancarlo Guizzardi, Tiago Prince Sales et al.

gUFO is a lightweight implementation of the Unified Foundational Ontology (UFO) suitable for Semantic Web OWL 2 DL applications. UFO is a mature foundational ontology with a rich axiomatization and that has been employed in a significant number of projects in research and industry. Moreover, it is currently in the process of standardization by the International Organization for Standardization as the ISO/IEC CD 21838-5. gUFO stands out from other foundational ontology implementations (such as those provided for BFO and DOLCE) given its unique support for a typology of types (operationalizing OntoClean guidelines), its reification patterns for intrinsic and relational aspects, and its support for situations and high-order types. gUFO provides well-founded patterns to address recurrent problems in Semantic Web knowledge graphs. In this paper, we present gUFO with its constituting categories, relations and constraints, discuss how it differs from the original UFO reference ontology, elaborate on its community adoption, and systematically position it in relation to existing OWL-based implementations of popular alternative foundational ontologies.

en cs.AI, cs.DB
arXiv Open Access 2026
TVWorld: Foundations for Remote-Control TV Agents

Zhantao Ma, Quanfeng Lu, Shuai Zhong et al.

Recent large vision-language models (LVLMs) have demonstrated strong potential for device control. However, existing research has primarily focused on point-and-click (PnC) interaction, while remote-control (RC) interaction commonly encountered in everyday TV usage remains largely underexplored. To fill this gap, we introduce \textbf{TVWorld}, an offline graph-based abstraction of real-world TV navigation that enables reproducible and deployment-free evaluation. On this basis, we derive two complementary benchmarks that comprehensively assess TV-use capabilities: \textbf{TVWorld-N} for topology-aware navigation and \textbf{TVWorld-G} for focus-aware grounding. These benchmarks expose a key limitation of existing agents: insufficient topology awareness for focus-based, long-horizon TV navigation. Motivated by this finding, we propose a \emph{Topology-Aware Training} framework that injects topology awareness into LVLMs. Using this framework, we develop \textbf{TVTheseus}, a foundation model specialized for TV navigation. TVTheseus achieves a success rate of $68.3\%$ on TVWorld-N, surpassing strong closed-source baselines such as Gemini 3 Flash and establishing state-of-the-art (SOTA) performance. Additional analyses further provide valuable insights into the development of effective TV-use agents.

en cs.CV, cs.AI
arXiv Open Access 2025
MoFM: A Large-Scale Human Motion Foundation Model

Mohammadreza Baharani, Ghazal Alinezhad Noghre, Armin Danesh Pazho et al.

Foundation Models (FM) have increasingly drawn the attention of researchers due to their scalability and generalization across diverse tasks. Inspired by the success of FMs and the principles that have driven advancements in Large Language Models (LLMs), we introduce MoFM as a novel Motion Foundation Model. MoFM is designed for the semantic understanding of complex human motions in both time and space. To facilitate large-scale training, MotionBook, a comprehensive human motion dictionary of discretized motions is designed and employed. MotionBook utilizes Thermal Cubes to capture spatio-temporal motion heatmaps, applying principles from discrete variational models to encode human movements into discrete units for a more efficient and scalable representation. MoFM, trained on a large corpus of motion data, provides a foundational backbone adaptable to diverse downstream tasks, supporting paradigms such as one-shot, unsupervised, and supervised tasks. This versatility makes MoFM well-suited for a wide range of motion-based applications.

en cs.CV, cs.LG
arXiv Open Access 2025
Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Yu Bo, Weian Mao, Yanjun Shao et al.

In recent years, a variety of methods based on Transformer and state space model (SSM) architectures have been proposed, advancing foundational DNA language models. However, there is a lack of comparison between these recent approaches and the classical architecture convolutional networks (CNNs) on foundation model benchmarks. This raises the question: are CNNs truly being surpassed by these recent approaches based on transformer and SSM architectures? In this paper, we develop a simple but well-designed CNN-based method termed ConvNova. ConvNova identifies and proposes three effective designs: 1) dilated convolutions, 2) gated convolutions, and 3) a dual-branch framework for gating mechanisms. Through extensive empirical experiments, we demonstrate that ConvNova significantly outperforms recent methods on more than half of the tasks across several foundation model benchmarks. For example, in histone-related tasks, ConvNova exceeds the second-best method by an average of 5.8%, while generally utilizing fewer parameters and enabling faster computation. In addition, the experiments observed findings that may be related to biological characteristics. This indicates that CNNs are still a strong competitor compared to Transformers and SSMs. We anticipate that this work will spark renewed interest in CNN-based methods for DNA foundation models.

en cs.LG, cs.AI
arXiv Open Access 2025
SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models

Huixin Zhan, Clovis Barbour, Jason H. Moore

Genomic Foundation Models (GFMs), such as Evolutionary Scale Modeling (ESM), have demonstrated significant success in variant effect prediction. However, their adversarial robustness remains largely unexplored. To address this gap, we propose SafeGenes: a framework for Secure analysis of genomic foundation models, leveraging adversarial attacks to evaluate robustness against both engineered near-identical adversarial Genes and embedding-space manipulations. In this study, we assess the adversarial vulnerabilities of GFMs using two approaches: the Fast Gradient Sign Method (FGSM) and a soft prompt attack. FGSM introduces minimal perturbations to input sequences, while the soft prompt attack optimizes continuous embeddings to manipulate model predictions without modifying the input tokens. By combining these techniques, SafeGenes provides a comprehensive assessment of GFM susceptibility to adversarial manipulation. Targeted soft prompt attacks induced severe degradation in MLM-based shallow architectures such as ProteinBERT, while still producing substantial failure modes even in high-capacity foundation models such as ESM1b and ESM1v. These findings expose critical vulnerabilities in current foundation models, opening new research directions toward improving their security and robustness in high-stakes genomic applications such as variant effect prediction.

en cs.CR, cs.AI
arXiv Open Access 2025
Foundation Models For Seismic Data Processing: An Extensive Review

Fabian Fuchs, Mario Ruben Fernandez, Norman Ettrich et al.

Seismic processing plays a crucial role in transforming raw data into high-quality subsurface images, pivotal for various geoscience applications. Despite its importance, traditional seismic processing techniques face challenges such as noisy and damaged data and the reliance on manual, time-consuming workflows. The emergence of deep learning approaches has introduced effective and user-friendly alternatives, yet many of these deep learning approaches rely on synthetic datasets and specialized neural networks. Recently, foundation models have gained traction in the seismic domain, due to their success in the natural image domain. Therefore, we investigate the application of natural image foundation models on the three seismic processing tasks: demultiple, interpolation, and denoising. We evaluate the impact of different model characteristics, such as pre-training technique and neural network architecture, on performance and efficiency. Rather than proposing a single seismic foundation model, we critically examine various natural image foundation models and suggest some promising candidates for future exploration.

en cs.CV
arXiv Open Access 2025
Integrating Genomics into Multimodal EHR Foundation Models

Jonathan Amar, Edward Liu, Alessandra Breschi et al.

This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships between clinical data and genetic predispositions. The methodology extends advancements in generative AI to the EHR foundation model space, enhancing predictive capabilities and interpretability. Evaluation on AoU data demonstrates the model's predictive value for the onset of various conditions, particularly Type 2 Diabetes (T2D), and illustrates the interplay between PRS and EHR data. The work also explores transfer learning for custom classification tasks, showcasing the architecture's versatility and efficiency. This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies, laying the groundwork for more personalized, equitable, and actionable real-world evidence generation in healthcare.

en cs.LG, cs.AI
arXiv Open Access 2025
Beyond holography: the entropic quantum gravity foundations of image processing

Ginestra Bianconi

Recently, thanks to the development of artificial intelligence (AI) there is increasing scientific attention in establishing the connections between theoretical physics and AI. Traditionally, these connections have been focusing mostly on the relation between string theory and image processing and involve important theoretical paradigms such as holography. Recently G. Bianconi has formulated the Gravity from Entropy (GfE) approach to quantum gravity in which gravity is derived from the geometric quantum relative entropy (GQRE) between two metrics associated with the Lorentzian spacetime. Here it is demonstrated that the famous Perona-Malik algorithm for image processing is the gradient flow that maximizes the GfE action in its simple warm-up scenario. Specifically, this algorithm is the outcome of the maximization of the GfE action calculated between two Euclidean metrics: the one of the support of the image and the one induced by the image. As the Perona-Malik algorithm is known to preserve sharp contours, this implies that the GfE action, does not in general lead to uniform images upon iteration of the gradient flow dynamics as it would be intuitively expected from entropic actions maximising classical entropies. Rather, the outcome of the maximization of the GfE action is compatible with the preservation of complex structures. These results provide the geometrical and information theory foundations for the Perona-Malik algorithm and might contribute to establish deeper connections between GfE, machine learning and brain research.

en cond-mat.dis-nn, cond-mat.stat-mech
arXiv Open Access 2025
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

Redwan Sony, Parisa Farmanifard, Arun Ross et al.

In this paper, we address the following question: How do generic foundation models (e.g., CLIP, BLIP, GPT-4o, Grok-4) compare against a domain-specific face recognition model (viz., AdaFace or ArcFace) on the face recognition task? Through a series of experiments involving several foundation models and benchmark datasets, we report the following findings: (a) In all face benchmark datasets considered, domain-specific models outperformed zero-shot foundation models. (b) The performance of zero-shot generic foundation models improved on over-segmented face images compared to tightly cropped faces, thereby suggesting the importance of contextual clues. (c) A simple score-level fusion of a foundation model with a domain-specific face recognition model improved the accuracy at low false match rates. (d) Foundation models, such as GPT-4o and Grok-4, are able to provide explainability to the face recognition pipeline. In some instances, foundation models are even able to resolve low-confidence decisions made by AdaFace, thereby reiterating the importance of combining domain-specific face recognition models with generic foundation models in a judicious manner.

en cs.CV, cs.AI
arXiv Open Access 2025
Foundations of Digital Circuits: Denotation, Operational, and Algebraic Semantics

George Kaye

This thesis details a project to define a fully compositional theory of synchronous sequential circuits built from primitive components, motivated by applying techniques successfully used in programming languages to hardware. The first part of the thesis defines the syntactic foundations of sequential circuit morphisms, and then builds three different semantic theories: denotational, operational and algebraic. We characterise the denotational semantics of sequential circuits as certain causal stream functions, as well as providing a link to existing circuit methodologies by mapping between circuit morphisms, stream functions and Mealy machines. The operational semantics is defined as a strategy for applying some global transformations followed by local reductions to demonstrate how a circuit processes a value, leading to a notion of observational equivalence. The algebraic semantics consists of equations for bringing circuits into a pseudo-normal form, and then encoding between different state sets. This part of the thesis concludes with a discussion of some novel applications, such as those for using partial evaluation for digital circuits. While mathematically rigorous, the categorical string diagram formalism is not suited for reasoning computationally. The second part of this thesis details an extension of string diagram rewriting with hypergraphs so that it is compatible with the traced comonoid structure present in the category of digital circuits. We identify the properties that characterise cospans of hypergraphs corresponding to traced comonoid terms, and demonstrate how to identify rewriting contexts valid for rewriting modulo traced comonoid structure. We apply the graph rewriting framework to fixed point operators as well as the operational semantics from the first part, and present a new hardware description language based on these theoretical developments.

en cs.LO, cs.PL
arXiv Open Access 2025
Saving Foundation Flow-Matching Priors for Inverse Problems

Yuxiang Wan, Ryan Devera, Wenjie Zhang et al.

Foundation flow-matching (FM) models promise a universal prior for solving inverse problems (IPs), yet today they trail behind domain-specific or even untrained priors. How can we unlock their potential? We introduce FMPlug, a plug-in framework that redefines how foundation FMs are used in IPs. FMPlug combines an instance-guided, time-dependent warm-start strategy with a sharp Gaussianity regularization, adding problem-specific guidance while preserving the Gaussian structures. This leads to a significant performance boost across image restoration and scientific IPs. Our results point to a path for making foundation FM models practical, reusable priors for IP solving.

en cs.LG, cs.CV
DOAJ Open Access 2025
Parameterization of 3D cloud geometry and a neural-network-based fast forward operator for polarized radiative transfer

A. Weber, G. Köcher, B. Mayer

<p>Clouds generally have a complex three-dimensional geometry. However, realistic three-dimensional radiative transfer simulations of clouds are computationally expensive, so most retrievals of cloud properties assume one-dimensional clouds, which introduces retrieval biases. In this work, a fast forward operator for polarized 3D radiative transfer in the visible wavelength range is presented. To this end, a new approximation for 3D radiative transfer, the InDEpendent column local halF-sphere ApproXimation (IDEFAX), is introduced. The basic idea behind this approximation is similar to the independent column approximation assuming plane-parallel clouds. However, every column is approximated by an independent field of 3D half-spherical clouds instead of a plane-parallel homogeneous cloud. This field of half-spherical clouds is defined by the local cloud surface orientation angles and the cloud fraction. Thus, the IDEFAX has only three more parameters compared to the plane-parallel approximation. To obtain a fast forward operator, artificial neural networks are trained for both the plane-parallel and the half-spherical cloud assumptions. The IDEFAX and the neural network forward operators are validated against polarized 3D radiative transfer simulations with MYSTIC for low-level Arctic mixed-phase clouds using a realistic cloud field simulated with the WRF model. The use of the IDEFAX significantly improves the representation of 3D radiative effects in the simulated radiance fields compared to the plane-parallel independent column approximation. Due to the implementation of the forward operator with neural networks, the computation time for both approximations is comparable and about 5 orders of magnitude faster than full 3D radiative transfer simulations for the shown example. The introduced neural network forward operators are constructed to be used in retrievals of cloud properties with the specMACS instrument. However, the methods are also applicable to other measurements in the visible wavelength range and to model data.</p>

Environmental engineering, Earthwork. Foundations
DOAJ Open Access 2025
The ATMONSYS water vapor DIAL: advanced measurements of short-term variability in the planetary boundary layer

J. Speidel, H. Vogelmann, A. Behrendt et al.

<p>High-resolution measurements of water vapor concentrations and their transport throughout the turbulent planetary boundary layer (PBL) and beyond are key for an enhanced understanding of atmospheric processes. This study presents data from the mobile Atmospheric Monitoring System (ATMONSYS) Differential Absorption Lidar (DIAL), operated with a novel titanium sapphire (Ti:Sa) laser concept, for the first time. The ATMONSYS DIAL aims to resolve turbulence throughout the PBL with a sampling frequency of 10 <span class="inline-formula">s</span> and vertical resolutions of less than 200 <span class="inline-formula">m</span>. General measuring capabilities during high-noon, clear-sky, summer conditions with a maximum vertical measurement range of <span class="inline-formula">&gt;3</span> <span class="inline-formula">km</span> and statistical uncertainties of <span class="inline-formula">&lt;5</span> <span class="inline-formula">%</span> are demonstrated. The analysis of turbulence spectra shows good agreement with Kolmogorov's law, demonstrating the system's capability to resolve turbulence. However, deviations from Kolmogorov behavior are observed at certain frequency ranges. By combining the ATMONSYS DIAL with an adjacent high-quality Doppler wind lidar, some of these deviations are mitigated in the co-spectra due to independent noise from both instruments. However, intermediate deviations from Kolmogorov behavior persist, likely due to surrounding surface heterogeneities. The agreement of the co-spectra with Kolmogorov's law at the highest frequencies demonstrates that the ATMONSYS DIAL is capable of resolving turbulent latent energy fluxes down to the measurement's Nyquist frequency of <span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M7" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">5</mn><mo>×</mo><msup><mn mathvariant="normal">10</mn><mrow><mo>-</mo><mn mathvariant="normal">2</mn></mrow></msup><mspace width="0.125em" linebreak="nobreak"/><mrow class="unit"><mi mathvariant="normal">Hz</mi></mrow></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="57pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="1eb5b22bd73ec5f408035b3557276a3c"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-18-4923-2025-ie00001.svg" width="57pt" height="14pt" src="amt-18-4923-2025-ie00001.png"/></svg:svg></span></span>. A system cross-intercomparison of the ATMONSYS DIAL with two adjacent water vapor Raman lidars and radiosondes shows overall good agreement between the sensors, despite minor DIAL deficiencies under certain conditions with broken clouds passing over the lidar. The observed profile-to-profile DIAL fluctuations and sensor-to-sensor deviations, in combination with low statistical uncertainty, highlight the advantage of humidity lidars, such as the ATMONSYS DIAL, in capturing both short-term and small-scale dynamics of the lowermost atmosphere.</p>

Environmental engineering, Earthwork. Foundations
arXiv Open Access 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks

WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya et al.

We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across various downstream tasks including both understanding and generative tasks. We specifically evaluated this approach on representative tasks such as music tagging, music transcription, music source separation, and music mixing. Our results reveal that the features extracted from foundation models provide valuable enhancements in training downstream task models. This highlights the capability of using features extracted from music foundation models as a booster for downstream tasks. Our approach not only benefits existing task-specific models but also supports music downstream tasks constrained by data scarcity. This paves the way for more effective and accessible music processing solutions.

en cs.SD, cs.IR
arXiv Open Access 2024
Annotation Free Semantic Segmentation with Vision Foundation Models

Soroush Seifi, Daniel Olmeda Reino, Fabien Despinoy et al.

Semantic Segmentation is one of the most challenging vision tasks, usually requiring large amounts of training data with expensive pixel level annotations. With the success of foundation models and especially vision-language models, recent works attempt to achieve zeroshot semantic segmentation while requiring either large-scale training or additional image/pixel level annotations. In this work, we generate free annotations for any semantic segmentation dataset using existing foundation models. We use CLIP to detect objects and SAM to generate high quality object masks. Next, we build a lightweight module on top of a self-supervised vision encoder, DinoV2, to align the patch features with a pretrained text encoder for zeroshot semantic segmentation. Our approach can bring language-based semantics to any pretrained vision encoder with minimal training, uses foundation models as the sole source of supervision and generalizes from little training data with no annotation.

en cs.CV
arXiv Open Access 2024
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Lingyi Hong, Shilin Yan, Renrui Zhang et al.

Visual object tracking aims to localize the target object of each frame based on its initial appearance in the first frame. Depending on the input modility, tracking tasks can be divided into RGB tracking and RGB+X (e.g. RGB+N, and RGB+D) tracking. Despite the different input modalities, the core aspect of tracking is the temporal matching. Based on this common ground, we present a general framework to unify various tracking tasks, termed as OneTracker. OneTracker first performs a large-scale pre-training on a RGB tracker called Foundation Tracker. This pretraining phase equips the Foundation Tracker with a stable ability to estimate the location of the target object. Then we regard other modality information as prompt and build Prompt Tracker upon Foundation Tracker. Through freezing the Foundation Tracker and only adjusting some additional trainable parameters, Prompt Tracker inhibits the strong localization ability from Foundation Tracker and achieves parameter-efficient finetuning on downstream RGB+X tracking tasks. To evaluate the effectiveness of our general framework OneTracker, which is consisted of Foundation Tracker and Prompt Tracker, we conduct extensive experiments on 6 popular tracking tasks across 11 benchmarks and our OneTracker outperforms other models and achieves state-of-the-art performance.

en cs.CV
arXiv Open Access 2024
Real-World Robot Applications of Foundation Models: A Review

Kento Kawaharazuka, Tatsuya Matsushima, Andrew Gambardella et al.

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), trained on extensive data, facilitate flexible application across different tasks and modalities. Their impact spans various fields, including healthcare, education, and robotics. This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems. The summary encompasses the perspective of input-output relationships in foundation models, as well as their role in perception, motion planning, and control within the field of robotics. This paper concludes with a discussion of future challenges and implications for practical robot applications.

en cs.RO, cs.AI
DOAJ Open Access 2023
A semi-Lagrangian method for detecting and tracking deep convective clouds in geostationary satellite observations

W. K. Jones, M. W. Christensen, P. Stier

<p>Automated methods for the detection and tracking of deep convective clouds in geostationary satellite imagery have a vital role in both the forecasting of severe storms and research into their behaviour. Studying the interactions and feedbacks between multiple deep convective clouds (DCC), however, poses a challenge for existing algorithms due to the necessary compromise between false detection and missed detection errors. We utilise an optical flow method to determine the motion of deep convective clouds in GOES-16 ABI imagery in order to construct a semi-Lagrangian framework for the motion of the cloud field, independently of the detection and tracking of cloud objects. The semi-Lagrangian framework allows severe storms to be simultaneously detected and tracked in both spatial and temporal dimensions. For the purpose of this framework we have developed a novel Lagrangian convolution method and a number of novel implementations of morphological image operations that account for the motion of observed objects. These novel methods allow the accurate extension of computer vision techniques to the temporal domain for moving objects such as DCCs. By combining this framework with existing methods for detecting DCCs (including detection of growing cores through cloud top cooling and detection of anvil clouds using brightness temperature), we show that the novel framework enables reductions in errors due to both false and missed detections compared to any of the individual methods, reducing the need to compromise when compared with existing frameworks. The novel framework enables the continuous tracking of anvil clouds associated with detected deep convection after convective activity has stopped, enabling the study of the entire life cycle of DCCs and their associated anvils. Furthermore, we expect this framework to be applicable to a wide range of cases including the detection and tracking of low-level clouds and other atmospheric phenomena. In addition, this framework may be used to combine observations from multiple sources, including satellite observations, weather radar and reanalysis model data.</p>

Environmental engineering, Earthwork. Foundations

Halaman 45 dari 31989