Hasil untuk "Earthwork. Foundations"

Menampilkan 20 dari ~639800 hasil · dari DOAJ, arXiv, Semantic Scholar, CrossRef

JSON API
DOAJ Open Access 2026
Dynamic quantification of methane emissions at facility scale using laser tomography: demonstration of a farm deployment

K. Scheel, K. Scheel, E. Vänskä et al.

<p>Detecting and quantifying greenhouse gas (GHG) emissions is essential for understanding global GHG budgets, updating emission inventories, and evaluating climate change mitigation efforts. Most anthropogenic emissions occur at the scale of facilities, and emission distribution in time and space relates to facility operations. This paper presents a novel GHG monitoring technique for facility-scale, dynamic emission quantification under complex wind conditions, referred to as laser dispersion tomography (LDT), which integrates laser dispersion spectroscopy (LDS) with Bayesian inversion methods. It uses sequential multi-beam open-path LDS measurements and wind data to infer dynamic GHG concentration and source maps at facility scale. In this work, the use of LDT for monitoring methane emissions in agriculture is demonstrated by deploying it on an operational farm. For this aim, computational methods used in data analysis of LDT are also further developed. Particularly, we introduce spatial constraints to the tomographic reconstruction based on prior knowledge on potential source locations – information often available in facility-scale GHG monitoring applications. We investigate numerically whether such constraints could improve the tolerance of LDT to misrepresentations induced by complex wind fields caused by building effects, and/or presence of interfering external emission sources, both highly likely to characterize a real-world farm environment. The results of numerical studies indicate that including spatial constraints reduces the uncertainty and improves the reliability of source quantification in such conditions, with one simulation case showing an average reduction in posterior uncertainty of 36.2 %. In the experimental study, dynamic emission patterns caused by various operations in the farm, such as slurry and dry manure management, are well captured, both temporally and spatially. The results support the feasibility of LDT as a tool for robust quantification of GHG mass emission rates at farms, especially when the spatial constraining of sources is possible. Owing to the fine spatial and temporal resolution of LDT, we foresee its use in improving GHG emission inventories through fine parametrization, and also its extension to other GHGs and other sectors contributing to global emissions.</p>

Environmental engineering, Earthwork. Foundations
arXiv Open Access 2026
EEG Foundation Models: Progresses, Benchmarking, and Open Problems

Dingkun Liu, Yuheng Chen, Zhu Chen et al.

Electroencephalography (EEG) foundation models have recently emerged as a promising paradigm for brain-computer interfaces (BCIs), aiming to learn transferable neural representations from large-scale heterogeneous recordings. Despite rapid progresses, there lacks fair and comprehensive comparisons of existing EEG foundation models, due to inconsistent pre-training objectives, preprocessing choices, and downstream evaluation protocols. This paper fills this gap. We first review 50 representative models and organize their design choices into a unified taxonomic framework including data standardization, model architectures, and self-supervised pre-training strategies. We then evaluate 12 open-source foundation models and competitive specialist baselines across 13 EEG datasets spanning nine BCI paradigms. Emphasizing real-world deployments, we consider both cross-subject generalization under a leave-one-subject-out protocol and rapid calibration under a within-subject few-shot setting. We further compare full-parameter fine-tuning with linear probing to assess the transferability of pre-trained representations, and examine the relationship between model scale and downstream performance. Our results indicate that: 1) linear probing is frequently insufficient; 2) specialist models trained from scratch remain competitive across many tasks; and, 3) larger foundation models do not necessarily yield better generalization performance under current data regimes and training practices.

en cs.LG, cs.CV
arXiv Open Access 2026
Mechanistic Foundations of Goal-Directed Control

Alma Lago

Mechanistic interpretability has transformed the analysis of transformer circuits by decomposing model behavior into competing algorithms, identifying phase transitions during training, and deriving closed-form predictions for when and why strategies shift. However, this program has remained largely confined to sequence-prediction architectures, leaving embodied control systems without comparable mechanistic accounts. Here we extend this framework to sensorimotor-cognitive development, using infant motor learning as a model system. We show that foundational inductive biases give rise to causal control circuits, with learned gating mechanisms converging toward theoretically motivated uncertainty thresholds. The resulting dynamics reveal a clean phase transition in the arbitration gate whose commitment behavior is well described by a closed-form exponential moving-average surrogate. We identify context window k as the critical parameter governing circuit formation: below a minimum threshold (k$\leq$4) the arbitration mechanism cannot form; above it (k$\geq$8), gate confidence scales asymptotically as log k. A two-dimensional phase diagram further reveals task-demand-dependent route arbitration consistent with the prediction that prospective execution becomes advantageous only when prediction error remains within the task tolerance window. Together, these results provide a mechanistic account of how reactive and prospective control strategies emerge and compete during learning. More broadly, this work sharpens mechanistic accounts of cognitive development and provides principled guidance for the design of interpretable embodied agents.

en cs.LG, eess.SY
DOAJ Open Access 2025
A method to retrieve mixed-phase cloud vertical structure from airborne lidar

E. Crosbie, E. Crosbie, J. W. Hair et al.

<p>A technique was developed to provide cloud phase information using data collected by the NASA Langley airborne High Spectral Resolution Lidar systems with a particular emphasis on mixed-phase cloud conditions, where boundaries and gradients in the distribution of ice and liquid water are critically important for microphysical and radiative processes. The method is based on the established use of depolarization to identify ice particles but incorporates a new method to separate the ice depolarization from the depolarization produced by multiple scattering in dense liquid clouds. Clouds known to be liquid-only based on ambient temperature were used to train an empirical model of the multiple-scattering depolarization that results at different ranges from the lidar. The method classifies lidar observations as liquid-dominant, mixed-phase, and ice-dominant and has an additional categorization for oriented ice. For evaluation of the retrieval, a two-aircraft approach was used with the lidar observing the same clouds that were concurrently being sampled with in situ microphysical probes. Aircraft matchups were able to track the individual cloud elements and capture marked changes in the distribution of liquid and ice across flight segments of typically 20–100 km. Qualitative features relating to localized changes in the cloud-top temperature, cloud morphology, and convective circulations were generally replicated between the lidar phase classification and the in situ microphysical data. Quantitative evaluation of the phase classification was carried out using a subset of 15 cloud scenes that satisfied strict aircraft collocation and microphysical requirements. Using the in situ microphysical data, it was found that ice extinction fractions of 14 % and 76 % most closely matched the upper and lower bounds of the lidar mixed-phase classification.</p>

Environmental engineering, Earthwork. Foundations
DOAJ Open Access 2025
The Arctic Weather Satellite radiometer

P. Eriksson, A. Emrich, K. Kempe et al.

<p>The Arctic Weather Satellite (AWS) is a project led by the European Space Agency (ESA) that has several novel aspects. From a technical perspective, it serves as a demonstrator of how to expand the network of operational satellite-based microwave sensors cost-effectively and acts as the proto-flight model for a suggested constellation of satellites, denoted as EUMETSAT Polar System (EPS) Sterna. The design philosophy has been to reduce complexity and instead focus the efforts on critical parts and characterise the instrument well before the launch. The single instrument onboard is a 19-channel microwave cross-track radiometer. There are 15 channels covering ranges around 54, 89 and 174 <span class="inline-formula">GHz</span>. These are channels similar to ones found on existing sensors, however, thanks to the short development process, allowing use of more modern and recent technology, the performance and resolution of these channels on AWS exceed or match similar sensors, despite being a small satellite. Additionally, four channels around 325.15 <span class="inline-formula">GHz</span> form a completely new frequency band for observations from space. The addition of these new channels aims to improve sensitivity to ice hydrometeors.</p> <p>In this article, we outline the mission and describe the instrument in detail, to support the usage of radiances measured by AWS. The satellite was launched in August 2024, and the status towards the end of the commissioning phase is reflected here. For example, a characterisation of the noise performance is provided, showing that the target specifications have been met, for most channels with a margin. This is except for two channels identified to have technical issues already before the launch. If EPS-Sterna is selected by EUMETSAT, these and other identified problems will be corrected, but otherwise the constellation is expected to consist of recurrent models of AWS with minor modifications.</p>

Environmental engineering, Earthwork. Foundations
arXiv Open Access 2025
CytoFM: The first cytology foundation model

Vedrana Ivezić, Ashwath Radhachandran, Ekaterina Redekop et al.

Cytology is essential for cancer diagnostics and screening due to its minimally invasive nature. However, the development of robust deep learning models for digital cytology is challenging due to the heterogeneity in staining and preparation methods of samples, differences across organs, and the limited availability of large, diverse, annotated datasets. Developing a task-specific model for every cytology application is impractical and non-cytology-specific foundation models struggle to generalize to tasks in this domain where the emphasis is on cell morphology. To address these challenges, we introduce CytoFM, the first cytology self-supervised foundation model. Using iBOT, a self-supervised Vision Transformer (ViT) training framework incorporating masked image modeling and self-distillation, we pretrain CytoFM on a diverse collection of cytology datasets to learn robust, transferable representations. We evaluate CytoFM on multiple downstream cytology tasks, including breast cancer classification and cell type identification, using an attention-based multiple instance learning framework. Our results demonstrate that CytoFM performs better on two out of three downstream tasks than existing foundation models pretrained on histopathology (UNI) or natural images (iBOT-Imagenet). Visualizations of learned representations demonstrate our model is able to attend to cytologically relevant features. Despite a small pre-training dataset, CytoFM's promising results highlight the ability of task-agnostic pre-training approaches to learn robust and generalizable features from cytology data.

en cs.CV
arXiv Open Access 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey

Dong Li, Guihong Wan, Xintao Wu et al.

Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images. While early developments centered on uni-modal models trained solely on visual data, recent advances have highlighted the promise of multi-modal foundation models that integrate heterogeneous data sources such as textual reports, structured domain knowledge, and molecular profiles. In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 32 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 28 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. We aim for this survey to serve as a valuable resource for researchers and practitioners working at the intersection of pathology and AI.

en cs.CV, cs.AI
arXiv Open Access 2025
REOBench: Benchmarking Robustness of Earth Observation Foundation Models

Xiang Li, Yong Tao, Siyuan Zhang et al.

Earth observation foundation models have shown strong generalization across multiple Earth observation tasks, but their robustness under real-world perturbations remains underexplored. To bridge this gap, we introduce REOBench, the first comprehensive benchmark for evaluating the robustness of Earth observation foundation models across six tasks and twelve types of image corruptions, including both appearance-based and geometric perturbations. To ensure realistic and fine-grained evaluation, our benchmark focuses on high-resolution optical remote sensing images, which are widely used in critical applications such as urban planning and disaster response. We conduct a systematic evaluation of a broad range of models trained using masked image modeling, contrastive learning, and vision-language pre-training paradigms. Our results reveal that (1) existing Earth observation foundation models experience significant performance degradation when exposed to input corruptions. (2) The severity of degradation varies across tasks, model architectures, backbone sizes, and types of corruption, with performance drop varying from less than 1% to over 20%. (3) Vision-language models show enhanced robustness, particularly in multimodal tasks. REOBench underscores the vulnerability of current Earth observation foundation models to real-world corruptions and provides actionable insights for developing more robust and reliable models. Code and data are publicly available at https://github.com/lx709/REOBench.

en cs.CV
arXiv Open Access 2025
PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles

Yitao Long, Yuru Jiang, Hongjun Liu et al.

This work investigates the reasoning and planning capabilities of foundation models and their scalability in complex, dynamic environments. We introduce PuzzlePlex, a benchmark designed to assess these capabilities through a diverse set of puzzles. PuzzlePlex consists of 15 types of puzzles, including deterministic and stochastic games of varying difficulty, as well as single-player and two-player scenarios. The PuzzlePlex framework provides a comprehensive environment for each game, and supports extensibility to generate more challenging instances as foundation models evolve. Additionally, we implement customized game-playing strategies for comparison. Building on this benchmark, we develop fine-grained metrics to measure performance and conduct an in-depth analysis of frontier foundation models across two settings: instruction-based and code-based. Furthermore, we systematically investigate their scaling limits. Our findings show that reasoning models outperform others in instruction-based settings, while code-based execution presents greater challenges but offers a scalable and efficient alternative. PuzzlePlex enables targeted evaluation and guides future improvements in reasoning, planning, and generalization for foundation models.

en cs.AI, cs.CL
arXiv Open Access 2025
Towards Decentralized and Sustainable Foundation Model Training with the Edge

Leyang Xue, Meghana Madhyastha, Randal Burns et al.

Foundation models are at the forefront of AI research, appealing for their ability to learn from vast datasets and cater to diverse tasks. Yet, their significant computational demands raise issues of environmental impact and the risk of centralized control in their development. We put forward a vision towards decentralized and sustainable foundation model training that leverages the collective compute of sparingly used connected edge AI devices. We present the rationale behind our vision, particularly in support of its sustainability benefit. We further outline a set of challenges that need to be addressed to turn this vision into reality.

en cs.LG
arXiv Open Access 2025
On the Status of Foundation Models for SAR Imagery

Nathan Inkawhich

In this work we investigate the viability of foundational AI/ML models for Synthetic Aperture Radar (SAR) object recognition tasks. We are inspired by the tremendous progress being made in the wider community, particularly in the natural image domain where frontier labs are training huge models on web-scale datasets with unprecedented computing budgets. It has become clear that these models, often trained with Self-Supervised Learning (SSL), will transform how we develop AI/ML solutions for object recognition tasks - they can be adapted downstream with very limited labeled data, they are more robust to many forms of distribution shift, and their features are highly transferable out-of-the-box. For these reasons and more, we are motivated to apply this technology to the SAR domain. In our experiments we first run tests with today's most powerful visual foundational models, including DINOv2, DINOv3 and PE-Core and observe their shortcomings at extracting semantically-interesting discriminative SAR target features when used off-the-shelf. We then show that Self-Supervised finetuning of publicly available SSL models with SAR data is a viable path forward by training several AFRL-DINOv2s and setting a new state-of-the-art for SAR foundation models, significantly outperforming today's best SAR-domain model SARATR-X. Our experiments further analyze the performance trade-off of using different backbones with different downstream task-adaptation recipes, and we monitor each model's ability to overcome challenges within the downstream environments (e.g., extended operating conditions and low amounts of labeled data). We hope this work will inform and inspire future SAR foundation model builders, because despite our positive results, we still have a long way to go.

en cs.CV, eess.IV
arXiv Open Access 2025
Theoretical Foundations of Representation Learning using Unlabeled Data: Statistics and Optimization

Pascal Esser, Maximilian Fleissner, Debarghya Ghoshdastidar

Representation learning from unlabeled data has been extensively studied in statistics, data science and signal processing with a rich literature on techniques for dimension reduction, compression, multi-dimensional scaling among others. However, current deep learning models use new principles for unsupervised representation learning that cannot be easily analyzed using classical theories. For example, visual foundation models have found tremendous success using self-supervision or denoising/masked autoencoders, which effectively learn representations from massive amounts of unlabeled data. However, it remains difficult to characterize the representations learned by these models and to explain why they perform well for diverse prediction tasks or show emergent behavior. To answer these questions, one needs to combine mathematical tools from statistics and optimization. This paper provides an overview of recent theoretical advances in representation learning from unlabeled data and mentions our contributions in this direction.

en cs.LG
arXiv Open Access 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries

Neil He, Jiahong Liu, Buze Zhang et al.

In the era of foundation models and Large Language Models (LLMs), Euclidean space has been the de facto geometric setting for machine learning architectures. However, recent literature has demonstrated that this choice comes with fundamental limitations. At a large scale, real-world data often exhibits inherently non-Euclidean structures, such as multi-way relationships, hierarchies, symmetries, and non-isotropic scaling, in a variety of domains, such as languages, vision, and the natural sciences. It is challenging to effectively capture these structures within the constraints of Euclidean spaces. This position paper argues that moving beyond Euclidean geometry is not merely an optional enhancement but a necessity to maintain the scaling law for the next-generation of foundation models. By adopting these geometries, foundation models could more efficiently leverage the aforementioned structures. Task-aware adaptability that dynamically reconfigures embeddings to match the geometry of downstream applications could further enhance efficiency and expressivity. Our position is supported by a series of theoretical and empirical investigations of prevalent foundation models. Finally, we outline a roadmap for integrating non-Euclidean geometries into foundation models, including strategies for building geometric foundation models via fine-tuning, training from scratch, and hybrid approaches.

en cs.LG, cs.AI
arXiv Open Access 2025
PRISM: Distributed Inference for Foundation Models at Edge

Muhammad Azlan Qazi, Alexandros Iosifidis, Qi Zhang

Foundation models (FMs) have achieved remarkable success across a wide range of applications, from image classification to natural langurage processing, but pose significant challenges for deployment at edge. This has sparked growing interest in developing practical and efficient strategies for bringing foundation models to edge environments. In this work, we propose PRISM, a communication-efficient and compute-aware strategy for distributed Transformer inference on edge devices. Our method leverages a Segment Means representation to approximate intermediate output features, drastically reducing inter-device communication. Additionally, we restructure the self-attention mechanism to eliminate redundant computations caused by per-device Key/Value calculation in position-wise partitioning and design a partition-aware causal masking scheme tailored for autoregressive models. We evaluate PRISM on ViT, BERT, and GPT-2 across diverse datasets, namely CIFAR-10, CIFAR-100, ImageNet-1k, GLUE, and CBT. Our results demonstrate substantial reductions in communication overhead (up to 99.2% for BERT at compression rate CR = 128) and per-device computation (51.24% for BERT at the same setting), with only minor accuracy degradation. This method offers a scalable and practical solution for deploying foundation models in distributed resource-constrained environments.

en cs.LG, cs.AI
DOAJ Open Access 2024
Geometrical and optical properties of cirrus clouds in Barcelona, Spain: analysis with the two-way transmittance method of 4 years of lidar measurements

C. Gil-Díaz, M. Sicard, M. Sicard et al.

<p>In this paper a statistical study of cirrus geometrical and optical properties based on 4 years of continuous ground-based lidar measurements with the Barcelona (Spain) Micro Pulse Lidar (MPL) is analysed. First, a review of the literature on the two-way transmittance method is presented. This method is a well-known lidar inversion method used to retrieve the optical properties of an aerosol–cloud layer between two molecular (i.e. aerosol and cloud-free) regions below and above, without the need to make any a priori assumptions about their optical and/or microphysical properties. Second, a simple mathematical expression of the two-way transmittance method is proposed for both ground-based and spaceborne lidar systems. This approach of the method allows the retrieval of the cloud optical depth, the cloud column lidar ratio and the vertical profile of the cloud backscatter coefficient. The method is illustrated for a cirrus cloud using measurements from the ground-based MPL and from the spaceborne Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP). Third, the database is then filtered with a cirrus identification criterion based on (and compared to) the literature using only lidar and radiosonde data. During the period from November 2018 to September 2022, 367 high-altitude cirrus clouds were identified at 00:00 and 12:00 UTC, of which 203 were successfully inverted with the two-way transmittance method. The statistical results of these 203 high-altitude cirrus clouds show that the cloud thickness is 1.8 <span class="inline-formula">±</span> 1.1 km, the mid-cloud temperature is <span class="inline-formula">−</span>51 <span class="inline-formula">±</span> 8 <span class="inline-formula"><sup>∘</sup></span>C and the linear cloud depolarization ratio is 0.32 <span class="inline-formula">±</span> 0.13. The application of the transmittance method yields an average cloud optical depth (COD) of 0.36 <span class="inline-formula">±</span> 0.45 and a mean effective column lidar ratio of 30 <span class="inline-formula">±</span> 19 sr. Statistical results of the errors associated with the two-way transmittance method retrievals are also provided. The highest occurrence of cirrus is observed in spring and the majority of cirrus clouds (48 %) are visible (0.03 <span class="inline-formula">&lt;</span> COD <span class="inline-formula">&lt;</span> 0.3), followed by opaque (COD <span class="inline-formula">&gt;</span> 0.3) with a percentage of 38 %. Together with results from other sites, possible latitudinal dependencies have been analysed together with correlations between cirrus cloud properties. For example, we noted that in Barcelona the COD correlates positively with the cloud base temperature, effective column lidar ratio and linear cloud depolarization ratio and negatively with the cloud base height.</p>

Environmental engineering, Earthwork. Foundations
DOAJ Open Access 2024
Using open-path dual-comb spectroscopy to monitor methane emissions from simulated grazing cattle

C. Weerasekara, L. C. Morris, N. A. Malarich et al.

<p>Accurate whole-farm or herd-level measurements of livestock methane emissions are necessary for anthropogenic greenhouse gas inventories and to evaluate mitigation strategies. A controlled methane (CH<span class="inline-formula"><sub>4</sub></span>) release experiment was performed to determine if dual-comb spectroscopy (DCS) can detect CH<span class="inline-formula"><sub>4</sub></span> concentration enhancements produced by a typical herd of beef cattle in an extensive grazing system. Open-path DCS was used to measure downwind and upwind CH<span class="inline-formula"><sub>4</sub></span> concentrations from 10 point sources of methane simulating cattle emissions. The CH<span class="inline-formula"><sub>4</sub></span> mole fractions and wind velocity data were used to calculate CH<span class="inline-formula"><sub>4</sub></span> flux using an inverse dispersion model, and the simulated fluxes were then compared to the actual CH<span class="inline-formula"><sub>4</sub></span> release rate. For a source located 60 m from the downwind path, the DCS system detected 10 nmol mol<span class="inline-formula"><sup>−1</sup></span> CH<span class="inline-formula"><sub>4</sub></span> horizontal concentration gradient above the atmospheric background concentration with a precision of 6 nmol mol<span class="inline-formula"><sup>−1</sup></span> in 15 min interval. A CH<span class="inline-formula"><sub>4</sub></span> release of 3970 g d<span class="inline-formula"><sup>−1</sup></span> was performed, resulting in an average concentration enhancement of 24 nmol mol<span class="inline-formula"><sup>−1</sup></span> of CH<span class="inline-formula"><sub>4</sub></span>. The calculated CH<span class="inline-formula"><sub>4</sub></span> flux was 4002 g d<span class="inline-formula"><sup>−1</sup></span>, showing good agreement with the actual CH<span class="inline-formula"><sub>4</sub></span> release rate. Periodically altering the downwind path, which may be needed to track moving cattle, did not adversely affect the ability of the instruments to determine the CH<span class="inline-formula"><sub>4</sub></span> flux. These results give us confidence that CH<span class="inline-formula"><sub>4</sub></span> flux can be determined by grazing cattle with low disturbance and direct field-scale measurements.</p>

Environmental engineering, Earthwork. Foundations
arXiv Open Access 2024
Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.

en cs.LG, cs.AI
arXiv Open Access 2024
A New Type of Foundation Model Based on Recordings of People's Emotions and Physiology

David Gamez, Dionis Barcari, Aliya Grig

Foundation models have had a big impact in recent years and billions of dollars are being invested in them in the current AI boom. The more popular ones, such as Chat-GPT, are trained on large amounts of data from the Internet, and then reinforcement learning, RAG, prompt engineering and cognitive modelling are used to fine-tune and augment their behavior. This technology has been used to create models of individual people, such as Caryn Marjorie. However, these chatbots are not based on people's actual emotional and physiological responses to their environment, so they are, at best, surface-level approximations to the characters they are imitating. This paper describes how a new type of foundation model - a first-person foundation model - could be created from recordings of what a person sees and hears as well as their emotional and physiological reactions to these stimuli. A first-person foundation model would map environmental stimuli to a person's emotional and physiological states, and map a person's emotional and physiological states to their behavior. First-person foundation models have many exciting applications, including a new type of recommendation engine, personal assistants, generative adversarial networks, dating and recruitment. To obtain training data for a first-person foundation model, we have developed a recording rig that captures what the wearer is seeing and hearing as well as their emotional and physiological states. This novel source of data could help to address the shortage of new data for building the next generation of foundation models.

en cs.AI, cs.LG

Halaman 48 dari 31990