Hasil "Earthwork. Foundations"

arXiv Open Access 2026

Leon Götz, Lars Frederik Peiss, Erik Sauer et al.

Virtual sensors use machine learning to predict target signals from available measurements, replacing expensive physical sensors in critical applications. Existing virtual sensor approaches require application-specific models with hand-selected inputs for each sensor, cannot leverage task synergies, and lack consistent benchmarks. At the same time, emerging time series foundation models are computationally expensive and limited to predicting their input signals, making them incompatible with virtual sensors. We introduce the first foundation model for virtual sensors addressing both limitations. Our unified model can simultaneously predict diverse virtual sensors exploiting synergies while maintaining computational efficiency. It learns relevant input signals for each virtual sensor, eliminating expert knowledge requirements while adding explainability. In our large-scale evaluation on a standard benchmark and an application-specific dataset with over 18 billion samples, our architecture achieves 415x reduction in computation time and 951x reduction in memory requirements, while maintaining or even improving predictive quality compared to baselines. Our model scales gracefully to hundreds of virtual sensors with nearly constant parameter count, enabling practical deployment in large-scale sensor networks.

en cs.LG

Detail Sumber

DOAJ Open Access 2025

Assimilation of global navigation satellite system (GNSS) zenith delays and tropospheric gradients: a sensitivity study utilizing sparse and dense station networks

R. Thundathil, R. Thundathil, F. Zus et al.

The assimilation of global navigation satellite system (GNSS) zenith total delays (ZTDs) into numerical weather models improves weather forecasts. In addition, the GNSS tropospheric gradient (TG) estimates provide valuable insight into the moisture distribution in the lower troposphere. In this study, we utilize a newly developed forward operator for TGs to investigate the sensitivity effects of incorporating TGs into the Weather Research and Forecasting model at varying station network densities. We assimilated ZTD and TGs from dense and sparse station networks (0.5 and 1°, respectively). Through this study, we found that the improvement in the humidity field with the assimilation of ZTD and TGs from the sparse station network (1° resolution) is comparable to the improvement achieved by assimilating ZTD only from the dense station network (0.5° resolution). These results encourage the assimilation of TGs alongside ZTDs in operational weather forecasting agencies, especially in regions with few GNSS stations. Conversely, assimilating TGs alongside ZTDs from sparse GNSS networks can be a cost-effective way to enhance the accuracy of the model fields and subsequent forecast quality.

Environmental engineering, Earthwork. Foundations

Detail DOI Sumber

arXiv Open Access 2025

Finetuning a Weather Foundation Model with Lightweight Decoders for Unseen Physical Processes

Fanny Lehmann, Firat Ozdemir, Benedikt Soja et al.

Recent advances in AI weather forecasting have led to the emergence of so-called "foundation models", typically defined by expensive pretraining and minimal fine-tuning for downstream tasks. However, in the natural sciences, a desirable foundation model should also encode meaningful statistical relationships between the underlying physical variables. This study evaluates the performance of the state-of-the-art Aurora foundation model in predicting hydrological variables, which were not considered during pretraining. We introduce a lightweight approach using shallow decoders trained on the latent representations of the pretrained model to predict these new variables. As a baseline, we compare this to fine-tuning the full model, which allows further optimization of the latent space while incorporating new variables into both inputs and outputs. The decoder-based approach requires 50% less training time and 35% less memory, while achieving strong accuracy across various hydrological variables and preserving desirable properties of the foundation model, such as autoregressive stability. Notably, decoder accuracy depends on the physical correlation between the new variables and those used during pretraining, indicating that Aurora's latent space captures meaningful physical relationships. In this sense, we argue that an important quality metric for foundation models in Earth sciences is their ability to be extended to new variables without a full fine-tuning. This provides a new perspective for making foundation models more accessible to communities with limited computational resources, while supporting broader adoption in Earth sciences.

en cs.LG

Detail Sumber

arXiv Open Access 2025

Resource-efficient Inference with Foundation Model Programs

Lunyiu Nie, Zhimin Ding, Kevin Yu et al.

The inference-time resource costs of large language and vision models present a growing challenge in production deployments. We propose the use of foundation model programs, i.e., programs that can invoke foundation models with varying resource costs and performance, as an approach to this problem. Specifically, we present a method that translates a task into a program, then learns a policy for resource allocation that, on each input, selects foundation model "backends" for each program module. The policy uses smaller, cheaper backends to handle simpler subtasks, while allowing more complex subtasks to leverage larger, more capable models. We evaluate the method on two new "streaming" visual question-answering tasks in which a system answers a question on a sequence of inputs, receiving ground-truth feedback after each answer. Compared to monolithic multi-modal models, our implementation achieves up to 98% resource savings with minimal accuracy loss, demonstrating its potential for scalable and resource-efficient multi-modal inference.

en cs.LG

Detail Sumber

arXiv Open Access 2025

On the Status of Foundation Models for SAR Imagery

Nathan Inkawhich

In this work we investigate the viability of foundational AI/ML models for Synthetic Aperture Radar (SAR) object recognition tasks. We are inspired by the tremendous progress being made in the wider community, particularly in the natural image domain where frontier labs are training huge models on web-scale datasets with unprecedented computing budgets. It has become clear that these models, often trained with Self-Supervised Learning (SSL), will transform how we develop AI/ML solutions for object recognition tasks - they can be adapted downstream with very limited labeled data, they are more robust to many forms of distribution shift, and their features are highly transferable out-of-the-box. For these reasons and more, we are motivated to apply this technology to the SAR domain. In our experiments we first run tests with today's most powerful visual foundational models, including DINOv2, DINOv3 and PE-Core and observe their shortcomings at extracting semantically-interesting discriminative SAR target features when used off-the-shelf. We then show that Self-Supervised finetuning of publicly available SSL models with SAR data is a viable path forward by training several AFRL-DINOv2s and setting a new state-of-the-art for SAR foundation models, significantly outperforming today's best SAR-domain model SARATR-X. Our experiments further analyze the performance trade-off of using different backbones with different downstream task-adaptation recipes, and we monitor each model's ability to overcome challenges within the downstream environments (e.g., extended operating conditions and low amounts of labeled data). We hope this work will inform and inspire future SAR foundation model builders, because despite our positive results, we still have a long way to go.

en cs.CV, eess.IV

Detail Sumber

arXiv Open Access 2025

Foundations of Safe Online Reinforcement Learning in the Linear Quadratic Regulator: $\sqrt{T}$-Regret

Benjamin Schiffer, Lucas Janson

Understanding how to efficiently learn while adhering to safety constraints is essential for using online reinforcement learning in practical applications. However, proving rigorous regret bounds for safety-constrained reinforcement learning is difficult due to the complex interaction between safety, exploration, and exploitation. In this work, we seek to establish foundations for safety-constrained reinforcement learning by studying the canonical problem of controlling a one-dimensional linear dynamical system with unknown dynamics. We study the safety-constrained version of this problem, where the state must with high probability stay within a safe region, and we provide the first safe algorithm that achieves regret of $\tilde{O}_T(\sqrt{T})$. Furthermore, the regret is with respect to the baseline of truncated linear controllers, a natural baseline of non-linear controllers that are well-suited for safety-constrained linear systems. In addition to introducing this new baseline, we also prove several desirable continuity properties of the optimal controller in this baseline. In showing our main result, we prove that whenever the constraints impact the optimal controller, the non-linearity of our controller class leads to a faster rate of learning than in the unconstrained setting.

en stat.ML, cs.LG

Detail Sumber

arXiv Open Access 2025

SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models

Jigang Fan, Zhenghong Zhou, Ruofan Jin et al.

Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at https://github.com/jigang-fan/SafeProtein.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2025

Information Filtering Networks: Theoretical Foundations, Generative Methodologies, and Real-World Applications

Tomaso Aste

Information Filtering Networks (IFNs) provide a powerful framework for modeling complex systems through globally sparse yet locally dense and interpretable structures that capture multivariate dependencies. This review offers a comprehensive account of IFNs, covering their theoretical foundations, construction methodologies, and diverse applications. Tracing their origins from early network-based models to advanced formulations such as the Triangulated Maximally Filtered Graph (TMFG) and the Maximally Filtered Clique Forest (MFCF), the paper highlights how IFNs address key challenges in high-dimensional data-driven modeling. IFNs and their construction methodologies are intrinsically higher-order networks that generate simplicial complexes-structures that are only now becoming popular in the broader literature. Applications span fields including finance, biology, psychology, and artificial intelligence, where IFNs improve interpretability, computational efficiency, and predictive performance. Special attention is given to their role in graphical modeling, where IFNs enable the estimation of sparse inverse covariance matrices with greater accuracy and scalability than traditional approaches like Graphical LASSO. Finally, the review discusses recent developments that integrate IFNs with machine learning and deep learning, underscoring their potential not only to bridge classical network theory with contemporary data-driven paradigms, but also to shape the architectures of deep learning models themselves.

en cs.LG

Detail Sumber

arXiv Open Access 2025

EEG-Bench: A Benchmark for EEG Foundation Models in Clinical Applications

Ard Kastrati, Josua Bürki, Jonas Lauer et al.

We introduce a unified benchmarking framework focused on evaluating EEG-based foundation models in clinical applications. The benchmark spans 11 well-defined diagnostic tasks across 14 publicly available EEG datasets, including epilepsy, schizophrenia, Parkinson's disease, OCD, and mild traumatic brain injury. It features minimal preprocessing, standardized evaluation protocols, and enables side-by-side comparisons of classical baselines and modern foundation models. Our results show that while foundation models achieve strong performance in certain settings, simpler models often remain competitive, particularly under clinical distribution shifts. To facilitate reproducibility and adoption, we release all prepared data and code in an accessible and extensible format.

en cs.LG, cs.AI

Detail Sumber

arXiv Open Access 2025

Scalable Geospatial Data Generation Using AlphaEarth Foundations Model

Luc Houriez, Sebastian Pilarski, Behzad Vahedi et al.

High-quality labeled geospatial datasets are essential for extracting insights and understanding our planet. Unfortunately, these datasets often do not span the entire globe and are limited to certain geographic regions where data was collected. Google DeepMind's recently released AlphaEarth Foundations (AEF) provides an information-dense global geospatial representation designed to serve as a useful input across a wide gamut of tasks. In this article we propose and evaluate a methodology which leverages AEF to extend geospatial labeled datasets beyond their initial geographic regions. We show that even basic models like random forests or logistic regression can be used to accomplish this task. We investigate a case study of extending LANDFIRE's Existing Vegetation Type (EVT) dataset beyond the USA into Canada at two levels of granularity: EvtPhys (13 classes) and EvtGp (80 classes). Qualitatively, for EvtPhys, model predictions align with ground truth. Trained models achieve 81% and 73% classification accuracy on EvtPhys validation sets in the USA and Canada, despite discussed limitations.

en cs.LG, cs.CV

Detail Sumber

S2 Open Access 2025

12th-CENTURY CHURCHES IN THE CONTEXT OF MODERN LADOGA (OLD LADOGA): HISTORY AND RESULTS OF RECENT ARCHAEOLOGICAL RESEARCH

N. Grigoreva

In the 12th century, a series of six stone churches were constructed in Ladoga (presently the village of Staraya Ladoga, situated within the Volkhovsky District of the Leningrad Region) over a brief period, attesting to the city's distinctive status within North-Western medieval Rus. The chronology of the construction of these ecclesiastical edifices remains a subject of considerable scholarly debate. Two churches (St. George's and the Assumption Cathedral) have survived to the present day in their original form, while another (St. Nicholas Church) has been preserved with major alterations from the modern period. The remaining three churches are in a state of disrepair and are preserved to varying degrees. For a period of 150 years, the churches have been the focus of repeated attention from researchers in the fields of architecture, archaeology, and art history. The extant and ruined churches are considered monuments of both architecture and archaeology and are formally listed in the unified state register of cultural heritage sites of the Russian Federation. All work pertaining to the monitoring of the church foundations and earthworks in their immediate vicinity requires mandatory oversight by archaeological specialists. This is particularly pertinent given the accelerated development of the contemporary Staraya Ladoga settlement, the restoration of monasteries, and the adaptation of temple complexes for contemporary use.

en

Detail DOI Sumber

DOAJ Open Access 2024

NitroNet – a machine learning model for the prediction of tropospheric NO2 profiles from TROPOMI observations

L. Kuhn, L. Kuhn, S. Beirle et al.

We introduce NitroNet, a deep learning model for the prediction of tropospheric NO2 profiles from satellite column measurements. NitroNet is a neural network trained on synthetic NO2 profiles from the regional chemistry and transport model WRF-Chem, which was operated on a European domain for the month of May 2019. This WRF-Chem simulation was constrained by in situ and satellite measurements, which were used to optimize important simulation parameters (e.g. the boundary layer scheme). The NitroNet model receives NO2 vertical column densities (VCDs) from the TROPOspheric Monitoring Instrument (TROPOMI) and ancillary variables (meteorology, emissions, etc.) as input, from which it reproduces NO2 concentration profiles. Training of the neural network is conducted on a filtered dataset, meaning that NO2 profiles showing strong disagreement (>20 %) with colocated TROPOMI column measurements are discarded. We present a first evaluation of NitroNet over a variety of geographical and temporal domains (Europe, the US West Coast, India, and China) and different seasons. For this purpose, we validate the NO2 profiles predicted by NitroNet against satellite, in situ, and MAX-DOAS (Multi-Axis Differential Optical Absorption Spectroscopy) measurements. The training data were previously validated against the same datasets. During summertime, NitroNet shows small biases and strong correlations with all three datasets: a bias of +6.7 % and R=0.95 for TROPOMI NO2 VCDs, a bias of −10.5 % and R=0.75 for AirBase surface concentrations, and a bias of −34.3 % to +99.6 % with R=0.83–0.99 for MAX-DOAS measurements. In comparison to TROPOMI satellite data, NitroNet even shows significantly lower errors and stronger correlation than a direct comparison with WRF-Chem numerical results. During wintertime considerable low biases arise because the summertime/late-spring training data are not fully representative of all atmospheric wintertime characteristics (e.g. longer NO2 lifetimes). Nonetheless, the wintertime performance of NitroNet is surprisingly good and comparable to that of classic regional chemistry and transport models. NitroNet can demonstrably be used outside the geographic and temporal domain of the training data with only slight performance reductions. What makes NitroNet unique when compared to similar existing deep learning models is the inclusion of synthetic model data, which offers important benefits: due to the lack of NO2 profile measurements, models trained on empirical datasets are limited to the prediction of surface concentrations learned from in situ measurements. NitroNet, however, can predict full tropospheric NO2 profiles. Furthermore, in situ measurements of NO2 are known to suffer from biases, often larger than +20 %, due to cross-sensitivities to photooxidants, which other models trained on empirical data inevitably reproduce.

Environmental engineering, Earthwork. Foundations

Detail DOI Sumber

arXiv Open Access 2024

One for All: Toward Unified Foundation Models for Earth Vision

Zhitong Xiong, Yi Wang, Fahong Zhang et al.

Foundation models characterized by extensive parameters and trained on large-scale datasets have demonstrated remarkable efficacy across various downstream tasks for remote sensing data. Current remote sensing foundation models typically specialize in a single modality or a specific spatial resolution range, limiting their versatility for downstream datasets. While there have been attempts to develop multi-modal remote sensing foundation models, they typically employ separate vision encoders for each modality or spatial resolution, necessitating a switch in backbones contingent upon the input data. To address this issue, we introduce a simple yet effective method, termed OFA-Net (One-For-All Network): employing a single, shared Transformer backbone for multiple data modalities with different spatial resolutions. Using the masked image modeling mechanism, we pre-train a single Transformer backbone on a curated multi-modal dataset with this simple design. Then the backbone model can be used in different downstream tasks, thus forging a path towards a unified foundation backbone model in Earth vision. The proposed method is evaluated on 12 distinct downstream tasks and demonstrates promising performance.

en cs.CV

Detail Sumber

arXiv Open Access 2024

Foundation Models for the Digital Twin Creation of Cyber-Physical Systems

Shaukat Ali, Paolo Arcaini, Aitor Arrieta

Foundation models are trained on a large amount of data to learn generic patterns. Consequently, these models can be used and fine-tuned for various purposes. Naturally, studying such models' use in the context of digital twins for cyber-physical systems (CPSs) is a relevant area of investigation. To this end, we provide perspectives on various aspects within the context of developing digital twins for CPSs, where foundation models can be used to increase the efficiency of creating digital twins, improve the effectiveness of the capabilities they provide, and used as specialized fine-tuned foundation models acting as digital twins themselves. We also discuss challenges in using foundation models in a more generic context. We use the case of an autonomous driving system as a representative CPS to give examples. Finally, we provide discussions and open research directions that we believe are valuable for the digital twin community.

en cs.SE

Detail DOI Sumber

arXiv Open Access 2024

Quantum Spread-Spectrum CDMA Communication Systems: Mathematical Foundations

Mohammad Amir Dastgheib, Jawad A. Salehi, Mohammad Rezai

This paper describes the fundamental principles and mathematical foundations of quantum spread spectrum code division multiple access (QCDMA) communication systems. The evolution of quantum signals through the direct-sequence spread spectrum multiple access communication system is carefully characterized by a novel approach called the decomposition of creation operators. In this methodology, the creation operator of the transmitted quantum signal is decomposed into the chip-time interval creation operators each of which is defined over the duration of a chip. These chip-time interval creation operators are the invariant building blocks of the spread spectrum quantum communication systems. With the aid of the proposed chip-time decomposition approach, we can find closed-form relations for quantum signals at the receiver of such a quantum communication system. Further, the paper details the principles of narrow-band filtering of quantum signals required at the receiver, a crucial step in designing and analyzing quantum communication systems. We show that by employing coherent states as the transmitted quantum signals, the inter-user interference appears as an additive term in the magnitude of the output coherent (Glauber) state, and the output of the quantum communication system is a pure quantum signal. On the other hand, if the transmitters utilize particle-like quantum signals (Fock states) such as single photon states, entanglement and a spread spectrum version of the Hong-Ou-Mandel effect can arise at the receivers. The important techniques developed in this paper are expected to have far-reaching implications for various applications in the exciting field of quantum communication and signal processing.

en quant-ph, cs.NI

Detail Sumber

CrossRef Open Access 2024

BIM Earthwork Calculation of Open Cut Section of Intercity Railway Based on BIM+GIS technology

Mingliang Pan, Hanling Sun

en

Detail DOI Sumber

S2 Open Access 2023

ON RESEARCH OF THE MECHANISM OF MULTIFUNCTIONAL TECHNOLOGICAL BUILDING SYSTEMS

Т. Л. Чебанов

he design of multifunctional technological systems is based on well-known theories of operations research using production building systems of various levels and purposes, as well as system engineering, decision-making and optimization methods. Taking into account the provisions of efficiency and reliability systems.Production building systems, as a class of functional systems, are created and designed to implement certain tasks, which can be specialized (one task) or multifunctional (several tasks) The result of the formation of such systems is the final useful result, which is achieved through interaction and, accordingly, the mutual influence of its participants. Dynamics and the ability to change in the process of system implementation are provided by models that have similar structures and sets of indicators and parameters for the subject and product of work. Its main component can be shown in the form of systematized information about the phenomena and patterns that manifest in them. They form the theoretical foundations of the relevant aspect of technology.The decomposition of complex systems into component subsystems in order to optimize their elements is solved by formalizing design procedures and creates a method of designing multifunctional systems.Expanding the universal capabilities of construction and road machines by equipping them with additional interchangeable working bodies allows for a flexible approach to the design of multifunctional technological systems.Their effectiveness is especially evident in the design and implementation of multifunctional systems during earthworks, landscaping works, as well as the construction of agro-industrial structures from light and especially light metal structures

1 sitasi en

Detail DOI Sumber

DOAJ Open Access 2023

Short-term variability of atmospheric helium revealed through a cryo-enrichment method

B. Birner, E. Morgan, R. F. Keeling

Tropospheric helium variations are tightly linked to CO2 due to the co-emission of He and CO2 from natural-gas burning. Recently, Birner et al. (2022a) showed that the global consumption of natural gas has measurably increased the He content of the atmosphere. Like CO2, He is also predicted to exhibit complex spatial and temporal variability on shorter timescales, but measurements of these short-term variations are lacking. Here, we present the development of an improved gas delivery and purification system for the semi-continuous mass spectrometric measurement of the atmospheric He-to-nitrogen ratio (<math xmlns="http://www.w3.org/1998/Math/MathML" id="M8" display="inline" overflow="scroll" dspmath="mathml"><mrow class="chem"><mi mathvariant="normal">He</mi><mo>/</mo><msub><mi mathvariant="normal">N</mi><mn mathvariant="normal">2</mn></msub></mrow></math><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="35pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="269f5ce6c5ddaee06cc46a97dbf56995"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-16-1551-2023-ie00001.svg" width="35pt" height="14pt" src="amt-16-1551-2023-ie00001.png"/></svg:svg>). The method replaces the chemical getter used previously by Birner et al. (2021, 2022a) to preconcentrate He in an air stream with a cryogenic trap which can be more simply regenerated by heating and which improves the precision of the measurement to 22 per meg (i.e., 0.022 ‰) in 10 min (1σ). Using this “cryo-enrichment” method, we measured the <math xmlns="http://www.w3.org/1998/Math/MathML" id="M13" display="inline" overflow="scroll" dspmath="mathml"><mrow class="chem"><mi mathvariant="normal">He</mi><mo>/</mo><msub><mi mathvariant="normal">N</mi><mn mathvariant="normal">2</mn></msub></mrow></math><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="35pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="de443b31e798bf19b9a0a22e87a0d040"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-16-1551-2023-ie00002.svg" width="35pt" height="14pt" src="amt-16-1551-2023-ie00002.png"/></svg:svg> ratios in ambient air at La Jolla (California, USA) over 5 weeks in 2022. During this period, <math xmlns="http://www.w3.org/1998/Math/MathML" id="M14" display="inline" overflow="scroll" dspmath="mathml"><mrow class="chem"><mi mathvariant="normal">He</mi><mo>/</mo><msub><mi mathvariant="normal">N</mi><mn mathvariant="normal">2</mn></msub></mrow></math><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="35pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="3186ebc579eaf02f53bc4eb39bf9fc60"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-16-1551-2023-ie00003.svg" width="35pt" height="14pt" src="amt-16-1551-2023-ie00003.png"/></svg:svg> was strongly correlated with atmospheric CO2 concentrations, as expected from anthropogenic emissions, with a diurnal cycle of 450–500 per meg (max–min) caused by the sea–land breeze pattern of local winds, which modulates the influence of local pollution sources.

Environmental engineering, Earthwork. Foundations

Detail Sumber

arXiv Open Access 2023

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

Wanjun Zhong, Ruixiang Cui, Yiduo Guo et al.

Evaluating the general abilities of foundation models to tackle human-level tasks is a vital aspect of their development and application in the pursuit of Artificial General Intelligence (AGI). Traditional benchmarks, which rely on artificial datasets, may not accurately represent human-level capabilities. In this paper, we introduce AGIEval, a novel benchmark specifically designed to assess foundation model in the context of human-centric standardized exams, such as college entrance exams, law school admission tests, math competitions, and lawyer qualification tests. We evaluate several state-of-the-art foundation models, including GPT-4, ChatGPT, and Text-Davinci-003, using this benchmark. Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and math competitions, attaining a 95% accuracy rate on the SAT Math test and a 92.5% accuracy on the English test of the Chinese national college entrance exam. This demonstrates the extraordinary performance of contemporary foundation models. In contrast, we also find that GPT-4 is less proficient in tasks that require complex reasoning or specific domain knowledge. Our comprehensive analyses of model capabilities (understanding, knowledge, reasoning, and calculation) reveal these models' strengths and limitations, providing valuable insights into future directions for enhancing their general capabilities. By concentrating on tasks pertinent to human cognition and decision-making, our benchmark delivers a more meaningful and robust evaluation of foundation models' performance in real-world scenarios. The data, code, and all model outputs are released in https://github.com/ruixiangcui/AGIEval.

en cs.CL, cs.AI

Detail Sumber

arXiv Open Access 2023

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

Yafei Hu, Quanting Xie, Vidhi Jain et al.

Building general-purpose robots that operate seamlessly in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. However, as a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments. These systems require extensively-labeled data and task-specific models. When deployed in real-world scenarios, such systems face several generalization issues and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of general-purpose robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing a generalized formulation of how foundation models are used in robotics, and the fundamental barriers to making generalist robots universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository 2 of resources, including papers reviewed in this survey, as well as related projects and repositories for developing foundation models for robotics.

en cs.RO, cs.AI

Detail Sumber

Hasil untuk "Earthwork. Foundations"