Hasil untuk "machine learning"

Menampilkan 20 dari ~2799013 hasil · dari arXiv, CrossRef

JSON API
arXiv Open Access 2025
Limits To (Machine) Learning

Zhimin Chen, Bryan Kelly, Semyon Malamud

Machine learning (ML) methods are highly flexible, but their ability to approximate the true data-generating process is fundamentally constrained by finite samples. We characterize a universal lower bound, the Limits-to-Learning Gap (LLG), quantifying the unavoidable discrepancy between a model's empirical fit and the population benchmark. Recovering the true population $R^2$, therefore, requires correcting observed predictive performance by this bound. Using a broad set of variables, including excess returns, yields, credit spreads, and valuation ratios, we find that the implied LLGs are large. This indicates that standard ML approaches can substantially understate true predictability in financial data. We also derive LLG-based refinements to the classic Hansen and Jagannathan (1991) bounds, analyze implications for parameter learning in general-equilibrium settings, and show that the LLG provides a natural mechanism for generating excess volatility.

en stat.ML, cs.LG
CrossRef Open Access 2024
Our Vision for JGR: Machine Learning and Computation

E. Camporeale, R. Marino, the Editorial Board

AbstractThis editorial introduces the inaugural issue of the Journal of Geophysical Research: Machine Learning and Computation to the scientific community, elucidating the motivations and vision behind its establishment. The landscape of computational tools for geoscientists has undergone a rapid transformation in the last decade, akin to a new scientific revolution challenging the traditional scientific method. The paradigm shift emphasizes the integration of data‐driven methods and the possibility of predicting and/or reproducing the evolution of natural phenomena with computers as the fourth pillar of scientific discovery, sparking debates on trustworthiness, and ethical implications. The data science revolution is fueled by the convergence of advancements, including the big‐data revolution, GPU market expansion, and significant investments in Artificial Intelligence and high performance computing by both institutional and private players. This transformation has given rise to a trans‐disciplinary community that has investigated a wide range of questions under the lens of machine learning (ML) approaches and has generally advanced the field of computational methods within the broader geosciences community, the core of the American Geophysical Union (AGU) membership. Responding to an unmet demand in the existing worldwide editorial offer, the Journal of Geophysical Research: Machine Learning and Computation aims to serve as an intellectual crucible, fostering collaborations across multiple geophysical disciplines and data scientists. The journal welcomes papers with strong methodological developments that allow for geoscience advancements grounded in specific computational and data‐driven methods, leveraging ML as well as innovative computational strategies, and leading to breakthrough discoveries and original scientific outcomes. Authors are encouraged to balance succinctness in introducing methods with a thorough exploration of the novelty of the work proposed and its future applications placing special emphasis on the connection between the data science approach and the scientific outcome, considering a broad readership. Emphasis on result reproducibility aligns with AGU guidance, inviting active participation from the community in shaping geophysical research in the era of machine learning and computation.

arXiv Open Access 2024
Time-Reversible Bridges of Data with Machine Learning

Ludwig Winkler

The analysis of dynamical systems is a fundamental tool in the natural sciences and engineering. It is used to understand the evolution of systems as large as entire galaxies and as small as individual molecules. With predefined conditions on the evolution of dy-namical systems, the underlying differential equations have to fulfill specific constraints in time and space. This class of problems is known as boundary value problems. This thesis presents novel approaches to learn time-reversible deterministic and stochastic dynamics constrained by initial and final conditions. The dynamics are inferred by machine learning algorithms from observed data, which is in contrast to the traditional approach of solving differential equations by numerical integration. The work in this thesis examines a set of problems of increasing difficulty each of which is concerned with learning a different aspect of the dynamics. Initially, we consider learning deterministic dynamics from ground truth solutions which are constrained by deterministic boundary conditions. Secondly, we study a boundary value problem in discrete state spaces, where the forward dynamics follow a stochastic jump process and the boundary conditions are discrete probability distributions. In particular, the stochastic dynamics of a specific jump process, the Ehrenfest process, is considered and the reverse time dynamics are inferred with machine learning. Finally, we investigate the problem of inferring the dynamics of a continuous-time stochastic process between two probability distributions without any reference information. Here, we propose a novel criterion to learn time-reversible dynamics of two stochastic processes to solve the Schrödinger Bridge Problem.

en stat.ML, cs.LG
arXiv Open Access 2024
Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning

Ting Zhu, Yue Jin, Jeremie Houssineau et al.

In decentralized multi-agent reinforcement learning, agents learning in isolation can lead to relative over-generalization (RO), where optimal joint actions are undervalued in favor of suboptimal ones. This hinders effective coordination in cooperative tasks, as agents tend to choose actions that are individually rational but collectively suboptimal. To address this issue, we introduce MaxMax Q-Learning (MMQ), which employs an iterative process of sampling and evaluating potential next states, selecting those with maximal Q-values for learning. This approach refines approximations of ideal state transitions, aligning more closely with the optimal joint policy of collaborating agents. We provide theoretical analysis supporting MMQ's potential and present empirical evaluations across various environments susceptible to RO. Our results demonstrate that MMQ frequently outperforms existing baselines, exhibiting enhanced convergence and sample efficiency.

en cs.LG, cs.AI
arXiv Open Access 2024
Machine Learning for Inverse Problems and Data Assimilation

Eviatar Bach, Ricardo Baptista, Daniel Sanz-Alonso et al.

The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product, we include a succinct mathematical treatment of various fundamental underpinning topics in machine learning, and adjacent areas of (computational) mathematics.

en stat.ML, cs.LG
arXiv Open Access 2023
Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models

Manuel Schürch, Xiang Li, Ahmed Allam et al.

We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction

en stat.ML, cs.LG
arXiv Open Access 2023
Credal Bayesian Deep Learning

Michele Caprio, Souradeep Dutta, Kuk Jin Jang et al.

Uncertainty quantification and robustness to distribution shifts are important goals in machine learning and artificial intelligence. Although Bayesian Neural Networks (BNNs) allow for uncertainty in the predictions to be assessed, different sources of predictive uncertainty cannot be distinguished properly. We present Credal Bayesian Deep Learning (CBDL). Heuristically, CBDL allows to train an (uncountably) infinite ensemble of BNNs, using only finitely many elements. This is possible thanks to prior and likelihood finitely generated credal sets (FGCSs), a concept from the imprecise probability literature. Intuitively, convex combinations of a finite collection of prior-likelihood pairs are able to represent infinitely many such pairs. After training, CBDL outputs a set of posteriors on the parameters of the neural network. At inference time, such posterior set is used to derive a set of predictive distributions that is in turn utilized to distinguish between (predictive) aleatoric and epistemic uncertainties, and to quantify them. The predictive set also produces either (i) a collection of outputs enjoying desirable probabilistic guarantees, or (ii) the single output that is deemed the best, that is, the one having the highest predictive lower probability -- another imprecise-probabilistic concept. CBDL is more robust than single BNNs to prior and likelihood misspecification, and to distribution shift. We show that CBDL is better at quantifying and disentangling different types of (predictive) uncertainties than single BNNs and ensemble of BNNs. In addition, we apply CBDL to two case studies to demonstrate its downstream tasks capabilities: one, for motion prediction in autonomous driving scenarios, and two, to model blood glucose and insulin dynamics for artificial pancreas control. We show that CBDL performs better when compared to an ensemble of BNNs baseline.

en cs.LG, stat.ML
arXiv Open Access 2023
Power-law Dynamic arising from machine learning

Wei Chen, Weitao Du, Zhi-Ming Ma et al.

We study a kind of new SDE that was arisen from the research on optimization in machine learning, we call it power-law dynamic because its stationary distribution cannot have sub-Gaussian tail and obeys power-law. We prove that the power-law dynamic is ergodic with unique stationary distribution, provided the learning rate is small enough. We investigate its first exist time. In particular, we compare the exit times of the (continuous) power-law dynamic and its discretization. The comparison can help guide machine learning algorithm.

en stat.ML, cs.LG
arXiv Open Access 2022
Rank-based Decomposable Losses in Machine Learning: A Survey

Shu Hu, Xin Wang, Siwei Lyu

Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.

en cs.LG
arXiv Open Access 2022
Automatic Machine Learning for Multi-Receiver CNN Technology Classifiers

Amir-Hossein Yazdani-Abyaneh, Marwan Krunz

Convolutional Neural Networks (CNNs) are one of the most studied family of deep learning models for signal classification, including modulation, technology, detection, and identification. In this work, we focus on technology classification based on raw I/Q samples collected from multiple synchronized receivers. As an example use case, we study protocol identification of Wi-Fi, LTE-LAA, and 5G NR-U technologies that coexist over the 5 GHz Unlicensed National Information Infrastructure (U-NII) bands. Designing and training accurate CNN classifiers involve significant time and effort that goes into fine-tuning a model's architectural settings and determining the appropriate hyperparameter configurations, such as learning rate and batch size. We tackle the former by defining architectural settings themselves as hyperparameters. We attempt to automatically optimize these architectural parameters, along with other preprocessing (e.g., number of I/Q samples within each classifier input) and learning hyperparameters, by forming a Hyperparameter Optimization (HyperOpt) problem, which we solve in a near-optimal fashion using the Hyperband algorithm. The resulting near-optimal CNN (OCNN) classifier is then used to study classification accuracy for OTA as well as simulations datasets, considering various SNR values. We show that the number of receivers to construct multi-channel inputs for CNNs should be defined as a preprocessing hyperparameter to be optimized via Hyperband. OTA results reveal that our OCNN classifiers improve classification accuracy by 24.58% compared to manually tuned CNNs. We also study the effect of min-max normalization of I/Q samples within each classifier's input on generalization accuracy over simulated datasets with SNRs other than training set's SNR and show an average of 108.05% improvement when I/Q samples are normalized.

en cs.LG, cs.NI
arXiv Open Access 2022
Can deep neural networks learn process model structure? An assessment framework and analysis

Jari Peeperkorn, Seppe vanden Broucke, Jochen De Weerdt

Predictive process monitoring concerns itself with the prediction of ongoing cases in (business) processes. Prediction tasks typically focus on remaining time, outcome, next event or full case suffix prediction. Various methods using machine and deep learning havebeen proposed for these tasks in recent years. Especially recurrent neural networks (RNNs) such as long short-term memory nets (LSTMs) have gained in popularity. However, no research focuses on whether such neural network-based models can truly learn the structure of underlying process models. For instance, can such neural networks effectively learn parallel behaviour or loops? Therefore, in this work, we propose an evaluation scheme complemented with new fitness, precision, and generalisation metrics, specifically tailored towards measuring the capacity of deep learning models to learn process model structure. We apply this framework to several process models with simple control-flow behaviour, on the task of next-event prediction. Our results show that, even for such simplistic models, careful tuning of overfitting countermeasures is required to allow these models to learn process model structure.

arXiv Open Access 2021
A Aelf-supervised Tibetan-chinese Vocabulary Alignment Method Based On Adversarial Learning

Enshuai Hou, Jie zhu

Tibetan is a low-resource language. In order to alleviate the shortage of parallel corpus between Tibetan and Chinese, this paper uses two monolingual corpora and a small number of seed dictionaries to learn the semi-supervised method with seed dictionaries and self-supervised adversarial training method through the similarity calculation of word clusters in different embedded spaces and puts forward an improved self-supervised adversarial learning method of Tibetan and Chinese monolingual data alignment only. The experimental results are as follows. First, the experimental results of Tibetan syllables Chinese characters are not good, which reflects the weak semantic correlation between Tibetan syllables and Chinese characters; second, the seed dictionary of semi-supervised method made before 10 predicted word accuracy of 66.5 (Tibetan - Chinese) and 74.8 (Chinese - Tibetan) results, to improve the self-supervision methods in both language directions have reached 53.5 accuracy.

en cs.CL
arXiv Open Access 2020
Additively Homomorphical Encryption based Deep Neural Network for Asymmetrically Collaborative Machine Learning

Yifei Zhang, Hao Zhu

The financial sector presents many opportunities to apply various machine learning techniques. Centralized machine learning creates a constraint which limits further applications in finance sectors. Data privacy is a fundamental challenge for a variety of finance and insurance applications that account on learning a model across different sections. In this paper, we define a new practical scheme of collaborative machine learning that one party owns data, but another party owns labels only, and term this \textbf{Asymmetrically Collaborative Machine Learning}. For this scheme, we propose a novel privacy-preserving architecture where two parties can collaboratively train a deep learning model efficiently while preserving the privacy of each party's data. More specifically, we decompose the forward propagation and backpropagation of the neural network into four different steps and propose a novel protocol to handle information leakage in these steps. Our extensive experiments on different datasets demonstrate not only stable training without accuracy loss, but also more than 100 times speedup compared with the state-of-the-art system.

en cs.LG, stat.ML
arXiv Open Access 2019
A Graph Autoencoder Approach to Causal Structure Learning

Ignavier Ng, Shengyu Zhu, Zhitang Chen et al.

Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-the-arts, we propose a new gradient-based method to learn causal structures from observational data. The proposed method generalizes the recent gradient-based methods to a graph autoencoder framework that allows nonlinear structural equation models and is easily applicable to vector-valued variables. We demonstrate that on synthetic datasets, our proposed method outperforms other gradient-based methods significantly, especially on large causal graphs. We further investigate the scalability and efficiency of our method, and observe a near linear training time when scaling up the graph size.

en cs.LG, stat.ML

Halaman 59 dari 139951