Hasil untuk "machine learning"

Menampilkan 20 dari ~10323179 hasil · dari CrossRef, DOAJ, Semantic Scholar

JSON API
S2 Open Access 2021
An introduction to statistical learning with applications in R

Fariha Sohil, Muhammad Umair Sohali, J. Shabbir

The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self-contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines. For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts. Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book's web site. This textbook considers statistical learning applications when interest centers on the conditional distribution of a response variable, given a set of predictors, and in the absence of a credible model that can be specified before the data analysis begins. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis depends in an integrated fashion on sound data collection, intelligent data management, appropriate statistical procedures, and an

4535 sitasi en
S2 Open Access 2016
A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials

Logan T. Ward, Ankit Agrawal, A. Choudhary et al.

A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method for partitioning the data set into groups of similar materials in order to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.

1485 sitasi en Materials Science, Physics
S2 Open Access 2016
Machine learning phases of matter

J. Carrasquilla, R. Melko

The success of machine learning techniques in handling big data sets proves ideal for classifying condensed-matter phases and phase transitions. The technique is even amenable to detecting non-trivial states lacking in conventional order. Condensed-matter physics is the study of the collective behaviour of infinitely complex assemblies of electrons, nuclei, magnetic moments, atoms or qubits1. This complexity is reflected in the size of the state space, which grows exponentially with the number of particles, reminiscent of the ‘curse of dimensionality’ commonly encountered in machine learning2. Despite this curse, the machine learning community has developed techniques with remarkable abilities to recognize, classify, and characterize complex sets of data. Here, we show that modern machine learning architectures, such as fully connected and convolutional neural networks3, can identify phases and phase transitions in a variety of condensed-matter Hamiltonians. Readily programmable through modern software libraries4,5, neural networks can be trained to detect multiple types of order parameter, as well as highly non-trivial states with no conventional order, directly from raw state configurations sampled with Monte Carlo6,7.

1367 sitasi en Computer Science, Physics
S2 Open Access 2016
Machine learning of accurate energy-conserving molecular force fields

Stefan Chmiela, A. Tkatchenko, H. Sauceda et al.

The law of energy conservation is used to develop an efficient machine learning approach to construct accurate force fields. Using conservation of energy—a fundamental property of closed classical and quantum mechanical systems—we develop an efficient gradient-domain machine learning (GDML) approach to construct accurate molecular force fields using a restricted number of samples from ab initio molecular dynamics (AIMD) trajectories. The GDML implementation is able to reproduce global potential energy surfaces of intermediate-sized molecules with an accuracy of 0.3 kcal mol−1 for energies and 1 kcal mol−1 Å̊−1 for atomic forces using only 1000 conformational geometries for training. We demonstrate this accuracy for AIMD trajectories of molecules, including benzene, toluene, naphthalene, ethanol, uracil, and aspirin. The challenge of constructing conservative force fields is accomplished in our work by learning in a Hilbert space of vector-valued functions that obey the law of energy conservation. The GDML approach enables quantitative molecular dynamics simulations for molecules at a fraction of cost of explicit AIMD calculations, thereby allowing the construction of efficient force fields with the accuracy and transferability of high-level ab initio methods.

1223 sitasi en Physics, Medicine
S2 Open Access 2018
A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective

Yuji Roh, Geon Heo, Steven Euijong Whang

Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.

789 sitasi en Computer Science, Mathematics
S2 Open Access 2018
Tunability: Importance of Hyperparameters of Machine Learning Algorithms

Philipp Probst, A. Boulesteix, B. Bischl

Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view, define data-based defaults and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to chose adequate hyperparameter spaces for tuning.

767 sitasi en Computer Science, Mathematics
S2 Open Access 2018
SoK: Security and Privacy in Machine Learning

Nicolas Papernot, P. Mcdaniel, Arunesh Sinha et al.

Advances in machine learning (ML) in recent years have enabled a dizzying array of applications such as data analytics, autonomous systems, and security diagnostics. ML is now pervasive—new systems and models are being deployed in every domain imaginable, leading to widespread deployment of software based inference and decision making. There is growing recognition that ML exposes new vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited. We systematize findings on ML security and privacy, focusing on attacks identified on these systems and defenses crafted to date.We articulate a comprehensive threat model for ML, and categorize attacks and defenses within an adversarial framework. Key insights resulting from works both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. In particular, it is apparent that constructing a theoretical understanding of the sensitivity of modern ML algorithms to the data they analyze, à la PAC theory, will foster a science of security and privacy in ML.

589 sitasi en Computer Science
S2 Open Access 2018
Delayed Impact of Fair Machine Learning

Lydia T. Liu, Sarah Dean, Esther Rolf et al.

Static classification has been the predominant focus of the study of fairness in machine learning. While most models do not consider how decisions change populations over time, it is conventional wisdom that fairness criteria promote the long-term well-being of groups they aim to protect. This work studies the interaction of static fairness criteria with temporal indicators of well-being. We show a simple one-step feedback model in which common criteria do not generally promote improvement over time, and may in fact cause harm. Our results highlight the importance of temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.

515 sitasi en Mathematics, Computer Science
S2 Open Access 2018
The Impact of Machine Learning on Economics

S. Athey

This paper provides an assessment of the early contributions of machine learning to economics, as well as predictions about its future contributions. It begins by briefly overviewing some themes from the literature on machine learning, and then draws some contrasts with traditional approaches to estimating the impact of counterfactual policies in economics. Next, we review some of the initial “off-the-shelf” applications of machine learning to economics, including applications in analyzing text and images. We then describe new types of questions that have been posed surrounding the application of machine learning to policy problems, including “prediction policy problems,” as well as considerations of fairness and manipulability. Next, we briefly review of some of the emerging econometric literature combining machine learning and causal inference. Finally, we overview a set of predictions about the future impact of machine learning on economics.

475 sitasi en Computer Science
S2 Open Access 2019
Machine learning and soil sciences: a review aided by machine learning tools

J. Padarian, B. Minasny, A. McBratney

Abstract. The application of machine learning (ML) techniques in various fields of science has increased rapidly, especially in the last 10 years. The increasing availability of soil data that can be efficiently acquired remotely and proximally, and freely available open-source algorithms, have led to an accelerated adoption of ML techniques to analyse soil data. Given the large number of publications, it is an impossible task to manually review all papers on the application of ML in soil science without narrowing down a narrative of ML application in a specific research question. This paper aims to provide a comprehensive review of the application of ML techniques in soil science aided by a ML algorithm (latent Dirichlet allocation) to find patterns in a large collection of text corpora. The objective is to gain insight into publications of ML applications in soil science and to discuss the research gaps in this topic. We found that (a) there is an increasing usage of ML methods in soil sciences, mostly concentrated in developed countries, (b) the reviewed publications can be grouped into 12 topics, namely remote sensing, soil organic carbon, water, contamination, methods (ensembles), erosion and parent material, methods (NN, neural networks, SVM, support vector machines), spectroscopy, modelling (classes), crops, physical, and modelling (continuous), and (c) advanced ML methods usually perform better than simpler approaches thanks to their capability to capture non-linear relationships. From these findings, we found research gaps, in particular, about the precautions that should be taken (parsimony) to avoid overfitting, and that the interpretability of the ML models is an important aspect to consider when applying advanced ML methods in order to improve our knowledge and understanding of soil. We foresee that a large number of studies will focus on the latter topic.

345 sitasi en Computer Science
S2 Open Access 2019
Machine Learning in Banking Risk Management: A Literature Review

M. Leo, Suneel Sharma, K. Maddulety

There is an increasing influence of machine learning in business applications, with many solutions already implemented and many more being explored. Since the global financial crisis, risk management in banks has gained more prominence, and there has been a constant focus around how risks are being detected, measured, reported and managed. Considerable research in academia and industry has focused on the developments in banking and risk management and the current and emerging challenges. This paper, through a review of the available literature seeks to analyse and evaluate machine-learning techniques that have been researched in the context of banking risk management, and to identify areas or problems in risk management that have been inadequately explored and are potential areas for further research. The review has shown that the application of machine learning in the management of banking risks such as credit risk, market risk, operational risk and liquidity risk has been explored; however, it doesn’t appear commensurate with the current industry level of focus on both risk management and machine learning. A large number of areas remain in bank risk management that could significantly benefit from the study of how machine learning can be applied to address specific problems.

345 sitasi en Business
DOAJ Open Access 2026
Metabolite correlation-based network analysis combined with machine learning techniques highlights LOX biosynthesis in Vanilla planifolia and Vanilla pompona source leaves

David Toubiana, Pamela Moon, Elias Bassil et al.

Abstract The vanilla genus of the orchid family is the primary source of the famous vanilla spice. Here, the central metabolome of source leaves of two important commercial vanilla species (V. planifolia, V. pompona) - collected from two distinct geographical locations -  was profiled using a GC-TOF–MS platform. In total, 544 metabolites were identified which were subjected to a combination of multivariate and univariate analysis, location-adjusted models, correlation-based network analysis (CNA), and CNA combined with machine-learning assisted pathway mapping. Multivariate analysis of the metabolic profiles revealed a clear separation between the two species, confirmed by location-adjusted models. Univariate statistical analysis highlighted linoleic acid in the V. planifolia vs. V. pompona comparison. CNA showed higher connectivity in the V. pompona network over the V. planifolia network (6425 vs. 3508 edges), suggestive for biochemical adaptations for each species. CNA combined with machine learning techniques highlighted the lipoxygenase (LOX) pathway. This finding, combined with the identification of linoleic acid during univariate statistical analysis, indicated modified fatty acid metabolism in V. pompona with potential consequences for attracting pollinators.

Medicine, Science

Halaman 18 dari 516159