Hasil "machine learning"

S2 Open Access 2019

Machine learning and the physical sciences

Giuseppe Carleo, I. Cirac, Kyle Cranmer et al.

Machine learning (ML) encompasses a broad range of algorithms and modeling tools used for a vast array of data processing tasks, which has entered most scientific disciplines in recent years. This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences. This includes conceptual developments in ML motivated by physical insights, applications of machine learning techniques to several domains in physics, and cross fertilization between the two fields. After giving a basic notion of machine learning methods and principles, examples are described of how statistical physics is used to understand methods in ML. This review then describes applications of ML methods in particle physics and cosmology, quantum many-body physics, quantum computing, and chemical and material physics. Research and development into novel computing architectures aimed at accelerating ML are also highlighted. Each of the sections describe recent successes as well as domain-specific methodology and challenges.

1805 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2019

Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques

Senthilkumar Mohan, Chandrasegar Thirumalai, Gautam Srivastava

Heart disease is one of the most significant causes of mortality in the world today. Prediction of cardiovascular disease is a critical challenge in the area of clinical data analysis. Machine learning (ML) has been shown to be effective in assisting in making decisions and predictions from the large quantity of data produced by the healthcare industry. We have also seen ML techniques being used in recent developments in different areas of the Internet of Things (IoT). Various studies give only a glimpse into predicting heart disease with ML techniques. In this paper, we propose a novel method that aims at finding significant features by applying machine learning techniques resulting in improving the accuracy in the prediction of cardiovascular disease. The prediction model is introduced with different combinations of features and several known classification techniques. We produce an enhanced performance level with an accuracy level of 88.7% through the prediction model for heart disease with the hybrid random forest with a linear model (HRFLM).

1304 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2019

Automated Machine Learning - Methods, Systems, Challenges

1427 sitasi en Computer Science

Detail Sumber

S2 Open Access 2019

Machine Learning in Medicine

A. Rajkomar, Jeffrey Dean, I. Kohane

Machine Learning in Medicine In this view of the future of medicine, patient–provider interactions are informed and supported by massive amounts of data from interactions with similar patients. The...

1502 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2018

Empirical Asset Pricing via Machine Learning

Shihao Gu, Bryan T. Kelly, D. Xiu

We perform a comparative analysis of machine learning methods for the canonical problem of empirical asset pricing: measuring asset risk premiums. We demonstrate large economic gains to investors using machine learning forecasts, in some cases doubling the performance of leading regression-based strategies from the literature. We identify the best-performing methods (trees and neural networks) and trace their predictive gains to allowing nonlinear predictor interactions missed by other methods. All methods agree on the same set of dominant predictive signals, a set that includes variations on momentum, liquidity, and volatility. Authors have furnished an Internet Appendix, which is available on the Oxford University Press Web site next to the link to the final published paper online.

1811 sitasi en Computer Science, Economics

Detail DOI Sumber

S2 Open Access 2018

Quantum Machine Learning

M. Schuld, Francesco Petruccione

2242 sitasi en Computer Science

Detail Sumber

S2 Open Access 2018

Explaining Explanations: An Overview of Interpretability of Machine Learning

Leilani H. Gilpin, David Bau, Ben Z. Yuan et al.

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we describe foundational concepts of explainability and show how they can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence.

2169 sitasi en Computer Science, Mathematics

Detail DOI Sumber

S2 Open Access 2018

Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation.

T. Malta, Artem Sokolov, A. Gentles et al.

Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.

1784 sitasi en Medicine, Biology

Detail DOI Sumber

S2 Open Access 2021

A survey on missing data in machine learning

Tlamelo Emmanuel, Thabiso M. Maupong, Dimane Mpoeleng et al.

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

871 sitasi en Computer Science, Medicine

Detail DOI Sumber

S2 Open Access 2017

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson et al.

Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role. This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference and, when the target attribute meets certain conditions about its influence, attribute inference attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks and begins to shed light on what other factors may be in play. Finally, we explore the connection between membership inference and attribute inference, showing that there are deep connections between the two that lead to effective new attacks.

1441 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2017

Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification

Ignacio Arganda-Carreras, V. Kaynig, C. Rueden et al.

2010 sitasi en Computer Science, Medicine

Detail DOI Sumber

S2 Open Access 2016

Stealing Machine Learning Models via Prediction APIs

Florian Tramèr, Fan Zhang, A. Juels et al.

Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service ("predictive analytics") systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.

2069 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 2016

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

G. Lemaître, Fernando Nogueira, Christos K. Aridas

Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: this https URL.

2418 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 2015

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

Tianqi Chen, Mu Li, Yutian Li et al.

MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters. This paper describes both the API design and the system implementation of MXNet, and explains how embedding of both symbolic expression and tensor operation is handled in a unified fashion. Our preliminary experiments reveal promising results on large scale deep neural network applications using multiple GPU machines.

2317 sitasi en Computer Science

Detail Sumber

S2 Open Access 2021

Swarm Learning for decentralized and confidential clinical machine learning

Stefanie Warnat-Herresthal, Hartmut Schultze, Krishnaprasad Shastry et al.

Fast and reliable detection of patients with severe and heterogeneous illnesses is a major goal of precision medicine1,2. Patients with leukaemia can be identified using machine learning on the basis of their blood transcriptomes3. However, there is an increasing divide between what is technically possible and what is allowed, because of privacy legislation4,5. Here, to facilitate the integration of any medical data from any data owner worldwide without violating privacy laws, we introduce Swarm Learning—a decentralized machine-learning approach that unites edge computing, blockchain-based peer-to-peer networking and coordination while maintaining confidentiality without the need for a central coordinator, thereby going beyond federated learning. To illustrate the feasibility of using Swarm Learning to develop disease classifiers using distributed data, we chose four use cases of heterogeneous diseases (COVID-19, tuberculosis, leukaemia and lung pathologies). With more than 16,400 blood transcriptomes derived from 127 clinical studies with non-uniform distributions of cases and controls and substantial study biases, as well as more than 95,000 chest X-ray images, we show that Swarm Learning classifiers outperform those developed at individual sites. In addition, Swarm Learning completely fulfils local confidentiality regulations by design. We believe that this approach will notably accelerate the introduction of precision medicine. Swarm Learning is a decentralized machine learning approach that outperforms classifiers developed at individual sites for COVID-19 and other diseases while preserving confidentiality and privacy.

735 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2021

A Survey of Human-in-the-loop for Machine Learning

Xingjiao Wu, Luwei Xiao, Yixuan Sun et al.

Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field; along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.

716 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 1995

Some Studies in Machine Learning Using the Game of Checkers

A. Samuel

5130 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2003

Gaussian Processes in Machine Learning

Carl E. Rasmussen

3899 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2022

Machine Learning for Electrocatalyst and Photocatalyst Design and Discovery.

Haoxin Mai, T. Le, Dehong Chen et al.

Electrocatalysts and photocatalysts are key to a sustainable future, generating clean fuels, reducing the impact of global warming, and providing solutions to environmental pollution. Improved processes for catalyst design and a better understanding of electro/photocatalytic processes are essential for improving catalyst effectiveness. Recent advances in data science and artificial intelligence have great potential to accelerate electrocatalysis and photocatalysis research, particularly the rapid exploration of large materials chemistry spaces through machine learning. Here a comprehensive introduction to, and critical review of, machine learning techniques used in electrocatalysis and photocatalysis research are provided. Sources of electro/photocatalyst data and current approaches to representing these materials by mathematical features are described, the most commonly used machine learning methods summarized, and the quality and utility of electro/photocatalyst models evaluated. Illustrations of how machine learning models are applied to novel electro/photocatalyst discovery and used to elucidate electrocatalytic or photocatalytic reaction mechanisms are provided. The review offers a guide for materials scientists on the selection of machine learning methods for electrocatalysis and photocatalysis research. The application of machine learning to catalysis science represents a paradigm shift in the way advanced, next-generation catalysts will be designed and synthesized.

365 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2022

Interpretable machine learning for knowledge generation in heterogeneous catalysis

J. Esterhuizen, B. Goldsmith, S. Linic

296 sitasi en

Detail DOI Sumber

Hasil untuk "machine learning"