A Review on Linear Regression Comprehensive in Machine Learning
Dastan Maulud, A. Abdulazeez
Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.
A Survey on the Explainability of Supervised Machine Learning
Nadia Burkart, Marco F. Huber
Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. Particularly understanding the decision making in highly sensitive areas such as healthcare or finance, is of paramount importance. The decision-making behind the black boxes requires it to be more transparent, accountable, and understandable for humans. This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning (SML). We conduct a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions. Finally, we illustrate principles by means of an explanatory case study and discuss important future directions.
927 sitasi
en
Computer Science, Mathematics
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
B. Biggio, F. Roli
Deep neural networks and machine-learning algorithms are pervasively used in several applications, ranging from computer vision to computer security. In most of these applications, the learning algorithm has to face intelligent and adaptive attackers who can carefully manipulate data to purposely subvert the learning process. As these algorithms have not been originally designed under such premises, they have been shown to be vulnerable to well-crafted, sophisticated attacks, including training-time poisoning and test-time evasion attacks (also known as adversarial examples). The problem of countering these threats and learning secure classifiers in adversarial settings has thus become the subject of an emerging, relevant research field known as adversarial machine learning. The purposes of this tutorial are: (a) to introduce the fundamentals of adversarial machine learning to the security community; (b) to illustrate the design cycle of a learning-based pattern recognition system for adversarial tasks; (c) to present novel techniques that have been recently proposed to assess performance of pattern classifiers and deep learning algorithms under attack, evaluate their vulnerabilities, and implement defense strategies that make learning algorithms more robust to attacks; and (d) to show some applications of adversarial machine learning to pattern recognition tasks like object recognition in images, biometric identity recognition, spam and malware detection.
1583 sitasi
en
Computer Science
Quantifying the Carbon Emissions of Machine Learning
Alexandre Lacoste, A. Luccioni, Victor Schmidt
et al.
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, we present our Machine Learning Emissions Calculator, a tool for our community to better understand the environmental impact of training ML models. We accompany this tool with an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
889 sitasi
en
Computer Science
Introduction to Machine Learning, Neural Networks, and Deep Learning
Rene Y. Choi, Aaron S. Coyner, Jayashree Kalpathy-Cramer
et al.
Purpose To present an overview of current machine learning methods and their use in medical research, focusing on select machine learning techniques, best practices, and deep learning. Methods A systematic literature search in PubMed was performed for articles pertinent to the topic of artificial intelligence methods used in medicine with an emphasis on ophthalmology. Results A review of machine learning and deep learning methodology for the audience without an extensive technical computer programming background. Conclusions Artificial intelligence has a promising future in medicine; however, many challenges remain. Translational Relevance The aim of this review article is to provide the nontechnical readers a layman's explanation of the machine learning methods being used in medicine today. The goal is to provide the reader a better understanding of the potential and challenges of artificial intelligence within the field of medicine.
824 sitasi
en
Computer Science, Medicine
Machine Learning Testing: Survey, Landscapes and Horizons
J Zhang, M. Harman, Lei Ma
et al.
This paper provides a comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing.
847 sitasi
en
Computer Science, Mathematics
An open source machine learning framework for efficient and transparent systematic reviews
R. Schoot, J. D. Bruin, Raoul Schram
et al.
To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice. It is a challenging task for any research field to screen the literature and determine what needs to be included in a systematic review in a transparent way. A new open source machine learning framework called ASReview, which employs active learning and offers a range of machine learning models, can check the literature efficiently and systemically.
802 sitasi
en
Computer Science
Plans and Situated Actions: The Problem of Human-Machine Communication (Learning in Doing: Social,
L. Suchman
7155 sitasi
en
Psychology, Sociology
An Introduction to MCMC for Machine Learning
C. Andrieu, Nando de Freitas, A. Doucet
et al.
2746 sitasi
en
Computer Science
Counterfactual Explanations for Machine Learning: A Review
Sahil Verma, John P. Dickerson, Keegan E. Hines
463 sitasi
en
Computer Science
Amnesiac Machine Learning
Laura Graves, Vineel Nagisetty, Vijay Ganesh
The Right to be Forgotten is part of the recently enacted General Data Protection Regulation (GDPR) law that affects any data holder that has data on European Union residents. It gives EU residents the ability to request deletion of their personal data, including training records used to train machine learning models. Unfortunately, Deep Neural Network models are vulnerable to information leaking attacks such as model inversion attacks which extract class information from a trained model and membership inference attacks which determine the presence of an example in a model's training data. If a malicious party can mount an attack and learn private information that was meant to be removed, then it implies that the model owner has not properly protected their user's rights and their models may not be compliant with the GDPR law. In this paper, we present two efficient methods that address this question of how a model owner or data holder may delete personal data from models in such a way that they may not be vulnerable to model inversion and membership inference attacks while maintaining model efficacy. We start by presenting a real-world threat model that shows that simply removing training data is insufficient to protect users. We follow that up with two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations. We provide extensive empirical analysis that show that these methods are indeed efficient, safe to apply, effectively remove learned information about sensitive data from trained models while maintaining model efficacy.
402 sitasi
en
Computer Science
Machine Learning in Chemical Engineering: Strengths, Weaknesses, Opportunities, and Threats
Maarten R. Dobbelaere, Pieter P. Plehiers, R. Vijver
et al.
Abstract Chemical engineers rely on models for design, research, and daily decision-making, often with potentially large financial and safety implications. Previous efforts a few decades ago to combine artificial intelligence and chemical engineering for modeling were unable to fulfill the expectations. In the last five years, the increasing availability of data and computational resources has led to a resurgence in machine learning-based research. Many recent efforts have facilitated the roll-out of machine learning techniques in the research field by developing large databases, benchmarks, and representations for chemical applications and new machine learning frameworks. Machine learning has significant advantages over traditional modeling techniques, including flexibility, accuracy, and execution speed. These strengths also come with weaknesses, such as the lack of interpretability of these black-box models. The greatest opportunities involve using machine learning in time-limited applications such as real-time optimization and planning that require high accuracy and that can build on models with a self-learning ability to recognize patterns, learn from data, and become more intelligent over time. The greatest threat in artificial intelligence research today is inappropriate use because most chemical engineers have had limited training in computer science and data analysis. Nevertheless, machine learning will definitely become a trustworthy element in the modeling toolbox of chemical engineers.
264 sitasi
en
Computer Science
Machine Learning in Healthcare
A. Ojha, C. Vinitha
Machine learning in construction: From shallow to deep learning
Yayin Xu, Ying Zhou, Przemysław Sekuła
et al.
Abstract The development of artificial intelligence technology is currently bringing about new opportunities in construction. Machine learning is a major area of interest within the field of artificial intelligence, playing a pivotal role in the process of making construction “smart”. The application of machine learning in construction has the potential to open up an array of opportunities such as site supervision, automatic detection, and intelligent maintenance. However, the implementation of machine learning faces a range of challenges due to the difficulties in acquiring labeled data, especially when applied in a highly complex construction site environment. This paper reviews the history of machine learning development from shallow to deep learning and its applications in construction. The strengths and weaknesses of machine learning technology in construction have been analyzed in order to foresee the future direction of machine learning applications in this sphere. Furthermore, this paper presents suggestions which may benefit researchers in terms of combining specific knowledge domains in construction with machine learning algorithms so as to develop dedicated deep network models for the industry.
252 sitasi
en
Computer Science
Reproducibility in machine learning for health research: Still a ways to go
Matthew B. A. McDermott, Shirly Wang, N. Marinsek
et al.
Machine learning applied to health falls short on several reproducibility metrics compared to other machine learning subfields. Machine learning for health must be reproducible to ensure reliable clinical use. We evaluated 511 scientific papers across several machine learning subfields and found that machine learning for health compared poorly to other areas regarding reproducibility metrics, such as dataset and code accessibility. We propose recommendations to address this problem.
Autonomous Driving Architectures: Insights of Machine Learning and Deep Learning Algorithms
M. Bachute, Javed Subhedar
Abstract Research in Autonomous Driving is taking momentum due to the inherent advantages of autonomous driving systems. The main advantage being the disassociation of the driver from the vehicle reducing the human intervention. However, the Autonomous Driving System involves many subsystems which need to be integrated as a whole system. Some of the tasks include Motion Planning, Vehicle Localization, Pedestrian Detection, Traffic Sign Detection, Road-marking Detection, Automated Parking, Vehicle Cybersecurity, and System Fault Diagnosis. This paper aims to the overview of various Machine Learning and Deep Learning Algorithms used in Autonomous Driving Architectures for different tasks like Motion Planning, Vehicle Localization, Pedestrian Detection, Traffic Sign Detection, Road-marking Detection, Automated Parking, Vehicle Cybersecurity and Fault Diagnosis. This paper surveys the technical aspects of Machine Learning and Deep Learning Algorithms used for Autonomous Driving Systems. Comparison of these algorithms is done based on the metrics like mean Intersect in over Union (mIoU), Average Precision (AP)missed detection rate, miss rate False Positives Per Image (FPPI), and average number for false frame detection. This study contributes to picture a review of the Machine Learning and Deep Learning Algorithms used for Autonomous Driving Systems and is organized based on the different tasks of the system.
234 sitasi
en
Computer Science
Machine learning algorithms for social media analysis: A survey
Balaji T.K., Chandra Sekhara Rao Annavarapu, Annushree Bablani
Abstract Social Media (SM) are the most widespread and rapid data generation applications on the Internet increase the study of these data. However, the efficient processing of such massive data is challenging, so we require a system that learns from these data, like machine learning. Machine learning methods make the systems to learn itself. Many papers are published on SM using machine learning approaches over the past few decades. In this paper, we provide a comprehensive survey of multiple applications of SM analysis using robust machine learning algorithms. Initially, we discuss a summary of machine learning algorithms, which are used in SM analysis. After that, we provide a detailed survey of machine learning approaches to SM analysis. Furthermore, we summarize the challenges and benefits of Machine Learning usages in SM analysis. Finally, we presented open issues and consequences in SM analysis for further research.
219 sitasi
en
Computer Science
DOME: recommendations for supervised machine learning validation in biology
Ian Walsh, D. Fishman, D. García-Gasulla
et al.
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi
et al.
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
184 sitasi
en
Computer Science, Mathematics
Machine learning with a reject option: a survey
Kilian Hendrickx, Lorenzo Perini, Dries Van der Plas
et al.
Machine learning models always make a prediction, even when it is likely to be inaccurate. This behavior should be avoided in many decision support applications, where mistakes can have severe consequences. Albeit already studied in 1970, machine learning with rejection recently gained interest. This machine learning subfield enables machine learning models to abstain from making a prediction when likely to make a mistake. This survey aims to provide an overview on machine learning with rejection. We introduce the conditions leading to two types of rejection, ambiguity and novelty rejection, which we carefully formalize. Moreover, we review and categorize strategies to evaluate a model’s predictive and rejective quality. Additionally, we define the existing architectures for models with rejection and describe the standard techniques for learning such models. Finally, we provide examples of relevant application domains and show how machine learning with rejection relates to other machine learning research areas.
174 sitasi
en
Computer Science