Hasil "machine learning"

S2 Open Access 2021

POT: Python Optimal Transport

Rémi Flamary, N. Courty, Alexandre Gramfort et al.

1020 sitasi en Computer Science

S2 Open Access 2020

Sashank J. Reddi, Zachary B. Charles, M. Zaheer et al.

Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Due to the heterogeneity of the client datasets, standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have had notable success in combating such issues. In this work, we propose federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyze their convergence in the presence of heterogeneous data for general nonconvex settings. Our results highlight the interplay between client heterogeneity and communication efficiency. We also perform extensive experiments on these methods and show that the use of adaptive optimizers can significantly improve the performance of federated learning.

1890 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 2020

Metrics for Multi-Class Classification: an Overview

Margherita Grandini, Enrico Bagli, G. Visani

Classification tasks in machine learning involving more than two classes are known by the name of "multi-class classification". Performance indicators are very useful when the aim is to evaluate and compare different classification models or machine learning techniques. Many metrics come in handy to test the ability of a multi-class classifier. Those metrics turn out to be useful at different stage of the development process, e.g. comparing the performance of two different models or analysing the behaviour of the same model by tuning different parameters. In this white paper we review a list of the most promising multi-class metrics, we highlight their advantages and disadvantages and show their possible usages during the development of a classification model.

1213 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 2019

Generalizing from a Few Examples

Yaqing Wang, Quanming Yao, J. Kwok et al.

Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this article, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications, and theories, are also proposed to provide insights for future research.1

2071 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2019

Text Classification Algorithms: A Survey

Kamran Kowsari, K. Meimandi, Mojtaba Heidarysafa et al.

In recent years, there has been an exponential growth in the number of complex documentsand texts that require a deeper understanding of machine learning methods to be able to accuratelyclassify texts in many applications. Many machine learning approaches have achieved surpassingresults in natural language processing. The success of these learning algorithms relies on their capacityto understand complex models and non-linear relationships within data. However, finding suitablestructures, architectures, and techniques for text classification is a challenge for researchers. In thispaper, a brief overview of text classification algorithms is discussed. This overview covers differenttext feature extractions, dimensionality reduction methods, existing algorithms and techniques, andevaluations methods. Finally, the limitations of each technique and their application in real-worldproblems are discussed.

1461 sitasi en Computer Science, Mathematics

Detail DOI Sumber

S2 Open Access 2018

PennyLane: Automatic differentiation of hybrid quantum-classical computations

V. Bergholm, J. Izaac, M. Schuld et al.

1486 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 2017

Counterfactual Fairness

Matt J. Kusner, Joshua R. Loftus, Chris Russell et al.

Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school.

1825 sitasi en Computer Science, Psychology

Detail Sumber

S2 Open Access 2016

Molecular graph convolutions: moving beyond fingerprints

S. Kearnes, Kevin McCloskey, Marc Berndl et al.

Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.

1574 sitasi en Computer Science, Medicine

Detail DOI Sumber

S2 Open Access 2013

An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks

I. Goodfellow, Mehdi Mirza, Xia Da et al.

Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. We find that it is always best to train using the dropout algorithm--the dropout algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. We find that different tasks and relationships between tasks result in very different rankings of activation function performance. This suggests the choice of activation function should always be cross-validated.

1617 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 2009

Twitter Sentiment Classiﬁcation using Distant Supervision

Alec Go

2781 sitasi en

Detail Sumber

S2 Open Access 2009

Differentially Private Empirical Risk Minimization

Kamalika Chaudhuri, C. Monteleoni, A. Sarwate

Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ε-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.

1579 sitasi en Medicine, Computer Science

Detail DOI Sumber

S2 Open Access 2003

Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution

Lei Yu, Huan Liu

2761 sitasi en Computer Science

Detail Sumber

S2 Open Access 2003

Factors in automatic musical genre classification of audio signals

Tao Li, G. Tzanetakis

2288 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 1998

Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

D. Lewis

2469 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2006

Quantile Regression Forests

N. Meinshausen

1915 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 1994

Irrelevant Features and the Subset Selection Problem

G. John, Ron Kohavi, Karl Pfleger

2867 sitasi en Computer Science, Mathematics

Detail DOI Sumber

S2 Open Access 2004

kernlab - An S4 Package for Kernel Methods in R

Alexandros Karatzoglou, A. Smola, K. Hornik et al.

kernlab is an extensible package for kernel-based machine learning methods in R. It takes advantage of R's new S4 ob ject model and provides a framework for creating and using kernel-based algorithms. The package contains dot product primitives (kernels), implementations of support vector machines and the relevance vector machine, Gaussian processes, a ranking algorithm, kernel PCA, kernel CCA, and a spectral clustering algorithm. Moreover it provides a general purpose quadratic programming solver, and an incomplete Cholesky decomposition method.

1983 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2000

Less is More: Active Learning with Support Vector Machines

Greg Schohn, David A. Cohn

983 sitasi en Computer Science

Detail Sumber

S2 Open Access 2007

Sparse Feature Learning for Deep Belief Networks

Marc'Aurelio Ranzato, Y-Lan Boureau, Yann LeCun

949 sitasi en Computer Science

Detail Sumber

S2 Open Access 2004

Learning and evaluating classifiers under sample selection bias

B. Zadrozny

913 sitasi en Computer Science

Detail DOI Sumber

Hasil untuk "machine learning"