Hasil untuk "deep learning"

Menampilkan 20 dari ~11031378 hasil · dari Semantic Scholar, DOAJ, CrossRef

JSON API
S2 Open Access 2020
Deep learning in environmental remote sensing: Achievements and challenges

Qiangqiang Yuan, Huanfeng Shen, Tongwen Li et al.

Abstract Various forms of machine learning (ML) methods have historically played a valuable role in environmental remote sensing research. With an increasing amount of “big data” from earth observation and rapid advances in ML, increasing opportunities for novel methods have emerged to aid in earth environmental monitoring. Over the last decade, a typical and state-of-the-art ML framework named deep learning (DL), which is developed from the traditional neural network (NN), has outperformed traditional models with considerable improvement in performance. Substantial progress in developing a DL methodology for a variety of earth science applications has been observed. Therefore, this review will concentrate on the use of the traditional NN and DL methods to advance the environmental remote sensing process. First, the potential of DL in environmental remote sensing, including land cover mapping, environmental parameter retrieval, data fusion and downscaling, and information reconstruction and prediction, will be analyzed. A typical network structure will then be introduced. Afterward, the applications of DL environmental monitoring in the atmosphere, vegetation, hydrology, air and land surface temperature, evapotranspiration, solar radiation, and ocean color are specifically reviewed. Finally, challenges and future perspectives will be comprehensively analyzed and discussed.

1413 sitasi en Computer Science
S2 Open Access 2020
Deep Learning for Anomaly Detection

Guansong Pang, Chunhua Shen, Longbing Cao et al.

Anomaly detection, a.k.a. outlier detection or novelty detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection, has emerged as a critical direction. This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods. We review their key intuitions, objective functions, underlying assumptions, advantages, and disadvantages and discuss how they address the aforementioned challenges. We further discuss a set of possible future opportunities and new perspectives on addressing the challenges.

1260 sitasi en Computer Science, Mathematics
S2 Open Access 2019
Deep Learning Approach for Intelligent Intrusion Detection System

R. Vinayakumar, M. Alazab, I. K. P. S. Senior Member et al.

Machine learning techniques are being widely used to develop an intrusion detection system (IDS) for detecting and classifying cyberattacks at the network-level and the host-level in a timely and automatic manner. However, many challenges arise since malicious attacks are continually changing and are occurring in very large volumes requiring a scalable solution. There are different malware datasets available publicly for further research by cyber security community. However, no existing study has shown the detailed analysis of the performance of various machine learning algorithms on various publicly available datasets. Due to the dynamic nature of malware with continuously changing attacking methods, the malware datasets available publicly are to be updated systematically and benchmarked. In this paper, a deep neural network (DNN), a type of deep learning model, is explored to develop a flexible and effective IDS to detect and classify unforeseen and unpredictable cyberattacks. The continuous change in network behavior and rapid evolution of attacks makes it necessary to evaluate various datasets which are generated over the years through static and dynamic approaches. This type of study facilitates to identify the best algorithm which can effectively work in detecting future cyberattacks. A comprehensive evaluation of experiments of DNNs and other classical machine learning classifiers are shown on various publicly available benchmark malware datasets. The optimal network parameters and network topologies for DNNs are chosen through the following hyperparameter selection methods with KDDCup 99 dataset. All the experiments of DNNs are run till 1,000 epochs with the learning rate varying in the range [0.01–0.5]. The DNN model which performed well on KDDCup 99 is applied on other datasets, such as NSL-KDD, UNSW-NB15, Kyoto, WSN-DS, and CICIDS 2017, to conduct the benchmark. Our DNN model learns the abstract and high-dimensional feature representation of the IDS data by passing them into many hidden layers. Through a rigorous experimental testing, it is confirmed that DNNs perform well in comparison with the classical machine learning classifiers. Finally, we propose a highly scalable and hybrid DNNs framework called scale-hybrid-IDS-AlertNet which can be used in real-time to effectively monitor the network traffic and host-level events to proactively alert possible cyberattacks.

1401 sitasi en Computer Science
S2 Open Access 2019
Deep learning for electroencephalogram (EEG) classification tasks: a review

Alexander Craik, Yongtian He, J. Contreras-Vidal

Objective. Electroencephalography (EEG) analysis has been an important tool in neuroscience with applications in neuroscience, neural engineering (e.g. Brain–computer interfaces, BCI’s), and even commercial applications. Many of the analytical tools used in EEG studies have used machine learning to uncover relevant information for neural classification and neuroimaging. Recently, the availability of large EEG data sets and advances in machine learning have both led to the deployment of deep learning architectures, especially in the analysis of EEG signals and in understanding the information it may contain for brain functionality. The robust automatic classification of these signals is an important step towards making the use of EEG more practical in many applications and less reliant on trained professionals. Towards this goal, a systematic review of the literature on deep learning applications to EEG classification was performed to address the following critical questions: (1) Which EEG classification tasks have been explored with deep learning? (2) What input formulations have been used for training the deep networks? (3) Are there specific deep learning network structures suitable for specific types of tasks? Approach. A systematic literature review of EEG classification using deep learning was performed on Web of Science and PubMed databases, resulting in 90 identified studies. Those studies were analyzed based on type of task, EEG preprocessing methods, input type, and deep learning architecture. Main results. For EEG classification tasks, convolutional neural networks, recurrent neural networks, deep belief networks outperform stacked auto-encoders and multi-layer perceptron neural networks in classification accuracy. The tasks that used deep learning fell into five general groups: emotion recognition, motor imagery, mental workload, seizure detection, event related potential detection, and sleep scoring. For each type of task, we describe the specific input formulation, major characteristics, and end classifier recommendations found through this review. Significance. This review summarizes the current practices and performance outcomes in the use of deep learning for EEG classification. Practical suggestions on the selection of many hyperparameters are provided in the hope that they will promote or guide the deployment of deep learning to EEG datasets in future research.

1450 sitasi en Medicine, Physics
S2 Open Access 2019
Review of Deep Learning Algorithms and Architectures

A. Shrestha, A. Mahmood

Deep learning (DL) is playing an increasingly important role in our lives. It has already made a huge impact in areas, such as cancer diagnosis, precision medicine, self-driving cars, predictive forecasting, and speech recognition. The painstakingly handcrafted feature extractors used in traditional learning, classification, and pattern recognition systems are not scalable for large-sized data sets. In many cases, depending on the problem complexity, DL can also overcome the limitations of earlier shallow networks that prevented efficient training and abstractions of hierarchical representations of multi-dimensional training data. Deep neural network (DNN) uses multiple (deep) layers of units with highly optimized algorithms and architectures. This paper reviews several optimization methods to improve the accuracy of the training and to reduce training time. We delve into the math behind training algorithms used in recent deep networks. We describe current shortcomings, enhancements, and implementations. The review also covers different types of deep architectures, such as deep convolution networks, deep residual networks, recurrent neural networks, reinforcement learning, variational autoencoders, and others.

1479 sitasi en Computer Science
S2 Open Access 2019
Deep Learning for Hyperspectral Image Classification: An Overview

Shutao Li, Weiwei Song, Leyuan Fang et al.

Hyperspectral image (HSI) classification has become a hot topic in the field of remote sensing. In general, the complex characteristics of hyperspectral data make the accurate classification of such data challenging for traditional machine learning methods. In addition, hyperspectral imaging often deals with an inherently nonlinear relation between the captured spectral information and the corresponding materials. In recent years, deep learning has been recognized as a powerful feature-extraction tool to effectively address nonlinear problems and widely used in a number of image processing tasks. Motivated by those successful applications, deep learning has also been introduced to classify HSIs and demonstrated good performance. This survey paper presents a systematic review of deep learning-based HSI classification literatures and compares several strategies for this topic. Specifically, we first summarize the main challenges of HSI classification which cannot be effectively overcome by traditional machine learning methods, and also introduce the advantages of deep learning to handle these problems. Then, we build a framework that divides the corresponding works into spectral-feature networks, spatial-feature networks, and spectral–spatial-feature networks to systematically review the recent achievements in deep learning-based HSI classification. In addition, considering the fact that available training samples in the remote sensing field are usually very limited and training deep networks require a large number of samples, we include some strategies to improve classification performance, which can provide some guidelines for future studies on this topic. Finally, several representative deep learning-based classification methods are conducted on real HSIs in our experiments.

1523 sitasi en Computer Science, Engineering
S2 Open Access 2020
Temporal Graph Networks for Deep Learning on Dynamic Graphs

Emanuele Rossi, B. Chamberlain, Fabrizio Frasca et al.

Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.

944 sitasi en Computer Science, Mathematics
S2 Open Access 2018
Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning

Milad Nasr, R. Shokri, Amir Houmansadr

Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.

1748 sitasi en Computer Science, Mathematics
S2 Open Access 2018
An End-to-End Deep Learning Architecture for Graph Classification

Muhan Zhang, Zhicheng Cui, Marion Neumann et al.

Neural networks are typically designed to deal with data in tensor forms. In this paper, we propose a novel neural network architecture accepting graphs of arbitrary structure. Given a dataset containing graphs in the form of (G,y) where G is a graph and y is its class, we aim to develop neural networks that read the graphs directly and learn a classification function. There are two main challenges: 1) how to extract useful features characterizing the rich information encoded in a graph for classification purpose, and 2) how to sequentially read a graph in a meaningful and consistent order. To address the first challenge, we design a localized graph convolution model and show its connection with two graph kernels. To address the second challenge, we design a novel SortPooling layer which sorts graph vertices in a consistent order so that traditional neural networks can be trained on the graphs. Experiments on benchmark graph classification datasets demonstrate that the proposed architecture achieves highly competitive performance with state-of-the-art graph kernels and other graph neural network methods. Moreover, the architecture allows end-to-end gradient-based training with original graphs, without the need to first transform graphs into vectors.

1692 sitasi en Computer Science
S2 Open Access 2018
Robust Physical-World Attacks on Deep Learning Visual Classification

Kevin Eykholt, I. Evtimov, Earlence Fernandes et al.

Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations. Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm, Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. With a perturbation in the form of only black and white stickers, we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8% of the captured video frames obtained on a moving vehicle (field test) for the target classifier.

2109 sitasi en Computer Science
S2 Open Access 2018
Horovod: fast and easy distributed deep learning in TensorFlow

Alexander Sergeev, Mike Del Balso

Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library's API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at this https URL

1360 sitasi en Computer Science, Mathematics
S2 Open Access 2015
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter

We introduce the "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized ReLUs (PReLUs), ELUs alleviate the vanishing gradient problem via the identity for positive values. However, ELUs have improved learning characteristics compared to the units with other activation functions. In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero like batch normalization but with lower computational complexity. Mean shifts toward zero speed up learning by bringing the normal gradient closer to the unit natural gradient because of a reduced bias shift effect. While LReLUs and PReLUs have negative values, too, they do not ensure a noise-robust deactivation state. ELUs saturate to a negative value with smaller inputs and thereby decrease the forward propagated variation and information. Therefore, ELUs code the degree of presence of particular phenomena in the input, while they do not quantitatively model the degree of their absence. In experiments, ELUs lead not only to faster learning, but also to significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. On CIFAR-100 ELUs networks significantly outperform ReLU networks with batch normalization while batch normalization does not improve ELU networks. ELU networks are among the top 10 reported CIFAR-10 results and yield the best published result on CIFAR-100, without resorting to multi-view evaluation or model averaging. On ImageNet, ELU networks considerably speed up learning compared to a ReLU network with the same architecture, obtaining less than 10% classification error for a single crop, single model network.

6007 sitasi en Computer Science, Mathematics
S2 Open Access 2015
Privacy-preserving deep learning

R. Shokri, Vitaly Shmatikov

Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extrajudicial surveillance. Many data owners-for example, medical institutions that may want to apply deep learning methods to clinical records-are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning. In this paper, we present a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets.

2622 sitasi en Computer Science
S2 Open Access 2011
Multimodal Deep Learning

Jiquan Ngiam, A. Khosla, Mingyu Kim et al.

Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep networks that learn features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time. Furthermore, we show how to learn a shared representation between modalities and evaluate it on a unique task, where the classifier is trained with audio-only data but tested with video-only data and vice-versa. Our models are validated on the CUAVE and AVLetters datasets on audio-visual speech classification, demonstrating best published visual speech classification on AVLetters and effective shared representation learning.

3454 sitasi en Computer Science, Mathematics
S2 Open Access 2022
Deep learning in optical metrology: a review

Chao Zuo, Jiaming Qian, Shijie Feng et al.

With the advances in scientific foundations and technological implementations, optical metrology has become versatile problem-solving backbones in manufacturing, fundamental research, and engineering applications, such as quality control, nondestructive testing, experimental mechanics, and biomedicine. In recent years, deep learning, a subfield of machine learning, is emerging as a powerful tool to address problems by learning from data, largely driven by the availability of massive datasets, enhanced computational power, fast data storage, and novel training algorithms for the deep neural network. It is currently promoting increased interests and gaining extensive attention for its utilization in the field of optical metrology. Unlike the traditional “physics-based” approach, deep-learning-enabled optical metrology is a kind of “data-driven” approach, which has already provided numerous alternative solutions to many challenging problems in this field with better performances. In this review, we present an overview of the current status and the latest progress of deep-learning technologies in the field of optical metrology. We first briefly introduce both traditional image-processing algorithms in optical metrology and the basic concepts of deep learning, followed by a comprehensive review of its applications in various optical metrology tasks, such as fringe denoising, phase retrieval, phase unwrapping, subset correlation, and error compensation. The open challenges faced by the current deep-learning approach in optical metrology are then discussed. Finally, the directions for future research are outlined.

567 sitasi en Medicine
S2 Open Access 2022
Deep learning, reinforcement learning, and world models

Yu Matsuo, Yann LeCun, M. Sahani et al.

Deep learning (DL) and reinforcement learning (RL) methods seem to be a part of indispensable factors to achieve human-level or super-human AI systems. On the other hand, both DL and RL have strong connections with our brain functions and with neuroscientific findings. In this review, we summarize talks and discussions in the "Deep Learning and Reinforcement Learning" session of the symposium, International Symposium on Artificial Intelligence and Brain Science. In this session, we discussed whether we can achieve comprehensive understanding of human intelligence based on the recent advances of deep learning and reinforcement learning algorithms. Speakers contributed to provide talks about their recent studies that can be key technologies to achieve human-level intelligence.

450 sitasi en Computer Science, Medicine
S2 Open Access 2022
Deep-learning seismology

S. Mousavi, G. Beroza

Seismic waves from earthquakes and other sources are used to infer the structure and properties of Earth’s interior. The availability of large-scale seismic datasets and the suitability of deep-learning techniques for seismic data processing have pushed deep learning to the forefront of fundamental, long-standing research investigations in seismology. However, some aspects of applying deep learning to seismology are likely to prove instructive for the geosciences, and perhaps other research areas more broadly. Deep learning is a powerful approach, but there are subtleties and nuances in its application. We present a systematic overview of trends, challenges, and opportunities in applications of deep-learning methods in seismology. Description Large-scale learning The large amount and availability of datasets in seismology creates a great opportunity to apply machine learning and artificial intelligence to data processing. Mousavi and Beroza provide a comprehensive review of the deep-learning techniques being applied to seismic datasets, covering approaches, limitations, and opportunities. The trends in data processing and analysis can be instructive for geoscience and other research areas more broadly. —BG The ways in which deep learning can help process and analyze large seismological datasets are reviewed. BACKGROUND Seismology is the study of seismic waves to understand their origin—most obviously, sudden fault slip in earthquakes, but also explosions, volcanic eruptions, glaciers, landslides, ocean waves, vehicular traffic, aircraft, trains, wind, air guns, and thunderstorms, for example. Seismology uses those same waves to infer the structure and properties of planetary interiors. Because sources can generate waves at any time, seismic ground motion is recorded continuously, at typical sampling rates of 100 points per second, for three components of motion, and on arrays that can include thousands of sensors. Although seismology is clearly a data-rich science, it often is a data-driven science as well, with new phenomena and unexpected behavior discovered with regularity. And for at least some tasks, the careful and painstaking work of seismic analysts over decades and around the world has also made seismology a data label–rich science. This facet makes it fertile ground for deep learning, which has entered almost every subfield of seismology and outperforms classical approaches, often dramatically, for many seismological tasks. ADVANCES Seismic wave identification and onset-time, first-break determination for seismic P and S waves within continuous seismic data are foundational to seismology and are particularly well suited to deep learning because of the availability of massive, labeled datasets. It has received particularly close attention, and that has led, for example, to the development of deep learning–based earthquake catalogs that can feature more than an order of magnitude more events than are present in conventional catalogs. Deep learning has shown the ability to outperform classical approaches for other important seismological tasks as well, including the discrimination of earthquakes from explosions and other sources, separation of seismic signals from background noise, seismic image processing and interpretation, and Earth model inversion. OUTLOOK The development of increasingly cost-effective sensors and emerging ground-motion sensing technologies, such as fiber optic cable and accelerometers in smart devices, portend a continuing acceleration of seismological data volumes, so that deep learning is likely to become essential to seismology’s future. Deep learning’s nonlinear mapping ability, sequential data modeling, automatic feature extraction, dimensionality reduction, and reparameterization are all advantageous for processing high-dimensional seismic data, particularly because those data are noisy and, from the point of view of mathematical inference, incomplete. Deep learning for scientific discovery and direct extraction of insight into seismological processes is clearly just getting started. Aspects of seismology pose interesting additional challenges for deep learning. Many of the most important problems in earthquake seismology—such as earthquake forecasting, ground motion prediction, and rapid earthquake alerting—concern large and damaging earthquakes that are (fortunately) rare. That rarity poses a fundamental challenge for the data-hungry methods of deep learning: How can we train reliable models, and how do we validate them well enough to rely on them when data are scarce and opportunities to test models are infrequent? Further, how can we operationalize deep-learning techniques in such a situation, when the mechanisms by which they make predictions from data may not be easily explained, and the consequences of incorrect models are high? Incorporating domain knowledge through physics-based and explainable deep learning and setting up standard benchmarking and evaluation protocols will help ensure progress, as is the nascent emergence of a seismological data science ecosystem. More generally, a combination of data science literacy for geoscientists as well as recruiting data science expertise will help to ensure that deep-learning seismology reaches its full potential. Deep-learning processing of seismic data and incorporation of domain knowledge can lead to new capabilities and new insights across seismology. A. MASTIN/SCIENCE, TOP RIGHT: CARA HARWOOD/UNIVERSITY OF CALIFORNIA-DAVIS/CC BY-NC-SA 3.0 MIDDLE RIGHT: JOHAN SWANEPOEL/SCIENCE SOURCE

400 sitasi en Medicine
S2 Open Access 2021
Deep learning: a statistical viewpoint

P. Bartlett, A. Montanari, A. Rakhlin

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

322 sitasi en Computer Science, Mathematics

Halaman 7 dari 551569