A Survey of the Usages of Deep Learning for Natural Language Processing
Dan Otter, Julian R. Medina, J. Kalita
Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This article provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to many applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.
1577 sitasi
en
Computer Science, Medicine
Time-series forecasting with deep learning: a survey
Bryan Lim, Stefan Zohren
Numerous deep learning architectures have been developed to accommodate the diversity of time-series datasets across different domains. In this article, we survey common encoder and decoder designs used in both one-step-ahead and multi-horizon time-series forecasting—describing how temporal information is incorporated into predictions by each model. Next, we highlight recent developments in hybrid deep learning models, which combine well-studied statistical models with neural network components to improve pure methods in either category. Lastly, we outline some ways in which deep learning can also facilitate decision support with time-series data. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.
1751 sitasi
en
Medicine, Mathematics
Deep Learning--based Text Classification
Shervin Minaee, E. Cambria, Jianfeng Gao
Deep learning--based models have surpassed classical machine learning--based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this article, we provide a comprehensive review of more than 150 deep learning--based models for text classification developed in recent years, and we discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and we discuss future research directions.
1271 sitasi
en
Computer Science, Mathematics
Deep Learning for Anomaly Detection: A Survey
Raghavendra Chalapathy, Sanjay Chawla
Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted. Within each category we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. For each category, we present we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting these techniques.
1752 sitasi
en
Computer Science, Mathematics
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data
F. Diakogiannis, F. Waldner, P. Caccetta
et al.
Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large within-class and small between-class variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate state of the art performance for pixel level classification of objects. Here we present a novel deep learning architecture, \resuneta, that combines ideas from various state of the art modules used in computer vision for semantic segmentation tasks. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has better convergence properties and behaves well even under the presence of highly imbalanced classes. The performance of our modeling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show state-of-the-art performance with an average F1 score of 92.9% over all classes for our best model.
1785 sitasi
en
Computer Science
Deep Learning for Image Super-Resolution: A Survey
Zhihao Wang, Jian Chen, S. Hoi
Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.
1738 sitasi
en
Computer Science, Medicine
Learning to Reweight Examples for Robust Deep Learning
Mengye Ren, Wenyuan Zeng, Binh Yang
et al.
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
1618 sitasi
en
Computer Science, Mathematics
Spectral–Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework
Zilong Zhong, Jonathan Li, Zhiming Luo
et al.
1525 sitasi
en
Computer Science
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song
Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, networks used in practice are going wider and deeper. On the theoretical side, a long line of works has been focusing on training neural networks with one hidden layer. The theory of multi-layer networks remains largely unsettled. In this work, we prove why stochastic gradient descent (SGD) can find $\textit{global minima}$ on the training objective of DNNs in $\textit{polynomial time}$. We only make two assumptions: the inputs are non-degenerate and the network is over-parameterized. The latter means the network width is sufficiently large: $\textit{polynomial}$ in $L$, the number of layers and in $n$, the number of samples. Our key technique is to derive that, in a sufficiently large neighborhood of the random initialization, the optimization landscape is almost-convex and semi-smooth even with ReLU activations. This implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting. As concrete examples, starting from randomly initialized weights, we prove that SGD can attain 100% training accuracy in classification tasks, or minimize regression loss in linear convergence speed, with running time polynomial in $n,L$. Our theory applies to the widely-used but non-smooth ReLU activation, and to any smooth and possibly non-convex loss functions. In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).
1590 sitasi
en
Computer Science, Mathematics
Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey
Naveed Akhtar, Ajmal Saeed Mian
Deep learning is at the heart of the current rise of artificial intelligence. In the field of computer vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas, deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has recently led to a large influx of contributions in this direction. This paper presents the first comprehensive survey on adversarial attacks on deep learning in computer vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, drawing on the reviewed literature, we provide a broader outlook of this research direction.
2032 sitasi
en
Computer Science
Deep Learning‐Based Crack Damage Detection Using Convolutional Neural Networks
Y. Cha, Wooram Choi, O. Büyüköztürk
2832 sitasi
en
Computer Science, Engineering
SchNet - A deep learning architecture for molecules and materials.
Kristof T. Schütt, H. Sauceda, P. Kindermans
et al.
Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.
2011 sitasi
en
Physics, Computer Science
Deep Reinforcement Learning with Double Q-Learning
H. V. Hasselt, A. Guez, David Silver
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
8861 sitasi
en
Computer Science
Learning Deep Features for Discriminative Localization
Bolei Zhou, A. Khosla, Àgata Lapedriza
et al.
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.
10424 sitasi
en
Computer Science
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan
et al.
A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.
9531 sitasi
en
Computer Science, Mathematics
Human-level control through deep reinforcement learning
Volodymyr Mnih, K. Kavukcuoglu, David Silver
et al.
31070 sitasi
en
Medicine, Computer Science
Learning a Deep Convolutional Network for Image Super-Resolution
Chao Dong, Chen Change Loy, Kaiming He
et al.
5600 sitasi
en
Computer Science
Plant leaf disease classification using EfficientNet deep learning model
Ümit Atila, Murat Uçar, K. Akyol
et al.
Abstract Most plant diseases show visible symptoms, and the technique which is accepted today is that an experienced plant pathologist diagnoses the disease through optical observation of infected plant leaves. The fact that the disease diagnosis process is slow to perform manually and another fact that the success of the diagnosis is proportional to the pathologist's capabilities makes this problem an excellent application area for computer-aided diagnostic systems. Instead of classical machine learning methods, in which manual feature extraction should be flawless to achieve successful results, there is a need for a model that does not need pre-processing and can perform a successful classification. In this study, EfficientNet deep learning architecture was proposed in plant leaf disease classification and the performance of this model was compared with other state-of-the-art deep learning models. The PlantVillage dataset was used to train models. All the models were trained with original and augmented datasets having 55,448 and 61,486 images, respectively. EfficientNet architecture and other deep learning models were trained using transfer learning approach. In the transfer learning, all layers of the models were set to be trainable. The results obtained in the test dataset showed that B5 and B4 models of EfficientNet architecture achieved the highest values compared to other deep learning models in original and augmented datasets with 99.91% and 99.97% respectively for accuracy and 98.42% and 99.39% respectively for precision.
747 sitasi
en
Computer Science
Deep Learning for Time Series Anomaly Detection: A Survey
Zahra Zamanzadeh Darban, G. I. Webb, Shirui Pan
et al.
Time series anomaly detection is important for a wide range of research fields and applications, including financial markets, economics, earth sciences, manufacturing, and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, and heart palpitations, and is therefore of particular interest. The large size and complexity of patterns in time series data have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey provides a structured and comprehensive overview of state-of-the-art deep learning for time series anomaly detection. It provides a taxonomy based on anomaly detection strategies and deep learning models. Aside from describing the basic anomaly detection techniques in each category, their advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. Finally, it summarises open issues in research and challenges faced while adopting deep anomaly detection models to time series data.
484 sitasi
en
Computer Science
AutoKeras: An AutoML Library for Deep Learning
Haifeng Jin, François Chollet, Qingquan Song
et al.
141 sitasi
en
Computer Science