Semantic Scholar Open Access 2018 1590 sitasi

A Convergence Theory for Deep Learning via Over-Parameterization

Zeyuan Allen-Zhu Yuanzhi Li Zhao Song

Lihat Sumber

Abstrak

Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, networks used in practice are going wider and deeper. On the theoretical side, a long line of works has been focusing on training neural networks with one hidden layer. The theory of multi-layer networks remains largely unsettled. In this work, we prove why stochastic gradient descent (SGD) can find $\textit{global minima}$ on the training objective of DNNs in $\textit{polynomial time}$. We only make two assumptions: the inputs are non-degenerate and the network is over-parameterized. The latter means the network width is sufficiently large: $\textit{polynomial}$ in $L$, the number of layers and in $n$, the number of samples. Our key technique is to derive that, in a sufficiently large neighborhood of the random initialization, the optimization landscape is almost-convex and semi-smooth even with ReLU activations. This implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting. As concrete examples, starting from randomly initialized weights, we prove that SGD can attain 100% training accuracy in classification tasks, or minimize regression loss in linear convergence speed, with running time polynomial in $n,L$. Our theory applies to the widely-used but non-smooth ReLU activation, and to any smooth and possibly non-convex loss functions. In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).

Topik & Kata Kunci

Computer Science Mathematics

Penulis (3)

Zeyuan Allen-Zhu

Yuanzhi Li

Zhao Song

Format Sitasi

APA MLA BibTeX

Allen-Zhu, Z., Li, Y., Song, Z. (2018). A Convergence Theory for Deep Learning via Over-Parameterization. https://www.semanticscholar.org/paper/42ec3db12a2e4628885451b13035c2e975220a25

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2018
Bahasa: en
Total Sitasi: 1590×
Sumber Database: Semantic Scholar
Akses: Open Access ✓