Explaining neural scaling laws
Abstrak
Significance The population loss of trained deep neural networks has been empirically observed to improve as a power law in a variety of large models and datasets. We investigate the origins behind such “scaling laws” and provide a taxonomy for different scaling regimes. Our findings are based on derivations in linear random feature models—which, in addition to being a simple fruitful model, also describe the wide network limit of deep neural networks. We further formulate and verify aspects of scaling based on smoothness in interpolating a data manifold. We support our theory with empirical results in realistic settings. Our work provides insights into scaling laws and bridges the large gap between theory and experiment in modern deep learning.
Topik & Kata Kunci
Penulis (5)
Yasaman Bahri
Ethan Dyer
J. Kaplan
Jaehoon Lee
Utkarsh Sharma
Akses Cepat
- Tahun Terbit
- 2021
- Bahasa
- en
- Total Sitasi
- 424×
- Sumber Database
- Semantic Scholar
- DOI
- 10.1073/pnas.2311878121
- Akses
- Open Access ✓