Semantic Scholar Open Access 2016 2894 sitasi

CNN architectures for large-scale audio classification

Shawn Hershey Sourish Chaudhuri D. Ellis J. Gemmeke A. Jansen +8 lainnya

Lihat Sumber DOI

Abstrak

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.

Topik & Kata Kunci

Computer Science Mathematics

Penulis (13)

Shawn Hershey

Sourish Chaudhuri

D. Ellis

J. Gemmeke

A. Jansen

R. C. Moore

M. Plakal

D. Platt

R. Saurous

Bryan Seybold

M. Slaney

Ron J. Weiss

K. Wilson

Format Sitasi

APA MLA BibTeX

Hershey, S., Chaudhuri, S., Ellis, D., Gemmeke, J., Jansen, A., Moore, R.C. et al. (2016). CNN architectures for large-scale audio classification. https://doi.org/10.1109/ICASSP.2017.7952132

Akses Cepat

Lihat di Sumber doi.org/10.1109/ICASSP.2017.7952132

Informasi Jurnal

Tahun Terbit: 2016
Bahasa: en
Total Sitasi: 2894×
Sumber Database: Semantic Scholar
DOI: 10.1109/ICASSP.2017.7952132
Akses: Open Access ✓