Semantic Scholar Open Access 2018 375 sitasi

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Charles Eckert Xiaowei Wang Jingcheng Wang Arun K. Subramaniyan R. Iyer +3 lainnya

Lihat Sumber DOI

Abstrak

This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 8.3× over state-of-art multi-core CPU (Xeon E5), 7.7× over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4× over CPU (2.2× over GPU), while reducing power consumption by 50% over CPU (53% over GPU).

Topik & Kata Kunci

Computer Science

Penulis (8)

Charles Eckert

Xiaowei Wang

Jingcheng Wang

Arun K. Subramaniyan

R. Iyer

D. Sylvester

D. Blaauw

R. Das

Format Sitasi

APA MLA BibTeX

Eckert, C., Wang, X., Wang, J., Subramaniyan, A.K., Iyer, R., Sylvester, D. et al. (2018). Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. https://doi.org/10.1109/ISCA.2018.00040

Akses Cepat

Lihat di Sumber doi.org/10.1109/ISCA.2018.00040

Informasi Jurnal

Tahun Terbit: 2018
Bahasa: en
Total Sitasi: 375×
Sumber Database: Semantic Scholar
DOI: 10.1109/ISCA.2018.00040
Akses: Open Access ✓