DOAJ Open Access 2025

NLU: An Adaptive, Small-Footprint, Low-Power Neural Learning Unit for Edge and IoT Applications

Amirhossein Rostami Seyed Mohammad Ali Zeinolabedin Liyuan Guo Florian Kelber Heiner Bauer +7 lainnya

Abstrak

Over the last few years, online training of deep neural networks (DNNs) on edge and mobile devices has attracted increasing interest in practical use cases due to their adaptability to new environments, personalization, and privacy preservation. Despite these advantages, online learning on resource-restricted devices is challenging. This work demonstrates a 16-bit floating-point, flexible, power- and memory-efficient neural learning unit (NLU) that can be integrated into processors to accelerate the learning process. To achieve this, we implemented three key strategies: a dynamic control unit, a tile allocation engine, and a neural compute pipeline, which together enhance data reuse and improve the flexibility of the NLU. The NLU was integrated into a system-on-chip (SoC) featuring a 32-bit RISC-V core and memory subsystems, fabricated using GlobalFoundries 22nm FDSOI technology. The design occupies just <inline-formula> <tex-math notation="LaTeX">$0.015mm^{2}$ </tex-math></inline-formula> of silicon area and consumes only 0.379 mW of power. The results show that the NLU can accelerate the training process by up to <inline-formula> <tex-math notation="LaTeX">$24.38\times $ </tex-math></inline-formula> and reduce energy consumption by up to <inline-formula> <tex-math notation="LaTeX">$37.37\times $ </tex-math></inline-formula> compared to a RISC-V implementation with a floating-point unit (FPU). Additionally, compared to the state-of-the-art RISC-V with vector coprocessor, the NLU achieves <inline-formula> <tex-math notation="LaTeX">$4.2\times $ </tex-math></inline-formula> higher energy efficiency (measured in GFLOPS/W). These results demonstrate the feasibility of our design for edge and IoT devices, positioning it favorably among state-of-the-art on-chip learning solutions. Furthermore, we performed mixed-precision on-chip training from scratch for keyword spotting tasks using the Google Speech Commands (GSC) dataset. Training on just 40% of the dataset, the NLU achieved a training accuracy of 89.34% with stochastic rounding.

Topik & Kata Kunci

Electric apparatus and materials. Electric circuits. Electric networks

Penulis (12)

Amirhossein Rostami

Seyed Mohammad Ali Zeinolabedin

Liyuan Guo

Florian Kelber

Heiner Bauer

Andreas Dixius

Stefan Scholze

Marc Berthel

Dennis Walter

Johannes Uhlig

Bernhard Vogginger

Christian Mayr

Format Sitasi

APA MLA BibTeX

Rostami, A., Zeinolabedin, S.M.A., Guo, L., Kelber, F., Bauer, H., Dixius, A. et al. (2025). NLU: An Adaptive, Small-Footprint, Low-Power Neural Learning Unit for Edge and IoT Applications. https://doi.org/10.1109/OJCAS.2025.3546067

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1109/OJCAS.2025.3546067

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.1109/OJCAS.2025.3546067
Akses: Open Access ✓