POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks?
Abstrak
Efficient deployment of deep learning models on resource-constrained devices requires balancing accuracy with energy consumption and/or latency. Quantization is a proven method to achieve this balance by reducing the precision of neural network weights and activations. However, simply changing the precision does not enable direct iso-accuracy and iso-energy comparisons. To address this, we combine a realistic processor energy model with a network filter multiplier that scales the number of channels, thereby enabling such comparisons. This work presents a Pareto-Optimal Quantization (POQ) methodology aimed at mapping a neural network architecture to a specific hardware platform while systematically exploring the design space in between to identify the most effective quantization strategy. Our approach evaluates how different design choices impact the accuracy-energy trade-off. Using detailed energy modeling instead of proxy metrics, our results reveal that 8-bit integer (<monospace>int8</monospace>) quantization is Pareto-Optimal for MobileNetV2, providing up to <inline-formula> <tex-math notation="LaTeX">$2.8\times $ </tex-math></inline-formula> energy savings or 10% higher accuracy compared to 16-bit floating-point (<monospace>fp16</monospace>). Furthermore, employing high-precision residuals shifts the Pareto frontier, making 4-bit integer (<monospace>int4</monospace>) quantization optimal, achieving up to <inline-formula> <tex-math notation="LaTeX">$1.9\times $ </tex-math></inline-formula> additional energy reduction or 2% additional accuracy gains. Moreover, our findings emphasize the role of DRAM energy in certain model configurations and highlight the importance of precise energy modeling. These results reflect the application of our POQ methodology to the practical deployment of energy-efficient deep learning models on constrained hardware.
Topik & Kata Kunci
Penulis (3)
Floran De Putter
Sherif Eissa
Henk Corporaal
Akses Cepat
PDF tidak tersedia langsung
Cek di sumber asli →- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1109/ACCESS.2025.3567046
- Akses
- Open Access ✓