arXiv Open Access 2025

Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems

Kayhan Behdin Ata Fatahibaarzi Qingquan Song Yun Dai Aman Gupta +15 lainnya

Lihat Sumber

Abstrak

Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendation systems to generative tasks. Although scaling laws indicate that larger models generally yield better generalization and performance, their substantial computational requirements often render them impractical for many real-world scenarios at scale. In this paper, we present a comprehensive set of insights for training and deploying small language models (SLMs) that deliver high performance for a variety of industry use cases. We focus on two key techniques: (1) knowledge distillation and (2) model compression via structured pruning and quantization. These approaches enable SLMs to retain much of the quality of their larger counterparts while significantly reducing training/serving costs and latency. We detail the impact of these techniques on a variety of use cases in a large professional social network platform and share deployment lessons, including hardware optimization strategies that improve speed and throughput for both predictive and reasoning-based applications in Recommendation Systems.

Topik & Kata Kunci

cs.IR cs.LG

Penulis (20)

Kayhan Behdin

Ata Fatahibaarzi

Qingquan Song

Yun Dai

Aman Gupta

Zhipeng Wang

Shao Tang

Hejian Sang

Gregory Dexter

Sirou Zhu

Siyu Zhu

Tejas Dharamsi

Vignesh Kothapalli

Zhoutong Fu

Yihan Cao

Pin-Lun Hsu

Fedor Borisyuk

Natesh Pillai

Luke Simon

Rahul Mazumder

Format Sitasi

APA MLA BibTeX

Behdin, K., Fatahibaarzi, A., Song, Q., Dai, Y., Gupta, A., Wang, Z. et al. (2025). Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems. https://arxiv.org/abs/2502.14305

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓