arXiv Open Access 2025

Pie: A Programmable Serving System for Emerging LLM Applications

In Gim Zhiyao Ma Seung-seob Lee Lin Zhong
Lihat Sumber

Abstrak

Emerging large language model (LLM) applications involve diverse reasoning strategies and agentic workflows, straining the capabilities of existing serving systems built on a monolithic token generation loop. This paper introduces Pie, a programmable LLM serving system designed for flexibility and efficiency. Pie decomposes the traditional generation loop into fine-grained service handlers exposed via an API and delegates control of the generation process to user-provided programs, called inferlets. This enables applications to implement new KV cache strategies, bespoke generation logic, and seamlessly integrate computation and I/O-entirely within the application, without requiring modifications to the serving system. Pie executes inferlets using WebAssembly, benefiting from its lightweight sandboxing. Our evaluation shows Pie matches state-of-the-art performance on standard tasks (3-12% latency overhead) while significantly improving latency and throughput (1.3x-3.4x higher) on agentic workflows by enabling application-specific optimizations.

Topik & Kata Kunci

Penulis (4)

I

In Gim

Z

Zhiyao Ma

S

Seung-seob Lee

L

Lin Zhong

Format Sitasi

Gim, I., Ma, Z., Lee, S., Zhong, L. (2025). Pie: A Programmable Serving System for Emerging LLM Applications. https://arxiv.org/abs/2510.24051

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓