arXiv Open Access 2025

Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

Timur Galimzyanov Olga Kolomyttseva Egor Bogomolov

Lihat Sumber

Abstrak

We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.

Topik & Kata Kunci

cs.LG cs.AI cs.IR

Penulis (3)

Timur Galimzyanov

Olga Kolomyttseva

Egor Bogomolov

Format Sitasi

APA MLA BibTeX

Galimzyanov, T., Kolomyttseva, O., Bogomolov, E. (2025). Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets. https://arxiv.org/abs/2510.20609

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓