Semantic Scholar Open Access 2025 5 sitasi

ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context

Joongwon Kim Anirudh Goyal Liang Tan Hanna Hajishirzi Srini Iyer +1 lainnya

Abstrak

We introduce ASTRO, the"Autoregressive Search-Taught Reasoner", a framework for training language models to reason like search algorithms, explicitly leveraging self-reflection, backtracking, and exploration in their outputs. Recently, training large language models (LLMs) via reinforcement learning (RL) has led to the advent of reasoning models with greatly enhanced reasoning capabilities. Open-source replications of reasoning models, while successful, build upon models that already exhibit strong reasoning capabilities along with search behavior observed even before RL. As a result, it is yet unclear how to boost the reasoning capabilities of other non-reasoner models including Llama 3. ASTRO teaches such models to internalize structured search behavior through a synthetic dataset derived from Monte Carlo Tree Search (MCTS) over mathematical problem-solving trajectories. By converting search traces into natural language chain-of-thoughts that capture both successes and recoveries from failure, ASTRO bootstraps models with a rich prior for exploration during RL. We finetune our models on these search-derived traces and further improve performance via RL with verifiable rewards. We apply ASTRO to the Llama 3 family of models and achieve absolute performance gains of 16.0% on MATH-500, 26.9% on AMC 2023, and 20.0% on AIME 2024, especially improving upon challenging problems that require iterative correction. Our results demonstrate that search-inspired training offers a principled way to instill robust reasoning capabilities into open LLMs.

Topik & Kata Kunci

Computer Science

Penulis (6)

Joongwon Kim

Anirudh Goyal

Liang Tan

Hanna Hajishirzi

Srini Iyer

Tianlu Wang

Format Sitasi

APA MLA BibTeX

Kim, J., Goyal, A., Tan, L., Hajishirzi, H., Iyer, S., Wang, T. (2025). ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context. https://doi.org/10.48550/arXiv.2507.00417

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.48550/arXiv.2507.00417

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Total Sitasi: 5×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2507.00417
Akses: Open Access ✓