Semantic Scholar Open Access 2025 5 sitasi

ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context

Joongwon Kim Anirudh Goyal Liang Tan Hanna Hajishirzi Srini Iyer +1 lainnya

Abstrak

We introduce ASTRO, the"Autoregressive Search-Taught Reasoner", a framework for training language models to reason like search algorithms, explicitly leveraging self-reflection, backtracking, and exploration in their outputs. Recently, training large language models (LLMs) via reinforcement learning (RL) has led to the advent of reasoning models with greatly enhanced reasoning capabilities. Open-source replications of reasoning models, while successful, build upon models that already exhibit strong reasoning capabilities along with search behavior observed even before RL. As a result, it is yet unclear how to boost the reasoning capabilities of other non-reasoner models including Llama 3. ASTRO teaches such models to internalize structured search behavior through a synthetic dataset derived from Monte Carlo Tree Search (MCTS) over mathematical problem-solving trajectories. By converting search traces into natural language chain-of-thoughts that capture both successes and recoveries from failure, ASTRO bootstraps models with a rich prior for exploration during RL. We finetune our models on these search-derived traces and further improve performance via RL with verifiable rewards. We apply ASTRO to the Llama 3 family of models and achieve absolute performance gains of 16.0% on MATH-500, 26.9% on AMC 2023, and 20.0% on AIME 2024, especially improving upon challenging problems that require iterative correction. Our results demonstrate that search-inspired training offers a principled way to instill robust reasoning capabilities into open LLMs.

Topik & Kata Kunci

Penulis (6)

J

Joongwon Kim

A

Anirudh Goyal

L

Liang Tan

H

Hanna Hajishirzi

S

Srini Iyer

T

Tianlu Wang

Format Sitasi

Kim, J., Goyal, A., Tan, L., Hajishirzi, H., Iyer, S., Wang, T. (2025). ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context. https://doi.org/10.48550/arXiv.2507.00417

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.48550/arXiv.2507.00417
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2507.00417
Akses
Open Access ✓