Semantic Scholar Open Access 2024 186 sitasi

AlphaMath Almost Zero: process Supervision without process

Guoxin Chen Minpeng Liao Chengxi Li Kai Fan

Abstrak

Although recent advancements in large language models (LLMs) have significantly improved their performance on various tasks, they still face challenges with complex and symbolic multi-step reasoning, particularly in mathematical reasoning. To bolster the mathematical reasoning capabilities of LLMs, most existing efforts concentrate on seeking assistance from either domain experts or GPT-4 for high-quality process-supervised data, which is not only expensive but also labor-intensive. In our study, we propose an innovative framework, AlphaMath, that bypasses the need for process annotations (from humans or GPTs) by leveraging Monte Carlo Tree Search (MCTS). This framework focuses on unleashing the potential of a well-pretrained LLM to autonomously enhance its mathematical reasoning. Specifically, we integrate a value model with the LLM, automatically generating both process supervision and step-level evaluation signals in MCTS. Furthermore, we propose an efficient inference strategy, step-level beam search, where the value model is crafted to assist the policy model (i.e., LLM) in navigating more effective reasoning paths, rather than solely relying on prior probabilities. The experimental results on both in-domain and out-of-domain datasets demonstrate that even without GPT-4 or human-annotated process supervision, our AlphaMath framework achieves comparable or superior results to previous state-of-the-art methods.

Topik & Kata Kunci

Computer Science

Penulis (4)

Guoxin Chen

Minpeng Liao

Chengxi Li

Kai Fan

Format Sitasi

APA MLA BibTeX

Chen, G., Liao, M., Li, C., Fan, K. (2024). AlphaMath Almost Zero: process Supervision without process. https://doi.org/10.52202/079017-0870

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.52202/079017-0870

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Total Sitasi: 186×
Sumber Database: Semantic Scholar
DOI: 10.52202/079017-0870
Akses: Open Access ✓