arXiv Open Access 2025

Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

Yu Li Yuan Huang Tao Wang Caiyu Fan Xiansheng Cai +18 lainnya
Lihat Sumber

Abstrak

Most scientific materials compress reasoning, presenting conclusions while omitting the derivational chains that justify them. This compression hinders verification by lacking explicit, step-wise justifications and inhibits cross-domain links by collapsing the very pathways that establish the logical and causal connections between concepts. We introduce a scalable framework that decompresses scientific reasoning, constructing a verifiable Long Chain-of-Thought (LCoT) knowledge base and projecting it into an emergent encyclopedia, SciencePedia. Our pipeline operationalizes an endpoint-driven, reductionist strategy: a Socratic agent, guided by a curriculum of around 200 courses, generates approximately 3 million first-principles questions. To ensure high fidelity, multiple independent solver models generate LCoTs, which are then rigorously filtered by prompt sanitization and cross-model answer consensus, retaining only those with verifiable endpoints. This verified corpus powers the Brainstorm Search Engine, which performs inverse knowledge search -- retrieving diverse, first-principles derivations that culminate in a target concept. This engine, in turn, feeds the Plato synthesizer, which narrates these verified chains into coherent articles. The initial SciencePedia comprises approximately 200,000 fine-grained entries spanning mathematics, physics, chemistry, biology, engineering, and computation. In evaluations across six disciplines, Plato-synthesized articles (conditioned on retrieved LCoTs) exhibit substantially higher knowledge-point density and significantly lower factual error rates than an equally-prompted baseline without retrieval (as judged by an external LLM). Built on this verifiable LCoT knowledge base, this reasoning-centric approach enables trustworthy, cross-domain scientific synthesis at scale and establishes the foundation for an ever-expanding encyclopedia.

Topik & Kata Kunci

Penulis (23)

Y

Yu Li

Y

Yuan Huang

T

Tao Wang

C

Caiyu Fan

X

Xiansheng Cai

S

Sihan Hu

X

Xinzijian Liu

C

Cheng Shi

M

Mingjun Xu

Z

Zhen Wang

Y

Yan Wang

X

Xiangqi Jin

T

Tianhan Zhang

L

Linfeng Zhang

L

Lei Wang

Y

Youjin Deng

P

Pan Zhang

W

Weijie Sun

X

Xinyu Li

W

Weinan E

L

Linfeng Zhang

Z

Zhiyuan Yao

K

Kun Chen

Format Sitasi

Li, Y., Huang, Y., Wang, T., Fan, C., Cai, X., Hu, S. et al. (2025). Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base. https://arxiv.org/abs/2510.26854

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓