DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
Abstrak
General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.
Topik & Kata Kunci
Penulis (198)
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Jun-Mei Song
Ruoyu Zhang
R. Xu
Qihao Zhu
Shirong Ma
Peiyi Wang
Xiaoling Bi
Xiaokang Zhang
Xingkai Yu
Yu Wu
Z. F. Wu
Zhibin Gou
Zhihong Shao
Zhuoshu Li
Ziyi Gao
A. Liu
Bing Xue
Bing-Li Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
C. Deng
Chenyu Zhang
C. Ruan
Damai Dai
Deli Chen
Dong-Li Ji
Erhang Li
Fangyun Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guowei Li
H. Zhang
Han Bao
Hanwei Xu
Haocheng Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
JingChang Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
J. Cai
J. Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
K. Yu
Lean Wang
Lecong Zhang
Liang Zhao
Litong Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
M. Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qiancheng Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Runji Wang
R. J. Chen
R. Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shengfeng Ye
Shiyu Wang
Shuiping Yu
Shunfeng Zhou
Shuting Pan
S. Li
Shuang Zhou
Shao-Kang Wu
Tao Yun
Tian Pei
T. Sun
T. Wang
Wangding Zeng
Wanjia Zhao
Wen Liu
W. Liang
Wenjun Gao
Wen-Xia Yu
Wentao Zhang
W. Xiao
Wei An
Xiaodong Liu
Xiaohan Wang
Xiaokang Chen
X. Nie
Xin Cheng
Xin Liu
Xin Xie
Xingchao Liu
Xinyu Yang
Xinyuan Li
Xuecheng Su
Xuheng Lin
X. Q. Li
Xiangyu Jin
Xi-Cheng Shen
Xiaosha Chen
Xiaowen Sun
Xiaoxiang Wang
Xinnan Song
Xinyi Zhou
Xianzu Wang
Xinxia Shan
Y. K. Li
Y. Q. Wang
Y. X. Wei
Yang Zhang
Yanhong Xu
Yao Li
Yao Zhao
Yaofeng Sun
Yaohui Wang
Yi Yu
Yichao Zhang
Yifan Shi
Yi Xiong
Ying He
Y. Piao
Yisong Wang
Yixuan Tan
Yiyang Ma
Yiyuan Liu
Yongqiang Guo
Y. Ou
Yuduan Wang
Yue Gong
Yu-Jing Zou
Yujia He
Yunfan Xiong
Yu-Wei Luo
Yu-mei You
Yuxuan Liu
Yuyang Zhou
Y. X. Zhu
Yanping Huang
Yao Li
Yi Zheng
Yuchen Zhu
Yunxiang Ma
Ying Tang
Y. Zha
Yuting Yan
Z. Ren
Z. Ren
Zhangli Sha
Zhe Fu
Zhean Xu
Zhenda Xie
Zhen-guo Zhang
Zhewen Hao
Zhicheng Ma
Zhigang Yan
Zhiyu Wu
Zihui Gu
Zijia Zhu
Zijun Liu
Zi-An Li
Ziwei Xie
Ziyang Song
Zizheng Pan
Zhen Huang
Zhipeng Xu
Zhongyu Zhang
Zhen Zhang
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Total Sitasi
- 5344×
- Sumber Database
- Semantic Scholar
- DOI
- 10.1038/s41586-025-09422-z
- Akses
- Open Access ✓