Hasil "q-bio.NC" - JURNALIN

S2 Open Access 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

DeepSeek-AI, Daya Guo, Dejian Yang et al.

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

5344 sitasi en Medicine, Computer Science

Detail DOI Sumber

S2 Open Access 2023

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland et al.

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are"Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.

2229 sitasi en Computer Science

Detail Sumber

S2 Open Access 2024

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Rafael Rafailov, Joey Hejna, Ryan Park et al.

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch between the two approaches. Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm. In this work we rectify this difference. We theoretically show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation. Using our theoretical results, we provide three concrete empirical insights. First, we show that because of its token level interpretation, DPO is able to perform some type of credit assignment. Next, we prove that under the token level formulation, classical search-based algorithms, such as MCTS, which have recently been applied to the language generation space, are equivalent to likelihood-based search on a DPO policy. Empirically we show that a simple beam search yields meaningful improvement over the base DPO policy. Finally, we show how the choice of reference policy causes implicit rewards to decline during training. We conclude by discussing applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.

247 sitasi en Computer Science

Detail Sumber

arXiv Open Access 2024

Membrane Interactions in Alzheimer`s Treatment Strategies with Multitarget Molecules

Pablo Zambrano

Addressing Alzheimer's disease (AD) requires innovative strategies beyond current single-target drugs. This Letter to the Editor suggests that multitarget molecules, especially those targeting neuronal membrane protection, could offer a comprehensive approach to AD therapy, advocating for further research into their mechanisms and therapeutic potential.

en q-bio.BM, q-bio.NC

Detail DOI Sumber

arXiv Open Access 2023

Applications of Computer Vision in Analysis of the Clock-Drawing Test as a Metric of Cognitive Impairment

Luzhou Zhang

The Clock-Drawing test is a well known and widely used neuropsychological metric to assess basic cognitive function. My objective is to combine methods of machine learning in computer vision and image analysis to predict a subject's level of cognitive impairment.

en q-bio.NC, q-bio.QM

Detail Sumber

arXiv Open Access 2021

Graph Theory in Brain Networks

Moo K. Chung

Recent developments in graph theoretic analysis of complex networks have led to deeper understanding of brain networks. Many complex networks show similar macroscopic behaviors despite differences in the microscopic details. Probably two most often observed characteristics of complex networks are scale-free and small-world properties. In this paper, we will explore whether brain networks follow scale-free and small-worldness among other graph theory properties.

en q-bio.NC, q-bio.QM

Detail Sumber

arXiv Open Access 2021

Cross-time functional connectivity analysis

Ze Wang

A large body of literature has shown the substantial inter-regional functional connectivity in the mammal brain. One important property remaining un-studied is the cross-time interareal connection. This paper serves to provide a tool to characterize the cross-time functional connectivity. The method is extended from the temporal embedding based brain temporal coherence analysis. Both synthetic data and in-vivo data were used to evaluate the various properties of the cross-time functional connectivity matrix, which is also called the cross-regional temporal coherence matrix.

en q-bio.NC, q-bio.QM

Detail Sumber

arXiv Open Access 2020

How does neural activity encode spontaneous motor behavior in zebrafish larvae ?

Selma Mehyaoui

The origins of spontaneous movements have been investigated in human as well as in other vertebrates. Studies have reported an increase in neuronal activity one second before the onset of a given movement: this is known as readiness potential. The mechanisms underlying this increase are still unclear. Zebrafish larva is an ideal animal model to study the neuronal basis of spontaneous movements. Because of its small size and transparency, this vertebrate is an ideal candidate to apply optical recording methods. In order to understand what neuronal activity causes the execution of a specific tail movement at a given time, we will mainly use a prediction approach.

en q-bio.NC, q-bio.QM

Detail Sumber

S2 Open Access 1989

A High Statistics Measurement of the Proton Structure Functions F(2) (x, Q**2) and R from Deep Inelastic Muon Scattering at High Q**2

A. Benvenuti, D. Bollini, G. Bruni et al.

419 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2013

First Result from the Alpha Magnetic Spectrometer on the International Space Station : Precision Measurement of the Positron Fraction in Primary Cosmic Rays of 0 . 5 – 350 GeV

M. Aguilar, G. Alberti, B. Alpat et al.

173 sitasi en

Detail Sumber

S2 Open Access 2010

Pressure-induced superconductivity in topological parent compound Bi2Te3

J. Zhang, S. J. Zhang, H. Weng et al.

We report a successful observation of pressure-induced superconductivity in a topological compound Bi2Te3 with Tc of ∼3 K between 3 to 6 GPa. The combined high-pressure structure investigations with synchrotron radiation indicated that the superconductivity occurred at the ambient phase without crystal structure phase transition. The Hall effects measurements indicated the hole-type carrier in the pressure-induced superconducting Bi2Te3 single crystal. Consequently, the first-principles calculations based on the structural data obtained by the Rietveld refinement of X-ray diffraction patterns at high pressure showed that the electronic structure under pressure remained topologically nontrivial. The results suggested that topological superconductivity can be realized in Bi2Te3 due to the proximity effect between superconducting bulk states and Dirac-type surface states. We also discuss the possibility that the bulk state could be a topological superconductor.

269 sitasi en Materials Science, Medicine

Detail DOI Sumber

S2 Open Access 2014

Chronic Q Fever in the Netherlands 5 Years after the Start of the Q Fever Epidemic: Results from the Dutch Chronic Q Fever Database

L. Kampschreur, C. Delsing, R. Groenwold et al.

114 sitasi en Medicine

Detail DOI Sumber

arXiv Open Access 2017

Myelin and saltatory conduction

Maurizio De Pittà

Essential tutorial on myelin, oligodendrocytes and their functional relevance in the pathophysiology of the brain.

en q-bio.NC, q-bio.TO

Detail Sumber

arXiv Open Access 2017

Glia

Maurizio De Pittà

Essential introduction to glial cells with emphasis on astrocytes, microglia and their interplay in reactive astrogliosis.

en q-bio.NC, q-bio.TO

Detail Sumber

S2 Open Access 1992

Epidemiologic features and clinical presentation of acute Q fever in hospitalized patients: 323 French cases.

H. Dupont, D. Raoult, P. Brouqui et al.

336 sitasi en Medicine

Detail DOI Sumber

arXiv Open Access 2015

How a Generation Was Misled About Natural Selection

Liane Gabora

This article explains how natural selection works and how it has been inappropriately applied to the description of cultural change. It proposes an alternative evolutionary explanation for cultural evolution that describes it in terms of communal exchange.

en q-bio.PE, q-bio.NC

Detail Sumber

S2 Open Access 1971

Significance of the Diagnostic Q Wave of Myocardial Infarction

L. Horan, N. Flowers, Jennifer C. Johnson

280 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2002

Elliptic flow from two- and four-particle correlations in Au + Au collisions at sqrt{s_{NN}} = 130 GeV

C. Adler, Z. Ahammed, C. Allgower et al.

Elliptic flow holds much promise for studying the early-time thermalization attained in ultrarelativistic nuclear collisions. Flow measurements also provide a means of distinguishing between hydrodynamic models and calculations which approach the low density (dilute gas) limit. Among the effects that can complicate the interpretation of elliptic flow measurements are azimuthal correlations that are unrelated to the reaction plane (non-flow correlations). Using data for Au + Au collisions at sqrt{s_{NN}} = 130 GeV from the STAR TPC, it is found that four-particle correlation analyses can reliably separate flow and non-flow correlation signals. The latter account for on average about 15% of the observed second-harmonic azimuthal correlation, with the largest relative contribution for the most peripheral and the most central collisions. The results are also corrected for the effect of flow variations within centrality bins. This effect is negligible for all but the most central bin, where the correction to the elliptic flow is about a factor of two. A simple new method for two-particle flow analysis based on scalar products is described. An analysis based on the distribution of the magnitude of the flow vector is also described.

277 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2009

Australia's national Q fever vaccination program.

H. Gidding, Cate Wallace, G. Lawrence et al.

150 sitasi en Medicine

Detail DOI Sumber

S2 Open Access

Beam Energy Dependence of Moments of the Net-charge Multiplicity Distributions in Au + Au Collisions at Rhic Accessed Terms of Use Detailed Terms

L. Adamczyk, J. K. Adkins, G. Agakishiev et al.

192 sitasi en

Detail Sumber

Hasil untuk "q-bio.NC"