Hasil "q-bio.GN" - JURNALIN

S2 Open Access 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

DeepSeek-AI, Daya Guo, Dejian Yang et al.

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

5347 sitasi en Medicine, Computer Science

Detail DOI Sumber

S2 Open Access 2023

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland et al.

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are"Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.

2226 sitasi en Computer Science

Detail Sumber

S2 Open Access 2021

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, S. Levine

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state. This leverages the generalization capacity of the function approximator to estimate the value of the best available action at a given state without ever directly querying a Q-function with this unseen action. Our algorithm alternates between fitting this upper expectile value function and backing it up into a Q-function. Then, we extract the policy via advantage-weighted behavioral cloning. We dub our method implicit Q-learning (IQL). IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learning. We also demonstrate that IQL achieves strong performance fine-tuning using online interaction after offline initialization.

1344 sitasi en Computer Science

Detail Sumber

S2 Open Access 2020

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, Aurick Zhou, G. Tucker et al.

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static datasets without further interaction. However, in practice, offline RL presents a major challenge, and standard off-policy RL methods can fail due to overestimation of values induced by the distributional shift between the dataset and the learned policy, especially when training on complex and multi-modal data distributions. In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function such that the expected value of a policy under this Q-function lower-bounds its true value. We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees. In practice, CQL augments the standard Bellman error objective with a simple Q-value regularizer which is straightforward to implement on top of existing deep Q-learning and actor-critic implementations. On both discrete and continuous control domains, we show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return, especially when learning from complex and multi-modal data distributions.

2400 sitasi en Computer Science, Mathematics

Detail Sumber

S2 Open Access 2023

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Haoning Wu, Zicheng Zhang, Weixia Zhang et al.

The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligned with human opinions. Observing that human raters only learn and judge discrete text-defined levels in subjective studies, we propose to emulate this subjective process and teach LMMs with text-defined rating levels instead of scores. The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA), as well as video quality assessment (VQA) tasks under the original LMM structure. With the syllabus, we further unify the three tasks into one model, termed the OneAlign. In our experiments, we demonstrate the advantage of the discrete-level-based syllabus over direct-score-based variants for LMMs. Our code and the pre-trained weights are released at https://github.com/Q-Future/Q-Align.

466 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2024

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Rafael Rafailov, Joey Hejna, Ryan Park et al.

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch between the two approaches. Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm. In this work we rectify this difference. We theoretically show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation. Using our theoretical results, we provide three concrete empirical insights. First, we show that because of its token level interpretation, DPO is able to perform some type of credit assignment. Next, we prove that under the token level formulation, classical search-based algorithms, such as MCTS, which have recently been applied to the language generation space, are equivalent to likelihood-based search on a DPO policy. Empirically we show that a simple beam search yields meaningful improvement over the base DPO policy. Finally, we show how the choice of reference policy causes implicit rewards to decline during training. We conclude by discussing applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.

247 sitasi en Computer Science

Detail Sumber

S2 Open Access 2015

Precision Measurement of the Helium Flux in Primary Cosmic Rays of Rigidities 1.9 GV to 3 TV with the Alpha Magnetic Spectrometer on the International Space Station.

M. Aguilar, D. Aisa, B. Alpat et al.

Knowledge of the precise rigidity dependence of the helium flux is important in understanding the origin, acceleration, and propagation of cosmic rays. A precise measurement of the helium flux in primary cosmic rays with rigidity (momentum/charge) from 1.9 GV to 3 TV based on 50 million events is presented and compared to the proton flux. The detailed variation with rigidity of the helium flux spectral index is presented for the first time. The spectral index progressively hardens at rigidities larger than 100 GV. The rigidity dependence of the helium flux spectral index is similar to that of the proton spectral index though the magnitudes are different. Remarkably, the spectral index of the proton to helium flux ratio increases with rigidity up to 45 GV and then becomes constant; the flux ratio above 45 GV is well described by a single power law.

357 sitasi en Medicine, Physics

Detail DOI Sumber

S2 Open Access 2019

Measurement of the cosmic ray proton spectrum from 40 GeV to 100 TeV with the DAMPE satellite

Q. An, R. Asfandiyarov, P. Azzarello et al.

DAMPE satellite has directly measured the cosmic ray proton spectrum from 40 GeV to 100 TeV and revealed a new feature at about 13.6 TeV. The precise measurement of the spectrum of protons, the most abundant component of the cosmic radiation, is necessary to understand the source and acceleration of cosmic rays in the Milky Way. This work reports the measurement of the cosmic ray proton fluxes with kinetic energies from 40 GeV to 100 TeV, with 2 1/2 years of data recorded by the DArk Matter Particle Explorer (DAMPE). This is the first time that an experiment directly measures the cosmic ray protons up to ~100 TeV with high statistics. The measured spectrum confirms the spectral hardening at ~300 GeV found by previous experiments and reveals a softening at ~13.6 TeV, with the spectral index changing from ~2.60 to ~2.85. Our result suggests the existence of a new spectral feature of cosmic rays at energies lower than the so-called knee and sheds new light on the origin of Galactic cosmic rays.

223 sitasi en Physics, Medicine

Detail DOI Sumber

arXiv Open Access 2025

A Multi-Evidence Framework Rescues Low-Power Prognostic Signals and Rejects Statistical Artifacts in Cancer Genomics

Gokturk Aytug Akarlar

Motivation: Standard genome-wide association studies in cancer genomics rely on statistical significance with multiple testing correction, but systematically fail in underpowered cohorts. In TCGA breast cancer (n=967, 133 deaths), low event rates (13.8%) create severe power limitations, producing false negatives for known drivers and false positives for large passenger genes. Results: We developed a five-criteria computational framework integrating causal inference (inverse probability weighting, doubly robust estimation) with orthogonal biological validation (expression, mutation patterns, literature evidence). Applied to TCGA-BRCA mortality analysis, standard Cox+FDR detected zero genes at FDR<0.05, confirming complete failure in underpowered settings. Our framework correctly identified RYR2 -- a cardiac gene with no cancer function -- as a false positive despite nominal significance (p=0.024), while identifying KMT2C as a complex candidate requiring validation despite marginal significance (p=0.047, q=0.954). Power analysis revealed median power of 15.1% across genes, with KMT2C achieving only 29.8% power (HR=1.55), explaining borderline statistical significance despite strong biological evidence. The framework distinguished true signals from artifacts through mutation pattern analysis: RYR2 showed 29.8% silent mutations (passenger signature) with no hotspots, while KMT2C showed 6.7% silent mutations with 31.4% truncating variants (driver signature). This multi-evidence approach provides a template for analyzing underpowered cohorts, prioritizing biological interpretability over purely statistical significance. Availability: All code and analysis pipelines available at github.com/akarlaraytu/causal-inference-for-cancer-genomics

en q-bio.GN, cs.LG

Detail Sumber

S2 Open Access 2021

Observation of a Near-Threshold Structure in the K^{+} Recoil-Mass Spectra in e^{+}e^{-}→K^{+}(D_{s}^{-}D^{*0}+D_{s}^{*-}D^{0}).

B. C. M. Ablikim, M. Achasov, P. Adlarson et al.

We report a study of the processes of e^{+}e^{-}→K^{+}D_{s}^{-}D^{*0} and K^{+}D_{s}^{*-}D^{0} based on e^{+}e^{-} annihilation samples collected with the BESIII detector operating at BEPCII at five center-of-mass energies ranging from 4.628 to 4.698 GeV with a total integrated luminosity of 3.7 fb^{-1}. An excess of events over the known contributions of the conventional charmed mesons is observed near the D_{s}^{-}D^{*0} and D_{s}^{*-}D^{0} mass thresholds in the K^{+} recoil-mass spectrum for events collected at sqrt[s]=4.681 GeV. The structure matches a mass-dependent-width Breit-Wigner line shape, whose pole mass and width are determined as (3982.5_{-2.6}^{+1.8}±2.1) MeV/c^{2} and (12.8_{-4.4}^{+5.3}±3.0) MeV, respectively. The first uncertainties are statistical and the second are systematic. The significance of the resonance hypothesis is estimated to be 5.3 σ over the contributions only from the conventional charmed mesons. This is the first candidate for a charged hidden-charm tetraquark with strangeness, decaying into D_{s}^{-}D^{*0} and D_{s}^{*-}D^{0}. However, the properties of the excess need further exploration with more statistics.

106 sitasi en Medicine

Detail DOI Sumber

CrossRef Open Access 2022

Pan-Cancer Pyroptosis Analyses Identified Novel Immunology and Chemotherapy-Related Prognostic Signatures in Cancer Subtypes

Canrong Li, Cha Lin, Xiaoduo Xie

Despite mounting evidence linking pyroptotic cell death to tumor growth, the clinical significance and disease mechanism of pyroptosis in cancer remain uncertain. In this study, we established a unique gene signature (π signature) that can be used as a predictive and prognostic tool in pyroptosis-related cancer subtypes. We found that the 13 core pyroptosis genes exerted opposite prognostic effects in different cancer types, which were subgrouped as pyroptosis positively related cancer and pyroptosis negatively related cancer. Subsequently, π signature was identified separately from the hub genes in pyroptosis positively related cancer and pyroptosis negatively related cancer subtypes. It was shown that π signature was well correlated with patient survival, pathological stages, tumor lymphocyte infiltration, and immunotherapy response. π signature was also applied as a predictive tool for chemotherapy drug responses and used as an independent factor for patient overall survival prediction. In short, this elaborated genetic signature could help us understand the oncogenic mechanism and pave the way for further therapeutic strategies based on pyroptosis.

1 sitasi en

Detail DOI Sumber

arXiv Open Access 2020

Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data

Fahad Saeed, Muhammad Haseeb, SS Iyengar

Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is reached by the \emph{existing} parallel algorithms is $Ω(mn+2r\frac{q}{p})$, where $m$ and $n$ are the dimensions of the theoretical database matrix, $q$ and $r$ are dimensions of spectra, and $p$ is the number of processors. We further prove that communication-optimal strategy with fast-memory $\sqrt{M} = mn + \frac{2qr}{p}$ can achieve $Ω({\frac{2mnq}{p}})$ but is not achieved by any existing parallel proteomics algorithms till date. To validate our claim, we performed a meta-analysis of published parallel algorithms, and their performance results. We show that sub-optimal speedups with increasing number of processors is a direct consequence of not achieving the communication lower-bounds. We further validate our claim by performing experiments which demonstrate the communication bounds that are proved in this paper. Consequently, we assert that next-generation of \emph{provable}, and demonstrated superior parallel algorithms are urgently needed for MS based large systems-biology studies especially for meta-proteomics, proteogenomic, microbiome, and proteomics for non-model organisms. Our hope is that this paper will excite the parallel computing community to further investigate parallel algorithms for highly influential MS based omics problems.

en cs.DC, q-bio.GN

Detail Sumber

arXiv Open Access 2020

Computational Performance of a Germline Variant Calling Pipeline for Next Generation Sequencing

Jie Liu, Xiaotian Wu, Kai Zhang et al.

With the booming of next generation sequencing technology and its implementation in clinical practice and life science research, the need for faster and more efficient data analysis methods becomes pressing in the field of sequencing. Here we report on the evaluation of an optimized germline mutation calling pipeline, HummingBird, by assessing its performance against the widely accepted BWA-GATK pipeline. We found that the HummingBird pipeline can significantly reduce the running time of the primary data analysis for whole genome sequencing and whole exome sequencing while without significantly sacrificing the variant calling accuracy. Thus, we conclude that expansion of such software usage will help to improve the primary data analysis efficiency for next generation sequencing.

en q-bio.GN, cs.PF

Detail Sumber

S2 Open Access 1995

Optimization of passively Q-switched lasers

J. Degnan

437 sitasi en Physics

Detail DOI Sumber

arXiv Open Access 2019

Next Generation Radiogenomics Sequencing for Prediction of EGFR and KRAS Mutation Status in NSCLC Patients Using Multimodal Imaging and Machine Learning Approaches

Isaac Shiri, Hassan Maleki, Ghasem Hajianfar et al.

Aim: In the present work, we aimed to evaluate a comprehensive radiomics framework that enabled prediction of EGFR and KRAS mutation status in NSCLC cancer patients based on PET and CT multi-modalities radiomic features and machine learning (ML) algorithms. Methods: Our study involved 211 NSCLC cancer patient with PET and CTD images. More than twenty thousand radiomic features from different image-feature sets were extracted Feature value was normalized to obtain Z-scores, followed by student t-test students for comparison, high correlated features were eliminated and the False discovery rate (FDR) correction were performed Six feature selection methods and twelve classifiers were used to predict gene status in patient and model evaluation was reported on independent validation sets (68 patients). Results: The best predictive power of conventional PET parameters was achieved by SUVpeak (AUC: 0.69, P-value = 0.0002) and MTV (AUC: 0.55, P-value = 0.0011) for EGFR and KRAS, respectively. Univariate analysis of radiomics features improved prediction power up to AUC: 75 (q-value: 0.003, Short Run Emphasis feature of GLRLM from LOG preprocessed image of PET with sigma value 1.5) and AUC: 0.71 (q-value 0.00005, The Large Dependence Low Gray Level Emphasis from GLDM in LOG preprocessed image of CTD sigma value 5) for EGFR and KRAS, respectively. Furthermore, the machine learning algorithm improved the perdition power up to AUC: 0.82 for EGFR (LOG preprocessed of PET image set with sigma 3 with VT feature selector and SGD classifier) and AUC: 0.83 for KRAS (CT image set with sigma 3.5 with SM feature selector and SGD classifier). Conclusion: We demonstrated that radiomic features extracted from different image-feature sets could be used for EGFR and KRAS mutation status prediction in NSCLC patients, and showed that they have more predictive power than conventional imaging parameters.

en physics.med-ph, cs.LG

Detail DOI Sumber

arXiv Open Access 2019

High dimensional precision medicine from patient-derived xenografts

Naim U. Rashid, Daniel J. Luckett, Jingxiang Chen et al.

The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a clinical outcome in a population of interest. Patient-derived xenograft (PDX) studies permit the evaluation of multiple treatments within a single tumor and thus are ideally suited for estimating optimal ITRs. PDX data are characterized by correlated outcomes, a high-dimensional feature space, and a large number of treatments. Existing methods for estimating optimal ITRs do not take advantage of the unique structure of PDX data or handle the associated challenges well. In this paper, we explore machine learning methods for estimating optimal ITRs from PDX data. We analyze data from a large PDX study to identify biomarkers that are informative for developing personalized treatment recommendations in multiple cancers. We estimate optimal ITRs using regression-based approaches such as Q-learning and direct search methods such as outcome weighted learning. Finally, we implement a superlearner approach to combine a set of estimated ITRs and show that the resulting ITR performs better than any of the input ITRs, mitigating uncertainty regarding user choice of any particular ITR estimation methodology. Our results indicate that PDX data are a valuable resource for developing individualized treatment strategies in oncology.

en stat.ML, cs.LG

Detail Sumber

S2 Open Access 2010

Pressure-induced superconductivity in topological parent compound Bi2Te3

J. Zhang, S. J. Zhang, H. Weng et al.

We report a successful observation of pressure-induced superconductivity in a topological compound Bi2Te3 with Tc of ∼3 K between 3 to 6 GPa. The combined high-pressure structure investigations with synchrotron radiation indicated that the superconductivity occurred at the ambient phase without crystal structure phase transition. The Hall effects measurements indicated the hole-type carrier in the pressure-induced superconducting Bi2Te3 single crystal. Consequently, the first-principles calculations based on the structural data obtained by the Rietveld refinement of X-ray diffraction patterns at high pressure showed that the electronic structure under pressure remained topologically nontrivial. The results suggested that topological superconductivity can be realized in Bi2Te3 due to the proximity effect between superconducting bulk states and Dirac-type surface states. We also discuss the possibility that the bulk state could be a topological superconductor.

269 sitasi en Materials Science, Medicine

Detail DOI Sumber

S2 Open Access 2014

Chronic Q Fever in the Netherlands 5 Years after the Start of the Q Fever Epidemic: Results from the Dutch Chronic Q Fever Database

L. Kampschreur, C. Delsing, R. Groenwold et al.

114 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2014

High sensitivity and high Q-factor nanoslotted parallel quadrabeam photonic crystal cavity for real-time and label-free sensing

Daquan Yang, S. Kita, F. Liang et al.

108 sitasi en Materials Science

Detail DOI Sumber

S2 Open Access 1989

Theory of the optimally coupled Q-switched laser

John J. Degnan

368 sitasi en Physics

Detail DOI Sumber

Hasil untuk "q-bio.GN"