Hasil "q-bio.BM" - JURNALIN

S2 Open Access 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

DeepSeek-AI, Daya Guo, Dejian Yang et al.

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

5344 sitasi en Medicine, Computer Science

Detail DOI Sumber

S2 Open Access 2023

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland et al.

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are"Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.

2213 sitasi en Computer Science

Detail Sumber

S2 Open Access 2021

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, S. Levine

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state. This leverages the generalization capacity of the function approximator to estimate the value of the best available action at a given state without ever directly querying a Q-function with this unseen action. Our algorithm alternates between fitting this upper expectile value function and backing it up into a Q-function. Then, we extract the policy via advantage-weighted behavioral cloning. We dub our method implicit Q-learning (IQL). IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learning. We also demonstrate that IQL achieves strong performance fine-tuning using online interaction after offline initialization.

1341 sitasi en Computer Science

Detail Sumber

S2 Open Access 2017

Offloading in Mobile Edge Computing: Task Allocation and Computational Frequency Scaling

T. Dinh, Jianhua Tang, Q. La et al.

842 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2023

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Haoning Wu, Zicheng Zhang, Weixia Zhang et al.

The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligned with human opinions. Observing that human raters only learn and judge discrete text-defined levels in subjective studies, we propose to emulate this subjective process and teach LMMs with text-defined rating levels instead of scores. The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA), as well as video quality assessment (VQA) tasks under the original LMM structure. With the syllabus, we further unify the three tasks into one model, termed the OneAlign. In our experiments, we demonstrate the advantage of the discrete-level-based syllabus over direct-score-based variants for LMMs. Our code and the pre-trained weights are released at https://github.com/Q-Future/Q-Align.

465 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2003

Effect of Chemical Oxidation on the Structure of Single-Walled Carbon Nanotubes

Jin Zhang, Hongling Zou, Q. Qing et al.

1096 sitasi en Chemistry

Detail DOI Sumber

S2 Open Access 2019

Future Physics Programme of BESIII

M. Ablikim, M. Achasov, P. Adlarson et al.

There has recently been a dramatic renewal of interest in hadron spectroscopy and charm physics. This renaissance has been driven in part by the discovery of a plethora of charmonium-like XYZ states at BESIII and B factories, and the observation of an intriguing proton-antiproton threshold enhancement and the possibly related X(1835) meson state at BESIII, as well as the threshold measurements of charm mesons and charm baryons. We present a detailed survey of the important topics in tau-charm physics and hadron physics that can be further explored at BESIII during the remaining operation period of BEPCII. This survey will help in the optimization of the data-taking plan over the coming years, and provides physics motivation for the possible upgrade of BEPCII to higher luminosity.

474 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2017

Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of electrons and positrons

G. Ambrosi, Q. An, R. Asfandiyarov et al.

High-energy cosmic-ray electrons and positrons (CREs), which lose energy quickly during their propagation, provide a probe of Galactic high-energy processes and may enable the observation of phenomena such as dark-matter particle annihilation or decay. The CRE spectrum has been measured directly up to approximately 2 teraelectronvolts in previous balloon- or space-borne experiments, and indirectly up to approximately 5 teraelectronvolts using ground-based Cherenkov γ-ray telescope arrays. Evidence for a spectral break in the teraelectronvolt energy range has been provided by indirect measurements, although the results were qualified by sizeable systematic uncertainties. Here we report a direct measurement of CREs in the energy range 25 gigaelectronvolts to 4.6 teraelectronvolts by the Dark Matter Particle Explorer (DAMPE) with unprecedentedly high energy resolution and low background. The largest part of the spectrum can be well fitted by a ‘smoothly broken power-law’ model rather than a single power-law model. The direct detection of a spectral break at about 0.9 teraelectronvolts confirms the evidence found by previous indirect measurements, clarifies the behaviour of the CRE spectrum at energies above 1 teraelectronvolt and sheds light on the physical origin of the sub-teraelectronvolt CREs.

517 sitasi en Physics, Medicine

Detail DOI Sumber

S2 Open Access 2024

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Rafael Rafailov, Joey Hejna, Ryan Park et al.

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch between the two approaches. Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm. In this work we rectify this difference. We theoretically show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation. Using our theoretical results, we provide three concrete empirical insights. First, we show that because of its token level interpretation, DPO is able to perform some type of credit assignment. Next, we prove that under the token level formulation, classical search-based algorithms, such as MCTS, which have recently been applied to the language generation space, are equivalent to likelihood-based search on a DPO policy. Empirically we show that a simple beam search yields meaningful improvement over the base DPO policy. Finally, we show how the choice of reference policy causes implicit rewards to decline during training. We conclude by discussing applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.

246 sitasi en Computer Science

Detail Sumber

S2 Open Access 2019

Multiple-Attribute Decision-Making Based on Archimedean Bonferroni Operators of q-Rung Orthopair Fuzzy Numbers

Peide Liu, Peng Wang

The theory of $q$-rung orthopair fuzzy sets ($q$-ROFSs) proposed by Yager effectively describes fuzzy information in the real world. Because $q$-ROFSs contain the parameter $q$ and can adjust the range of expressed fuzzy information, they are superior to both intuitionistic and Pythagorean fuzzy sets. Archimedean T-norm and T-conorm (ATT) is an important tool used to generate operational rules based on the q-rung orthopair fuzzy numbers ($q$-ROFNs). In comparison, the Bonferroni mean (BM) operator has an advantage because it considers the interrelationships between the different attributes. Therefore, it is an important and meaningful innovation to extend the BM operator to the $q$-ROFNs based upon the ATT. In this paper, we first discuss $q$-rung orthopair fuzzy operational rules by using ATT. Furthermore, we extend BM operator to the $q$-ROFNs and propose the $q$-rung orthopair fuzzy Archimedean BM $(q\hbox{-}{ROFABM})$ operator and the q-rung orthopair fuzzy weighted Archimedean BM $(q\hbox{-}{ROFWABM})$ operator and study their desirable properties. Then, a new multiple-attribute decision-making (MADM) method is developed based on $q\hbox{-}{ROFWABM}$ operator. Finally, we use a practical example to verify effectiveness and superiority by comparing to other existing methods.

319 sitasi en Computer Science, Mathematics

Detail DOI Sumber

S2 Open Access 2018

Some q‐Rung Orthopai Fuzzy Bonferroni Mean Operators and Their Application to Multi‐Attribute Group Decision Making

Peide Liu, Junlin Liu

In the real multi‐attribute group decision making (MAGDM), there will be a mutual relationship between different attributes. As we all know, the Bonferroni mean (BM) operator has the advantage of considering interrelationships between parameters. In addition, in describing uncertain information, the eminent characteristic of q‐rung orthopair fuzzy sets (q‐ROFs) is that the sum of the qth power of the membership degree and the qth power of the degrees of non‐membership is equal to or less than 1, so the space of uncertain information they can describe is broader. In this paper, we combine the BM operator with q‐rung orthopair fuzzy numbers (q‐ROFNs) to propose the q‐rung orthopair fuzzy BM (q‐ROFBM) operator, the q‐rung orthopair fuzzy weighted BM (q‐ROFWBM) operator, the q‐rung orthopair fuzzy geometric BM (q‐ROFGBM) operator, and the q‐rung orthopair fuzzy weighted geometric BM (q‐ROFWGBM) operator, then the MAGDM methods are developed based on these operators. Finally, we use an example to illustrate the MAGDM process of the proposed methods. The proposed methods based on q‐ROFWBM and q‐ROFWGBM operators are very useful to deal with MAGDM problems.

330 sitasi en Mathematics, Computer Science

Detail DOI Sumber

CrossRef Open Access 2026

The impact of early termination of coal-fired power plant on achieving SDG#7 in Java Island-Indonesia

Suksmo Satriyo Pangarso, Jaka Aminata, Nugroho Sumarjiyanto BM

Indonesia has 9 indicators of achieving Sustainable Development Goals (SDG) #7 at the national level. This study proposes the addition indicators for SDG#7 achievement to assess the readiness of provinces in Java Island in terms of ensuring access to affordable, sustainable, clean, and modern energy from electricity which associated with the early termination program of coal-fired power plant (CFPP). The additional indicators are access to electricity, reliability of electricity, electricity security and energy poverty. The results of this study indicate that based on the above-mentioned criteria, the program will have significant impact on the criteria of access to electricity and energy poverty.

en

Detail DOI Sumber

CrossRef Open Access 2026

Mg-Hydroxyapatite Nanorods for Dual Intracellular Doxorubicin Delivery and Osteogenic-Associated BM-MSC Responses

Federico Pupilli, Giada Bassi, Marta Tavoni et al.

en

Detail DOI Sumber

S2 Open Access 2019

Measurement of the cosmic ray proton spectrum from 40 GeV to 100 TeV with the DAMPE satellite

Q. An, R. Asfandiyarov, P. Azzarello et al.

DAMPE satellite has directly measured the cosmic ray proton spectrum from 40 GeV to 100 TeV and revealed a new feature at about 13.6 TeV. The precise measurement of the spectrum of protons, the most abundant component of the cosmic radiation, is necessary to understand the source and acceleration of cosmic rays in the Milky Way. This work reports the measurement of the cosmic ray proton fluxes with kinetic energies from 40 GeV to 100 TeV, with 2 1/2 years of data recorded by the DArk Matter Particle Explorer (DAMPE). This is the first time that an experiment directly measures the cosmic ray protons up to ~100 TeV with high statistics. The measured spectrum confirms the spectral hardening at ~300 GeV found by previous experiments and reveals a softening at ~13.6 TeV, with the spectral index changing from ~2.60 to ~2.85. Our result suggests the existence of a new spectral feature of cosmic rays at energies lower than the so-called knee and sheds new light on the origin of Galactic cosmic rays.

222 sitasi en Physics, Medicine

Detail DOI Sumber

CrossRef Open Access 2025

Hydrogeochemical Characterization of Mineral Springs in Peruvian Tropical Highlands

Damaris Leiva-Tafur, Hardy Geoffrey Manco Perez, Jesús Rascón et al.

Water quality in natural mineral springs is essential for sustainable use and conservation in the Amazon region. This study presents a hydrogeochemical characterization of 21 springs in the Peruvian Tropical Highlands, expanding on previous records of only six sources. The springs, which are thermal, saline, and sulfurous, are located between 384 and 3147 m a.s.l., mainly in mountainous areas with structural slopes and permeable sedimentary formations, such as the Pulluicana Group (composed mainly of sandstones and shales) and the Sarayaquillo Formation (characterized by reddish sandstones and siltstones). Physicochemical analysis showed temperatures ranging from 15.1 to 38.2 °C, pH from 5.20 to 8.72, conductivity between 0.05 and 253 mS/cm, and total dissolved solids from 0.02 to 162.50 g/L. High levels of arsenic and aluminum, likely originating from the natural weathering of rocks rich in these elements, exceeded national limits. Microbiological analysis detected fecal coliforms and Escherichia coli, indicating potential health risks. The results highlight the importance of regular monitoring and proper management to ensure safe use and explore its therapeutic and biotechnological applications, such as microbial bioremediation or development of extremophile-based enzymes.

en

Detail DOI Sumber

S2 Open Access 2021

Observation of a Near-Threshold Structure in the K^{+} Recoil-Mass Spectra in e^{+}e^{-}→K^{+}(D_{s}^{-}D^{*0}+D_{s}^{*-}D^{0}).

B. C. M. Ablikim, M. Achasov, P. Adlarson et al.

We report a study of the processes of e^{+}e^{-}→K^{+}D_{s}^{-}D^{*0} and K^{+}D_{s}^{*-}D^{0} based on e^{+}e^{-} annihilation samples collected with the BESIII detector operating at BEPCII at five center-of-mass energies ranging from 4.628 to 4.698 GeV with a total integrated luminosity of 3.7 fb^{-1}. An excess of events over the known contributions of the conventional charmed mesons is observed near the D_{s}^{-}D^{*0} and D_{s}^{*-}D^{0} mass thresholds in the K^{+} recoil-mass spectrum for events collected at sqrt[s]=4.681 GeV. The structure matches a mass-dependent-width Breit-Wigner line shape, whose pole mass and width are determined as (3982.5_{-2.6}^{+1.8}±2.1) MeV/c^{2} and (12.8_{-4.4}^{+5.3}±3.0) MeV, respectively. The first uncertainties are statistical and the second are systematic. The significance of the resonance hypothesis is estimated to be 5.3 σ over the contributions only from the conventional charmed mesons. This is the first candidate for a charged hidden-charm tetraquark with strangeness, decaying into D_{s}^{-}D^{*0} and D_{s}^{*-}D^{0}. However, the properties of the excess need further exploration with more statistics.

106 sitasi en Medicine

Detail DOI Sumber

arXiv Open Access 2024

Insights into elastic properties of coarse-grained DNA models: q-stiffness of cgDNA vs. cgDNA+

Wout Laeremans, Midas Segers, Aderik Voorspoels et al.

Coarse-grained models have emerged as valuable tools to simulate long DNA molecules while maintaining computational efficiency. These models aim at preserving interactions among coarse-grained variables in a manner that mirrors the underlying atomistic description. We explore here a method for testing coarse-grained vs. all-atom models using stiffness matrices in Fourier space ($q$-stiffnesses), which are particularly suited to probe DNA elasticity at different length scales. We focus on a class of coarse-grained rigid base DNA models known as cgDNA and its most recent version cgDNA+. Our analysis shows that while cgDNA+ follows closely the $q$-stiffnesses of the all-atom model, the original cgDNA shows some deviations for twist and bending variables which are rather strong in the $q \to 0$ (long length scale) limit. The consequence is that while both cgDNA and cgDNA+ give a suitable description of local elastic behavior, the former misses some effects which manifest themselves at longer length scales. In particular, cgDNA performs poorly on the twist stiffness with a value much lower than expected for long DNA molecules. Conversely, the all-atom and cgDNA+ twist is strongly length scale dependent: DNA is torsionally soft at a few base pair distances, but becomes more rigid at distances of a few dozens base pairs. Our analysis shows that the bending persistence length in all-atom and cgDNA+ is somewhat overestimated.

en cond-mat.soft, cond-mat.stat-mech

Detail DOI Sumber

arXiv Open Access 2024

BetterBodies: Reinforcement Learning guided Diffusion for Antibody Sequence Design

Yannick Vogt, Mehdi Naouar, Maria Kalweit et al.

Antibodies offer great potential for the treatment of various diseases. However, the discovery of therapeutic antibodies through traditional wet lab methods is expensive and time-consuming. The use of generative models in designing antibodies therefore holds great promise, as it can reduce the time and resources required. Recently, the class of diffusion models has gained considerable traction for their ability to synthesize diverse and high-quality samples. In their basic form, however, they lack mechanisms to optimize for specific properties, such as binding affinity to an antigen. In contrast, the class of offline Reinforcement Learning (RL) methods has demonstrated strong performance in navigating large search spaces, including scenarios where frequent real-world interaction, such as interaction with a wet lab, is impractical. Our novel method, BetterBodies, which combines Variational Autoencoders (VAEs) with RL guided latent diffusion, is able to generate novel sets of antibody CDRH3 sequences from different data distributions. Using the Absolut! simulator, we demonstrate the improved affinity of our novel sequences to the SARS-CoV spike receptor-binding domain. Furthermore, we reflect biophysical properties in the VAE latent space using a contrastive loss and add a novel Q-function based filtering to enhance the affinity of generated sequences. In conclusion, methods such as ours have the potential to have great implications for real-world biological sequence design, where the generation of novel high-affinity binders is a cost-intensive endeavor.

en q-bio.BM, cs.LG

Detail Sumber

S2 Open Access 2011

Deterministic design of wavelength scale, ultra-high Q photonic crystal nanobeam cavities.

Q. Quan, M. Lončar

Photonic crystal nanobeam cavities are versatile platforms of interest for optical communications, optomechanics, optofluidics, cavity QED, etc. In a previous work [Appl. Phys. Lett. 96, 203102 (2010)], we proposed a deterministic method to achieve ultrahigh Q cavities. This follow-up work provides systematic analysis and verifications of the deterministic design recipe and further extends the discussion to air-mode cavities. We demonstrate designs of dielectric-mode and air-mode cavities with Q > 10⁹, as well as dielectric-mode nanobeam cavities with both ultrahigh-Q (> 10⁷) and ultrahigh on-resonance transmissions (T > 95%).

430 sitasi en Physics, Medicine

Detail DOI Sumber

CrossRef Open Access 2023

The application of Rigidoporus sp J12 and Stenotrophomonas maltophilia BM in the degradation of batik waste

Yohanes Subowo, Suliasih Suliasih, Sri Widawati

The batik industry in Indonesia produces batik waste which pollutes the environment. This waste can be degraded using laccase-producing microorganisms. The microorganisms used in the research were the fungus Rigidoporus sp J12 and the bacteria Stenotrophomonas maltophilia BM. This research aims to determine the ability of Rigidoporus sp J12 and Stenotrophomonas maltophilia BM and their consortium in producing laccase, observing their ability to degrade Poly R-478 which is an indicator of phenoloxidase activity and batik waste. Microorganisms are grown in growth media and then placed in media containing Poly R-478 or batik waste. Inducers are added to increase laccase activity. The inducers used were 15 g/L sucrose, 200 µM CuSO4 and 40 mM veratryl alcohol. The results showed that Rigidoporus sp J12 and Stenotrophomonas maltophilia BM produced laccase in PDB and NA media. The highest laccase activity was found in the enzyme produced by Rigidoporus sp J12 in PDB media at a temperature of 40°C, media pH 6.0 and the addition of sucrose. Rigidoporus sp J12 degraded batik waste by 39.38% and increased by 2.12 times after adding sucrose and incubation for 15 days. These bacteria and fungi can be used to degrade batik waste in order to prevent environmental pollution. Using the fungus Rigidoporus sp J12 purely is more profitable than using it with S. maltophilia BM bacteria.

en

Detail DOI Sumber

Hasil untuk "q-bio.BM"