Hasil "q-bio.CB" - JURNALIN

S2 Open Access 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

DeepSeek-AI, Daya Guo, Dejian Yang et al.

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

5344 sitasi en Medicine, Computer Science

Detail DOI Sumber

S2 Open Access 2023

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland et al.

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are"Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.

2213 sitasi en Computer Science

Detail Sumber

S2 Open Access 2021

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, S. Levine

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state. This leverages the generalization capacity of the function approximator to estimate the value of the best available action at a given state without ever directly querying a Q-function with this unseen action. Our algorithm alternates between fitting this upper expectile value function and backing it up into a Q-function. Then, we extract the policy via advantage-weighted behavioral cloning. We dub our method implicit Q-learning (IQL). IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learning. We also demonstrate that IQL achieves strong performance fine-tuning using online interaction after offline initialization.

1341 sitasi en Computer Science

Detail Sumber

S2 Open Access 2017

Offloading in Mobile Edge Computing: Task Allocation and Computational Frequency Scaling

T. Dinh, Jianhua Tang, Q. La et al.

842 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2003

Effect of Chemical Oxidation on the Structure of Single-Walled Carbon Nanotubes

Jin Zhang, Hongling Zou, Q. Qing et al.

1096 sitasi en Chemistry

Detail DOI Sumber

S2 Open Access 2021

Test of lepton universality in beauty-quark decays

L. C. R. Aaij, C. Beteta, T. Ackernley et al.

The standard model of particle physics currently provides our best description of fundamental particles and their interactions. The theory predicts that the different charged leptons, the electron, muon and tau, have identical electroweak interaction strengths. Previous measurements have shown that a wide range of particle decays are consistent with this principle of lepton universality. This article presents evidence for the breaking of lepton universality in beauty-quark decays, with a significance of 3.1 standard deviations, based on proton–proton collision data collected with the LHCb detector at CERN’s Large Hadron Collider. The measurements are of processes in which a beauty meson transforms into a strange meson with the emission of either an electron and a positron, or a muon and an antimuon. If confirmed by future measurements, this violation of lepton universality would imply physics beyond the standard model, such as a new fundamental interaction between quarks and leptons. The Large Hadron Collider beauty collaboration reports a test of lepton flavour universality in decays of bottom mesons into strange mesons and a charged lepton pair, finding evidence of a violation of this principle postulated in the standard model.

455 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2019

Future Physics Programme of BESIII

M. Ablikim, M. Achasov, P. Adlarson et al.

There has recently been a dramatic renewal of interest in hadron spectroscopy and charm physics. This renaissance has been driven in part by the discovery of a plethora of charmonium-like XYZ states at BESIII and B factories, and the observation of an intriguing proton-antiproton threshold enhancement and the possibly related X(1835) meson state at BESIII, as well as the threshold measurements of charm mesons and charm baryons. We present a detailed survey of the important topics in tau-charm physics and hadron physics that can be further explored at BESIII during the remaining operation period of BEPCII. This survey will help in the optimization of the data-taking plan over the coming years, and provides physics motivation for the possible upgrade of BEPCII to higher luminosity.

474 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2017

Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of electrons and positrons

G. Ambrosi, Q. An, R. Asfandiyarov et al.

High-energy cosmic-ray electrons and positrons (CREs), which lose energy quickly during their propagation, provide a probe of Galactic high-energy processes and may enable the observation of phenomena such as dark-matter particle annihilation or decay. The CRE spectrum has been measured directly up to approximately 2 teraelectronvolts in previous balloon- or space-borne experiments, and indirectly up to approximately 5 teraelectronvolts using ground-based Cherenkov γ-ray telescope arrays. Evidence for a spectral break in the teraelectronvolt energy range has been provided by indirect measurements, although the results were qualified by sizeable systematic uncertainties. Here we report a direct measurement of CREs in the energy range 25 gigaelectronvolts to 4.6 teraelectronvolts by the Dark Matter Particle Explorer (DAMPE) with unprecedentedly high energy resolution and low background. The largest part of the spectrum can be well fitted by a ‘smoothly broken power-law’ model rather than a single power-law model. The direct detection of a spectral break at about 0.9 teraelectronvolts confirms the evidence found by previous indirect measurements, clarifies the behaviour of the CRE spectrum at energies above 1 teraelectronvolt and sheds light on the physical origin of the sub-teraelectronvolt CREs.

517 sitasi en Physics, Medicine

Detail DOI Sumber

S2 Open Access 2021

Observation of an exotic narrow doubly charmed tetraquark

L. C. R. Aaij, A. Abdelmotteleb, C. Beteta et al.

Conventional, hadronic matter consists of baryons and mesons made of three quarks and a quark–antiquark pair, respectively1,2. Here, we report the observation of a hadronic state containing four quarks in the Large Hadron Collider beauty experiment. This so-called tetraquark contains two charm quarks, a u¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{{{{{u}}}}}$$\end{document} and a d¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{{{{{d}}}}}$$\end{document} quark. This exotic state has a mass of approximately 3,875 MeV and manifests as a narrow peak in the mass spectrum of D0D0π+ mesons just below the D*+D0 mass threshold. The near-threshold mass together with the narrow width reveals the resonance nature of the state. The LHCb Collaboration reports the observation of an exotic, narrow, tetraquark state that contains two charm quarks, an up antiquark and a down antiquark.

381 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2020

Observation of structure in the J/ψ-pair mass spectrum.

L. C. R. Aaij, C. Beteta, T. Ackernley et al.

Using proton-proton collision data at centre-of-mass energies of s=7,8 and 13TeV recorded by the LHCb experiment at the Large Hadron Collider, corresponding to an integrated luminosity of 9fb-1, the invariant mass spectrum of J/ψ pairs is studied. A narrow structure around 6.9GeV/c2 matching the lineshape of a resonance and a broad structure just above twice the J/ψ mass are observed. The deviation of the data from nonresonant J/ψ-pair production is above five standard deviations in the mass region between 6.2 and 7.4GeV/c2, covering predicted masses of states composed of four charm quarks. The mass and natural width of the narrow X(6900) structure are measured assuming a Breit-Wigner lineshape.

401 sitasi en Physics, Medicine

Detail DOI Sumber

S2 Open Access 2024

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Rafael Rafailov, Joey Hejna, Ryan Park et al.

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch between the two approaches. Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm. In this work we rectify this difference. We theoretically show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation. Using our theoretical results, we provide three concrete empirical insights. First, we show that because of its token level interpretation, DPO is able to perform some type of credit assignment. Next, we prove that under the token level formulation, classical search-based algorithms, such as MCTS, which have recently been applied to the language generation space, are equivalent to likelihood-based search on a DPO policy. Empirically we show that a simple beam search yields meaningful improvement over the base DPO policy. Finally, we show how the choice of reference policy causes implicit rewards to decline during training. We conclude by discussing applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.

246 sitasi en Computer Science

Detail Sumber

S2 Open Access 2014

Observation of the rare Bs0 →µ+µ− decay from the combined analysis of CMS and LHCb data

The Cms, LHCb Collaborations V. Khachatryan, A. Sirunyan et al.

The standard model of particle physics describes the fundamental particles and their interactions via the strong, electromagnetic and weak forces. It provides precise predictions for measurable quantities that can be tested experimentally. The probabilities, or branching fractions, of the strange B meson () and the B0 meson decaying into two oppositely charged muons (μ+ and μ−) are especially interesting because of their sensitivity to theories that extend the standard model. The standard model predicts that the and decays are very rare, with about four of the former occurring for every billion mesons produced, and one of the latter occurring for every ten billion B0 mesons. A difference in the observed branching fractions with respect to the predictions of the standard model would provide a direction in which the standard model should be extended. Before the Large Hadron Collider (LHC) at CERN started operating, no evidence for either decay mode had been found. Upper limits on the branching fractions were an order of magnitude above the standard model predictions. The CMS (Compact Muon Solenoid) and LHCb (Large Hadron Collider beauty) collaborations have performed a joint analysis of the data from proton–proton collisions that they collected in 2011 at a centre-of-mass energy of seven teraelectronvolts and in 2012 at eight teraelectronvolts. Here we report the first observation of the µ+µ− decay, with a statistical significance exceeding six standard deviations, and the best measurement so far of its branching fraction. Furthermore, we obtained evidence for the µ+µ− decay with a statistical significance of three standard deviations. Both measurements are statistically compatible with standard model predictions and allow stringent constraints to be placed on theories beyond the standard model. The LHC experiments will resume taking data in 2015, recording proton–proton collisions at a centre-of-mass energy of 13 teraelectronvolts, which will approximately double the production rates of and B0 mesons and lead to further improvements in the precision of these crucial tests of the standard model.

499 sitasi en Physics, Medicine

Detail DOI Sumber

S2 Open Access 1995

Cannabinoids activate an inwardly rectifying potassium conductance and inhibit Q-type calcium currents in AtT20 cells transfected with rat brain cannabinoid receptor

K. Mackie, Y. Lai, R. Westenbroek et al.

Rat brain cannabinoid receptor (CB-1) was stably transfected into the murine tumor line AtT-20 to study its coupling to inwardly rectifying potassium currents (Kir) and high voltage-activated calcium currents (ICa). In cells expressing CB-1 (“A-2” cells), cannabinoid agonist potently and stereospecifically activated Kir via a pertussis toxin- sensitive G protein. ICa in A-2 cells was sensitive to dihydropyridines and omega CTX MVIIC, less so to omega CgTX GVIA and insensitive to omega Aga IVa. In CB-1 expressing cells, cannabinoid agonist inhibited only the omega CTX MVIIC-sensitive component of ICa. Inhibition of Q- type ICa was voltage dependent and PTX sensitive, thus similar in character to the well-studied modulation of N-type ICa. An endogenous cannabinoid, anandamide, activated Kir and inhibited ICa as efficaciously as potent cannabinoid agonist. Immunocytochemical studies with antibodies specific for class A, B, C, D, and E voltage-dependent calcium channel alpha 1 subunits revealed that AtT-20 cells express each of these major classes of alpha 1 subunit.

645 sitasi en Chemistry, Medicine

Detail DOI Sumber

CrossRef Open Access 2025

Video1-FA(CB) - Video 1 FA(CB)

en

Detail DOI Sumber

CrossRef Open Access 2025

Video3-MA(CB) - Video 3 MA(CB)

en

Detail DOI Sumber

CrossRef Open Access 2025

Bio‐Based Oils as an Alternative Rubber Processing Oil in CB/Silica‐Filled NR/BR/SSBR Compounds

Nur Raihan Mohamed, Nadras Othman, Raa Khimi Shuib

ABSTRACT Bio‐based oil is emerging as a promising alternative to replace rubber processing oil from petroleum as a plasticizer in elastomers. Petroleum‐based oil is considered environmentally harmful and is derived from nonrenewable resources. This study explores replacing petroleum‐based rubber processing oils with bio‐based oils in elastomers. The research focuses on ternary rubber blends (NR/BR/SSBR) mixed with petroleum‐based oil (RPO), epoxidized palm oil (EPO), coconut oil (CO), soybean oil (SBO), and sunflower oil (SFO), using conventional rubber processing methods. A dual filler system of carbon black and silica was employed to reinforce the rubber. The result shows that among all bio‐based oils, EPO improves both tensile strength (0.7% increase) and elongation at break (7% increase) compared to RPO, along with better wear resistance and skid resistance but prolonged the optimum cure time. In the meantime, CO offers good elongation, wear resistance, and dynamic performance similar to RPO, with a slight decrease in tensile strength. Meanwhile, SBO has the lowest rolling resistance among the bio‐based oils based on Tan δ at 60°C, but it also decreases tensile strength and wear resistance while greatly increasing flexibility (11.2% increase in elongation). Among the bio‐based oils, however, SFO offers a balance with moderate improvements in flexibility and dynamic properties but lower wear resistance.

en

Detail DOI Sumber

arXiv Open Access 2025

Stochastic Mutation as a Mechanism for the Emergence of SARS-CoV-2 New Variants

Liaofu Luo, Jun Lv

Predicting the future evolutionary trajectory of SARS-CoV-2 remains a critical challenge, particularly due to the pivotal role of spike protein mutations. It is therefore essential to develop evolutionary models capable of continuously integrating new experimental data. In this study, we employ a cladogram algorithm that incorporates established assumptions for mutant representation -- using both four-letter and two-letter formats -- along with an n-mer distance algorithm to construct a cladogenetic tree of SARS-CoV-2 mutations. This tree accurately captures the observed changes across macro-lineages. We introduce a stochastic method for generating new strains on this tree based on spike protein mutations. For a given set A of existing mutation sites, we define a set X comprising x randomly generated mutation sites on the spike protein. The intersection of A and X, denoted as set Y, contains y sites. Our analysis indicates that the position of a generated strain on the tree is primarily determined by x. Through large-scale stochastic sampling, we predict the emergence of new macro-lineages. As x increases, the dominance among macro-lineages shifts: lineage O surpasses N, P surpasses O, and eventually Q surpasses P. We identify threshold values of x that delineate transitions between these macro-lineages. Furthermore, we propose an algorithm for predicting the timeline of macro-lineage emergence. In conclusion, our findings demonstrate that SARS-CoV-2 evolution adheres to statistical principles: the emergence of new strains can be driven by randomly generated spike protein sites, and large-scale stochastic sampling reveals evolutionary patterns underlying the rise of distinct macro-lineages.

en q-bio.QM, q-bio.CB

Detail Sumber

CrossRef Open Access 2024

The separation between <scp>mRNA</scp> ‐ends is more variable than expected

Nancy Gerling, J. Alfredo Mendez, Eduardo Gomez et al.

Effective circularization of mRNA molecules is a key step for the efficient initiation of translation. Research has shown that the intrinsic separation of the ends of mRNA molecules is rather small, suggesting that intramolecular arrangements could provide this effective circularization. Considering that the innate proximity of RNA ends might have important unknown biological implications, we aimed to determine whether the close proximity of the ends of mRNA molecules is a conserved feature across organisms and gain further insights into the functional effects of the proximity of RNA ends. To do so, we studied the secondary structure of 274 full native mRNA molecules from 17 different organisms to calculate the contour length ( C L ) of the external loop as an index of their end‐to‐end separation. Our computational predictions show bigger variations (from 0.59 to 31.8 nm) than previously reported and also than those observed in random sequences. Our results suggest that separations larger than 18.5 nm are not favored, whereas short separations could be related to phenotypical stability. Overall, our work implies the existence of a biological mechanism responsible for the increase in the observed variability, suggesting that the C L features of the exterior loop could be relevant for the initiation of translation and that a short C L could contribute to the stability of phenotypes.

1 sitasi en

Detail DOI Sumber

S2 Open Access 2011

Deterministic design of wavelength scale, ultra-high Q photonic crystal nanobeam cavities.

Q. Quan, M. Lončar

Photonic crystal nanobeam cavities are versatile platforms of interest for optical communications, optomechanics, optofluidics, cavity QED, etc. In a previous work [Appl. Phys. Lett. 96, 203102 (2010)], we proposed a deterministic method to achieve ultrahigh Q cavities. This follow-up work provides systematic analysis and verifications of the deterministic design recipe and further extends the discussion to air-mode cavities. We demonstrate designs of dielectric-mode and air-mode cavities with Q > 10⁹, as well as dielectric-mode nanobeam cavities with both ultrahigh-Q (> 10⁷) and ultrahigh on-resonance transmissions (T > 95%).

430 sitasi en Physics, Medicine

Detail DOI Sumber

CrossRef Open Access 2023

Osmotic Stress Responses, Cell Wall Integrity, and Conidiation Are Regulated by a Histidine Kinase Sensor in Trichoderma atroviride

Gabriela Calcáneo-Hernández, Fidel Landeros-Jaime, José Antonio Cervantes-Chávez et al.

Trichoderma atroviride responds to various environmental stressors through the mitogen-activated protein kinase (MAPK) Tmk3 and MAPK-kinase Pbs2 signaling pathways. In fungi, orthologues to Tmk3 are regulated by a histidine kinase (HK) sensor. However, the role of T. atroviride HKs remains unknown. In this regard, the function of the T. atroviride HK Nik1 was analyzed in response to stressors regulated by Tmk3. The growth of the Δnik1 mutant strains was compromised under hyperosmotic stress; mycelia were less resistant to lysing enzymes than the WT strain, while conidia of Δnik1 were more sensitive to Congo red; however, ∆pbs2 and ∆tmk3 strains showed a more drastic defect in cell wall stability. Light-regulated blu1 and grg2 gene expression was induced upon an osmotic shock through Pbs2-Tmk3 but was independent of Nik1. The encoding chitin synthases chs1 and chs2 genes were downregulated after an osmotic shock in the WT, but chs1 and chs3 expression were enhanced in ∆nik1, ∆pbs2, and ∆tmk3. The vegetative growth and conidiation by light decreased in ∆nik1, although Nik1 was unrequired to activate the light-responsive genes by Tmk3. Altogether, Nik1 regulates responses related to the Pbs2-Tmk3 pathway and suggests the participation of additional HKs to respond to stress.

9 sitasi en

Detail DOI Sumber

Hasil untuk "q-bio.CB"