Hasil untuk "q-fin.CP"

Menampilkan 20 dari ~1506377 hasil · dari Semantic Scholar, CrossRef

JSON API
S2 Open Access 2025
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

DeepSeek-AI, Daya Guo, Dejian Yang et al.

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

5349 sitasi en Medicine, Computer Science
S2 Open Access 2021
Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package

E. Epifanovsky, A. Gilbert, Xintian Feng et al.

This article summarizes technical advances contained in the fifth major release of the Q-Chem quantum chemistry program package, covering developments since 2015. A comprehensive library of exchange–correlation functionals, along with a suite of correlated many-body methods, continues to be a hallmark of the Q-Chem software. The many-body methods include novel variants of both coupled-cluster and configuration-interaction approaches along with methods based on the algebraic diagrammatic construction and variational reduced density-matrix methods. Methods highlighted in Q-Chem 5 include a suite of tools for modeling core-level spectroscopy, methods for describing metastable resonances, methods for computing vibronic spectra, the nuclear–electronic orbital method, and several different energy decomposition analysis techniques. High-performance capabilities including multithreaded parallelism and support for calculations on graphics processing units are described. Q-Chem boasts a community of well over 100 active academic developers, and the continuing evolution of the software is supported by an “open teamware” model and an increasingly modular design.

984 sitasi en Medicine
S2 Open Access 2018
The Belle II Physics Book

E. Kou, P. Urquijo, W. Altmannshofer et al.

We present the physics program of the Belle II experiment, located on the intensity frontier SuperKEKB e+e- collider. Belle II collected its first collisions in 2018, and is expected to operate for the next decade. It is anticipated to collect 50/ab of collision data over its lifetime. This book is the outcome of a joint effort of Belle II collaborators and theorists through the Belle II theory interface platform (B2TiP), an effort that commenced in 2014. The aim of B2TiP was to elucidate the potential impacts of the Belle II program, which includes a wide scope of physics topics: B physics, charm, tau, quarkonium, electroweak precision measurements and dark sector searches. It is composed of nine working groups (WGs), which are coordinated by teams of theorist and experimentalists conveners: Semileptonic and leptonic B decays, Radiative and Electroweak penguins, phi_1 and phi_2 (time-dependent CP violation) measurements, phi_3 measurements, Charmless hadronic B decay, Charm, Quarkonium(like), tau and low-multiplicity processes, new physics and global fit analyses. This book highlights "golden- and silver-channels", i.e. those that would have the highest potential impact in the field. Theorists scrutinised the role of those measurements and estimated the respective theoretical uncertainties, achievable now as well as prospects for the future. Experimentalists investigated the expected improvements with the large dataset expected from Belle II, taking into account improved performance from the upgraded detector.

1020 sitasi en Physics
S2 Open Access 2014
Ultrasensitive terahertz sensing with high-Q Fano resonances in metasurfaces

Ranjan Singh, W. Cao, I. Al-Naib et al.

High quality factor resonances are extremely promising for designing ultra-sensitive refractive index label-free sensors, since it allows intense interaction between electromagnetic waves and the analyte material. Metamaterial and plasmonic sensing have recently attracted a lot of attention due to subwavelength confinement of electromagnetic fields in the resonant structures. However, the excitation of high quality factor resonances in these systems has been a challenge. We excite an order of magnitude higher quality factor resonances in planar terahertz metamaterials that we exploit for ultrasensitive sensing. The low-loss quadrupole and Fano resonances with extremely narrow linewidths enable us to measure the minute spectral shift caused due to the smallest change in the refractive index of the surrounding media. We achieve sensitivity levels of 7.75 × 103 nm/refractive index unit (RIU) with quadrupole and 5.7 × 104 nm/RIU with the Fano resonances which could be further enhanced by using thinner substrates. These findings would facilitate the design of ultrasensitive real time chemical and biomolecular sensors in the fingerprint region of the terahertz regime.

610 sitasi en Physics
S2 Open Access 2024
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Rafael Rafailov, Joey Hejna, Ryan Park et al.

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch between the two approaches. Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm. In this work we rectify this difference. We theoretically show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation. Using our theoretical results, we provide three concrete empirical insights. First, we show that because of its token level interpretation, DPO is able to perform some type of credit assignment. Next, we prove that under the token level formulation, classical search-based algorithms, such as MCTS, which have recently been applied to the language generation space, are equivalent to likelihood-based search on a DPO policy. Empirically we show that a simple beam search yields meaningful improvement over the base DPO policy. Finally, we show how the choice of reference policy causes implicit rewards to decline during training. We conclude by discussing applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.

247 sitasi en Computer Science
S2 Open Access 2023
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Yevgen Chebotar, Q. Vuong, A. Irpan et al.

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://qtransformer.github.io

145 sitasi en Computer Science
S2 Open Access 2011
Deterministic design of wavelength scale, ultra-high Q photonic crystal nanobeam cavities.

Q. Quan, M. Lončar

Photonic crystal nanobeam cavities are versatile platforms of interest for optical communications, optomechanics, optofluidics, cavity QED, etc. In a previous work [Appl. Phys. Lett. 96, 203102 (2010)], we proposed a deterministic method to achieve ultrahigh Q cavities. This follow-up work provides systematic analysis and verifications of the deterministic design recipe and further extends the discussion to air-mode cavities. We demonstrate designs of dielectric-mode and air-mode cavities with Q > 10⁹, as well as dielectric-mode nanobeam cavities with both ultrahigh-Q (> 10⁷) and ultrahigh on-resonance transmissions (T > 95%).

430 sitasi en Physics, Medicine
S2 Open Access 2018
Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids

P. Kofinas, A. Dounis, G. Vouros

Abstract This study proposes a cooperative multi-agent system for managing the energy of a stand-alone microgrid. The multi-agent system learns to control the components of the microgrid so as this to achieve its purposes and operate effectively, by means of a distributed, collaborative reinforcement learning method in continuous actions-states space. Stand-alone microgrids present challenges regarding guaranteeing electricity supply and increasing the reliability of the system under the uncertainties introduced by the renewable power sources and the stochastic demand of the consumers. In this article we consider a microgrid that consists of power production, power consumption and power storage units: the power production group includes a Photovoltaic source, a fuel cell and a diesel generator; the power consumption group includes an electrolyzer unit, a desalination plant and a variable electrical load that represent the power consumption of a building; the power storage group includes only the Battery bank. We conjecture that a distributed multi-agent system presents specific advantages to control the microgrid components which operate in a continuous states and actions space: For this purpose we propose the use of fuzzy Q-Learning methods for agents representing microgrid components to act as independent learners, while sharing state variables to coordinate their behavior. Experimental results highlight both the effectiveness of individual agents to control system components, as well as the effectiveness of the multi-agent system to guarantee electricity supply and increase the reliability of the microgrid.

192 sitasi en Computer Science
CrossRef Open Access 2023
SpikoPoniC: A Low-Cost Spiking Neuromorphic Computer for Smart Aquaponics

Ali Siddique, Jingqi Sun, Kung Jui Hou et al.

Aquaponics is an emerging area of agricultural sciences that combines aquaculture and hydroponics in a symbiotic way to enhance crop production. A stable smart aquaponic system requires estimating the fish size in real time. Though deep learning has shown promise in the context of smart aquaponics, most smart systems are extremely slow and costly and cannot be deployed on a large scale. Therefore, we design and present a novel neuromorphic computer that uses spiking neural networks (SNNs) for estimating not only the length but also the weight of the fish. To train the SNN, we present a novel hybrid scheme in which some of the neural layers are trained using direct SNN backpropagation, while others are trained using standard backpropagation. By doing this, a blend of high hardware efficiency and accuracy can be achieved. The proposed computer SpikoPoniC can classify more than 84 million fish samples in a second, achieving a speedup of at least 3369× over traditional general-purpose computers. The SpikoPoniC consumes less than 1100 slice registers on Virtex 6 and is much cheaper than most SNN-based hardware systems. To the best of our knowledge, this is the first SNN-based neuromorphic system that performs smart real-time aquaponic monitoring.

S2 Open Access 2003
Transverse-momentum and collision-energy dependence of high-pT hadron suppression in Au+Au collisions at ultrarelativistic energies.

J. Adams, C. Adler, M. Aggarwal et al.

We report high statistics measurements of inclusive charged hadron production in Au+Au and p+p collisions at sqrt[s(NN)]=200 GeV. A large, approximately constant hadron suppression is observed in central Au+Au collisions for 5<p(T)<12 GeV/c. The collision energy dependence of the yields and the centrality and p(T) dependence of the suppression provide stringent constraints on theoretical models of suppression. Models incorporating initial-state gluon saturation or partonic energy loss in dense matter are largely consistent with observations. We observe no evidence of p(T)-dependent suppression, which may be expected from models incorporating jet attenuation in cold nuclear matter or scattering of fragmentation hadrons.

562 sitasi en Physics, Medicine
S2 Open Access 2010
Photonic crystal nanobeam cavity strongly coupled to the feeding waveguide

Q. Quan, P. Deotare, M. Lončar

A deterministic design of an ultrahigh Q-factor, wavelength-scale photonic crystal nanobeam cavity is proposed and experimentally demonstrated. Using this approach, cavities with Q>106 and on-resonance transmission T>90% are designed. The devices, fabricated in silicon and capped with a low refractive index polymer, have experimental Q=80 000 and T=73%. This is, to the best of our knowledge, the highest transmission measured in deterministically designed, wavelength-scale high-Q cavities.

378 sitasi en Physics, Materials Science
S2 Open Access 2010
Pressure-induced superconductivity in topological parent compound Bi2Te3

J. Zhang, S. J. Zhang, H. Weng et al.

We report a successful observation of pressure-induced superconductivity in a topological compound Bi2Te3 with Tc of ∼3 K between 3 to 6 GPa. The combined high-pressure structure investigations with synchrotron radiation indicated that the superconductivity occurred at the ambient phase without crystal structure phase transition. The Hall effects measurements indicated the hole-type carrier in the pressure-induced superconducting Bi2Te3 single crystal. Consequently, the first-principles calculations based on the structural data obtained by the Rietveld refinement of X-ray diffraction patterns at high pressure showed that the electronic structure under pressure remained topologically nontrivial. The results suggested that topological superconductivity can be realized in Bi2Te3 due to the proximity effect between superconducting bulk states and Dirac-type surface states. We also discuss the possibility that the bulk state could be a topological superconductor.

269 sitasi en Materials Science, Medicine
S2 Open Access 2002
Elliptic flow from two- and four-particle correlations in Au + Au collisions at sqrt{s_{NN}} = 130 GeV

C. Adler, Z. Ahammed, C. Allgower et al.

Elliptic flow holds much promise for studying the early-time thermalization attained in ultrarelativistic nuclear collisions. Flow measurements also provide a means of distinguishing between hydrodynamic models and calculations which approach the low density (dilute gas) limit. Among the effects that can complicate the interpretation of elliptic flow measurements are azimuthal correlations that are unrelated to the reaction plane (non-flow correlations). Using data for Au + Au collisions at sqrt{s_{NN}} = 130 GeV from the STAR TPC, it is found that four-particle correlation analyses can reliably separate flow and non-flow correlation signals. The latter account for on average about 15% of the observed second-harmonic azimuthal correlation, with the largest relative contribution for the most peripheral and the most central collisions. The results are also corrected for the effect of flow variations within centrality bins. This effect is negligible for all but the most central bin, where the correction to the elliptic flow is about a factor of two. A simple new method for two-particle flow analysis based on scalar products is described. An analysis based on the distribution of the magnitude of the flow vector is also described.

277 sitasi en Physics

Halaman 1 dari 75319