Hasil "q-fin.MF" - JURNALIN

S2 Open Access 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

DeepSeek-AI, Daya Guo, Dejian Yang et al.

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

5349 sitasi en Medicine, Computer Science

Detail DOI Sumber

S2 Open Access 2023

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland et al.

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are"Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.

2246 sitasi en Computer Science

Detail Sumber

S2 Open Access 2021

Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package

E. Epifanovsky, A. Gilbert, Xintian Feng et al.

This article summarizes technical advances contained in the fifth major release of the Q-Chem quantum chemistry program package, covering developments since 2015. A comprehensive library of exchange–correlation functionals, along with a suite of correlated many-body methods, continues to be a hallmark of the Q-Chem software. The many-body methods include novel variants of both coupled-cluster and configuration-interaction approaches along with methods based on the algebraic diagrammatic construction and variational reduced density-matrix methods. Methods highlighted in Q-Chem 5 include a suite of tools for modeling core-level spectroscopy, methods for describing metastable resonances, methods for computing vibronic spectra, the nuclear–electronic orbital method, and several different energy decomposition analysis techniques. High-performance capabilities including multithreaded parallelism and support for calculations on graphics processing units are described. Q-Chem boasts a community of well over 100 active academic developers, and the continuing evolution of the software is supported by an “open teamware” model and an increasingly modular design.

984 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2017

Offloading in Mobile Edge Computing: Task Allocation and Computational Frequency Scaling

T. Dinh, Jianhua Tang, Q. La et al.

842 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2007

Ubiquity and dominance of oxygenated species in organic aerosols in anthropogenically‐influenced Northern Hemisphere midlatitudes

Q. Zhang, J. Jimenez, M. Canagaratna et al.

2031 sitasi en Biology, Geology

Detail DOI Sumber

S2 Open Access 2014

Ultrasensitive terahertz sensing with high-Q Fano resonances in metasurfaces

Ranjan Singh, W. Cao, I. Al-Naib et al.

High quality factor resonances are extremely promising for designing ultra-sensitive refractive index label-free sensors, since it allows intense interaction between electromagnetic waves and the analyte material. Metamaterial and plasmonic sensing have recently attracted a lot of attention due to subwavelength confinement of electromagnetic fields in the resonant structures. However, the excitation of high quality factor resonances in these systems has been a challenge. We excite an order of magnitude higher quality factor resonances in planar terahertz metamaterials that we exploit for ultrasensitive sensing. The low-loss quadrupole and Fano resonances with extremely narrow linewidths enable us to measure the minute spectral shift caused due to the smallest change in the refractive index of the surrounding media. We achieve sensitivity levels of 7.75 × 103 nm/refractive index unit (RIU) with quadrupole and 5.7 × 104 nm/RIU with the Fano resonances which could be further enhanced by using thinner substrates. These findings would facilitate the design of ultrasensitive real time chemical and biomolecular sensors in the fingerprint region of the terahertz regime.

610 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2024

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Rafael Rafailov, Joey Hejna, Ryan Park et al.

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch between the two approaches. Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm. In this work we rectify this difference. We theoretically show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation. Using our theoretical results, we provide three concrete empirical insights. First, we show that because of its token level interpretation, DPO is able to perform some type of credit assignment. Next, we prove that under the token level formulation, classical search-based algorithms, such as MCTS, which have recently been applied to the language generation space, are equivalent to likelihood-based search on a DPO policy. Empirically we show that a simple beam search yields meaningful improvement over the base DPO policy. Finally, we show how the choice of reference policy causes implicit rewards to decline during training. We conclude by discussing applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.

247 sitasi en Computer Science

Detail Sumber

S2 Open Access 2023

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Yevgen Chebotar, Q. Vuong, A. Irpan et al.

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://qtransformer.github.io

145 sitasi en Computer Science

Detail DOI Sumber

CrossRef Open Access 2026

Climate-driven reproductive decline in Southern right whales

Claire Charlton, Matthew Germishuizen, Bridgette O’Shannessy et al.

1 sitasi en

Detail DOI Sumber

CrossRef Open Access 2024

Variation in glider-detected North Atlantic right, blue, and fin whale calls in proximity to high-traffic shipping lanes

KL Indeck, R Gehrmann, AL Richardson et al.

Passive acoustic monitoring has become an integral tool for determining the presence, distribution, and behavior of vocally active cetacean species. Acoustically equipped underwater gliders are becoming a routine monitoring platform, because they can cover large spatial scales during a single deployment and have the capability to relay data to shore in near real-time. Yet, more research is needed to determine what information can be derived from glider-recorded cetacean detections. Here, a Slocum glider that monitored continuously for low frequency (<1 kHz) baleen whale vocalizations was deployed across the Honguedo Strait and the associated traffic separation scheme in the Gulf of St. Lawrence, Canada, during September and October 2019. We conducted a manual analysis of the archived audio to examine spatial and temporal variation in acoustic detection rates of North Atlantic right whales (NARWs), blue whales, and fin whales. Call detections of blue and fin whales demonstrated that both species were acoustically active throughout the deployment. Environmental association models suggested their preferential use of foraging areas along the southern slopes of the Laurentian Channel. Results also indicate that elevated background noise levels in the shipping lanes from vessel traffic only minimally influenced the likelihood of detecting blue whale acoustic presence, while they did not affect fin whale detectability. NARWs were definitively detected on less than 20% of deployment days, so only qualitative assessments of their presence were described. Nevertheless, detections of all 3 species highlight that their movements throughout this seasonally important region overlap with a high volume of vessel traffic, increasing their risk of ship strike.

en

Detail DOI Sumber

S2 Open Access 2011

Deterministic design of wavelength scale, ultra-high Q photonic crystal nanobeam cavities.

Q. Quan, M. Lončar

Photonic crystal nanobeam cavities are versatile platforms of interest for optical communications, optomechanics, optofluidics, cavity QED, etc. In a previous work [Appl. Phys. Lett. 96, 203102 (2010)], we proposed a deterministic method to achieve ultrahigh Q cavities. This follow-up work provides systematic analysis and verifications of the deterministic design recipe and further extends the discussion to air-mode cavities. We demonstrate designs of dielectric-mode and air-mode cavities with Q > 10⁹, as well as dielectric-mode nanobeam cavities with both ultrahigh-Q (> 10⁷) and ultrahigh on-resonance transmissions (T > 95%).

430 sitasi en Physics, Medicine

Detail DOI Sumber

S2 Open Access 2018

Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids

P. Kofinas, A. Dounis, G. Vouros

Abstract This study proposes a cooperative multi-agent system for managing the energy of a stand-alone microgrid. The multi-agent system learns to control the components of the microgrid so as this to achieve its purposes and operate effectively, by means of a distributed, collaborative reinforcement learning method in continuous actions-states space. Stand-alone microgrids present challenges regarding guaranteeing electricity supply and increasing the reliability of the system under the uncertainties introduced by the renewable power sources and the stochastic demand of the consumers. In this article we consider a microgrid that consists of power production, power consumption and power storage units: the power production group includes a Photovoltaic source, a fuel cell and a diesel generator; the power consumption group includes an electrolyzer unit, a desalination plant and a variable electrical load that represent the power consumption of a building; the power storage group includes only the Battery bank. We conjecture that a distributed multi-agent system presents specific advantages to control the microgrid components which operate in a continuous states and actions space: For this purpose we propose the use of fuzzy Q-Learning methods for agents representing microgrid components to act as independent learners, while sharing state variables to coordinate their behavior. Experimental results highlight both the effectiveness of individual agents to control system components, as well as the effectiveness of the multi-agent system to guarantee electricity supply and increase the reliability of the microgrid.

192 sitasi en Computer Science

Detail DOI Sumber

CrossRef Open Access 2023

Diagnosis and treatment of MPN in real life: exploratory and retrospective chart review including 960 MPN patients diagnosed with ET or MF in Germany

Andreas Schmidt, Christiane Bernhardt, Dieter Bürkle et al.

Abstract Purpose The WHO 2016 re-classification of myeloproliferative neoplasms resulted in a separation of essential thrombocythemia (ET) from the pre-fibrotic and fibrotic (overt) phases of primary myelofibrosis (MF). This study reports on a chart review conducted to evaluate the real life approach regarding clinical characteristics, diagnostic assessment, risk stratification and treatment decisions for MPN patients classified as ET or MF after implementation of the WHO 2016 classification. Methods In this retrospective chart review, 31 office-based hematologists/oncologists and primary care centers in Germany participated between April 2021 and May 2022. Physicians reported available data obtained from patient charts via paper–pencil based survey (secondary use of data). Patient features were evaluated using descriptive analysis, also including diagnostic assessment, therapeutic strategies and risk stratification. Results Data of 960 MPN patients diagnosed with essential thrombocythemia (ET) (n = 495) or myelofibrosis (MF) (n = 465) after implementation of the revised 2016 WHO classification of myeloid neoplasms was collected from the patient charts. While they met at least one minor WHO-criteria for primary myelofibrosis, 39.8% of those diagnosed with ET did not have histological BM testing at diagnosis. 63.4% of patients who were classified as having MF, however, did not obtain an early prognostic risk assessment. More than 50% of MF patients showed characteristics consistent with the pre-fibrotic phase, which was emphasized by the frequent use of cytoreductive therapy. Hydroxyurea was the most frequently used cytoreductive medication in 84.7% of ET and 53.1% of MF patients. While both ET and MF cohorts showed cardiovascular risk factors in more than 2/3 of the cases, the use of platelet inhibitors or anticoagulants varied between 56.8% in ET and 38.1% in MF patients. Conclusions Improved histopathologic diagnostics, dynamic risk stratification including genetic risk factors for cases of suspected ET and MF are recommended for precise risk assessment and therapeutic stratification according to WHO criteria.

5 sitasi en

Detail DOI Sumber

S2 Open Access 2003

Transverse-momentum and collision-energy dependence of high-pT hadron suppression in Au+Au collisions at ultrarelativistic energies.

J. Adams, C. Adler, M. Aggarwal et al.

We report high statistics measurements of inclusive charged hadron production in Au+Au and p+p collisions at sqrt[s(NN)]=200 GeV. A large, approximately constant hadron suppression is observed in central Au+Au collisions for 5<p(T)<12 GeV/c. The collision energy dependence of the yields and the centrality and p(T) dependence of the suppression provide stringent constraints on theoretical models of suppression. Models incorporating initial-state gluon saturation or partonic energy loss in dense matter are largely consistent with observations. We observe no evidence of p(T)-dependent suppression, which may be expected from models incorporating jet attenuation in cold nuclear matter or scattering of fragmentation hadrons.

562 sitasi en Physics, Medicine

Detail DOI Sumber

S2 Open Access 2022

On the Skew and Curvature of the Implied and Local Volatilities

E. Alòs, David Garc'ia-Lorite, Makar Pravosud

ABSTRACT In this paper, we study the relationship between the short-end of the local and the implied volatility surfaces. Our results, based on Malliavin calculus techniques, recover the recent rule (where H denotes the Hurst parameter of the volatility process) for rough volatilities (see F. Bourgey, S. De Marco, P. Friz, and P. Pigato. 2022. “Local Volatility under Rough Volatility.” arXiv:2204.02376v1 [q-fin.MF] https://doi.org/10.48550/arXiv.2204.02376.), that states that the short-time skew slope of the at-the-money implied volatility is of the corresponding slope for local volatilities. Moreover, we see that the at-the-money short-end curvature of the implied volatility can be written in terms of the short-end skew and curvature of the local volatility and vice versa. Additionally, this relationship depends on H.

3 sitasi en Economics

Detail DOI Sumber

S2 Open Access 2010

Photonic crystal nanobeam cavity strongly coupled to the feeding waveguide

Q. Quan, P. Deotare, M. Lončar

A deterministic design of an ultrahigh Q-factor, wavelength-scale photonic crystal nanobeam cavity is proposed and experimentally demonstrated. Using this approach, cavities with Q>106 and on-resonance transmission T>90% are designed. The devices, fabricated in silicon and capped with a low refractive index polymer, have experimental Q=80 000 and T=73%. This is, to the best of our knowledge, the highest transmission measured in deterministically designed, wavelength-scale high-Q cavities.

378 sitasi en Physics, Materials Science

Detail DOI Sumber

S2 Open Access 2021

Short-Term Interest Rate Estimation by Filtering in a Model Linking Inflation, the Central Bank and Short-Term Interest Rates

F. Antonacci, C. Costantini, M. Papi

We consider the model of Antonacci, Costantini, D’Ippoliti, Papi (arXiv:2010.05462 [q-fin.MF], 2020), which describes the joint evolution of inflation, the central bank interest rate, and the short-term interest rate. In the case when the diffusion coefficient does not depend on the central bank interest rate, we derive a semi-closed valuation formula for contingent derivatives, in particular for Zero Coupon Bonds (ZCBs). By using ZCB yields as observations, we implement the Kalman filter and obtain a dynamical estimate of the short-term interest rate. In turn, by this estimate, at each time step, we calibrate the model parameters under the risk-neutral measure and the coefficient of the risk premium. We compare the market values of German interest rate yields for several maturities with the corresponding values predicted by our model, from 2007 to 2015. The numerical results validate both our model and our numerical procedure.

1 sitasi en Mathematics

Detail DOI Sumber

CrossRef Open Access 2021

Les Foyers artistiques à la fin du règne de Louis XIV (1682–1715). Musique et spectacles. Hrsg. von Anne-Madeleine Goulet, unter Mitarbeit von Rémy Campos, Mathieu da Vinha und Jean Duron

Tobias C. Weißmann

en

Detail DOI Sumber

CrossRef Open Access 2021

Guest editorial

Andy Naranjo

en

Detail DOI Sumber

Hasil untuk "q-fin.MF"