David Rein, Betty Li Hou, Asa Cooper Stickland
et al.
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are"Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state. This leverages the generalization capacity of the function approximator to estimate the value of the best available action at a given state without ever directly querying a Q-function with this unseen action. Our algorithm alternates between fitting this upper expectile value function and backing it up into a Q-function. Then, we extract the policy via advantage-weighted behavioral cloning. We dub our method implicit Q-learning (IQL). IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learning. We also demonstrate that IQL achieves strong performance fine-tuning using online interaction after offline initialization.
Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static datasets without further interaction. However, in practice, offline RL presents a major challenge, and standard off-policy RL methods can fail due to overestimation of values induced by the distributional shift between the dataset and the learned policy, especially when training on complex and multi-modal data distributions. In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function such that the expected value of a policy under this Q-function lower-bounds its true value. We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees. In practice, CQL augments the standard Bellman error objective with a simple Q-value regularizer which is straightforward to implement on top of existing deep Q-learning and actor-critic implementations. On both discrete and continuous control domains, we show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return, especially when learning from complex and multi-modal data distributions.
Off-policy reinforcement learning aims to leverage experience collected from prior policies for sample-efficient learning. However, in practice, commonly used off-policy approximate dynamic programming methods based on Q-learning and actor-critic methods are highly sensitive to the data distribution, and can make only limited progress without collecting additional on-policy data. As a step towards more robust off-policy algorithms, we study the setting where the off-policy experience is fixed and there is no further interaction with the environment. We identify bootstrapping error as a key source of instability in current methods. Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator. We theoretically analyze bootstrapping error, and demonstrate how carefully constraining action selection in the backup can mitigate it. Based on our analysis, we propose a practical algorithm, bootstrapping error accumulation reduction (BEAR). We demonstrate that BEAR is able to learn robustly from different off-policy distributions, including random and suboptimal demonstrations, on a range of continuous control tasks.
On 2017 August 17 a binary neutron star coalescence candidate (later designated GW170817) with merger time 12:41:04 UTC was observed through gravitational waves by the Advanced LIGO and Advanced Virgo detectors. The Fermi Gamma-ray Burst Monitor independently detected a gamma-ray burst (GRB 170817A) with a time delay of ∼ 1.7 s with respect to the merger time. From the gravitational-wave signal, the source was initially localized to a sky region of 31 deg2 at a luminosity distance of 40 − 8 + 8 Mpc and with component masses consistent with neutron stars. The component masses were later measured to be in the range 0.86 to 2.26 M ⊙ . An extensive observing campaign was launched across the electromagnetic spectrum leading to the discovery of a bright optical transient (SSS17a, now with the IAU identification of AT 2017gfo) in NGC 4993 (at ∼ 40 Mpc ) less than 11 hours after the merger by the One-Meter, Two Hemisphere (1M2H) team using the 1 m Swope Telescope. The optical transient was independently detected by multiple teams within an hour. Subsequent observations targeted the object and its environment. Early ultraviolet observations revealed a blue transient that faded within 48 hours. Optical and infrared observations showed a redward evolution over ∼10 days. Following early non-detections, X-ray and radio emission were discovered at the transient’s position ∼ 9 and ∼ 16 days, respectively, after the merger. Both the X-ray and radio emission likely arise from a physical process that is distinct from the one that generates the UV/optical/near-infrared emission. No ultra-high-energy gamma-rays and no neutrino candidates consistent with the source were found in follow-up searches. These observations support the hypothesis that GW170817 was produced by the merger of two neutron stars in NGC 4993 followed by a short gamma-ray burst (GRB 170817A) and a kilonova/macronova powered by the radioactive decay of r-process nuclei synthesized in the ejecta.
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.
The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligned with human opinions. Observing that human raters only learn and judge discrete text-defined levels in subjective studies, we propose to emulate this subjective process and teach LMMs with text-defined rating levels instead of scores. The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA), as well as video quality assessment (VQA) tasks under the original LMM structure. With the syllabus, we further unify the three tasks into one model, termed the OneAlign. In our experiments, we demonstrate the advantage of the discrete-level-based syllabus over direct-score-based variants for LMMs. Our code and the pre-trained weights are released at https://github.com/Q-Future/Q-Align.
In recent year, there has been increasing concern about the growing amount of plastic waste coming from daily life. Different kinds of synthetic plastics are currently used for an extensive range of needs, but in order to reduce the impact of petroleum-based plastics and material waste, considerable attention has been focused on “green” plastics. In this paper, we present a broad review on the advances in the research and development of bio-based polymers analogous to petroleum-derived ones. The main interest for the development of bio-based materials is the strong public concern about waste, pollution and carbon footprint. The sustainability of those polymers, for general and specific applications, is driven by the great progress in the processing technologies that refine biomass feedstocks in order to obtain bio-based monomers that are used as building blocks. At the same time, thanks to the industrial progress, it is possible to obtain more versatile and specific chemical structures in order to synthetize polymers with ad-hoc tailored properties and functionalities, with engineering applications that include packaging but also durable and electronic goods. In particular, three types of polymers were described in this review: Bio-polyethylene (Bio-PE), bio-polypropylene (Bio-PP) and Bio-poly(ethylene terephthalate) (Bio-PET). The recent advances in their development in terms of processing technologies, product development and applications, as well as their advantages and disadvantages, are reported.
There has recently been a dramatic renewal of interest in hadron spectroscopy and charm physics. This renaissance has been driven in part by the discovery of a plethora of charmonium-like XYZ states at BESIII and B factories, and the observation of an intriguing proton-antiproton threshold enhancement and the possibly related X(1835) meson state at BESIII, as well as the threshold measurements of charm mesons and charm baryons. We present a detailed survey of the important topics in tau-charm physics and hadron physics that can be further explored at BESIII during the remaining operation period of BEPCII. This survey will help in the optimization of the data-taking plan over the coming years, and provides physics motivation for the possible upgrade of BEPCII to higher luminosity.
Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch between the two approaches. Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm. In this work we rectify this difference. We theoretically show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation. Using our theoretical results, we provide three concrete empirical insights. First, we show that because of its token level interpretation, DPO is able to perform some type of credit assignment. Next, we prove that under the token level formulation, classical search-based algorithms, such as MCTS, which have recently been applied to the language generation space, are equivalent to likelihood-based search on a DPO policy. Empirically we show that a simple beam search yields meaningful improvement over the base DPO policy. Finally, we show how the choice of reference policy causes implicit rewards to decline during training. We conclude by discussing applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.
Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model hinder the efficient adoption of diffusion models. Although post-training quantization (PTQ) is considered a go-to compression method for other tasks, it does not work out-of-the-box on diffusion models. We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture of the diffusion models, which compresses the noise estimation network to accelerate the generation process. We identify the key difficulty of diffusion model quantization as the changing output distributions of noise estimation networks over multiple time steps and the bimodal activation distribution of the shortcut layers within the noise estimation network. We tackle these challenges with timestep-aware calibration and split shortcut quantization in this work. Experimental results show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance (small FID change of at most 2.34 compared to >100 for traditional PTQ) in a training-free manner. Our approach can also be applied to text-guided image generation, where we can run stable diffusion in 4-bit weights with high generation quality for the first time.
DAMPE satellite has directly measured the cosmic ray proton spectrum from 40 GeV to 100 TeV and revealed a new feature at about 13.6 TeV. The precise measurement of the spectrum of protons, the most abundant component of the cosmic radiation, is necessary to understand the source and acceleration of cosmic rays in the Milky Way. This work reports the measurement of the cosmic ray proton fluxes with kinetic energies from 40 GeV to 100 TeV, with 2 1/2 years of data recorded by the DArk Matter Particle Explorer (DAMPE). This is the first time that an experiment directly measures the cosmic ray protons up to ~100 TeV with high statistics. The measured spectrum confirms the spectral hardening at ~300 GeV found by previous experiments and reveals a softening at ~13.6 TeV, with the spectral index changing from ~2.60 to ~2.85. Our result suggests the existence of a new spectral feature of cosmic rays at energies lower than the so-called knee and sheds new light on the origin of Galactic cosmic rays.
We report a study of the processes of e^{+}e^{-}→K^{+}D_{s}^{-}D^{*0} and K^{+}D_{s}^{*-}D^{0} based on e^{+}e^{-} annihilation samples collected with the BESIII detector operating at BEPCII at five center-of-mass energies ranging from 4.628 to 4.698 GeV with a total integrated luminosity of 3.7 fb^{-1}. An excess of events over the known contributions of the conventional charmed mesons is observed near the D_{s}^{-}D^{*0} and D_{s}^{*-}D^{0} mass thresholds in the K^{+} recoil-mass spectrum for events collected at sqrt[s]=4.681 GeV. The structure matches a mass-dependent-width Breit-Wigner line shape, whose pole mass and width are determined as (3982.5_{-2.6}^{+1.8}±2.1) MeV/c^{2} and (12.8_{-4.4}^{+5.3}±3.0) MeV, respectively. The first uncertainties are statistical and the second are systematic. The significance of the resonance hypothesis is estimated to be 5.3 σ over the contributions only from the conventional charmed mesons. This is the first candidate for a charged hidden-charm tetraquark with strangeness, decaying into D_{s}^{-}D^{*0} and D_{s}^{*-}D^{0}. However, the properties of the excess need further exploration with more statistics.
Ester Sevillano, Irene Lafuente, Nuria Peña
et al.
Antimicrobial resistance (AMR) poses a significant challenge to animal production due to the widespread use of antibiotics. Therefore, there is an urgent need for alternative antimicrobial strategies to effectively manage bacterial infections, protect animal health, and reduce reliance on antibiotics. This study evaluated the use of emerging approaches and procedures for the isolation, identification, and characterization of bacteriocin-producing bacteria and their bacteriocins, sourced from the gastrointestinal tract (GIT) of meat-producing pigs. Out of 2056 isolates screened against Gram-positive and Gram-negative indicator strains, 20 of the most active antimicrobial isolates were subjected to whole genome sequencing (WGS) for the prediction of coding DNA sequences (CDS) and the identification of bacteriocin gene clusters (BGC) and their functions. The use of an in vitro cell-free protein synthesis (IV-CFPS) protocol and the design of an IV-CFPS coupled to a split-intein mediated ligation (IV-CFPS/SIML) procedure made possible the evaluation of the production and antimicrobial activity of described and putatively novel bacteriocins. A colony MALDI-TOF MS procedure assisted in the identification of class I, II, and III lanthipeptides. MALDI-TOF MS and a targeted proteomics, combined with a massive peptide analysis (LC-MS/MS) approach, has proven valuable for the identification and biochemical characterization of previously described and novel bacteriocins encoded by the isolated bacteriocin-producing strains.
Photonic crystal nanobeam cavities are versatile platforms of interest for optical communications, optomechanics, optofluidics, cavity QED, etc. In a previous work [Appl. Phys. Lett. 96, 203102 (2010)], we proposed a deterministic method to achieve ultrahigh Q cavities. This follow-up work provides systematic analysis and verifications of the deterministic design recipe and further extends the discussion to air-mode cavities. We demonstrate designs of dielectric-mode and air-mode cavities with Q > 10⁹, as well as dielectric-mode nanobeam cavities with both ultrahigh-Q (> 10⁷) and ultrahigh on-resonance transmissions (T > 95%).
Two distinct families of pan-primate endogenous retroviruses, namely HERVL and HERVH, infected primates germline, colonized host genomes, and evolved into the global retroviral genomic regulatory dominion (GRD) operating during human embryogenesis (HE). HE retroviral GRD constitutes 8839 highly conserved fixed LTR elements linked to 5444 down-stream target genes forged by evolution into a functionally-consonant constellation of 26 genome-wide multimodular genomic regulatory networks (GRNs), each of which is defined by significant enrichment of numerous single gene ontology (GO)-specific traits. Locations of GRNs appear scattered across chromosomes to occupy from 5.5%-15.09% of human genome. Each GRN harbors from 529-1486 retroviral LTRs derived from LTR7, MLT2A1, and MLT2A2 sequences that are quantitatively balanced according to their genome-wide abundance. GRNs integrate activities from 199-805 down-stream target genes, including transcription factors, chromatin-state remodelers, signal-sensing and signal-transduction mediators, enzymatic and receptor binding effectors, intracellular complexes and extracellular matrix elements, and cell-cell adhesion molecules. GRNs compositions consist of several hundred to thousands smaller GO enrichment-defined genomic regulatory modules (GRMs) combining from a dozen to hundreds LTRs and down-stream target genes, which appear to operate on individuals life-span timescale along specific phenotypic avenues to exert profound effects on patterns of transcription, protein-protein interactions, developmental phenotypes, physiological traits, and pathological conditions of Modern Humans. Overall, this study identifies 69,573 statistically significant retroviral LTR-linked GRMs (Binominal FDR q-value threshold of 0.001), including 27,601 GRMs validated by the single GO-specific directed acyclic graph (DAG) analyses across six GO annotations.
This is the second part of the previous review. In the previous review we suspected that Orai3 channels were involved in lung cancer and more precisely in several cancers. Here we confirm that calcium dysregulation is important for cancer development. in this paper we show that Orai3 is an upstream activator of AKT and we prove that AKT is involved in chemoresistance in NSCLC.
A. N. Gorban, T. A. Tyukina, L. I. Pokidysheva
et al.
In 1987, we analyzed the changes in correlation graphs between various features of the organism during stress and adaptation. After 33 years of research of many authors, discoveries and rediscoveries, we can say with complete confidence: It is useful to analyze correlation graphs. In addition, we should add that the concept of adaptability ('adaptation energy') introduced by Selye is useful, especially if it is supplemented by 'adaptation entropy' and free energy, as well as an analysis of limiting factors. Our review of these topics, Dynamic and Thermodynamic Adaptation Models" (Phys Life Rev, 2021, arXiv:2103.01959 [q-bio.OT]), attracted many comments from leading experts, with new ideas and new problems, from the dynamics of aging and the training of athletes to single-cell omics. Methodological backgrounds, like free energy analysis, were also discussed in depth. In this article, we provide an analytical overview of twelve commenting papers and some related publications.