Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus
et al.
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only resources. This uncertainty is central to understanding both scientific acceleration and dual-use risk. We conducted a multi-model, multi-benchmark human uplift study comparing novices with LLM access versus internet-only access across eight biosecurity-relevant task sets. Participants worked on complex problems with ample time (up to 13 hours for the most involved tasks). We found that LLM access provided substantial uplift: novices with LLMs were 4.16 times more accurate than controls (95% CI [2.63, 6.87]). On four benchmarks with available expert baselines (internet-only), novices with LLMs outperformed experts on three of them. Perhaps surprisingly, standalone LLMs often exceeded LLM-assisted novices, indicating that users were not eliciting the strongest available contributions from the LLMs. Most participants (89.6%) reported little difficulty obtaining dual-use-relevant information despite safeguards. Overall, LLMs substantially uplift novices on biological tasks previously reserved for trained practitioners, underscoring the need for sustained, interactive uplift evaluations alongside traditional benchmarks.
Martí Cortada Garcia, Adrià Diéguez Moscardó, Marta Casanellas
We introduce GenPhylo, a Python module that simulates nucleotide sequence data along a phylogeny avoiding the restriction of continuous-time Markov processes. GenPhylo uses directly a general Markov model and therefore naturally incorporates heterogeneity across lineages. We solve the challenge of generating transition matrices with a pre-given expected number of substitutions (the branch length information) by providing an algorithm that can be incorporated in other simulation software.
Agentic AI "scientists" now use language models to search the literature, run analyses, and generate hypotheses. We evaluate KOSMOS, an autonomous AI scientist, on three problems in radiation biology using simple random-gene null benchmarks. Hypothesis 1: baseline DNA damage response (DDR) capacity across cell lines predicts the p53 transcriptional response after irradiation (GSE30240). Hypothesis 2: baseline expression of OGT and CDO1 predicts the strength of repressed and induced radiation-response modules in breast cancer cells (GSE59732). Hypothesis 3: a 12-gene expression signature predicts biochemical recurrence-free survival after prostate radiotherapy plus androgen deprivation therapy (GSE116918). The DDR-p53 hypothesis was not supported: DDR score and p53 response were weakly negatively correlated (Spearman rho = -0.40, p = 0.76), indistinguishable from random five-gene scores. OGT showed only a weak association (r = 0.23, p = 0.34), whereas CDO1 was a clear outlier (r = 0.70, empirical p = 0.0039). The 12-gene signature achieved a concordance index of 0.61 (p = 0.017) but a non-unique effect size. Overall, KOSMOS produced one well-supported discovery, one plausible but uncertain result, and one false hypothesis, illustrating that AI scientists can generate useful ideas but require rigorous auditing against appropriate null models.
André L. A. Neves, Ricardo Augusto Mendonça Vieira, Einar Vargas-Bello-Pérez
et al.
The rumen microbiome is central to feed digestion and host performance, making it an important target for improving ruminant productivity and sustainability. This study investigated how feed composition influences rumen microbial abundance and phenotypic traits in beef cattle. Fifty-nine Angus bulls were assigned to forage- and grain-based diets in a randomized block design, evaluating microbial dynamics, methane emissions, and feed efficiency. Quantitative PCR (qPCR) quantified bacterial, archaeal, fungal, and protozoal populations. Grain-based diets reduced bacterial and fungal counts compared to forage diets (1.1 × 10<sup>11</sup> vs. 2.8 × 10<sup>11</sup> copies of 16S rRNA genes and 1.5 × 10<sup>3</sup> vs. 3.5 × 10<sup>4</sup> copies of 18S rRNA genes/mL, respectively), while protozoan and methanogen populations remained stable. Microbial abundance correlated with feed intake metrics, including dry matter and neutral detergent fiber intakes. Methane emissions were lower in grain-fed bulls (14.8 vs. 18.0 L CH<sub>4</sub>/kg DMI), though feed efficiency metrics showed no direct association with microbial abundance. Comparative analysis revealed adaptive microbial shifts in response to dietary changes, with functional redundancy maintaining rumen stability and supporting host performance. These findings provide insights into how feed composition shapes rumen microbial dynamics and host phenotypes, highlighting the functional adaptability of the rumen microbiome during dietary transitions.
The sperm freezing–thawing procedure is the most commonly used technique in clinics to preserve male fertility before any pathological destruction of the testis. Therefore, most studies are currently focused on optimizing this method to achieve high-quality semen after thawing. During cryopreservation, oxidative stress-induced damage affects sperm structures and decreases their fertility potential. The use of antioxidants in freezing media can protect sperm against oxidative damage. We designed this study to evaluate whether incubation of semen with human follicular fluid, which contains a wide variety of enzymatic and nonenzymatic antioxidants, can prevent the negative effects of freezing–thawing on human spermatozoa. Human semen was divided into three groups i) the 0-hour group (before freezing), ii) the control group (after freezing–thawing), and iii) the FF group (after freezing with 50% follicular fluid). The sperm motility, viability, integrity of the plasma membrane and DNA, mitochondrial membrane potential, malondialdehyde level, total antioxidant capacity, and catalase activity were assessed in these three groups. The findings showed a significant decrease in sperm motility, viability, plasma membrane and DNA integrity, mitochondrial membrane potential, total antioxidant capacity, and catalase activity and a significant increase in malondialdehyde level in the control group compared with the 0-hour group. The FF group displayed a considerable increase in sperm parameters, total antioxidant capacity, and catalase activity and a significant decrease in malondialdehyde level compared with the control group. Follicular fluid can be considered an effective supplement to improve antioxidant indices and sperm parameters during freezing–thawing.
Alessandro Palma, Till Richter, Hanyi Zhang
et al.
Generative modeling of single-cell RNA-seq data is crucial for tasks like trajectory inference, batch effect removal, and simulation of realistic cellular data. However, recent deep generative models simulating synthetic single cells from noise operate on pre-processed continuous gene expression approximations, overlooking the discrete nature of single-cell data, which limits their effectiveness and hinders the incorporation of robust noise models. Additionally, aspects like controllable multi-modal and multi-label generation of cellular data remain underexplored. This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data. CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics while tackling relevant generative tasks such as rare cell type augmentation and batch correction. We also introduce a novel framework for compositional data generation using Flow Matching. By showcasing CFGen on a diverse set of biological datasets and settings, we provide evidence of its value to the fields of computational biology and deep generative models.
Pabel Shahrear, Md. Shahedul Islam, Md. Abu Bakkar
et al.
The ever-changing world of disease study heavily relies on mathematical models. They are key in finding and controlling infectious diseases. We aim to explore these mathematical tools used for studying disease spread in biology. The SEIR model holds our focus. It is a super important tool known for being flexible and useful. We look at the modified SEIR models' design and analysis. We dive right into vital parts like the equations that make the modified SEIR model work, setting parameter identities, and then checking its solutions' positivity and limits. The study begins with a detailed examination of the design and analysis of a modified SEIR model, demonstrating its angularity. We delve into the model's heart, dealing with critical issues such as the equations that drive the modified SEIR model, establishing parameter identities, and ensuring the positivity and boundlessness of its solutions. Basic Reproduction Number marks a significant milestone. We investigate the local stability, DFE, and EE. Global stability, a paramount consideration in understanding the long-term behaviors of the systems, is scrutinized by employing the Lyapunov stability theorem. The bifurcation analysis classifies and elucidates the fundamental concepts therein. One-dimensional bifurcation and forward and backward bifurcation analyses are intricately examined, providing a comprehensive understanding of the dynamical behavior and basic concepts. In summary, we offer a thorough description and analysis of the SEIR model but also lay the groundwork for advancing mathematical modeling in epidemiology. By bridging theoretical insights with practical implications, this study strives to empower researchers and policymakers with a deep understanding of infectious disease dynamics, thereby contributing to targeted public health strategies.
Atli Jasonarson, Hinrik Hafsteinsson, Bjarki Ármannsson
et al.
This paper presents the submission of the Árni Magnusson Institute's team to the WMT24 General translation task. We work on the English->Icelandic translation direction. Our system comprises four translation models and a grammar correction model. For training our models we carefully curate our datasets, aggressively filtering out sentence pairs that may detrimentally affect the quality of our system's output. Some of our data are collected from human translations and some are synthetically generated. A part of the synthetic data is generated using an LLM, and we find that it increases the translation capability of our system significantly.
Nanozymes constitute a promising treatment strategy for antitumor therapy. However, the catalytic function of metal‒organic framework (MOF)-based nanozymes during cuproptosis remains unclear. In this study, a Cu(Ⅱ)-based MOF nanocomposite loaded with the copper ionophore elesclomol and surface modified with polyethylene glycol polymer (PEG) was developed (ES@Cu(Ⅱ)-MOF) for effective cuproptosis induction. The peroxidase (POD)-like activity of ES@Cu(Ⅱ)-MOF generated an abundance of hydroxyl radicals (•OH) via a Fenton-like reaction, and its glutathione peroxidase (GSH-Px)-like activity converted Cu2+ into more toxic Cu+ ions to efficiently consume endogenous GSH. Notably, the rapid accumulation of Cu+ and ES in tumor cells induced the aggregation of lipoylated dihydrolipoamide S-acetyltransferase (DLAT) and the downregulation of Fe‒S cluster proteins, ultimately leading to cuproptosis. ES@Cu(Ⅱ)-MOF exhibited extraordinary cytotoxicity against breast cancer cells in vitro and significantly suppressed 4T1 breast tumor growth in vivo. Moreover, ES@Cu(Ⅱ)-MOF induced immunogenic cell death (ICD) to increase the antitumor immune response. Furthermore, combining ES@Cu(Ⅱ)-MOF with an anti-programmed cell death-ligand 1 (PD-L1) antibody converted the immunosuppressive tumor microenvironment to an immunogenic microenvironment, thus effectively inhibiting breast tumor growth. Overall, this work provides an innovative approach utilizing nanozymes to facilitate cuproptosis for cancer treatment, which potentially enhances the effectiveness of immune checkpoint inhibitor-based immunotherapy.
Domain generalization for Diabetic Retinopathy (DR) classification allows a model to adeptly classify retinal images from previously unseen domains with various imaging conditions and patient demographics, thereby enhancing its applicability in a wide range of clinical environments. In this study, we explore the inherent capacity of variational autoencoders to disentangle the latent space of fundus images, with an aim to obtain a more robust and adaptable domain-invariant representation that effectively tackles the domain shift encountered in DR datasets. Despite the simplicity of our approach, we explore the efficacy of this classical method and demonstrate its ability to outperform contemporary state-of-the-art approaches for this task using publicly available datasets. Our findings challenge the prevailing assumption that highly sophisticated methods for DR classification are inherently superior for domain generalization. This highlights the importance of considering simple methods and adapting them to the challenging task of generalizing medical images, rather than solely relying on advanced techniques.
Histone acetylation is a pivotal epigenetic modification that controls chromatin structure and regulates gene expression. It plays an essential role in modulating zygotic transcription and cell lineage specification of developing embryos. While the outcomes of many inductive signals have been described to require enzymatic activities of histone acetyltransferases and deacetylases (HDACs), the mechanisms by which HDACs confine the utilization of the zygotic genome remain to be elucidated. Here, we show that histone deacetylase 1 (Hdac1) progressively binds to the zygotic genome from mid-blastula and onward. The recruitment of Hdac1 to the genome at blastula is instructed maternally. Cis-regulatory modules (CRMs) bound by Hdac1 possess epigenetic signatures underlying distinct functions. We highlight a dual function model of Hdac1 where Hdac1 not only represses gene expression by sustaining a histone hypoacetylation state on inactive chromatin, but also maintains gene expression through participating in dynamic histone acetylation–deacetylation cycles on active chromatin. As a result, Hdac1 maintains differential histone acetylation states of bound CRMs between different germ layers and reinforces the transcriptional program underlying cell lineage identities, both in time and space. Taken together, our study reveals a comprehensive role for Hdac1 during early vertebrate embryogenesis.
Arachchige Maheshika Kumari Jayasinghe, Kirinde Gedara Isuru Sandanuwan Kirindage, Ilekuttige Priyan Shanura Fernando
et al.
Brown seaweed is a rich source of fucoidan, which exhibits a variety of biological activities. The present study discloses the protective effect of low molecular weight fucoidan (FSSQ) isolated from an edible brown alga, <i>Sargassum siliquastrum,</i> on lipopolysaccharide (LPS)-stimulated inflammatory responses in RAW 264.7 macrophages. The findings of the study revealed that FSSQ increases cell viability while decreasing intracellular reactive oxygen species production in LPS-stimulated RAW 264.7 macrophages dose-dependently. FSSQ reduced the iNOS and COX-2 expression, inhibiting the NO and prostaglandin E<sub>2</sub> production. Furthermore, mRNA expression of IL-1β, IL-6, and TNF-α was downregulated by FSSQ via modulating MAPK and NF-κB signaling. The NLRP3 inflammasome protein complex, including NLRP3, ASC, and caspase-1, as well as the subsequent release of pro-inflammatory cytokines, such as IL-1β and IL-18, release in LPS-stimulated RAW 264.7 macrophages was inhibited by FSSQ. The cytoprotective effect of FSSQ is indicated via Nrf2/HO-1 signaling activation, which is considerably reduced upon suppression of HO-1 activity by ZnPP. Collectively, the study revealed the therapeutic potential of FSSQ against inflammatory responses in LPS-stimulated RAW 264.7 macrophages. Moreover, the study suggests further investigations on commercially viable methods for fucoidan isolation.
Răzvan Lucian Coșeriu, Anca Delia Mare, Felicia Toma
et al.
(1) Background: The purpose of the study was to describe the activity of <i>mex</i> efflux pumps in Multidrug-Resistant (MDR) clinical isolates of <i>Pseudomonas aeruginosa</i> and to compare the carbapenem-resistance identification tests with PCR; (2) Methods: Sixty MDR <i>P. aeruginosa</i> were analyzed for detection of carbapenemase by disk diffusion inhibitory method, carbapenem inactivation method and Modified Hodge Test. Endpoint PCR was used to detect 7 carbapenemase genes (<i>bla</i><sub>KPC</sub>, <i>bla</i><sub>OXA48-like</sub>, <i>bla</i><sub>NDM</sub>, <i>bla</i><sub>GES-2</sub>, <i>bla</i><sub>SPM</sub>, <i>bla</i><sub>IMP</sub>, <i>bla</i><sub>VIM</sub>) and <i>mcr-1</i> for colistin resistance. The expression of <i>mex</i>A, <i>mex</i>B, <i>mex</i>C, <i>mex</i>E and <i>mex</i>X genes corresponding to the four main efflux pumps was also evaluated; (3) Results: From the tested strains, 71.66% presented at least one carbapenemase gene, with <i>bla</i><sub>GES-2</sub> as the most occurring gene (63.3%). Compared with the PCR, the accuracy of phenotypic tests did not exceed 25% for <i>P. aeruginosa</i>. The efflux pump genes were present in all strains except one. In 85% of the isolates, an overactivity of <i>mex</i>A, <i>mex</i>B and mostly <i>mex</i>C was detected. Previous treatment with ceftriaxone increased the activity of <i>mex</i>C by more than 160 times; (4) Conclusions: In our MDR <i>P. aeruginosa</i> clinical isolates, the carbapenem resistance is not accurately detected by phenotypic tests, due to the overexpression of <i>mex</i> efflux pumps and in a lesser amount, due to carbapenemase production.
Statisticians often face the choice between using probability models or a paradigm defined by minimising a loss function. Both approaches are useful and, if the loss can be re-cast into a proper probability model, there are many tools to decide which model or loss is more appropriate for the observed data, in the sense of explaining the data's nature. However, when the loss leads to an improper model, there are no principled ways to guide this choice. We address this task by combining the Hyvärinen score, which naturally targets infinitesimal relative probabilities, and general Bayesian updating, which provides a unifying framework for inference on losses and models. Specifically we propose the H-score, a general Bayesian selection criterion and prove that it consistently selects the (possibly improper) model closest to the data-generating truth in Fisher's divergence. We also prove that an associated H-posterior consistently learns optimal hyper-parameters featuring in loss functions, including a challenging tempering parameter in generalised Bayesian inference. As salient examples, we consider robust regression and non-parametric density estimation where popular loss functions define improper models for the data and hence cannot be dealt with using standard model selection tools. These examples illustrate advantages in robustness-efficiency trade-offs and provide a Bayesian implementation for kernel density estimation, opening a new avenue for Bayesian non-parametrics.
Giulio Bondanelli, Thomas Deneux, Brice Bathellier
et al.
Across sensory systems, complex spatio-temporal patterns of neural activity arise following the onset (ON) and offset (OFF) of stimuli. While ON responses have been widely studied, the mechanisms generating OFF responses in cortical areas have so far not been fully elucidated. We examine here the hypothesis that OFF responses are single-cell signatures of recurrent interactions at the network level. To test this hypothesis, we performed population analyses of two-photon calcium recordings in the auditory cortex of awake mice listening to auditory stimuli, and compared them to linear single-cell and network models. While the single-cell model explained some prominent features of the data, it could not capture the structure across stimuli and trials. In contrast, the network model accounted for the low-dimensional organization of population responses and their global structure across stimuli, where distinct stimuli activated mostly orthogonal dimensions in the neural state-space.
Heat stress is an increasing threat to rice production worldwide. To investigate the mechanisms of heat tolerance in hybrid rice and their contributions to rice heterosis, we compared the transcriptome of the hybrid rice II YOU 838 (II8) with the transcriptomes of its parents Fu Hui 838 (F8) and II-32A (II3) after heat stress at 42 °C for 0 h, 24 h, 72 h and 120 h. We also performed a proteomic analysis in II8 after heat stress at 42 °C for 24 h. The transcriptome data revealed time-dependent gene expression patterns under the heat stress conditions, and the heat stress response of II8 was greatly different from those of its parents. Gene ontology analysis of the differentially expressed genes that were clustered using k-means clustering showed that most of the up-regulated genes were involved in responses to stimuli, cell communication, and metabolic and transcription factor activities, whereas the down-regulated genes were enriched in photosynthesis and signal transduction. Moreover, 35 unique differentially abundant proteins, including a basic helix-loop-helix transcription factor (bHLH96), calmodulin-binding transcription activator, heat shock protein (Hsp70), and chaperonin 60 (CPN60), were detected in the proteomic analysis of II8 under heat stress. The co-regulatory analysis revealed novel genes and pathways involved in heat tolerance, namely, ferredoxin-NADP reductase, peroxidases, mitogen-activated protein kinase kinase kinase, and heat shock factor (HSF)–Hsp network. Members of the Hsp and HSF families had over-dominant expression patterns in the hybrid compared with its parents, to help maintain the higher photosynthesis and antioxidant defense systems in the hybrid. Our study suggests that the complex HSF–Hsp regulatory network contribute to the heat tolerance of the hybrid rice.
The inverse Potts problem to infer a Boltzmann distribution for homologous protein sequences from their single-site and pairwise amino acid frequencies recently attracts a great deal of attention in the studies of protein structure and evolution. We study regularization and learning methods and how to tune regularization parameters to correctly infer interactions in Boltzmann machine learning. Using $L_2$ regularization for fields, group $L_1$ for couplings is shown to be very effective for sparse couplings in comparison with $L_2$ and $L_1$. Two regularization parameters are tuned to yield equal values for both the sample and ensemble averages of evolutionary energy. Both averages smoothly change and converge, but their learning profiles are very different between learning methods. The Adam method is modified to make stepsize proportional to the gradient for sparse couplings. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered, but that recovering the pairwise correlations in the resolution of a total energy is harder for the natural proteins than for the protein-like sequences. Selective temperature for folding/structural constrains in protein evolution is also estimated.