Hasil untuk "q-bio.BM"

Menampilkan 20 dari ~300980 hasil · dari arXiv, CrossRef

JSON API
arXiv Open Access 2025
Prot2Chat: Protein LLM with Early-Fusion of Text, Sequence and Structure

Zhicong Wang, Zicheng Ma, Ziqiang Cao et al.

Motivation: Proteins are of great significance in living organisms. However, understanding their functions encounters numerous challenges, such as insufficient integration of multimodal information, a large number of training parameters, limited flexibility of classification-based methods, and the lack of systematic evaluation metrics for protein Q&A systems. To tackle these issues, we propose the Prot2Chat framework. Results: We modified ProteinMPNN to encode protein sequence and structural information in a unified way. We used a large language model (LLM) to encode questions into vectors and developed a protein-text adapter to compress protein information into virtual tokens based on these vectors, achieving the early fusion of text and protein information. Finally, the same LLM reads the virtual tokens and the questions to generate answers. To optimize training efficiency, we froze the encoder and employed Low-Rank Adaptation (LoRA) techniques for the LLM. Experiments on two datasets show that both automated metrics and expert evaluations demonstrate the superior performance of our model, and zero-shot prediction results highlight its generalization ability. The models and codes are available at https://github.com/ wangzc1233/Prot2Chat. Contact: zqcao@suda.edu.cn or wangzc025@163.com Key words: Protein Q&A, Early-Fusion, LLM

en cs.LG, cs.AI
arXiv Open Access 2025
The effect of stereochemical constraints on the structural properties of folded proteins

Jack A. Logan, Jacob Sumner, Alex T. Grigas et al.

Proteins are composed of chains of amino acids that fold into complex three-dimensional structures. Several key features, such as the radius of gyration, fraction of core amino acids $f_{\rm core}$, packing fraction $\langle φ\rangle$ of core amino acids, and structure factor $S(q)$ define the structure of folded proteins. It is well-known that folded proteins are compact with a radius of gyration $R_g(N) \sim N^ν$ that obeys power-law scaling with the number of amino acids $N$ and $ν\sim 1/3$, $f_{\rm core} \approx 0.09$, and $\langle φ\rangle \approx 0.55$. We also investigate the {\it internal} scaling of the radius of gyration $R_g(n)$ versus the chemical separation $n$ between amino acids for subchains of length $n$ and show that it does not obey simple power-law scaling with $ν\sim 1/3$. Instead, $R_g(n) \sim n^{ν_{1,2}}$ with a larger exponent $ν_1 > 1/3$ for small $n$ and smaller exponent $ν_{2} < 1/3$ for large $n$. To develop a minimal model for proteins that recapitulates these defining structural features, we carry out collapse simulations for a series of coarse-grained models with increasing complexity. We show that a model, which coarse-grains amino acids into a single spherical backbone bead and several variable-sized side-chain beads and enforces bend- and dihedral-angle constraints for the backbone, recapitulates $R_g(n)$, $f_{\rm core}$, $\langle φ\rangle$, and $S(q)$ for more than $2500$ x-ray crystal structures of proteins.

en cond-mat.soft, q-bio.BM
CrossRef Open Access 2025
A case of angle-closure glaucoma secondary to lens zonular abnormalities in a patient with high myopia

Tao Chen BM, Suhui Zhu, Huizhi Zhang BM et al.

Abstract In earlier research on angle-closure glaucoma, high myopia—which is defined by a longer axial length and a deeper anterior chamber depth—has hardly ever been documented. According to a recent study, high myopia and aberrant ciliary body function can cause lens dislocation because the ciliary body's decreased tension on the lens causes angle closure, elevated intraocular pressure, and the need for immediate mydriasis, intraocular pressure reduction, and surgery. It is easy for this illness to be mistaken for primary angle-closure glaucoma. In addition to failing to alleviate the problem, miosis treatment may worsen anterior chamber shallowing and potentially lead to malignant glaucoma. An uncommon case of acute angle-closure glaucoma in a patient with extreme myopia due to lens zonular dysfunction is described in this study.

arXiv Open Access 2024
Protein Structure Prediction in the 3D HP Model Using Deep Reinforcement Learning

Giovanny Espitia, Yui Tik Pang, James C. Gumbart

We address protein structure prediction in the 3D Hydrophobic-Polar lattice model through two novel deep learning architectures. For proteins under 36 residues, our hybrid reservoir-based model combines fixed random projections with trainable deep layers, achieving optimal conformations with 25% fewer training episodes. For longer sequences, we employ a long short-term memory network with multi-headed attention, matching best-known energy values. Both architectures leverage a stabilized Deep Q-Learning framework with experience replay and target networks, demonstrating consistent achievement of optimal conformations while significantly improving training efficiency compared to existing methods.

en cs.LG, cs.AI
CrossRef Open Access 2023
GATA3 as a Blood-Based RNA Biomarker for Idiopathic Parkinson’s Disease

Shubhra Acharya, Andrew I. Lumley, Lu Zhang et al.

Finding novel biomarkers for Parkinson’s disease (PD) is crucial for early disease diagnosis, severity assessment and identifying novel disease-modifying drug targets. Our study aimed at investigating the GATA3 mRNA levels in whole blood samples of idiopathic PD (iPD) patients with different disease severities as a biomarker for iPD. The present study is a cross-sectional, case-control study, with samples obtained from the Luxembourg Parkinson’s cohort (LuxPARK). iPD (N = 319) patients, along with age-matched controls without PD (non-PD; N = 319) were included in this study. Blood GATA3 mRNA expression was measured using quantitative reverse transcription PCR (RT-qPCR) assays. The capacity of GATA3 expression levels to establish the diagnosis of iPD (primary end-point) and assess disease severity (secondary end-point) was determined. The blood levels of GATA3 were significantly lower in iPD patients, compared to non-PD controls (p ≤ 0.001). Logistic regression models showed a significant association of GATA3 expression with iPD diagnosis after adjustment for the confounders (p = 0.005). Moreover, the addition of GATA3 expression to a baseline clinical model improved its iPD diagnosis capacity (p = 0.005). There was a significant association of GATA3 expression levels with the overall disease severity (p = 0.002), non-motor experiences of daily living (nm-EDL; p = 0.003) and sleep disturbances (p = 0.01). Our results suggest that GATA3 expression measured in blood may serve as a novel biomarker and may help in the diagnosis of iPD and assessment of disease severity.

CrossRef Open Access 2023
A Machine Learning Model Based on microRNAs for the Diagnosis of Essential Hypertension

Amela Jusic, Inela Junuzovic, Ahmed Hujdurovic et al.

Introduction: Hypertension is a major and modifiable risk factor for cardiovascular diseases. Essential, primary, or idiopathic hypertension accounts for 90–95% of all cases. Identifying novel biomarkers specific to essential hypertension may help in understanding pathophysiological pathways and developing personalized treatments. We tested whether the integration of circulating microRNAs (miRNAs) and clinical risk factors via machine learning modeling may provide useful information and novel tools for essential hypertension diagnosis and management. Materials and methods: In total, 174 participants were enrolled in the present observational case–control study, among which, there were 89 patients with essential hypertension and 85 controls. A discovery phase was conducted using small RNA sequencing in whole blood samples obtained from age- and sex-matched hypertension patients (n = 30) and controls (n = 30). A validation phase using RT-qPCR involved the remaining 114 participants. For machine learning, 170 participants with complete data were used to generate and evaluate the classification model. Results: Small RNA sequencing identified seven miRNAs downregulated in hypertensive patients as compared with controls in the discovery group, of which six were confirmed with RT-qPCR. In the validation group, miR-210-3p/361-3p/362-5p/378a-5p/501-5p were also downregulated in hypertensive patients. A machine learning support vector machine (SVM) model including clinical risk factors (sex, BMI, alcohol use, current smoker, and hypertension family history), miR-361-3p, and miR-501-5p was able to classify hypertension patients in a test dataset with an AUC of 0.90, a balanced accuracy of 0.87, a sensitivity of 0.83, and a specificity of 0.91. While five miRNAs exhibited substantial downregulation in hypertension patients, only miR-361-3p and miR-501-5p, alongside clinical risk factors, were consistently chosen in at least eight out of ten sub-training sets within the SVM model. Conclusions: This study highlights the potential significance of miRNA-based biomarkers in deepening our understanding of hypertension’s pathophysiology and in personalizing treatment strategies. The strong performance of the SVM model highlights its potential as a valuable asset for diagnosing and managing essential hypertension. The model remains to be extensively validated in independent patient cohorts before evaluating its added value in a clinical setting.

arXiv Open Access 2023
DGFN: Double Generative Flow Networks

Elaine Lau, Nikhil Vemgal, Doina Precup et al.

Deep learning is emerging as an effective tool in drug discovery, with potential applications in both predictive and generative models. Generative Flow Networks (GFlowNets/GFNs) are a recently introduced method recognized for the ability to generate diverse candidates, in particular in small molecule generation tasks. In this work, we introduce double GFlowNets (DGFNs). Drawing inspiration from reinforcement learning and Double Deep Q-Learning, we introduce a target network used to sample trajectories, while updating the main network with these sampled trajectories. Empirical results confirm that DGFNs effectively enhance exploration in sparse reward domains and high-dimensional state spaces, both challenging aspects of de-novo design in drug discovery.

en cs.LG, q-bio.BM
arXiv Open Access 2023
Distributed Reinforcement Learning for Molecular Design: Antioxidant case

Huanyi Qin, Denis Akhiyarov, Sophie Loehle et al.

Deep reinforcement learning has successfully been applied for molecular discovery as shown by the Molecule Deep Q-network (MolDQN) algorithm. This algorithm has challenges when applied to optimizing new molecules: training such a model is limited in terms of scalability to larger datasets and the trained model cannot be generalized to different molecules in the same dataset. In this paper, a distributed reinforcement learning algorithm for antioxidants, called DA-MolDQN is proposed to address these problems. State-of-the-art bond dissociation energy (BDE) and ionization potential (IP) predictors are integrated into DA-MolDQN, which are critical chemical properties while optimizing antioxidants. Training time is reduced by algorithmic improvements for molecular modifications. The algorithm is distributed, scalable for up to 512 molecules, and generalizes the model to a diverse set of molecules. The proposed models are trained with a proprietary antioxidant dataset. The results have been reproduced with both proprietary and public datasets. The proposed molecules have been validated with DFT simulations and a subset of them confirmed in public "unseen" datasets. In summary, DA-MolDQN is up to 100x faster than previous algorithms and can discover new optimized molecules from proprietary and public antioxidants.

en cs.LG, cs.DC
CrossRef Open Access 2022
Desalination Using the Capacitive Deionization Technology with Graphite/AC Electrodes: Effect of the Flow Rate and Electrode Thickness

Jhonatan Martinez, Martín Colán, Ronald Catillón et al.

Capacitive deionization (CDI) is an emerging water desalination technology whose principle lies in ion electrosorption at the surface of a pair of electrically charged electrodes. The aim of this study was to obtain the best performance of a CDI cell made of activated carbon as the active material for water desalination. In this work, electrodes of different active layer thicknesses were fabricated from a slurry of activated carbon deposited on graphite sheets. The as-prepared electrodes were characterized by cyclic voltammetry, and their physical properties were also studied using SEM and DRX. A CDI cell was fabricated with nine pairs of electrodes with the highest specific capacitance. The effect of the flow rate on the electrochemical performance of the CDI cell operating in charge–discharge electrochemical cycling was analyzed. We obtained a specific absorption capacity (SAC) of 10.2 mg/g and a specific energetic consumption (SEC) of 217.8 Wh/m3 at a flow rate of 55 mL/min. These results were contrasted with those available in the literature; in addition, other parameters such as Neff and SAR, which are necessary for the characterization and optimal operating conditions of the CDI cell, were analyzed. The findings from this study lay the groundwork for future research and increase the existing knowledge on CDI based on activated carbon electrodes.

arXiv Open Access 2022
Applying Deep Reinforcement Learning to the HP Model for Protein Structure Prediction

Kaiyuan Yang, Houjing Huang, Olafs Vandans et al.

A central problem in computational biophysics is protein structure prediction, i.e., finding the optimal folding of a given amino acid sequence. This problem has been studied in a classical abstract model, the HP model, where the protein is modeled as a sequence of H (hydrophobic) and P (polar) amino acids on a lattice. The objective is to find conformations maximizing H-H contacts. It is known that even in this reduced setting, the problem is intractable (NP-hard). In this work, we apply deep reinforcement learning (DRL) to the two-dimensional HP model. We can obtain the conformations of best known energies for benchmark HP sequences with lengths from 20 to 50. Our DRL is based on a deep Q-network (DQN). We find that a DQN based on long short-term memory (LSTM) architecture greatly enhances the RL learning ability and significantly improves the search process. DRL can sample the state space efficiently, without the need of manual heuristics. Experimentally we show that it can find multiple distinct best-known solutions per trial. This study demonstrates the effectiveness of deep reinforcement learning in the HP model for protein folding.

en cs.LG, q-bio.BM
arXiv Open Access 2021
An enzymatic hormesis box

Michael Grinfeld

We present a simple enzymatic system that is capable of a biphasic response under competitive inhibition. This is arguably the simplest system that can be said to be hormetic

en q-bio.BM, q-bio.SC
arXiv Open Access 2019
Multi-scale Molecular Simulations on Respiratory Complex I

Ville R. I. Kaila

Complex I (NADH:ubiquinone oxidoreductase) is a redox-driven proton pump that powers synthesis of adenosine triphosphate (ATP) and active transport in most organisms. This gigantic enzyme reduces quinone (Q) to quinol (QH2) in its hydrophilic domain, and transduces the released free energy into pumping of protons across its membrane domain, up to ca. 200 Å away from its active Q-reduction site. Recently resolved molecular structures of complex I from several species have made it possible for the first time to address the energetics and dynamics of the complete complex I using multi-scale methods of computational biochemistry. Here it is described how molecular simulations can provide important mechanistic insights into the function of the remarkable pumping machinery in complex I and stimulate new experiments.

en physics.bio-ph, q-bio.BM
arXiv Open Access 2019
Resource-Efficient Quantum Algorithm for Protein Folding

Anton Robert, Panagiotis Kl. Barkoutsos, Stefan Woerner et al.

Predicting the three-dimensional (3D) structure of a protein from its primary sequence of amino acids is known as the protein folding (PF) problem. Due to the central role of proteins' 3D structures in chemistry, biology and medicine applications (e.g., in drug discovery) this subject has been intensively studied for over half a century. Although classical algorithms provide practical solutions, sampling the conformation space of small proteins, they cannot tackle the intrinsic NP-hard complexity of the problem, even reduced to its simplest Hydrophobic-Polar model. While fault-tolerant quantum computers are still beyond reach for state-of-the-art quantum technologies, there is evidence that quantum algorithms can be successfully used on Noisy Intermediate-Scale Quantum (NISQ) computers to accelerate energy optimization in frustrated systems. In this work, we present a model Hamiltonian with $\mathcal{O}(N^4)$ scaling and a corresponding quantum variational algorithm for the folding of a polymer chain with $N$ monomers on a tetrahedral lattice. The model reflects many physico-chemical properties of the protein, reducing the gap between coarse-grained representations and mere lattice models. We use a robust and versatile optimisation scheme, bringing together variational quantum algorithms specifically adapted to classical cost functions and evolutionary strategies (genetic algorithms), to simulate the folding of the 10 amino acid Angiotensin peptide on 22 qubits. The same method is also successfully applied to the study of the folding of a 7 amino acid neuropeptide using 9 qubits on an IBM Q 20-qubit quantum computer. Bringing together recent advances in building gate-based quantum computers with noise-tolerant hybrid quantum-classical algorithms, this work paves the way towards accessible and relevant scientific experiments on real quantum processors.

en quant-ph, q-bio.BM
arXiv Open Access 2017
Energetic costs, precision, and efficiency of a biological motor in cargo transport

Wonseok Hwang, Changbong Hyeon

Molecular motors play pivotal roles in organizing the interior of cells. A motor efficient in cargo transport would move along cytoskeletal filaments with a high speed and a minimal error in transport distance (or time) while consuming a minimal amount of energy. The travel distance of the motor and its variance are, however, physically constrained by the free energy being consumed. A recently formulated \emph{thermodynamic uncertainty relation} offers a theoretical framework for the energy-accuracy trade-off relation ubiquitous in biological processes. According to the relation, a measure $\mathcal{Q}$, the product between the heat dissipated from a motor and the squared relative error in the displacement, has a minimal theoretical bound ($\mathcal{Q} \geq 2 k_B T$), which is approached when the time trajectory of the motor is maximally regular for a given amount of free energy input. Here, we use $\mathcal{Q}$ to quantify the transport efficiency of biological motors. Analyses on the motility data from several types of molecular motors reveal that $\mathcal{Q}$ is a complex function of ATP concentration and load ($f$). For kinesin-1, $\mathcal{Q}$ approaches the theoretical bound at $f\approx 4$ pN and over a broad range of ATP concentration (1 $μ$M - 10 mM), and is locally minimized at [ATP] $\approx$ 200 $μ$M. In stark contrast, this local minimum vanishes for a mutant that has a longer neck-linker, and the value of $\mathcal{Q}$ is significantly greater, which underscores the importance of molecular structure. Transport efficiencies of the biological motors studied here are semi-optimized under the cellular condition ([ATP] $\approx 1$ mM, $f=0-1$ pN). Our study indicates that among many possible directions of optimization, cytoskeletal motors are designed to operate at a high speed with a minimal error while leveraging their energy resources.

en physics.bio-ph, cond-mat.soft
arXiv Open Access 2014
M{ö}ssbauer characterization of an unusual high-spin side-on peroxo-Fe3+ species in the active site of superoxide reductase from Desulfoarculus Baarsii. Density functional calculations on related models

Olivier Horner, Jean-Marie Mouesca, Jean-Louis Oddou et al.

Superoxide reductase (SOR) is an Fe protein that catalyzes the reduction of superoxide to give H(2)O(2). Recently, the mutation of the Glu47 residue into alanine (E47A) in the active site of SOR from Desulfoarculus baarsii has allowed the stabilization of an iron-peroxo species when quickly reacted with H(2)O(2) [Math{é} et al. (2002) J. Am. Chem. Soc. 124, 4966-4967]. To further investigate this non-heme peroxo-iron species, we have carried out a M{ö}ssbauer study of the (57)Fe-enriched E47A SOR from D. baarsii reacted quickly with H(2)O(2). Considering the M{ö}ssbauer data, we conclude, in conjunction with the other spectroscopic data available and with the results of density functional calculations on related models, that this species corresponds to a high-spin side-on peroxo-Fe(3+) complex. This is one of the first examples of such a species in a biological system for which M{ö}ssbauer parameters are now available: delta(/Fe) = 0.54 (1) mm/s, DeltaE(Q) = -0.80 (5) mm/s, and the asymmetry parameter eta = 0.60 (5) mm/s. The M{ö}ssbauer and spin Hamiltonian parameters have been evaluated on a model from the side-on peroxo complex (model 2) issued from the oxidized iron center in SOR from Pyrococcus furiosus, for which structural data are available in the literature [Yeh et al. (2000) Biochemistry 39, 2499-2508]. For comparison, similar calculations have been carried out on a model derived from 2 (model 3), where the [CH(3)-S](1)(-) group has been replaced by the neutral [NH(3)](0) group [Neese and Solomon (1998) J. Am. Chem. Soc. 120, 12829-12848]. Both models 2 and 3 contain a formally high-spin Fe(3+) ion (i.e., with empty minority spin orbitals). We found, however, a significant fraction (approximately 0.6 for 2, approximately 0.8 for 3) of spin (equivalently charge) spread over two occupied (minority spin) orbitals. The quadrupole splitting value for 2 is found to be negative and matches quite well the experimental value. The computed quadrupole tensors are rhombic in the case of 2 and axial in the case of 3. This difference originates directly from the presence of the thiolate ligand in 2. A correlation between experimental isomer shifts for Fe(3+) mononuclear complexes with computed electron densities at the iron nucleus has been built and used to evaluate the isomer shift values for 2 and 3 (0.56 and 0.63 mm/s, respectively). A significant increase of isomer shift value is found upon going from a methylthiolate to a nitrogen ligand for the Fe(3+) ion, consistent with covalency effects due to the presence of the axial thiolate ligand. Considering that the isomer shift value for 3 is likely to be in the 0.61-0.65 mm/s range [Horner et al. (2002) Eur. J. Inorg. Chem., 3278-3283], the isomer shift value for a high-spin eta(2)-O(2) Fe(3+) complex with an axial thiolate group can be estimated to be in the 0.54-0.58 mm/s range. The occurrence of a side-on peroxo intermediate in SOR is discussed in relation to the recent data published for a side-on peroxo-Fe(3+) species in another biological system [Karlsson et al. (2003) Science 299, 1039-1042].

en physics.chem-ph, q-bio.BM
arXiv Open Access 2014
Virus-Encoded Ribonucleotide Reductases

Claus Bornemann

Ribonucleotide reductases are encoded by many viruses, but without other enzymes of nucleotide metabolism of no obvious use. A look at the enzymes' molecular properties and their possible mutator action may give clues.

en q-bio.BM, q-bio.PE
arXiv Open Access 2011
Protein Models Comparator: Scalable Bioinformatics Computing on the Google App Engine Platform

Paweł Widera, Natalio Krasnogor

The comparison of computer generated protein structural models is an important element of protein structure prediction. It has many uses including model quality evaluation, selection of the final models from a large set of candidates or optimisation of parameters of energy functions used in template-free modelling and refinement. Although many protein comparison methods are available online on numerous web servers, they are not well suited for large scale model comparison: (1) they operate with methods designed to compare actual proteins, not the models of the same protein, (2) majority of them offer only a single pairwise structural comparison and are unable to scale up to a required order of thousands of comparisons. To bridge the gap between the protein and model structure comparison we have developed the Protein Models Comparator (pm-cmp). To be able to deliver the scalability on demand and handle large comparison experiments the pm-cmp was implemented "in the cloud". Protein Models Comparator is a scalable web application for a fast distributed comparison of protein models with RMSD, GDT TS, TM-score and Q-score measures. It runs on the Google App Engine (GAE) cloud platform and is a showcase of how the emerging PaaS (Platform as a Service) technology could be used to simplify the development of scalable bioinformatics services. The functionality of pm-cmp is accessible through API which allows a full automation of the experiment submission and results retrieval. Protein Models Comparator is free software released on the Affero GNU Public Licence and is available with its source code at: http://www.infobiotics.org/pm-cmp This article presents a new web application addressing the need for a large-scale model-specific protein structure comparison and provides an insight into the GAE (Google App Engine) platform and its usefulness in scientific computing.

en cs.CE, cs.DC
arXiv Open Access 2008
Evolution of the genetic code. Emergence of DNA

Denis A. Semenov

This hypothesis can provide an opportunity to trace logically the process of the emergence of the DNA double helix. AT-enrichment in this hypothesis is main factor of evolution of DNA double helix from RNA double helix.

en q-bio.BM, q-bio.PE

Halaman 2 dari 15049