MixRx uses Large Language Models (LLMs) to classify drug combination interactions as Additive, Synergistic, or Antagonistic, given a multi-drug patient history. We evaluate the performance of 4 models, GPT-2, Mistral Instruct 2.0, and the fine-tuned counterparts. Our results showed a potential for such an application, with the Mistral Instruct 2.0 Fine-Tuned model providing an average accuracy score on standard and perturbed datasets of 81.5%. This paper aims to further develop an upcoming area of research that evaluates if LLMs can be used for biological prediction tasks.
The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.
Analyzing the codon usage frequencies of a specimen of 20 plants, for which the codon-anticodon pattern is known, we have remarked that the hierarchy of the usage frequencies present an almost "universal" behavior. Searching to explain this behavior, we assume that the codon usage probability results from the sum of two contributions: the first dominant term is an almost "universal" one and it depends on the codon-anticodon interaction; the second term is a local one, i.e. depends on the biological species. The codon-anticodon interaction is written as a spin-spin plus a z-spin term in the formalism of the crystal basis model. From general considerations, in particular from the choice of the signs and some constraints on the parameters defining the interaction, we are able to explain most of the observed data.
A MCMC approach is used to estimate the age-specific mortality rate ratio for German men and women with RA. For constructing priors, we calculate a range of admissible values from prevalence and incidence data based on about 60 million people in Germany. Using these priors, MCMC mimics and compares estimated mortality to the findings of a recent register study from Denmark. It is estimated that the mortality rate ratio is highest in the young ages (4.0 and 3.5 for men and women aged 17.5 years, respectively) and declines towards higher ages (1.0 and 1.2 for men and women aged 92.5 years, respectively). The lengths of the credibility intervals decrease from younger towards older ages.
The discovery of biomarker sets for a targeted pathway is a challenging problem in biomedical medicine, which is computationally prohibited on classical algorithms due to the massive search space. Here, I present a quantum algorithm named QuantAnts Machine to address the task. The proposed algorithm is a quantum analog of the classical Ant Colony Optimization (ACO). We create the mixture of multi-domain from genetic networks by representation theory, enabling the search of biomarkers from the multi-modality of the human genome. Although the proposed model can be generalized, we investigate the RAS-mutational activation in this work. To the end, QuantAnts Machine discovers rarely-known biomarkers in clinical-associated domain for RAS-activation pathway, including COL5A1, COL5A2, CCT5, MTSS1 and NCAPD2. Besides, the model also suggests several therapeutic-targets such as JUP, CD9, CD34 and CD74.
Here we examine the evolution of beta-2 microglobulin in terms of its hydropathic shapes, a theoretical construct that has revealed important trends. The dynamics of many proteins are largely driven by interactions between the protein itself and the thin water film that covers it. \b{eta}2m constitutes the basic building unit of the immunoglobulin superfamily; the evolution of its amino acid sequences from chickens to mice to humans provides new information about its multiple functions. Our hydrodynamic method involves concepts of topological shape evolution towards a critical point for optimized functions. The results are in excellent agreement with experiment for the details of the mouse-human evolution, as well as both the dangerous natural amyloid aggregation mutation D76N, and six other DN test mutations.
Mark R Baker, Elizabeth L Hawthorne, Jessica R Rogge
We present an open source model that allows quantitative prediction of the effects of testing on the rate of spread of COVID-19 described by R, the reproduction number, and on the degree of quarantine, isolation and lockdown required to limit it. The paper uses the model to quantify the outcomes of different test types and regimes, and to identify strategies and tests that can reduce the rate of spread and R value by a factor of between 1.67 and 33.3, reducing it to between 60% and 3% of the initial value.
CoV2019 has evolved to be much more dangerous than CoV2003. Experiments suggest that structural rearrangements dramatically enhance CoV2019 activity. We identify a new first stage of infection which precedes structural rearrangements by using biomolecular evolutionary theory to identify sequence differences enhancing viral attachment rates. We find a small cluster of mutations which show that CoV-2 has a new feature that promotes much stronger viral attachment and enhances contagiousness. The extremely dangerous dynamics of human coronavirus infection is a dramatic example of evolutionary approach of self-organized networks to criticality. It may favor a very successful vaccine. The identified mutations can be used to test the present theory experimentally.
The complex physicochemical structures and chemical reactions in living organism have some common features: (1) The life processes take place in the cytosol in the cells, which, from a physicochemical point of view is an emulsion of biomolecules in a dilute aqueous suspension. (2) All living systems are homochiral with respect to the units of amino acids and carbohydrates, but (some) proteins are chiral unstable in the cytosol. (3) And living organism are mortal. These three common features together give a prerequisite for the prebiotic self-assembly at the start of the Abiogenesis. Here we argue , that it all together indicates, that the prebiotic self-assembly of structures and reactions took place in a more saline environment, whereby the homochirality of proteins not only could be obtained, but also preserved. A more saline environment for the prebiotic self-assembly of organic molecules and establishment of biochemical reactions could have been the hydrothermal vents.
The CoHSI (Conservation of Hartley-Shannon Information) distribution is at the heart of a wide-class of discrete systems, defining (amongst other properties) the length distribution of their components. Discrete systems such as the known proteome, computer software and texts are all known to fit this distribution accurately. In a previous paper, we explored the properties of this distribution in detail. Here we will use these properties to show why the average length of components in general and proteins in particular is highly conserved, howsoever measured, demonstrating this on various aggregations of proteins taken from the UniProt database. We will go on to define departures from this equilibrium state, identifying fine structure in the average length of eukaryotic proteins that result from evolutionary processes.
Branko Dragovich, Andrei Yu. Khrennikov, Nataša Ž. Mišić
Ultrametric approach to the genetic code and the genome is considered and developed. $p$-Adic degeneracy of the genetic code is pointed out. Ultrametric tree of the codon space is presented. It is shown that codons and amino acids can be treated as $p$-adic ultrametric networks. Ultrametric modification of the Hamming distance is defined and noted how it can be useful. Ultrametric approach with $p$-adic distance is an attractive and promising trend towards investigation of bioinformation.
We study the build up of complexity on the example of 1 kg matter in different forms. We start on the simplest example of ideal gases, and then continue with more complex chemical, biological, life and social and technical structures. We assess the complexity of these systems quantitatively, based on their entropy. We present a method to attribute the same entropy to known physical systems and to complex organic molecules up to a DNA. The important steps in this program and the basic obstacles are discussed.
The evolution of terrestrial and aquatic wild type (WT) globins is dominated by changes in two proximate - distal Histidine ligand exit channels, here monitored quantitatively by hydropathic waves. These waves reveal allometric functional features inaccessible to single amino acid stereochemical contact models, and even very large all-atom Newtonian simulations. The evolutionary differences between these features between myoglobin and neuroglobin are related to the two oxidation channels through hydropathic wave analysis, which identifies subtle interspecies functional differences inaccessible to traditional size and metabolic scaling studies. Our analysis involves dynamic synchronization of allometric interactions across entire globins.
Antara Sengupta, Sk. Sarif Hassan, Pabitra Pal Choudhury
Proteins are macromolecules which hardly act alone; they need to make interactions with some other proteins to do so. Numerous factors are there which can regulate the interactions between proteins [4]. Here in this present study we aim to understand Protein -Protein Interactions (PPIs) of two proteins ABCB11 and ADA from quantitative point of view. One of our major aims also is to study the factors that regulate the PPIs and thus to distinguish these PPIs with proper quantification across the two species Homo Sapiens and Mus Musculus respectively to know how one protein interacts with different set of proteins in different species.
To truly eliminate Cartesian ghosts from the science of consciousness, we must describe consciousness as an aspect of the physical. Integrated Information Theory states that consciousness arises from intrinsic information generated by dynamical systems; however existing formulations of this theory are not applicable to standard models of fundamental physical entities. Modern physics has shown that fields are fundamental entities, and in particular that the electromagnetic field is fundamental. Here I hypothesize that consciousness arises from information intrinsic to fundamental fields. This hypothesis unites fundamental physics with what we know empirically about the neuroscience underlying consciousness, and it bypasses the need to consider quantum effects.
The mechanisms underlying major aspects of the human brain remain a mystery. It is unknown how verbal episodic memory is formed and integrated with sensory episodic memory. There is no consensus on the function and nature of dreaming. Here we present a theory for governing neural activity in the human brain. The theory describes the mechanisms for building memory traces for entities and explains how verbal memory is integrated with sensory memory. We infer that a core function of dreaming is to move charged particles such as calcium ions from the hippocampus to association areas to primary areas. We link a high level of calcium ions concentrations to Alzheimer's disease. We present a more precise definition of consciousness. Our results are a step forward in understanding the function and health of the human brain and provide the public with ways to keep a healthy brain.
The genetic code is connection between 64 codons, which are building blocks of the genes, and 20 amino acids, which are building blocks of the proteins. In addition to coding amino acids, a few codons code stop signal, which is at the end of genes, i.e. it terminates process of protein synthesis. This article is a review of simple modelling of the genetic code and related subjects by concept of p-adic distance. It also contains some new results. In particular, the article presents appropriate structure of the codon space, degeneration and possible evolution of the genetic code. p-Adic modelling of the genetic code is viewed as the first step in further application of p-adic tools in the information sector of life science.