Hasil untuk "Biotechnology"

Menampilkan 20 dari ~1000864 hasil · dari arXiv, DOAJ, Semantic Scholar, CrossRef

JSON API
arXiv Open Access 2026
The hidden structure of innovation networks

Lorenzo Emer, Anna Gallo, Mattia Marzi et al.

Innovation emerges from complex collaboration patterns - among inventors, firms, or institutions. However, not much is known about the overall mesoscopic structure around which inventive activity self-organizes. Here, we tackle this problem by employing patent data to analyze both individual (co-inventorship) and organization (co-ownership) networks in three strategic domains (artificial intelligence, biotechnology and semiconductors). We characterize the mesoscale structure (in terms of clusters) of each domain by comparing two alternative methods: a standard baseline - modularity maximization - and one based on the minimization of the Bayesian Information Criterion, within the Stochastic Block Model and its degree-corrected variant. We find that, across sectors, inventor networks are denser and more clustered than organization ones - consistent with the presence of small recurrent teams embedded into broader institutional hierarchies - whereas organization networks have neater hierarchical role-based structures, with few bridging firms coordinating the most peripheral ones. We also find that the discovered meso-structures are connected to innovation output. In particular, Lorenz curves of forward citations show a pervasive inequality in technological influence: across sectors and methods, both inventor (especially) and organization networks consistently show high levels of concentration of citations in a few of the discovered clusters. Our results demonstrate that the baseline modularity-based method may not be capable of fully capturing the way collaborations drive the spreading of inventive impact across technological domains. This is due to the presence of local hierarchies that call for more refined tools based on Bayesian inference.

en econ.GN, physics.soc-ph
arXiv Open Access 2025
Mitigating Hallucinations in Zero-Shot Scientific Summarisation: A Pilot Study

Imane Jaaouine, Ross D. King

Large language models (LLMs) produce context inconsistency hallucinations, which are LLM generated outputs that are misaligned with the user prompt. This research project investigates whether prompt engineering (PE) methods can mitigate context inconsistency hallucinations in zero-shot LLM summarisation of scientific texts, where zero-shot indicates that the LLM relies purely on its pre-training data. Across eight yeast biotechnology research paper abstracts, six instruction-tuned LLMs were prompted with seven methods: a baseline prompt, two levels of increasing instruction complexity (PE-1 and PE-2), two levels of context repetition (CR-K1 and CR-K2), and two levels of random addition (RA-K1 and RA-K2). Context repetition involved the identification and repetition of K key sentences from the abstract, whereas random addition involved the repetition of K randomly selected sentences from the abstract, where K is 1 or 2. A total of 336 LLM-generated summaries were evaluated using six metrics: ROUGE-1, ROUGE-2, ROUGE-L, BERTScore, METEOR, and cosine similarity, which were used to compute the lexical and semantic alignment between the summaries and the abstracts. Four hypotheses on the effects of prompt methods on summary alignment with the reference text were tested. Statistical analysis on 3744 collected datapoints was performed using bias-corrected and accelerated (BCa) bootstrap confidence intervals and Wilcoxon signed-rank tests with Bonferroni-Holm correction. The results demonstrated that CR and RA significantly improve the lexical alignment of LLM-generated summaries with the abstracts. These findings indicate that prompt engineering has the potential to impact hallucinations in zero-shot scientific summarisation tasks.

en cs.CL, cs.AI
arXiv Open Access 2025
Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity

Zaixi Zhang, Souradip Chakraborty, Amrit Singh Bedi et al.

The rapid adoption of generative artificial intelligence (GenAI) in the biosciences is transforming biotechnology, medicine, and synthetic biology. Yet this advancement is intrinsically linked to new vulnerabilities, as GenAI lowers the barrier to misuse and introduces novel biosecurity threats, such as generating synthetic viral proteins or toxins. These dual-use risks are often overlooked, as existing safety guardrails remain fragile and can be circumvented through deceptive prompts or jailbreak techniques. In this Perspective, we first outline the current state of GenAI in the biosciences and emerging threat vectors ranging from jailbreak attacks and privacy risks to the dual-use challenges posed by autonomous AI agents. We then examine urgent gaps in regulation and oversight, drawing on insights from 130 expert interviews across academia, government, industry, and policy. A large majority ($\approx 76$\%) expressed concern over AI misuse in biology, and 74\% called for the development of new governance frameworks. Finally, we explore technical pathways to mitigation, advocating a multi-layered approach to GenAI safety. These defenses include rigorous data filtering, alignment with ethical principles during development, and real-time monitoring to block harmful requests. Together, these strategies provide a blueprint for embedding security throughout the GenAI lifecycle. As GenAI becomes integrated into the biosciences, safeguarding this frontier requires an immediate commitment to both adaptive governance and secure-by-design technologies.

en cs.CR, q-bio.BM
DOAJ Open Access 2025
Genome-wide identification of potassium transporters and channels in Malus domestica genome

Muhammad Waqas, Habibullah Nadeem, Ayshah Aysh Alrashidi et al.

Abstract Potassium (K+) is an essential nutrient for plants. It contributes to most physiological and biochemical pathways for plant metabolism, growth, and development. It is the most available plant nutrient, comprising 10–15% of plant weight. Plants have a sophisticated system of K+ transporters and channels for distribution in plant body. Apple is one of the most consumed fruits in the world. Its fruit quality and yield are positively affected by K+. However, limited information is available about K+ transport systems in Apple. In this study, 47 candidate genes (26 K+ transporters and 21 K+ channels) have been identified in Apple (Malus domestica) genome. The phylogenetic comparisons with other plants (Glycine max, Arabidopsis thaliana, and Oryza sativa) indicated that the K+ transport system is much conserved among different plants. The analysis of Gene structure showed the presence of specific introns and exon patterns for these gene families. Transcriptomic data analysis and RT-qPCR demonstrated significant variations in the transcript abundance of these genes in response to abiotic stresses. The current project represents the first report about the K+ transport system in Apple. Therefore, it may act as a starting point for further functional characterizations.

Medicine, Science
DOAJ Open Access 2025
Optimization and comparative study of Bacillus consortia for cellulolytic potential and cellulase enzyme activity

Chukwuma Ogechukwu Bose, Rafatullah Mohd, Kapoor Riti Thapar et al.

Lignocellulosic biomass, owing to its recalcitrant nature, requires a consortium of enzymes for its breakdown. The present study deals with the isolation of cellulolytic bacterial strains from landfill leachate collected from the Pulau Burung landfill site of Penang, Malaysia, and consortia were constructed to test their cellulolytic efficiency. The dinitro salicylate method was used for the estimation of enzyme activity, and consortia were compared with promising bacterial strains. The combined potential of promising bacterial strains was optimized at varying experimental conditions to detect their maximum cellulolytic activity. The results showed that eight bacterial strains reflected hydrolytic activities, and these were identified by 16S rDNA sequence as Bacillus subtilis, Bacillus pumilus, Bacillus proteolyticus, Bacillus paramycoides, Bacillus cereus, Bacillus altitudinis, Bacillus niacin, and Bacillus thuringiensis. Consortia A included Bacillus proteolyticus, Bacillus subtilis, Bacillus pumilus, and Bacillus paramycoides and reflected high thermophilic inclination as the optimal temperature was 45°C at pH 6 with the highest cellulase activity of 0.90 U/ml. Consortia B included Bacillus cereus, Bacillus altitudinis, Bacillus niacin, and Bacillus thuringiensis and showed a cellulase activity of 0.78 U/ml at 38°C and pH 6. The results reflected the significant potential of these Bacillus strains and consortia in the breakdown of cellulose into useful end products. The consortia further proved that a synergistic relationship was more favourable for bioconversion processes.

Biology (General)
DOAJ Open Access 2025
TRPML1 ion channel promotes HepaRG cell differentiation under simulated microgravity conditions

Huancai Fan, Dongyuan Lü, Zheng Lu et al.

Abstract Stem cell differentiation must be regulated by intricate and complex interactions between cells and their surrounding environment, ensuring normal organ and tissue morphology such as the liver1. Though it is well acknowledged that microgravity provides necessary mechanical force signals for cell fate2, how microgravity affects growth, differentiation, and communication is still largely unknown due to the lack of real experimental scenarios and reproducibility tools. Here, Rotating Flat Chamber (RFC) was used to simulate ground-based microgravity effects to study how microgravity effects affect the differentiation of HepaRG (hepatic progenitor cells) cells. Unexpectedly, the results show that RFC conditions could promote HepaRG cell differentiation which exhibited increased expression of Alpha-fetoprotein (AFP), albumin (ALB), and Recombinant Cytokeratin 18 (CK18). Through screening a series of mechanical receptors, the ion channel TRPML1 was critical for promoting the differentiation effect under RFC conditions. Once TRPML1 was activated by stimulated microgravity effects, the concentration of lysosomal calcium ions was increased to activate the Wnt/β-catenin signaling pathway, which finally led to enhanced cell differentiation of HepaRG cells. In addition, the cytoskeleton was remodeled under RFC conditions to influence the expression of PI (3,5) P2, which is the best-known activator of TRPML1. In summary, our findings have established a mechanism by which simulated microgravity promotes the differentiation of HepaRG cells through the TRPML1 signaling pathway, which provides a potential target for the regulation of hepatic stem/progenitor cell differentiation and embryonic liver development under real microgravity conditions.

Biotechnology, Physiology
arXiv Open Access 2024
Normalized topological indices discriminate between architectures of branched macromolecules

Domen Vaupotič, Jules Morand, Luca Tubiana et al.

Branching architecture characterizes numerous systems, ranging from synthetic (hyper)branched polymers and biomolecules such as lignin, amylopectin, and nucleic acids to tracheal and neuronal networks. Its ubiquity reflects the many favourable properties that arise because of it. For instance, branched macromolecules are spatially compact and have a high surface functionality, which impacts their phase characteristics and self-assembly behaviour, among others. The relationship between branching and physical properties has been studied by mapping macromolecules to mathematical trees whose architecture can be characterized using topological indices. These indices, however, do not allow for a comparison of macromolecules that map to trees of different size, be it due to different mapping procedures or differences in their molecular weight. To alleviate this, we introduce a novel normalization of topological indices using estimates of their probability density functions. We determine two optimal normalized topological indices and construct a phase space that enables a robust discrimination between different architectures of branched macromolecules. We demonstrate the necessity of such a phase space on two practical applications, one being ribonucleic acid (RNA) molecules with various branching topologies and the other different methods of coarse-graining branched macromolecules. Our approach can be applied to any type of branched molecules and extended as needed to other topological indices, making it useful across a wide range of fields where branched molecules play an important role, including polymer physics, green chemistry, bioengineering, biotechnology, and medicine.

en cond-mat.soft
arXiv Open Access 2024
Data Augmentation Scheme for Raman Spectra with Highly Correlated Annotations

Christoph Lange, Isabel Thiele, Lara Santolin et al.

In biotechnology Raman Spectroscopy is rapidly gaining popularity as a process analytical technology (PAT) that measures cell densities, substrate- and product concentrations. As it records vibrational modes of molecules it provides that information non-invasively in a single spectrum. Typically, partial least squares (PLS) is the model of choice to infer information about variables of interest from the spectra. However, biological processes are known for their complexity where convolutional neural networks (CNN) present a powerful alternative. They can handle non-Gaussian noise and account for beam misalignment, pixel malfunctions or the presence of additional substances. However, they require a lot of data during model training, and they pick up non-linear dependencies in the process variables. In this work, we exploit the additive nature of spectra in order to generate additional data points from a given dataset that have statistically independent labels so that a network trained on such data exhibits low correlations between the model predictions. We show that training a CNN on these generated data points improves the performance on datasets where the annotations do not bear the same correlation as the dataset that was used for model training. This data augmentation technique enables us to reuse spectra as training data for new contexts that exhibit different correlations. The additional data allows for building a better and more robust model. This is of interest in scenarios where large amounts of historical data are available but are currently not used for model training. We demonstrate the capabilities of the proposed method using synthetic spectra of Ralstonia eutropha batch cultivations to monitor substrate, biomass and polyhydroxyalkanoate (PHA) biopolymer concentrations during of the experiments.

en cs.LG, q-bio.QM
arXiv Open Access 2024
Prediction by Machine Learning Analysis of Genomic Data Phenotypic Frost Tolerance in Perccottus glenii

Lilin Fan, Xuqing Chai, Zhixiong Tian et al.

Analysis of the genome sequence of Perccottus glenii, the only fish known to possess freeze tolerance, holds significant importance for understanding how organisms adapt to extreme environments, Traditional biological analysis methods are time-consuming and have limited accuracy, To address these issues, we will employ machine learning techniques to analyze the gene sequences of Perccottus glenii, with Neodontobutis hainanens as a comparative group, Firstly, we have proposed five gene sequence vectorization methods and a method for handling ultra-long gene sequences, We conducted a comparative study on the three vectorization methods: ordinal encoding, One-Hot encoding, and K-mer encoding, to identify the optimal encoding method, Secondly, we constructed four classification models: Random Forest, LightGBM, XGBoost, and Decision Tree, The dataset used by these classification models was extracted from the National Center for Biotechnology Information database, and we vectorized the sequence matrices using the optimal encoding method, K-mer, The Random Forest model, which is the optimal model, achieved a classification accuracy of up to 99, 98 , Lastly, we utilized SHAP values to conduct an interpretable analysis of the optimal classification model, Through ten-fold cross-validation and the AUC metric, we identified the top 10 features that contribute the most to the model's classification accuracy, This demonstrates that machine learning methods can effectively replace traditional manual analysis in identifying genes associated with the freeze tolerance phenotype in Perccottus glenii.

en cs.LG
arXiv Open Access 2024
FoodMem: Near Real-time and Precise Food Video Segmentation

Ahmad AlMughrabi, Adrián Galán, Ricardo Marques et al.

Food segmentation, including in videos, is vital for addressing real-world health, agriculture, and food biotechnology issues. Current limitations lead to inaccurate nutritional analysis, inefficient crop management, and suboptimal food processing, impacting food security and public health. Improving segmentation techniques can enhance dietary assessments, agricultural productivity, and the food production process. This study introduces the development of a robust framework for high-quality, near-real-time segmentation and tracking of food items in videos, using minimal hardware resources. We present FoodMem, a novel framework designed to segment food items from video sequences of 360-degree unbounded scenes. FoodMem can consistently generate masks of food portions in a video sequence, overcoming the limitations of existing semantic segmentation models, such as flickering and prohibitive inference speeds in video processing contexts. To address these issues, FoodMem leverages a two-phase solution: a transformer segmentation phase to create initial segmentation masks and a memory-based tracking phase to monitor food masks in complex scenes. Our framework outperforms current state-of-the-art food segmentation models, yielding superior performance across various conditions, such as camera angles, lighting, reflections, scene complexity, and food diversity. This results in reduced segmentation noise, elimination of artifacts, and completion of missing segments. Here, we also introduce a new annotated food dataset encompassing challenging scenarios absent in previous benchmarks. Extensive experiments conducted on MetaFood3D, Nutrition5k, and Vegetables & Fruits datasets demonstrate that FoodMem enhances the state-of-the-art by 2.5% mean average precision in food video segmentation and is 58 x faster on average.

en cs.CV
DOAJ Open Access 2024
Development of functional cookies form wheat-pumpkin seed based composite flour

Feriehiwote Weldeyohanis Gebremariam, Eneyew Tadesse Melaku, Venkatesa Prabhu Sundramurthy et al.

To develop high quality cookies, even seemingly smallest changes depended on factors that can affect taste, texture, and nutritional value. In this light, this study aimed to investigate the upshot of refined wheat flour and pumpkin seed flour on properties of cookies such as antioxidant activity, thermal and oxidative stability. In view of the foregoing, the roasted pumpkin seeds of particle size below 500 μm were blended with wheat flour at different ratios (BR) to bake at selected pre-determined temperatures (T) and time durations (TD). The synergetic effect of aforesaid parameters on cookie development, BR, T, and TD was studied by varying the parameters between the range 6–15 %, 180–200 °C and from 8 to 12 min, respectively, for the baking process of cookies. Further, the process was modelled and scrutinized using numerical optimization to achieve a highly acceptable product. On that account, it was deduced that the optimal condition for BR, T, and TD were 12.87 %, 186 °C and 9.5 min, respectively, that could pave to beget the excellent quality cookies with overall acceptance score of 8, protein content 14.28 %, fat 17.85 %, ash 2.23 %, moisture 2.46 %, fiber 2.38 % and total color difference 12.01. The optimized cookies (OCs) were found to have higher protein (11.49–14.28 %), fiber (0.93–2.41 %), ash (2.19–1.77 %), total antioxidant activity (38.7158–43.1860 %), oxidative stability (28.61–51.24 h), Zn (1.42–2.63 mg/100g), and Fe (2.12–3.20 mg/100g) content as compared to the control. Laconically, the study results provided the optimized processing condition for developing high quality cookies with respect to improved nutritional value and comparable overall acceptability.

Science (General), Social sciences (General)
arXiv Open Access 2023
Towards a modeling, optimization and predictive control framework for fed-batch metabolic cybergenetics

Sebastián Espinel-Ríos, Bruno Morabito, Johannes Pohlodek et al.

Biotechnology offers many opportunities for the sustainable manufacturing of valuable products. The toolbox to optimize bioprocesses includes \textit{extracellular} process elements such as the bioreactor design and mode of operation, medium formulation, culture conditions, feeding rates, etc. However, these elements are frequently insufficient for achieving optimal process performance or precise product composition. One can use metabolic and genetic engineering methods for optimization at the intracellular level. Nevertheless, those are often of static nature, failing when applied to dynamic processes or if disturbances occur. Furthermore, many bioprocesses are optimized empirically and implemented with little-to-no feedback control to counteract disturbances. The concept of cybergenetics has opened new possibilities to optimize bioprocesses by enabling online modulation of the gene expression of metabolism-relevant proteins via external inputs (e.g., light intensity in optogenetics). Here, we fuse cybergenetics with model-based optimization and predictive control for optimizing dynamic bioprocesses. To do so, we propose to use dynamic constraint-based models that integrate the dynamics of metabolic reactions, resource allocation, and inducible gene expression. We formulate a model-based optimal control problem to find the optimal process inputs. Furthermore, we propose using model predictive control to address uncertainties via online feedback. We focus on fed-batch processes, where the substrate feeding rate is an additional optimization variable. As a simulation example, we show the optogenetic control of the ATPase enzyme complex for dynamic modulation of enforced ATP wasting to adjust product yield and productivity.

en eess.SY, math.OC
arXiv Open Access 2022
Perspectives for self-driving labs in synthetic biology

Hector Garcia Martin, Tijana Radivojevic, Jeremy Zucker et al.

Self-driving labs (SDLs) combine fully automated experiments with artificial intelligence (AI) that decides the next set of experiments. Taken to their ultimate expression, SDLs could usher a new paradigm of scientific research, where the world is probed, interpreted, and explained by machines for human benefit. While there are functioning SDLs in the fields of chemistry and materials science, we contend that synthetic biology provides a unique opportunity since the genome provides a single target for affecting the incredibly wide repertoire of biological cell behavior. However, the level of investment required for the creation of biological SDLs is only warranted if directed towards solving difficult and enabling biological questions. Here, we discuss challenges and opportunities in creating SDLs for synthetic biology.

en q-bio.OT
arXiv Open Access 2022
Fourier Representations for Black-Box Optimization over Categorical Variables

Hamid Dadkhahi, Jesus Rios, Karthikeyan Shanmugam et al.

Optimization of real-world black-box functions defined over purely categorical variables is an active area of research. In particular, optimization and design of biological sequences with specific functional or structural properties have a profound impact in medicine, materials science, and biotechnology. Standalone search algorithms, such as simulated annealing (SA) and Monte Carlo tree search (MCTS), are typically used for such optimization problems. In order to improve the performance and sample efficiency of such algorithms, we propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To this end, we present two different representations, a group-theoretic Fourier expansion and an abridged one-hot encoded Boolean Fourier expansion. To learn such representations, we consider two different settings to update our surrogate model. First, we utilize an adversarial online regression setting where Fourier characters of each representation are considered as experts and their respective coefficients are updated via an exponential weight update rule each time the black box is evaluated. Second, we consider a Bayesian setting where queries are selected via Thompson sampling and the posterior is updated via a sparse Bayesian regression model (over our proposed representation) with a regularized horseshoe prior. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods, which achieve competitive or superior performance compared to state-of-the-art counterparts, while improving the computation cost and/or sample efficiency, substantially.

en cs.LG, cs.AI

Halaman 17 dari 50044