Rainfall prediction efforts had been prevalent ever since the impact of climate change on occurrences of natural disasters globally. Implementation of machine and deep learning techniques on features that contribute to rainfall occurrences were conducted with aims of seeking greater prediction accuracy for rainfall occurrences with a lack of study for significance of features in rainfall occurrence prediction. This study presents a framework of rainfall prediction features' significance analysis in the case study of Peninsular Malaysia rainfall occurrences. Features investigated in this study consist of temperature, humidity and wind speed. The designed framework for the investigation includes phases of data collection, data preprocessing, integration of random forest (RF) for ensemble classification and feature importance (FI) for feature significance calculation and finally model evaluation based on the metrics of precision, recall, F1 score and receiver operating characteristic (ROC) curve. In the preliminary investigation, the prediction model demonstrated accuracy, precision, recall and F1-score of 80.65%, 80%, 81% and 0.80 respectively. Humidity was found to have highest significance to the model's predictive power as compared to temperature and wind speed. Rainfall occurrence correlation with lower temperature and higher humidity and vice versa was identified with further investigation of feature data distribution against rainfall occurrences.
Electronic computers. Computer science, Information technology
Dinara A. Osmonalieva , Chinara R. Kulueva , Ibrokhimbek D. Nasirkhodjaev
et al.
We dwelt on the directions of product quality management in the conditions of climate-smart agriculture with the application of corporate information systems based on machine learning. The revealed aspects are used with different results within different countries and territories, which have individual climate characteristics and approaches to agricultural production. Specifics of agriculture could be unified with innovative climate-smart approaches to certain processes. We showed that such synthesis allows creating the most optimal solutions to reduce the climate footprint and raise productivity. The considered countries have a potential for further creation of intellectual digital solutions to improve quality management in climate-smart agriculture. This would help achieve results to ensure national and global food security. The goal of this paper was to reveal the key features of using the means of machine learning in corporate information systems to ensure product quality management in climate-smart agriculture. The scientific novelty of this research consisted in the improvement of theoretical and practical substantiation of the forms of interaction between parties that are interested in an increase in the efficiency of climate-smart agriculture with the use of machine learning tools. The main research methods were a systemic approach, comparison of advantages and disadvantages, statistical analysis, and ranking method.
Ali Ghias-Nodoushan, Alireza Sedighi-Anaraki, Mohammad Rasoul Jannesar
et al.
Abstract As global energy demand continues to rise and the need to transition from fossil fuels becomes increasingly urgent, integrating solar farms efficiently into power grids presents a significant challenge. This study introduces a novel graph-theoretic framework for designing optimal interconnection networks among distributed solar farms. By utilizing Prim’s algorithm to construct a minimum spanning tree, the proposed method effectively reduces transmission losses and infrastructure costs. The performance of this deterministic approach is benchmarked against Particle Swarm Optimization (PSO), a widely applied metaheuristic technique. To assess network robustness under potential line failures, a new graph-based reliability metric is developed. Case studies involving a cluster of solar farms demonstrate that Prim’s algorithm outperforms PSO in minimizing both power losses and capital investment, while also offering higher topological reliability. Although PSO achieves better load balancing, the graph-based approach proves more effective for loss-sensitive and cost-driven design scenarios. The proposed framework naturally accommodates constraints such as terrain limitations and is scalable to hybrid renewable energy systems. By integrating classical graph theory with practical power system considerations, this work offers a computationally efficient and economically viable solution for the optimal physical integration of large-scale solar energy infrastructure. The proposed methodology also lays a foundation for future integration of AI and machine learning techniques to enable dynamic network optimization under uncertainty.
Haowen Xu,1,* Yifan Xu,2,* Xueyi Wang,3 Zhisheng Yan,4 Ting Geng,5 Jinpeng Wu,2 Yongxin Li,1 Mingjin Guo1 1Department of Vascular Surgery, The Affiliated Hospital of Qingdao University, Qingdao, 266000, People’s Republic of China; 2Department of Neurosurgery, The Affiliated Hospital of Qingdao University, Qingdao, 266000, People’s Republic of China; 3Department of Vascular Surgery, Rongcheng City People’s Hospital, Rongcheng, 264300, People’s Republic of China; 4Department of Interventional Medicine, The Eighth People’s Hospital of Qingdao, Qingdao, 266000, People’s Republic of China; 5Interventional Operating Room, The Eighth People’s Hospital of Qingdao, Qingdao, 266000, People’s Republic of China*These authors contributed equally to this workCorrespondence: Mingjin Guo, Department of Vascular Surgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao, 266000, People’s Republic of China, Email qduahvasc@163.com Yongxin Li, Department of Vascular Surgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao, 266000, People’s Republic of China, Email Li.yongxin@outlook.comPurpose: Atherosclerosis (AS) and calcific aortic valve disease (CAVD) are common in aging populations and share metabolic dysregulation, chronic inflammation, and cellular aging. Shared immunometabolic biomarkers and therapeutic targets remain insufficiently defined. This study aimed to identify Cross-disease biomarkers linking AS and CAVD and to explore their translational potential.Methods: Four Gene Expression Omnibus (GEO) microarray datasets related to AS and CAVD were integrated. Differentially expressed genes (DEGs) were identified within each disease, and Cross-disease genes (CGs) were obtained by intersecting DEGs across the two diseases. Functional enrichment and protein–protein interaction analyses were performed. Machine learning (LASSO and Random Forest) refined candidate biomarkers. Immune infiltration was estimated with CIBERSORT, and a microRNA–transcription factor regulatory network was constructed. Molecular docking screened small molecules targeting the hub gene. Diagnostic performance was evaluated in independent datasets, and expression was validated in human tissues by qPCR and Western blot.Results: We identified 147 CGs enriched in immune and metabolic pathways. Fructose-1,6-bisphosphatase 1 (FBP1) emerged as a hub gene with strong diagnostic value across datasets. FBP1 expression correlated with alterations in multiple immune cell populations and was embedded within a regulatory network of predicted microRNAs and transcription factors. Docking analysis highlighted apigenin and kaempferol as candidate FBP1-targeting compounds. Experimental validation confirmed FBP1 upregulation in AS and CAVD tissues.Discussion: FBP1 represents a shared immunometabolic biomarker and potential therapeutic target that links metabolic reprogramming to immune dysregulation in AS and CAVD. These findings provide a rationale for further translational studies evaluating FBP1-centered interventions.Keywords: atherosclerosis, calcific aortic valve disease, machine learning, immunology, molecular docking
A key aspect driving advancements in machine learning applications in medicine is the availability of publicly accessible datasets. Evidently, there are studies conducted in the past with promising results, but they are not reproducible due to the fact that the data used are closed or proprietary or the authors were not able to publish them. The current study aims to narrow this gap for researchers who focus on image recognition tasks in microbiology, specifically in fungal identification and classification. An open database named OpenFungi is made available in this work; it contains high-quality images of macroscopic and microscopic fungal genera. The fungal cultures were grown from food products such as green leaf spices and cereals. The quality of the dataset is demonstrated by solving a classification problem with a simple convolutional neural network. A thorough experimental analysis was conducted, where six performance metrics were measured in three distinct validation scenarios. The results obtained demonstrate that in the fungal species classification task, the model achieved an overall accuracy of 99.79%, a true-positive rate of 99.55%, a true-negative rate of 99.96%, and an F1 score of 99.63% on the macroscopic dataset. On the microscopic dataset, the model reached a 97.82% accuracy, a 94.89% true-positive rate, a 99.19% true-negative rate, and a 95.20% F1 score. The results also reveal that the model maintains promising performance even when trained on smaller datasets, highlighting its robustness and generalization capabilities.
BackgroundPeople living with HIV(PLWH) are a high-risk population for cancer. We conducted a pioneering study on the gut microbiota of PLWH with various types of cancer, revealing key microbiota.MethodsWe collected stool samples from 54 PLWH who have cancer (PLWH-C), including Kaposi’s sarcoma (KS, n=7), lymphoma (L, n=22), lung cancer (LC, n=12), and colorectal cancer (CRC, n=13), 55 PLWH who do not have cancer (PLWH-NC), and 49 people living without HIV (Ctrl). The gut microbiota in fecal samples was analyzed using 16S rRNA sequencing. We compared the microbial diversity among groups and identified key microbiota and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways using random forest. Furthermore, we analyzed the correlation between microbiota and KEGG pathways and constructed microbiota Receiver Operating Characteristic (ROC) diagnostic models.ResultsCompared with PLWH-NC and Ctrl, PLWH with any type of cancer exhibited significantly lower alpha diversity and significant alterations in beta diversity of the gut microbiota. The significantly decreased abundance of Bacteroides and Bacteroides vulgatus in PLWH-C showed a negative correlation with the Pathways in cancer pathway, and a positive correlation with Choline metabolism in cancer, Central carbon metabolism in cancer, and Proteoglycans in cancer pathways. Bacteroides (AUC≥0.84) and Bacteroides vulgatus (AUC≥0.78) exhibited discriminatory diagnostic capabilities for PLWH-C in patients with different cancers compared with PLWH-NC and Ctrl.DiscussionWe confirmed a more severe dysbiosis of the gut microbiota in PLWH with KS, L, LC, or CRC. Bacteroides may be associated with disruptions in cancer-related metabolic pathways and serve as diagnostic biomarkers for PLWH with various cancers.
Christos Sgouropoulos, Christos Nikou, Stefanos Vlachos
et al.
Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.
Abstract This paper investigates the Dynamic Flexible Job Shop Scheduling Problem (DFJSP), which is based on new job insertion, machine breakdowns, changes in processing time, and considering the state of Automated Guided Vehicles (AGVs). The objective is to minimize the maximum completion time and improve on-time completion rates. To address the continuous production status and learn the most suitable actions (scheduling rules) at each rescheduling point, a Dueling Double Deep Q Network (D3QN) is developed to solve this problem. To improve the quality of the model solutions, a MachineRank algorithm (MR) is proposed, and based on the MR algorithm, seven composite scheduling rules are introduced. These rules aim to select and execute the optimal operation each time an operation is completed or a new disturbance occurs. Additionally, eight general state features are proposed to represent the scheduling status at the rescheduling point. By using continuous state features as the input to the D3QN, state-action values (Q-values) for each scheduling rule can be obtained. Numerical experiments were conducted on a large number of instances with different production configurations, and the results demonstrated the superiority and generality of the D3QN compared to various composite rules, other advanced scheduling rules, and standard Q-learning agents. The effectiveness and rationality of the dynamic scheduling trigger rules were also validated.
Novel psychoactive substances (NPSs) are compounds plotted to modify the chemical structures of prohibited substances, offering alternatives for consumption and evading legislation. The prompt emergence of these substances presents challenges in health concerns and forensic assessment because of the lack of analytical standards. A viable alternative for establishing these standards involves leveraging in silico methods to acquire spectroscopic data. This study assesses the efficacy of utilizing infrared spectroscopy (IRS) data derived from density functional theory (DFT) for analyzing NPSs. Various functionals were employed to generate infrared spectra for five distinct NPS categories including the following: amphetamines, benzodiazepines, synthetic cannabinoids, cathinones, and fentanyls. PRISMA software was conceived to rationalize data management. Unsupervised learning techniques, including Hierarchical Cluster Analysis (HCA), Principal Component Analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE), were utilized to refine the assessment process. Our findings reveal no significant disparities among the different functionals used to generate infrared spectra data. Additionally, the application of unsupervised learning demonstrated adequate segregation of NPSs within their respective groups. In conclusion, integrating theoretical data and dimension reduction techniques proves to be a powerful strategy for evaluating the spectroscopic characteristics of NPSs. This underscores the potential of this combined methodology as a diagnostic tool for distinguishing IR spectra across various NPS groups, facilitating the evaluation of newly unknown compounds.
Neuroproteomics, an emerging field at the intersection of neuroscience and proteomics, has garnered significant attention in the context of neurotrauma research. Neuroproteomics involves the quantitative and qualitative analysis of nervous system components, essential for understanding the dynamic events involved in the vast areas of neuroscience, including, but not limited to, neuropsychiatric disorders, neurodegenerative disorders, mental illness, traumatic brain injury, chronic traumatic encephalopathy, and other neurodegenerative diseases. With advancements in mass spectrometry coupled with bioinformatics and systems biology, neuroproteomics has led to the development of innovative techniques such as microproteomics, single-cell proteomics, and imaging mass spectrometry, which have significantly impacted neuronal biomarker research. By analyzing the complex protein interactions and alterations that occur in the injured brain, neuroproteomics provides valuable insights into the pathophysiological mechanisms underlying neurotrauma. This review explores how such insights can be harnessed to advance personalized medicine (PM) approaches, tailoring treatments based on individual patient profiles. Additionally, we highlight the potential future prospects of neuroproteomics, such as identifying novel biomarkers and developing targeted therapies by employing artificial intelligence (AI) and machine learning (ML). By shedding light on neurotrauma’s current state and future directions, this review aims to stimulate further research and collaboration in this promising and transformative field.
Kerstin Denecke, Robin Glauser, Daniel Reichenpfader
Recent developments related to tools based on artificial intelligence (AI) have raised interests in many areas, including higher education. While machine translation tools have been available and in use for many years in teaching and learning, generative AI models have sparked concerns within the academic community. The objective of this paper is to identify the strengths, weaknesses, opportunities and threats (SWOT) of using AI-based tools (ABTs) in higher education contexts. We employed a mixed methods approach to achieve our objectives; we conducted a survey and used the results to perform a SWOT analysis. For the survey, we asked lecturers and students to answer 27 questions (Likert scale, free text, etc.) on their experiences and viewpoints related to AI-based tools in higher education. A total of 305 people from different countries and with different backgrounds answered the questionnaire. The results show that a moderate to high future impact of ABTs on teaching, learning and exams is expected by the participants. ABT strengths are seen as the personalization of the learning experience or increased efficiency via automation of repetitive tasks. Several use cases are envisioned but are still not yet used in daily practice. Challenges include skills teaching, data protection and bias. We conclude that research is needed to study the unintended consequences of ABT usage in higher education in particular for developing countermeasures and to demonstrate the benefits of ABT usage in higher education. Furthermore, we suggest defining a competence model specifying the required skills that ensure the responsible and efficient use of ABTs by students and lecturers.
Education (General), Theory and practice of education
Shashwati Geed, Shashwati Geed, Megan L. Grainger
et al.
Objective: This study aims to investigate the validity of machine learning-derived amount of real-world functional upper extremity (UE) use in individuals with stroke. We hypothesized that machine learning classification of wrist-worn accelerometry will be as accurate as frame-by-frame video labeling (ground truth). A second objective was to validate the machine learning classification against measures of impairment, function, dexterity, and self-reported UE use.Design: Cross-sectional and convenience sampling.Setting: Outpatient rehabilitation.Participants: Individuals (>18 years) with neuroimaging-confirmed ischemic or hemorrhagic stroke >6-months prior (n = 31) with persistent impairment of the hemiparetic arm and upper extremity Fugl-Meyer (UEFM) score = 12–57.Methods: Participants wore an accelerometer on each arm and were video recorded while completing an “activity script” comprising activities and instrumental activities of daily living in a simulated apartment in outpatient rehabilitation. The video was annotated to determine the ground-truth amount of functional UE use.Main outcome measures: The amount of real-world UE use was estimated using a random forest classifier trained on the accelerometry data. UE motor function was measured with the Action Research Arm Test (ARAT), UEFM, and nine-hole peg test (9HPT). The amount of real-world UE use was measured using the Motor Activity Log (MAL).Results: The machine learning estimated use ratio was significantly correlated with the use ratio derived from video annotation, ARAT, UEFM, 9HPT, and to a lesser extent, MAL. Bland–Altman plots showed excellent agreement between use ratios calculated from video-annotated and machine-learning classification. Factor analysis showed that machine learning use ratios capture the same construct as ARAT, UEFM, 9HPT, and MAL and explain 83% of the variance in UE motor performance.Conclusion: Our machine learning approach provides a valid measure of functional UE use. The accuracy, validity, and small footprint of this machine learning approach makes it feasible for measurement of UE recovery in stroke rehabilitation trials.
Thermal deformation of the spindle accounts for a large proportion of existing errors. After gathering data on thermal deformation through an experiment with a machine tool, AI algorithms were used in this study to predict the displacement of a cutting tool caused by heat deformation. Thermal displacement and temperature data were entered into models constructed using several machine learning algorithms. These models were then quantitatively evaluated in terms of their accuracy and compared to each other. Subsequently, transfer learning and hyperparameter tuning were conducted to produce a model with optimal prediction capability. The experimental results revealed that after machine learning models were trained using data collected on the first day of the experiments, their predictions based on data collected on the second day of the experiments were rife with severe prediction errors. This outcome indicated that experimental data gathered at different times weakened the models’ predictive abilities. Thus, to increase the prediction accuracy and prevent time from being wasted on repeated training, transfer learning were incorporated with model optimization. Finally, this approach achieved excellent R2 scores of 0.99941, 0.99964, and 0.99902 for the prediction of displacement in the x-, y-, and z-directions.
The implementation of process analytical technologies is positioned to play a critical role in advancing biopharmaceutical manufacturing by simultaneously resolving clinical, regulatory, and cost challenges. Raman spectroscopy is emerging as a key technology enabling in-line product quality monitoring, but laborious calibration and computational modeling efforts limit the widespread application of this promising technology. In this study, we demonstrate new capabilities for measuring product aggregation and fragmentation in real-time during a bioprocess intended for clinical manufacturing by applying hardware automation and machine learning data analysis methods. We reduced the effort needed to calibrate and validate multiple critical quality attribute models by integrating existing workflows into one robotic system. The increased data throughput resulting from this system allowed us to train calibration models that demonstrate accurate product quality measurements every 38 s. In-process analytics enable advanced process understanding in the short-term and will lead ultimately to controlled bioprocesses that can both safeguard and take necessary actions that guarantee consistent product quality.
Mahum Naseer, Bharath Srinivas Prabakaran, Osman Hasan
et al.
Performance of trained neural network (NN) models, in terms of testing accuracy, has improved remarkably over the past several years, especially with the advent of deep learning. However, even the most accurate NNs can be biased toward a specific output classification due to the inherent bias in the available training datasets, which may propagate to the real-world implementations. This paper deals with the robustness bias, i.e., the bias exhibited by the trained NN by having a significantly large robustness to noise for a certain output class, as compared to the remaining output classes. The bias is shown to result from imbalanced datasets, i.e., the datasets where all output classes are not equally represented. Towards this, we propose the UnbiasedNets framework, which leverages K-means clustering and the NN's noise tolerance to diversify the given training dataset, even from relatively smaller datasets. This generates balanced datasets and reduces the bias within the datasets themselves. To the best of our knowledge, this is the first framework catering to the robustness bias problem in NNs. We use real-world datasets to demonstrate the efficacy of the UnbiasedNets for data diversification, in case of both binary and multi-label classifiers. The results are compared to well-known tools aimed at generating balanced datasets, and illustrate how existing works have limited success while addressing the robustness bias. In contrast, UnbiasedNets provides a notable improvement over existing works, while even reducing the robustness bias significantly in some cases, as observed by comparing the NNs trained on the diversified and original datasets.
Category theory has been successfully applied in various domains of science, shedding light on universal principles unifying diverse phenomena and thereby enabling knowledge transfer between them. Applications to machine learning have been pursued recently, and yet there is still a gap between abstract mathematical foundations and concrete applications to machine learning tasks. In this paper we introduce DisCoPyro as a categorical structure learning framework, which combines categorical structures (such as symmetric monoidal categories and operads) with amortized variational inference, and can be applied, e.g., in program learning for variational autoencoders. We provide both mathematical foundations and concrete applications together with comparison of experimental performance with other models (e.g., neuro-symbolic models). We speculate that DisCoPyro could ultimately contribute to the development of artificial general intelligence.
The number of road casualties is steadily rising, while the age of driverless vehicles on the road is rapidly coming. Machine-to-machine (M2M) communication and the use of Big Data created by M2M communication have enormous promise for improving road safety. A training dataset-less Deep Learning strategy that uses only a safety model and optimizes it sequentially through M2M learning over time can prevent a lack of suitable Knowledge Base while also improving the capacity to handle unpredictable scenarios. The article outlines an M2M learning model based on in-vehicle sensors that can be used to reduce traffic accidents.
Schizophrenia is a chronic mental illness that leads the patient to hallucinations and delusions with a prevalence of 0.4% worldwide. The importance early detection of Schizophrenia is tracking the pre-syndrome of Schizophrenia during the active phase, and could reduce psychosis symptomatic. However, the method sometimes cannot detect the symptoms accurately. As an alternative, machine learning can be implemented on microarray data for early detection. This study aimed to implement three ensemble methods, i.e., Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost) to identify Schizophrenia. Hyperparameter tuning was performed to improve the performance of the models. Based on the results, we found that the model 6, which is developed by the XGBoost method, performs better than other models with the value of accuracy and F1-score are 0.87 and 0.87, respectively.