Gully formation is a significant driver of soil erosion and land degradation worldwide and often leads to important downstream impacts. Nonetheless, our understanding of the global patterns and the factors controlling this process remains limited. Here, we present the first global assessment of gully density's spatial patterns. Using mapped observations from over 17,000 representative study sites worldwide, we trained random forest models that simulate both the susceptibility to gullying at a 1 km2 resolution and the corresponding gully head density (GHD). Through an interpretable machine learning framework, we demonstrate that global GHD patterns result from a combination of environmental factors with non-linear interactions, leading to significant regional variations in the dominant factors controlling GHD. We distinguish between gully hotspots driven primarily by natural factors such as topography, geomorphology, tectonics, pedology or climate and those where land use and land cover play a dominant role. Based on these insights, we identified critical global areas of gully erosion, i.e., hotspots where gully occurrence is likely highly sensitive to anthropogenic drivers. These include the Chinese Loess Plateau, the Ethiopian Highlands, and large parts of the Mediterranean and Sahel regions. Also desert regions are often characterized by high GHDs. However, in these cases, their occurrence is mainly driven by natural factors. The insights we provide are valuable to inform land management and targeted erosion mitigation strategies.
PHILEMON UTEN EMMOH, christopher ifeanyi Eke, Timothy Moses
Selection of important features is very vital in machine learning tasks involving high-dimensional dataset with large features. It helps in reducing the dimensionality of a dataset and improving model performance. Most of the feature selection techniques have restriction in the kind of dataset to be used. This study proposed a feature selection technique that is based on statistical lift measure to select important features from a dataset. The proposed technique is a generic approach that can be used in any binary classification dataset. The technique successfully determined the most important feature subset and outperformed the existing techniques. The proposed technique was tested on lungs cancer dataset and happiness classification dataset. The effectiveness of the proposed technique in selecting important features subset was evaluated and compared with other existing techniques, namely Chi-Square, Pearson Correlation and Information Gain. Both the proposed and the existing techniques were evaluated on five machine learning models using four standard evaluation metrics such as accuracy, precision, recall and F1-score. The experimental results of the proposed technique on lung cancer dataset shows that logistic regression, decision tree, adaboost, gradient boost and random forest produced a predictive accuracy of 0.919%, 0.935%, 0.919%, 0.935% and 0.935% respectively, and that of happiness classification dataset produced a predictive accuracy of 0.758%, 0.689%, 0.724%, 0.655% and 0.689% on random forest, k-nearest neighbor, decision tree, gradient boost and cat boost respectively, which outperformed the existing techniques.
An effective energy management strategy (EMS) is essential to optimize the energy efficiency of electric vehicles (EVs). With the advent of advanced machine learning techniques, the focus on developing sophisticated EMS for EVs is increasing. Here, we introduce LearningEMS: a unified framework and open-source benchmark designed to facilitate rapid development and assessment of EMS. LearningEMS is distinguished by its ability to support a variety of EV configurations, including hybrid EVs, fuel cell EVs, and plug-in EVs, offering a general platform for the development of EMS. The framework enables detailed comparisons of several EMS algorithms, encompassing imitation learning, deep reinforcement learning (RL), offline RL, model predictive control, and dynamic programming. We rigorously evaluated these algorithms across multiple perspectives: energy efficiency, consistency, adaptability, and practicability. Furthermore, we discuss state, reward, and action settings for RL in EV energy management, introduce a policy extraction and reconstruction method for learning-based EMS deployment, and conduct hardware-in-the-loop experiments. In summary, we offer a unified and comprehensive framework that comes with three distinct EV platforms, over 10 000 km of EMS policy data set, ten state-of-the-art algorithms, and over 160 benchmark tasks, along with three learning libraries. Its flexible design allows easy expansion for additional tasks and applications. The open-source algorithms, models, data sets, and deployment processes foster additional research and innovation in EV and broader engineering domains.
Ashraf abdallah, Bara' Al-MISTAREHI, Amir SHTAYAT
Agriculture is a vital component of Egypt's economy; therefore, using Digital Elevation Models (DEMs) in agricultural planning in Egypt has significant benefits regarding water management, site appropriateness assessment, flood risk mitigation, and infrastructure construction. It is also essential for planners to make more informed decisions, optimize resource allocation, and support sustainable farming practices. This research paper investigates the accuracy of obtaining DEM data from four free global models (STRM30, ALOS30, COP30, and TanDEM-X90). The global DEM data has been compared to an actual GNSS-RTK DEM data surveyed onsite for two agricultural block areas in Aswan, the southern Government of Egypt. The two blocks are a part of a national project. For Block I and II, the RMSE of the Model STRM30 was 2.92 m and 3.59 m, respectively, indicating a poorer solution. Regarding accuracy, the ALOS30 model ranks third, reporting an RMSE of 2.58 m for block II and 3.30 m for block I. COP30 has an RMSE value of 1.06 m for blocks I and II and.91 m overall. TanDEM-X90 is the most accurate model in this investigation; block I provided an RMSE of 0.90 m with an SD of 0.58 m (SD95% = 0.38 m). After removing the anomalies, the model's stated RMSE for block II was 0.34 m, with an SD value of 0.62 m and 1.03 m. According to the classification using machine learning algorithms, with an accuracy of 84.7% for block I and 85% for block II, TanDEM-X90 is the best solution.
This paper introduces an innovative EEG sensor-based computational framework that establishes a pioneering nexus between personality trait quantification and neural dynamics, leveraging biosignal processing of brainwave activity to elucidate their intrinsic influence on cognitive health and oscillatory brain rhythms. By employing electroencephalography (EEG) recordings from 21 participants undergoing the Trier Social Stress Test (TSST), we propose a machine learning (ML)-driven methodology to decode the Big Five personality traits—Extraversion (Ex), Agreeableness (A), Neuroticism (N), Conscientiousness (C), and Openness (O)—using classification algorithms such as support vector machine (SVM) and multilayer perceptron (MLP) applied to 64-electrode EEG sensor data. A novel multiphase neurocognitive analysis across the TSST stages (baseline, mental arithmetic, job interview, and recovery) systematically evaluates the bidirectional relationship between personality traits and stress-induced neural responses. The proposed framework reveals significant negative correlations between frontal–temporal theta–beta ratio (TBR) and self-reported Extraversion, Conscientiousness, and Openness, indicating faster stress recovery and higher cognitive resilience in individuals with elevated trait scores. The binary classification model achieves high accuracy (88.1% Ex, 94.7% A, 84.2% N, 81.5% C, and 93.4% O), surpassing the current benchmarks in personality neuroscience. These findings empirically validate the close alignment between personality constructs and neural oscillatory patterns, highlighting the potential of EEG-based sensing and machine-learning analytics for personalized mental-health monitoring and human-centric AI systems attuned to individual neurocognitive profiles.
Machine learning (ML) has become a cornerstone of critical applications, but its vulnerability to data poisoning attacks threatens system reliability and trustworthiness. Prior studies have begun to investigate the impact of data poisoning and proposed various defense or evaluation methods; however, most efforts remain limited to quantifying performance degradation, with little systematic comparison of internal behaviors across model architectures under attack and insufficient attention to interpretability for revealing model vulnerabilities. To tackle this issue, we build a reproducible evaluation pipeline and emphasize the importance of integrating robustness with interpretability in the design of secure and trustworthy ML systems. To be specific, we propose a unified poisoning evaluation framework that systematically compares traditional ML models, deep neural networks, and large language models under three representative attack strategies including label flipping, random corruption, and adversarial insertion, at escalating severity levels of 30%, 50%, and 75%, and integrate LIME-based explanations to trace the evolution of model reasoning. Experimental results demonstrate that traditional models collapse rapidly under label noise, whereas Bayesian LSTM hybrids and large language models maintain stronger resilience. Further interpretability analysis uncovers attribution failure patterns, such as over-reliance on neutral tokens or misinterpretation of adversarial cues, providing insights beyond accuracy metrics.
Abstract Minimal change disease (MCD) is a common cause of nephrotic syndrome. Due to its rapid progression, early detection is essential; however, definitive diagnosis requires invasive kidney biopsy. This study aims to develop non-invasive predictive models for diagnosing MCD by machine learning. We retrospectively collected data on demographic characteristics, blood tests, and urine tests from patients with nephrotic syndrome who underwent kidney biopsy. We applied four machine learning algorithms—TabPFN, LightGBM, Random Forest, and Artificial Neural Network—and logistic regression. We compared their performance using stratified 5-repeated 5-fold cross-validation for the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Variable importance was evaluated using the SHapley Additive exPlanations (SHAP) method. A total of 248 patients were included, with 82 cases (33%) were diagnosed with MCD. TabPFN demonstrated the best performance with an AUROC of 0.915 (95% CI 0.896–0.932) and an AUPRC of 0.840 (95% CI 0.807–0.872). The SHAP methods identified C3, total cholesterol, and urine red blood cells as key predictors for TabPFN, consistent with previous reports. Machine learning models could be valuable non-invasive diagnostic tools for MCD.
Abstract Background Psychiatry faces a challenge due to the lack of objective biomarkers, as current assessments are based on subjective evaluations. Automated speech analysis shows promise in detecting symptom severity in depressed patients. This project aimed to identify discriminating speech features between patients with major depressive disorder (MDD) and healthy controls (HCs) by examining associations with symptom severity measures. Methods Forty-four MDD patients from the Psychiatry Department, University Hospital Aachen, Germany and fifty-two HCs were recruited. Participants described positive and negative life events, which were recorded for analysis. The Beck Depression Inventory (BDI-II) and the Hamilton Rating Scale for Depression gauged depression severity. Transcribed audio recordings underwent feature extraction, including acoustics, speech rate, and content. Machine learning models including speech features and neuropsychological assessments, were used to differentiate between the MDD patients and HCs. Results Acoustic variables such as pitch and loudness differed significantly between the MDD patients and HCs (effect sizes 𝜼2 between 0.183 and 0.3, p < 0.001). Furthermore, variables pertaining to temporality, lexical richness, and speech sentiment displayed moderate to high effect sizes (𝜼2 between 0.062 and 0.143, p < 0.02). A support vector machine (SVM) model based on 10 acoustic features showed a high performance (AUC = 0.93) in differentiating between HCs and patients with MDD, comparable to an SVM based on the BDI-II (AUC = 0.99, p = 0.01). Conclusions This study identified robust speech features associated with MDD. A machine learning model based on speech features yielded similar results to an established pen-and-paper depression assessment. In the future, these findings may shape voice-based biomarkers, enhancing clinical diagnosis and MDD monitoring.
While current research predominantly focuses on image-based colorization, the domain of video-based colorization remains relatively unexplored. Many existing video colorization techniques operate frame-by-frame, often overlooking the critical aspect of temporal coherence between successive frames. This approach can result in inconsistencies across frames, leading to undesirable effects like flickering or abrupt color transitions between frames. To address these challenges, we combine the generative capabilities of a fine-tuned latent diffusion model with an autoregressive conditioning mechanism to ensure temporal consistency in automatic speaker video colorization. We demonstrate strong improvements on established quality metrics compared to existing methods, namely, PSNR, SSIM, FID, FVD, NIQE and BRISQUE. Specifically, we achieve an 18% improvement in performance when FVD is employed as the evaluation metric. Furthermore, we performed a subjective study, where users preferred LatentColorization to the existing state-of-the-art DeOldify 80% of the time. Our dataset combines conventional datasets and videos from television/movies. A short demonstration of our results can be seen in some example videos available at <uri>https://youtu.be/vDbzsZdFuxM</uri>.
The cohesion of an object-oriented class refers to the relatedness of its methods and attributes. Constructors, destructors, and access methods are special types of methods featuring unique characteristics that can artificially affect class cohesion quantification. Methods within a class can also directly or transitively invoke each other, representing another cohesion aspect not considered by most existing cohesion measures. The impact of considering special methods (SPs) and transitive relations (TRs) in cohesion measurement on the abilities of the measures to predict inheritance reusability has yet to be investigated. In this paper, we empirically explored this effect. We applied a statistical technique to test the significance of the cohesion value changes across seven scenarios of ignoring or considering SPs and TRs. In addition, we applied a machine learning-based technique to build inheritance reusability prediction models using each of the considered measures and scenarios, evaluated the classification performance of the prediction models, and statistically compared the inheritance reusability prediction results. The results show that for most of the considered measures, the ignorance/consideration of SPs and TRs changed the cohesion values and the corresponding prediction significantly. Based on the study findings, when building inheritance reusability prediction models, software engineers are advised to 1) combine cohesion with other quality factors; 2) exclude the TRs from cohesion quantification; and 3) decide whether to consider or ignore SPs in cohesion quantification based on the selected measure(s) to be used in the prediction model, as this decision differs from one measure to another.
Zishwa Muhammad Jauhar Nafis, Rahmatun Nazilla, Rega Nugraha
et al.
Seiring dengan perkembangan jumlah penggunaan Internet of Things yang terus meningkat dan meluas. Ancaman keamanan pada jaringan IoT juga meningkat. Terdapat beberapa teknik yang diterapkan untuk mengatasi ancaman keamanan ini. Salah satunya adalah teknik untuk mengklasifikasi suatu aktivitas yang termasuk dalam serangan atau bukan beserta jenis serangannya. Machine learning dapat dimanfaatkan untuk proses pengklasifikasian ini. Diantara algoritma machine learning yang dapat digunakan untuk penelitian ini adalah pendekatan algoritma Decision Tree dan K-Nearest Neighbor. Penelitian ini bertujuan untuk mendapatkan hasil klasifikasi terbaik untuk mendeteksi jenis serangan jaringan IoT baik dalam klasifikasi biner maupun klasifikasi multikleas. Dalam penelitian ini memanfaatkan Dataset Edge-IIoTset Cyber Security Dataset of IoT & IIoT. Hasil nilai evaluasi yang didapatkan menunjukkan bahwa performa algoritma Decision Tree lebih baik dibandingkan dengan Algoritma KNN. Dengan selisih nilai presisi, recall, F1-score, dan akurasi secara berurutan adalah 0.15, 0.18, 0.17 dan 0.08 dalam klasifikasi biner. Sedangkan dalam klasifikasi multikelas mendapatkan nilai selisih antar kedua algoritma sebesar 0.26, 0.20, 0.22, dan 0.23 secara berurutan untuk presisi, recall, F1-score, dan akurasi.
Abstract Background Precision healthcare has entered a new era because of the developments in personalized medicine, especially in the diagnosis and treatment of head and neck squamous cell carcinoma (HNSCC). This paper explores the dynamic landscape of personalized medicine as applied to HNSCC, encompassing both current developments and future prospects. Recent Findings The integration of personalized medicine strategies into HNSCC diagnosis is driven by the utilization of genetic data and biomarkers. Epigenetic biomarkers, which reflect modifications to DNA that can influence gene expression, have emerged as valuable indicators for early detection and risk assessment. Treatment approaches within the personalized medicine framework are equally promising. Immunotherapy, gene silencing, and editing techniques, including RNA interference and CRISPR/Cas9, offer innovative means to modulate gene expression and correct genetic aberrations driving HNSCC. The integration of stem cell research with personalized medicine presents opportunities for tailored regenerative approaches. The synergy between personalized medicine and technological advancements is exemplified by artificial intelligence (AI) and machine learning (ML) applications. These tools empower clinicians to analyze vast datasets, predict patient responses, and optimize treatment strategies with unprecedented accuracy. Conclusion The developments and prospects of personalized medicine in HNSCC diagnosis and treatment offer a transformative approach to managing this complex malignancy. By harnessing genetic insights, biomarkers, immunotherapy, gene editing, stem cell therapies, and advanced technologies like AI and ML, personalized medicine holds the key to enhancing patient outcomes and ushering in a new era of precision oncology.
Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Abstract The existing static voltage stability margin evaluation methods cannot meet the actual demand of current power grid well in terms of calculation speed and accuracy. Thus, this paper proposes a static voltage stability margin prediction method based on a graph attention network (GAT) and a long short‐term memory network (LSTM) to predict the static voltage stability margin of a power system accurately, fast, and effectively, considering new energy uncertainty. First, an innovative machine learning framework named the GAT‐LSTM is designed to extract highly representative power grid operation features considering the spatial‐temporal correlation of the power grid operation. Then, a static voltage stability margin prediction method based on the GAT‐LSTM is developed. Particularly, considering the influence of new energy power uncertainty, two loss functions of certainty and uncertainty are used in the proposed method to predict the voltage stability margin and voltage fluctuation range. Finally, the IEEE39‐bus power system and a practical power system are employed to verify the proposed method. The results show that the computational speed of the proposed method is greatly improved compared to the traditional methods not based on machine learning; the computation results are more accurate and reliable than the existing machine learning methods. Compared with the existing methods, the proposed method has higher scalability and applicability.
As the popularity of electric vehicles (EVs) and smart grids continues to rise, so does the demand for batteries. Within the landscape of battery-powered energy storage systems, the battery management system (BMS) is crucial. It provides key functions such as battery state estimation (including state of charge, state of health, battery safety, and thermal management) as well as cell balancing. Its primary role is to ensure safe battery operation. However, due to the limited memory and computational capacity of onboard chips, achieving this goal is challenging, as both theory and practical evidence suggest. Given the immense amount of battery data produced over its operational life, the scientific community is increasingly turning to cloud computing for data storage and analysis. This cloud-based digital solution presents a more flexible and efficient alternative to traditional methods that often require significant hardware investments. The integration of machine learning is becoming an essential tool for extracting patterns and insights from vast amounts of observational data. As a result, the future points towards the development of a cloud-based artificial intelligence (AI)-enhanced BMS. This will notably improve the predictive and modeling capacity for long-range connections across various timescales, by combining the strength of physical process models with the versatility of machine learning techniques.
Michela Prunella, Roberto Maria Scardigno, Domenico Buongiorno
et al.
Automatic vision-based inspection systems have played a key role in product quality assessment for decades through the segmentation, detection, and classification of defects. Historically, machine learning frameworks, based on hand-crafted feature extraction, selection, and validation, counted on a combined approach of parameterized image processing algorithms and explicated human knowledge. The outstanding performance of deep learning (DL) for vision systems, in automatically discovering a feature representation suitable for the corresponding task, has exponentially increased the number of scientific articles and commercial products aiming at industrial quality assessment. In such a context, this article reviews more than 220 relevant articles from the related literature published until February 2023, covering the recent consolidation and advances in the field of fully-automatic DL-based surface defects inspection systems, deployed in various industrial applications. The analyzed papers have been classified according to a bi-dimensional taxonomy, that considers both the specific defect recognition task and the employed learning paradigm. The dependency on large and high-quality labeled datasets and the different neural architectures employed to achieve an overall perception of both well-visible and subtle defects, through the supervision of fine or/and coarse data annotations have been assessed. The results of our analysis highlight a growing research interest in defect representation power enrichment, especially by transferring pre-trained layers to an optimized network and by explaining the network decisions to suggest trustworthy retention or rejection of the products being evaluated.
The steering mechanism of ship steering gear is generally driven by a hydraulic system. The precise control of the hydraulic cylinder in the steering mechanism can be achieved by the target rudder angle. However, hydraulic systems are often described as nonlinear systems with uncertainties. Since the system parameters are uncertain and system performances are influenced by disturbances and noises, the robustness cannot be satisfied by approximating the nonlinear theory by a linear theory. In this paper, a learning-based model predictive controller (LB-MPC) is designed for the position control of an electro-hydraulic cylinder system. In order to reduce the influence of uncertainty of the hydraulic system caused by the model mismatch, the Gaussian process (GP) is adopted, and also the real-time input and output data are used to improve the model. A comparative simulation of GP-MPC and MPC is performed assuming that the interference and uncertainty terms are bounded. Consequently, the proposed control strategy can effectively improve the piston position quickly and precisely with multiple constraint conditions.
Johannes Leiner, Vincent Pellissier, Sebastian König
et al.
Abstract Background Severe acute respiratory infections (SARI) are the most common infectious causes of death. Previous work regarding mortality prediction models for SARI using machine learning (ML) algorithms that can be useful for both individual risk stratification and quality of care assessment is scarce. We aimed to develop reliable models for mortality prediction in SARI patients utilizing ML algorithms and compare its performances with a classic regression analysis approach. Methods Administrative data (dataset randomly split 75%/25% for model training/testing) from years 2016–2019 of 86 German Helios hospitals was retrospectively analyzed. Inpatient SARI cases were defined by ICD-codes J09-J22. Three ML algorithms were evaluated and its performance compared to generalized linear models (GLM) by computing receiver operating characteristic area under the curve (AUC) and area under the precision-recall curve (AUPRC). Results The dataset contained 241,988 inpatient SARI cases (75 years or older: 49%; male 56.2%). In-hospital mortality was 11.6%. AUC and AUPRC in the testing dataset were 0.83 and 0.372 for GLM, 0.831 and 0.384 for random forest (RF), 0.834 and 0.382 for single layer neural network (NNET) and 0.834 and 0.389 for extreme gradient boosting (XGBoost). Statistical comparison of ROC AUCs revealed a better performance of NNET and XGBoost as compared to GLM. Conclusion ML algorithms for predicting in-hospital mortality were trained and tested on a large real-world administrative dataset of SARI patients and showed good discriminatory performances. Broad application of our models in clinical routine practice can contribute to patients’ risk assessment and quality management.