David E. Goldberg, John H. Holland
Hasil untuk "machine learning"
Menampilkan 20 dari ~2962523 hasil · dari CrossRef, DOAJ
Rich Caruana
Kamesh R. Babu
Biprateep Dey, David Zhao, Brett H Andrews et al.
Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variable Y given complex inputs X . Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for all x , and when needed, to reshape the densities of y toward ‘instance-wise’ calibration. This paper introduces the local amortized diagnostics and reshaping of conditional densities (LADaR) framework and proposes a new computationally efficient algorithm ( Cal-PIT ) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs). Cal-PIT learns a single interpretable local probability–probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, where Cal-PIT achieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyzes ^9 .
Bhargav Teja Nallapu, Ali Ezzati, Helena M. Blumen et al.
ABSTRACT INTRODUCTION Understanding the heterogeneity of brain structure in individuals with the Motoric Cognitive Risk Syndrome (MCR) may improve the current risk assessments of dementia. METHODS We used data from six cohorts from the MCR consortium (N = 1987). A weakly‐supervised clustering algorithm called HYDRA (Heterogeneity through Discriminative Analysis) was applied to volumetric magnetic resonance imaging (MRI) measures to identify distinct subgroups in the population with gait speeds lower than one standard deviation (1SD) above mean. RESULTS Three subgroups (Groups A, B, and C) were identified through MRI‐based clustering with significant differences in regional brain volumes, gait speeds, and performance on Trail Making (Part‐B) and Free and Cued Selective Reminding Tests. DISCUSSION Based on structural MRI, our results reflect heterogeneity in the population with moderate and slow gait, including those with MCR. Such a data‐driven approach could help pave new pathways toward dementia at‐risk stratification and have implications for precision health for patients. Highlights Different patterns of brain atrophy were observed among the people with moderate and slow gait speeds Slower gait speeds were associated with substantial cortical atrophy, higher rates of Motoric Cognitive Risk Syndrome (MCR), and worse cognitive performance This approach can aid patient stratification at early asymptomatic stages and have implications for precision health.
Somporn Sahachaiseree, Takashi Oguchi
ABSTRACT Reinforcement learning (RL) is a promising machine‐learning solution to traffic signal control problems, which have been extensively studied. However, variants of non‐linear, deep artificial neural network (ANN) function approximators (FAs) have been predominantly employed in previous studies proposing RL‐based controllers, leaving a significant interpretability issue due to their black‐box nature. In this work, the use of the linear FA for a value‐based RL agent in traffic signal control problems is investigated along with the least‐squares Q‐learning method, abbreviated as LSTDQ. The interpretable linear FA was found to be adequate for the RL agent to learn an optimal policy. This leads to the proposal to replace a non‐linear ANN FA with the linear FA counterpart, resolving the interpretability issue. Moreover, the LSTDQ learning method shows superior behaviour convergence compared to a gradient descent method. In a low‐intensity arrival pattern scenario, the control by the RL agent cuts about half of the average delay resulting from the pretimed control. Owing to the conciseness of the linear FA, a direct interpretation analysis of the converged linear‐FA parameters is presented. Lastly, two online relearning tests of the agents under non‐stationary arrivals are conducted to demonstrate the online performance of LSTDQ. In conclusion, the linear‐FA specification and the LSTDQ method are together proposed to be used for its control algorithm interpretability property, superior convergence quality, and lack of hyperparameters.
Yiannis Kiouvrekis, Theodor Panagiotakopoulos
Electromagnetic field (EMF) exposure mapping is increasingly important for ensuring compliance with safety regulations, supporting the deployment of next-generation wireless networks, and addressing public health concerns. While numerous surveys have addressed specific aspects of radio propagation or radio environment maps, a comprehensive and unified overview of EMF mapping methodologies has been lacking. This review bridges that gap by systematically analyzing computational, geospatial, and machine learning approaches used for EMF exposure mapping across both wireless communication engineering and public health domains. A novel taxonomy is introduced to clarify overlapping terminology—encompassing radio maps, radio environment maps, and EMF exposure maps—and to classify construction methods, including analytical models, model-based interpolation, and data-driven learning techniques. In addition, the review highlights domain-specific challenges such as indoor versus outdoor mapping, data sparsity, and model generalization, while identifying emerging opportunities in hybrid modeling, big data integration, and explainable AI. By combining perspectives from communication engineering and public health, this work provides a broader and more interdisciplinary synthesis than previous surveys, offering a structured reference and roadmap for advancing robust, scalable, and socially relevant EMF mapping frameworks.
T. Nageshkumar, Prateek Shrivastava, L. Ammayapan et al.
Machine learning model coupled with graphical user interface was developed to predict mechanical properties of flax fiber. The experiment was conducted using test setup which applies constant rate of loading (CRL). Flax fiber was tested under five independent parameters i.e, type of fiber (Tf), moisture content (Mc), weight of sample (Ws), gauge length (Gl) and loading rate (Lr) with response variables, i.e., breaking load and elongation. In this study, a total of 432 patterns of input and output parameters obtained from laboratory experiments were used to develop machine learning algorithms (Random forest, support vector, and XGBoost). Among the machine learning models, random forest regressor yielded high R2 value, low mean squared error (MSE), and mean absolute error (MAE). The SHapley Additive exPlanations (SHAP) analysis was performed and found sample weight and gauge length were the most influential features for breaking load and elongation, respectively. The developed GUI, integrated with a random forest regressor, predicted breaking load and elongation with an error range of −2.5% to 2.3% for raw fiber and 1.5% to 6.5% for cleaned fiber. The developed GUI coupled random forest regressor can be used to predict the mechanical properties of fibers with ease.
Qingfeng Sun, Kai Zhang, Yuanlong Xu et al.
Abstract Background HIV/TB co-infection presents substantial public-health challenges, showing greater treatment-failure and mortality rates than tuberculosis alone. Recent advances in machine learning (ML) provide a robust means of identifying high-risk patients early in the disease course. Methods This retrospective study enrolled 359 patients co-infected with HIV and TB at a single tertiary-care hospital. We extracted clinical and immunological data. The cohort was subsequently divided into training (0%) and test (0%) subsets, and class imbalance was addressed with the Synthetic Minority Over-sampling Technique (SMOTE). Six ML classifiers—Random Forest, XGBoost, LightGBM, Support Vector Machine, Extra Trees and CatBoost—were trained after grid-search hyper-parameter tuning. Model performance was assessed with the area under the receiver-operating-characteristic curve (AUC), accuracy, recall, precision, specificity and F1-score. Multi-criteria ranking was then conducted with the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS). The leading model was interpreted using SHapley Additive exPlanations (SHAP). Results Overall, 304 of 359 patients (84.7%) had favourable outcomes, whereas 55 (15.3%) had unfavourable outcomes. LightGBM achieved the best overall performance (AUC = 0.771; accuracy = 84.72%; F1 = 0.522) and was ranked first by TOPSIS. SHAP analysis highlighted age, CD4 and CD8 counts, body-mass index and occupation as key predictors. Lower BMI, pronounced immunosuppression and older age were strongly associated with unfavourable outcomes, findings that align with established clinical evidence. Conclusion A gradient-boosted model (LightGBM) combined with SHAP interpretation demonstrated reliable predictive performance in HIV/TB co-infection and highlighted clinically actionable risk factors. Incorporating this tool into routine workflows could enable healthcare providers to identify high-risk individuals earlier, allocate resources more efficiently and, ultimately, improve TB-treatment success. Clinical trial registration Not applicable.
Al Amin Biswas
Nowadays, artificial intelligence (AI) has been utilized in several domains of the healthcare sector. Despite its effectiveness in healthcare settings, its massive adoption remains limited due to the transparency issue, which is considered a significant obstacle. To achieve the trust of end users, it is necessary to explain the AI models' output. Therefore, explainable AI (XAI) has become apparent as a potential solution by providing transparent explanations of the AI models' output. In this review paper, the primary aim is to review articles that are mainly related to machine learning (ML) or deep learning (DL) based human disease diagnoses, and the model's decision-making process is explained by XAI techniques. To do that, two journal databases (Scopus and the IEEE Xplore Digital Library) were thoroughly searched using a few predetermined relevant keywords. The PRISMA guidelines have been followed to determine the papers for the final analysis, where studies that did not meet the requirements were eliminated. Finally, 90 Q1 journal articles are selected for in-depth analysis, covering several XAI techniques. Then, the summarization of the several findings has been presented, and appropriate responses to the proposed research questions have been outlined. In addition, several challenges related to XAI in the case of human disease diagnosis and future research directions in this sector are presented.
Ajay Dadhich, Jaideep Patel, Rovin Tiwari et al.
Mind-wandering (MW) is when an individual’s concentration drifts away from the task or activity. Researchers found a greater variability in electroencephalogram (EEG) signals due to MW. Collecting more nuanced information from raw EEG data to examine the harmful effects of MW is time-consuming. This study proposes a multi-resolution assessment of EEG signals using the flexible analytic wavelet transform (FAWT). The FAWT algorithm decomposes raw EEG data into more representative sub-bands (SBs). Several statistical characteristics are derived from the obtained SBs, and the effects of MW during meditation on the EEG signals are investigated. A set of significant characteristics is chosen and fed into the machine learning modules using a 10-fold validation approach to detect MW subjects automatically. Our proposed framework attained the highest classification accuracy of 92.41%, the highest sensitivity of 93.56%, and the highest specificity of 91.97%. The proposed framework can be used to design a suitable brain-computer interface (BCI) system to reduce MW and increase meditation depth for holistic and long-term health in society.
Vasiliki Bikia, Georgios Rovas, Stamatia Pagoulatou et al.
Oqbah Salim Atiyah
Earthquakes are among the most dangerous natural disasters that can cause major losses to buildings and threaten human lives. The research community is very interested in the topic of earthquakes because they occur suddenly and predicting them is very important for human safety. Creating accurate earthquake prediction techniques by applying machine learning (ML) approaches will help save people's lives and prevent damage. To identify important features and analyze the correlation between these features before submitting them to classification models, we proposed a new feature selection approach in this paper which combines two filtering ways: Normalization which is based on the Chi-square approach and analysis of variance, and the correlation approach based on the logistic regression technique (CLR-AVCH). Accordingly, three algorithms are applied. Then a facilitated voting classifier is created that combines the two best models with the highest prediction accuracy (histogram-based gradient boosting, adaptive boosting) to create a single technique that includes the strengths of the techniques that were combined to help find important patterns in the acquired data to obtain a model capable of early prediction of earthquakes. The proposed work achieved higher accuracy, F1_score, recall, and precision (0.94, 0.92, 0.94, 0.92), respectively.
Shiyun Li, Omar Dib
The rapid expansion of the internet has led to a corresponding surge in malicious online activities, posing significant threats to users and organizations. Cybercriminals exploit malicious uniform resource locators (URLs) to disseminate harmful content, execute phishing schemes, and orchestrate various cyber attacks. As these threats evolve, detecting malicious URLs (MURLs) has become crucial for safeguarding internet users and ensuring a secure online environment. In response to this urgent need, we propose a novel machine learning-driven framework designed to identify known and unknown MURLs effectively. Our approach leverages a comprehensive dataset encompassing various labels—including benign, phishing, defacement, and malware—to engineer a robust set of features validated through extensive statistical analyses. The resulting malicious URL detection system (MUDS) combines supervised machine learning techniques, tree-based algorithms, and advanced data preprocessing, achieving a high detection accuracy of 96.83% for known MURLs. For unknown MURLs, the proposed framework utilizes CL_K-means, a modified k-means clustering algorithm, alongside two additional biased classifiers, achieving 92.54% accuracy on simulated zero-day datasets. With an average processing time of under 14 milliseconds per instance, MUDS is optimized for real-time integration into network endpoint systems. These outcomes highlight the efficacy and efficiency of the proposed MUDS in fortifying online security by identifying and mitigating MURLs, thereby reinforcing the digital landscape against cyber threats.
ZHANG Yifan, SONG Wei
When solving multi-objective optimization problems, particle swarm optimization algorithms usually employ preset example selection methods and search strategies, which cannot be adjusted according to specific optimization states. In the face of different optimization problems, inappropriate search strategies cannot effectively guide the population, resulting in low search performance of the population. To solve the above problems, a multi-objective particle swarm optimization algorithm guided by extreme learning decision network (ELDN-PSO) is proposed. First of all, the multi-objective optimization problem is decomposed into several scalar subproblems, and an extreme learning decision network is constructed. The network takes the particle position as input, and selects appropriate search actions for each particle according to the optimization state. The fitness change of a particle on the subproblem is obtained as the training sample for the reinforcement learning, and the training speed is improved by extreme learning machine. In the process of optimization, the network is automatically adjusted according to the optimization states, and it selects the appropriate search strategy for the particles at different search stages. Secondly, the non-dominated solutions in the multi-objective optimization problem are difficult to compare. Thus, the leadership of each solution is quantified into a comparable value, so that the examples are more clearly selected for the particles. In addition, an external archive is used to store better particles to maintain the quality of the solutions and guide the population. Comparative experiments are carried out on the ZDT and DTLZ test functions. The results show that ELDN-PSO can effectively cope with different Pareto front shapes, improving the optimization speed as well as the convergence and diversity of the solutions.
Hang Li
Heechan Han, Boran Kim, Kyunghun Kim et al.
Precipitation is one of the driving forces in water cycles, and it is vital for understanding the water cycle, such as surface runoff, soil moisture, and evapotranspiration. However, missing precipitation data at the observatory becomes an obstacle to improving the accuracy and efficiency of hydrological analysis. To address this issue, we developed a machine learning algorithm-based precipitation data recovery tool to detect and predict missing precipitation data at observatories. This study investigated 30 weather stations in South Korea, evaluating the applicability of machine learning algorithms (artificial neural network and random forest) for precipitation data recovery using environmental variables, such as air pressure, temperature, humidity, and wind speed. The proposed model showed a high performance in detecting the missing precipitation occurrence with an accuracy of 80%. In addition, the prediction results from the models showed predictive ability with a correlation coefficient ranging from 0.5 to 0.7 and R2 values of 0.53. Although both algorithms performed similarly in estimating precipitation, ANN performed slightly better. Based on the results of this study, we expect that the machine learning algorithms can contribute to improving hydrological modeling performance by recovering missing precipitation data at observation stations. HIGHLIGHTS Missing precipitation data is recovered using ANN and RF algorithms.; Air humidity and air pressure have a high correlation with precipitation occurrence.; Both models have high performance in detecting the precipitation occurrence.; ANN model has better performance than the RF model for recovering daily precipitation data in South Korea.;
Sun-Feel Yang, So-Won Choi, Eul-Bum Lee
The ongoing Russia–Ukraine conflict has exacerbated the global crisis of natural gas supply, particularly in Europe. During the winter season, major importers of liquefied natural gas (LNG), such as South Korea and Japan, were directly affected by fluctuating spot LNG prices. This study aimed to use machine learning (ML) to predict the Japan Korea Marker (JKM), a spot LNG price index, to reduce price fluctuation risks for LNG importers such as the Korean Gas Corporation (KOGAS). Hence, price prediction models were developed based on long short-term memory (LSTM), artificial neural network (ANN), and support vector machine (SVM) algorithms, which were used for time series data prediction. Eighty-seven variables were collected for JKM prediction, of which eight were selected for modeling. Four scenarios (scenarios A, B, C, and D) were devised and tested to analyze the effect of each variable on the performance of the models. Among the eight variables, JKM, national balancing point (NBP), and Brent price indexes demonstrated the largest effects on the performance of the ML models. In contrast, the variable of LNG import volume in China had the least effect. The LSTM model showed a mean absolute error (MAE) of 0.195, making it the best-performing algorithm. However, the LSTM model demonstrated a decreased in performance of at least 57% during the COVID-19 period, which raises concerns regarding the reliability of the test results obtained during that time. The study compared the ML models’ prediction performances with those of the traditional statistical model, autoregressive integrated moving averages (ARIMA), to verify their effectiveness. The comparison results showed that the LSTM model’s performance deviated by an MAE of 15–22%, which can be attributed to the constraints of the small dataset size and conceptual structural differences between the ML and ARIMA models. However, if a sufficiently large dataset can be secured for training, the ML model is expected to perform better than the ARIMA. Additionally, separate tests were conducted to predict the trends of JKM fluctuations and comprehensively validate the practicality of the ML models. Based on the test results, LSTM model, identified as the optimal ML algorithm, achieved a performance of 53% during the regular period and 57% d during the abnormal period (i.e., COVID-19). Subject matter experts agreed that the performance of the ML models could be improved through additional studies, ultimately reducing the risk of price fluctuations when purchasing spot LNG.
Itaru Kaneko, Junichiro Hayano, Emi Yuda
Abstract Objective A small electrocardiograph and Holter electrocardiograph can record an electrocardiogram for 24 h or more. We examined whether gender could be verified from such an electrocardiogram and, if possible, how accurate it would be. Results Ten dimensional statistics were extracted from the heart rate data of more than 420,000 people, and gender identification was performed by various major identification methods. Lasso, linear regression, SVM, random forest, logistic regression, k-means, Elastic Net were compared, for Age < 50 and Age ≥ 50. The best Accuracy was 0.681927 for Random Forest for Age < 50. There are no consistent difference between Age < 50 and Age ≥ 50. Although the discrimination results based on these statistics are statistically significant, it was confirmed that they are not accurate enough to determine the gender of an individual.
Jongho Lee, Jiuk Shin, Jaewook Lee et al.
Large fires in factories cause severe human casualties and property damage. Thus, preparing more economical and efficient management strategies for fire prevention can significantly improve fire safety. This study deals with property damage grade prediction by fire based on simplified building information. This paper’s primary objective is to propose and verify a framework for predicting the scale of property damage caused by fire using machine learning (ML). Korean public datasets are collected and preprocessed, and ML algorithms are trained with only 15 input data using building register and fire scenario information. Four models (artificial neural network (ANN), decision tree (DT), k-nearest neighbor (KNN), and random forest (RF)) are used for ML. The RF model is the most suitable for this study, with recall and precision of 74.2% and 73.8%, respectively. Structure, floor, causes, and total floor area are the critical factors that govern the fire size. This study proposes a novel approach by utilizing ML models to accurately and rapidly predict the size of fire damage based on basic building information. By analyzing domestic fire incident data and creating fire scenarios, a similar ML model can be developed.
Halaman 3 dari 148127