Hasil "Computer software"

DOAJ Open Access 2025

Improving EEG based brain computer interface emotion detection with EKO ALSTM model

R. Kishore Kanna, Preety Shoran, Meenakshi Yadav et al.

Abstract Decoding signals from the CNS brain activity is done by a computer-based communication device called a BCI. In contrast, the system is considered compelling communication equipment enabling command, communication, and action without using neuromuscular or muscle channels. Various techniques for automatic emotion identification based on body language, speech, or facial expressions are nowadays in use. However, the monitoring of exterior emotions, which are easily manipulated, limits the applicability of these procedures. EEG-based emotion detection research might yield significant benefits for enhancing BCI application performance and user experience. To overcome these issues, this study proposed a novel EKO-ALSTM for emotion detection in EEG-based brain–computer interfaces. The proposed study comprises EEG-based signals that record the electrical activity of the brain connected to various emotional states, which are gathered as real-time acquired EEG signals for emotion detection. The data was pre-processed using a bandpass filter to remove unwanted frequency noise for the obtained data. Then, feature extraction is performed using DWT from pre-processed data. Specifically, the proposed approach is implemented using Python software. The proposed system and existing algorithms are compared using a variety of evaluation criteria, including specificity, F1 score, accuracy, recall or sensitivity, and positive predictive values or precision. The results demonstrated that the proposed method achieved better performance in EEG-based BCI emotion detection with an accuracy of 97.93%, a positive predictive value of 96.24%, a sensitivity of 97.81%, and a specificity of 97.75%. This study emphasizes that innovative approaches have significantly increased the accuracy of emotion identification when applied to EEG-based emotion recognition systems. Additionally, the findings suggest that integrating advanced machine learning techniques can further enhance the effectiveness and reliability of these systems in real-world applications, paving the way for more responsive and intuitive BCI technologies.

Medicine, Science

Detail DOI Sumber

DOAJ Open Access 2025

KHNN: Hypercomplex-valued neural networks computations via Keras using TensorFlow and PyTorch

Agnieszka Niemczynowicz, Radosław A. Kycia

Neural networks that utilize algebras more advanced than real numbers, such as hypercomplex numbers, can outperform traditional models in certain applications, usually, in the number of training parameters giving the same accuracy. However, no general framework exists for constructing hypercomplex neural networks. We propose a library integrated with Keras, TensorFlow, and PyTorch, enabling computations within these advanced algebraic systems. The library offers Dense and Convolutional layer architectures for 1D, 2D, and 3D data, tailored to support hypercomplex operations. This tool provides a streamlined approach for developing models that leverage hypercomplex numbers, enhancing performance in areas like image processing and signal analysis, and fostering innovation in machine learning. The branch of this software – HypercomplexKeras – is the Keras extension for hypercomplex neural networks.

Computer software

Detail DOI Sumber

DOAJ Open Access 2025

Predicting Residential Energy Consumption in South Africa Using Ensemble Models

David Attipoe, Donatien Koulla Moulla, Ernest Mnkandla et al.

This study presents ensemble machine learning (ML) models for predicting residential energy consumption in South Africa. By combining the best features of individual ML models, ensemble models reduce the drawbacks of each model and improve prediction accuracy. We present four ensemble models: ensemble by averaging (EA), ensemble by stacking each estimator (ESE), ensemble by boosting (EB), and ensemble by voting estimator (EVE). These models are built on top of Random Forest (RF) and Decision Tree (DT). These base predictor models leverage historical energy consumption patterns to capture temporal intricacies, including seasonal variations and rolling averages. In addition, we employed feature engineering methodologies to further enhance their predictive abilities. The accuracy of each ensemble model was evaluated by assessing various performance indicators, including the mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination R2. Overall, the findings illustrate the efficiency of ensemble learning models in providing accurate predictions for residential energy consumption. This study provides valuable insights for researchers and practitioners in predicting energy consumption in residential buildings and the benefits of using ensemble learning models in the building and energy research domains.

Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2025

Massive discovery of crystal structures across dimensionalities by leveraging vector quantization

ZiJie Qiu, Luozhijie Jin, Zijian Du et al.

Abstract Discovering new functional crystalline materials through computational methods remains a challenge in materials science. We introduce VQCrystal, a deep learning framework leveraging discrete latent representations to overcome key limitations to crystal generation and inverse design. VQCrystal employs a hierarchical VQ-VAE architecture to encode global and atom-level crystal features, coupled with an inter-atomic potential model and a genetic algorithm to realize property-targeted inverse design. Benchmark evaluations on diverse datasets demonstrate VQCrystal’s capabilities in representation learning and crystal discovery. We further apply VQCrystal for both 3D and 2D material design. For 3D materials, the density-functional theory validation confirmed that 62.22% of bandgaps and 99% of formation energies of the 56 filtered materials matched the target range. 437 generated materials were validated as existing entries in the full MP-20 database outside the training set. For 2D materials, 73.91% of 23 filtered structures exhibited high stability with formation energies below -1 eV/atom.

Materials of engineering and construction. Mechanics of materials, Computer software

Detail DOI Sumber

DOAJ Open Access 2025

Software Fault Prediction With an Iterative Fuzzy Logic System Considering Interpretability With Imbalanced Datasets

Behrooz Shahi, Hooman Tahayori

Users expect software to be error-free; however, preventing faults in software while being developed is difficult. Although predicting faults in software is arduous, it radically helps to improve the software quality. Due to the complexity of software, time, and budget limitations, such prediction helps to deliver more robust and error-free software with lower expenses. This paper introduces an iterative method based on fuzzy systems and machine learning to predict software faults. High interpretability, transparency, balancing data, and finding the best interval for converting numerical features to fuzzy features are basic challenges for predicting software faults. The proposed framework is split into four phases. In the first phase, the crisp inputs are converted to fuzzy sets. In the second phase, a membership function is constructed using triangular fuzzy sets. In the third phase, training data are balanced, and fuzzy rules are generated. In the last phase, the similarity of inputs with the rules’ antecedents is calculated, and the fired rules are aggregated to label the test data. Eclipse, Promise, and Travis repositories are evaluated with the proposed method. The calculated AUC of the proposed method on Promise, Travis, and Eclipse datasets are, respectively, equal to 89%, 62% and 87%, which are comparable to the results obtained by deep learning methods but with higher interpretability and transparency.

Electrical engineering. Electronics. Nuclear engineering, Computer software

Detail DOI Sumber

arXiv Open Access 2025

Large Language Models for Software Engineering: A Reproducibility Crisis

Mohammed Latif Siddiq, Arvin Islam-Gomes, Natalie Sekerak et al.

Reproducibility is a cornerstone of scientific progress, yet its state in large language model (LLM)-based software engineering (SE) research remains poorly understood. This paper presents the first large-scale, empirical study of reproducibility practices in LLM-for-SE research. We systematically mined and analyzed 640 papers published between 2017 and 2025 across premier software engineering, machine learning, and natural language processing venues, extracting structured metadata from publications, repositories, and documentation. Guided by four research questions, we examine (i) the prevalence of reproducibility smells, (ii) how reproducibility has evolved over time, (iii) whether artifact evaluation badges reliably reflect reproducibility quality, and (iv) how publication venues influence transparency practices. Using a taxonomy of seven smell categories: Code and Execution, Data, Documentation, Environment and Tooling, Versioning, Model, and Access and Legal, we manually annotated all papers and associated artifacts. Our analysis reveals persistent gaps in artifact availability, environment specification, versioning rigor, and documentation clarity, despite modest improvements in recent years and increased adoption of artifact evaluation processes at top SE venues. Notably, we find that badges often signal artifact presence but do not consistently guarantee execution fidelity or long-term reproducibility. Motivated by these findings, we provide actionable recommendations to mitigate reproducibility smells and introduce a Reproducibility Maturity Model (RMM) to move beyond binary artifact certification toward multi-dimensional, progressive evaluation of reproducibility rigor.

en cs.SE, cs.LG

Detail Sumber

arXiv Open Access 2025

Hybrid Work in Agile Software Development: Recurring Meetings

Emily Laue Christensen, Maria Paasivaara, Iflaah Salman

The Covid-19 pandemic established hybrid work as the new norm in software development companies. In large-scale agile, meetings of different types are pivotal for collaboration, and decisions need to be taken on how they are organized and carried out in hybrid work. This study investigates how recurring meetings are organized and carried out in hybrid work in a large-scale agile environment. We performed a single case study by conducting 27 semi-structured interviews with members of 15 agile teams, product owners, managers, and specialists from two units of Ericsson, a multinational telecommunications company with a "2 days per week at the office" policy. A key insight from this study is that different types of meetings in agile software development should be primarily organized onsite or remotely based on the meeting intent, i.e., meetings requiring active discussion or brainstorming, such as retrospectives or technical discussions, benefit from onsite attendance, whereas large information sharing meetings work well remotely. In hybrid work, community meetings can contribute to knowledge sharing within organizations, help strengthen social ties, and prevent siloed collaboration. Additionally, the use of cameras is recommended for small discussion-oriented remote and hybrid meetings.

en cs.SE

Detail Sumber

arXiv Open Access 2025

Sentiment Analysis Tools in Software Engineering: A Systematic Mapping Study

Martin Obaidi, Lukas Nagel, Alexander Specht et al.

Software development is a collaborative task. Previous research has shown social aspects within development teams to be highly relevant for the success of software projects. A team's mood has been proven to be particularly important. It is paramount for project managers to be aware of negative moods within their teams, as such awareness enables them to intervene. Sentiment analysis tools offer a way to determine the mood of a team based on textual communication. We aim to help developers or stakeholders in their choice of sentiment analysis tools for their specific purpose. Therefore, we conducted a systematic mapping study (SMS). We present the results of our SMS of sentiment analysis tools developed for or applied in the context of software engineering (SE). Our results summarize insights from 106 papers with respect to (1) the application domain, (2) the purpose, (3) the used data sets, (4) the approaches for developing sentiment analysis tools, (5) the usage of already existing tools, and (6) the difficulties researchers face. We analyzed in more detail which tools and approaches perform how in terms of their performance. According to our results, sentiment analysis is frequently applied to open-source software projects, and most approaches are neural networks or support-vector machines. The best performing approach in our analysis is neural networks and the best tool is BERT. Despite the frequent use of sentiment analysis in SE, there are open issues, e.g. regarding the identification of irony or sarcasm, pointing to future research directions. We conducted an SMS to gain an overview of the current state of sentiment analysis in order to help developers or stakeholders in this matter. Our results include interesting findings e.g. on the used tools and their difficulties. We present several suggestions on how to solve these identified problems.

en cs.SE

Detail DOI Sumber

arXiv Open Access 2025

Context Engineering for AI Agents in Open-Source Software

Seyedmoein Mohsenimofidi, Matthias Galster, Christoph Treude et al.

GenAI-based coding assistants have disrupted software development. The next generation of these tools is agent-based, operating with more autonomy and potentially without human oversight. Like human developers, AI agents require contextual information to develop solutions that are in line with the standards, policies, and workflows of the software projects they operate in. Vendors of popular agentic tools (e.g., Claude Code) recommend maintaining version-controlled Markdown files that describe aspects such as the project structure, code style, or building and testing. The content of these files is then automatically added to each prompt. Recently, AGENTS$.$md has emerged as a potential standard that consolidates existing tool-specific formats. However, little is known about whether and how developers adopt this format. Therefore, in this paper, we present the results of a preliminary study investigating the adoption of AI context files in 466 open-source software projects. We analyze the information that developers provide in AGENTS$.$md files, how they present that information, and how the files evolve over time. Our findings indicate that there is no established content structure yet and that there is a lot of variation in terms of how context is provided (descriptive, prescriptive, prohibitive, explanatory, conditional). Our commit-level analysis provides first insights into the evolution of the provided context. AI context files provide a unique opportunity to study real-world context engineering. In particular, we see great potential in studying which structural or presentational modifications can positively affect the quality of the generated content.

en cs.SE

Detail Sumber

arXiv Open Access 2025

Software Testing Education and Industry Needs - Report from the ENACTEST EU Project

Mehrdad Saadatmand, Abbas Khan, Beatriz Marin et al.

The evolving landscape of software development demands that software testers continuously adapt to new tools, practices, and acquire new skills. This study investigates software testing competency needs in industry, identifies knowledge gaps in current testing education, and highlights competencies and gaps not addressed in academic literature. This is done by conducting two focus group sessions and interviews with professionals across diverse domains, including railway industry, healthcare, and software consulting and performing a curated small-scale scoping review. The study instrument, co-designed by members of the ENACTEST project consortium, was developed collaboratively and refined through multiple iterations to ensure comprehensive coverage of industry needs and educational gaps. In particular, by performing a thematic qualitative analysis, we report our findings and observations regarding: professional training methods, challenges in offering training in industry, different ways of evaluating the quality of training, identified knowledge gaps with respect to academic education and industry needs, future needs and trends in testing education, and knowledge transfer methods within companies. Finally, the scoping review results confirm knowledge gaps in areas such as AI testing, security testing and soft skills.

en cs.SE

Detail Sumber

DOAJ Open Access 2024

An Automated Fingerprint Image Detection and Localization Approach-based Unsupervised Learning Algorithms using Low-quality Biometrics Plam Data

Abdulrasool Jadaan Abed, Dhahir Abdulhadi Abdullah

In this study, fingerprint identification and classification of low-quality fingerprints have been analyzed accordingly. As technology advances and methodologies evolve, staying at the forefront of research and innovation is imperative. The challenges addressed in this paper provide a foundation for future investigations and underscore the importance of developing resilient and adaptable biometric systems for real-world applications. The quest for accurate, efficient, and robust fingerprint identification in adverse conditions is a testament to the continuous evolution and refinement of machine learning and deep learning approaches in biometrics. While deep learning models exhibited improved performance, it is essential to acknowledge the need for further research and development in this domain. Additionally, integrating multimodal biometric systems and combining fingerprint data with other biometric modalities might present a viable avenue for mitigating the limitations associated with degraded fingerprints. In this paper, we develop a fingerprint identification approach for low-quality fingerprint images. The success rate accuracy of the propped algorithm for the low-quality fingerprint images should be significantly better than that of the standard local minutia approach. The main design of our deep learning approach is based on detecting and extracting the primary correlation during the training and using the correlation feature map to calculate the distance between the low-quality fingerprint images during the predicting phase. The experimental results show a very promising repulsing and high prediction accuracy.

Computer software

Detail DOI Sumber

DOAJ Open Access 2024

〈qo|op〉: A quantum object optimizer

Vu Tuan Hai, Nguyen Tan Viet, Le Bin Ho

The quantum object optimizer (〈qo|op〉) is a Python library that offers a framework for optimizing quantum circuits to represent quantum objects such as quantum states and Hamiltonians. This optimizer is a quantum compilation process in which an ansatz is trained to compile the information from a given quantum object. The software generates shallow circuits that can be implemented on various quantum computers and reduces the time required to process data and simulation. It also allows users to customize the algorithm for practical and proactive solutions to quantum circuit design. In this way, 〈qo|op〉 has the potential to impact both scientific research and practical applications in quantum technology significantly.

Computer software

Detail DOI Sumber

DOAJ Open Access 2024

Análisis del desempeño de C versus C++ en la producción multihilo de cadenas L-System: un caso de estudio ABP

J. Jesús Arellano Pimentel, Guadalupe Toledo Toledo, Mario Andrés Basilio López et al.

La programación orientada a objetos en C++ facilita la codificación de algoritmos respecto al paradigma estructurado del lenguaje C, este echo suele provocar un cuestionamiento válido entre los estudiantes ¿por qué codificar cadenas con memoria dinámica en C cuando los objetos String en C++ evitan ese trabajo? Este tipo de inquietudes permiten gestar casos de estudio de Aprendizaje Basado en Problemas (ABP). En el presente artículo se reporta el contraste de los lenguajes C y C++ a través de la generación multihilo de cadenas L-System usando computadoras personales a disposición de estudiantes de ingeniería en computación. Se realizaron cien corridas para el cálculo de tiempos promedio de ejecución para dos tipos de L-System considerando el procesamiento con balanceo y sin balanceo de carga para dos, cuatro y ocho hilos. Los resultados muestran una mayor velocidad de ejecución para el lenguaje C, pero diferencias interesantes en el Speed Up respecto al lenguaje C++. Al final se concluye que la mejor eficiencia se logra paralelizando con multihilos, siempre y cuando el volumen de los datos sea considerable y esté balanceado, además, la cantidad de hilos no debe rebasar el número de núcleos. Bien vale la pena que los estudiantes lleguen a estas conclusiones mediante el aprendizaje por descubrimiento a través de un caso de estudio.

Computer software

Detail DOI Sumber

arXiv Open Access 2024

An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots

Ebube Alor, Ahmad Abdellatif, SayedHassan Khatoonabadi et al.

Software engineering (SE) chatbots are increasingly gaining attention for their role in enhancing development processes. At the core of chatbots are Natural Language Understanding platforms (NLUs), which enable them to comprehend user queries but require labeled data for training. However, acquiring such labeled data for SE chatbots is challenging due to the scarcity of high-quality datasets, as training requires specialized vocabulary and phrases not found in typical language datasets. Consequently, developers often resort to manually annotating user queries -- a time-consuming and resource-intensive process. Previous approaches require human intervention to generate rules, called labeling functions (LFs), that categorize queries based on specific patterns. To address this issue, we propose an approach to automatically generate LFs by extracting patterns from labeled user queries. We evaluate our approach on four SE datasets and measure performance improvement from training NLUs on queries labeled by the generated LFs. The generated LFs effectively label data with AUC scores up to 85.3% and NLU performance improvements up to 27.2%. Furthermore, our results show that the number of LFs affects labeling performance. We believe that our approach can save time and resources in labeling users' queries, allowing practitioners to focus on core chatbot functionalities rather than manually labeling queries.

en cs.SE, cs.AI

Detail Sumber

arXiv Open Access 2024

What Is an App Store? The Software Engineering Perspective

Wenhan Zhu, Sebastian Proksch, Daniel M. German et al.

"App stores" are online software stores where end users may browse, purchase, download, and install software applications. By far, the best known app stores are associated with mobile platforms, such as Google Play for Android and Apple's App Store for iOS. The ubiquity of smartphones has led to mobile app stores becoming a touchstone experience of modern living. However, most of app store research has concentrated on properties of the apps rather than the stores themselves. Today, there is a rich diversity of app stores and these stores have largely been overlooked by researchers: app stores exist on many distinctive platforms, are aimed at different classes of users, and have different end-goals beyond simply selling a standalone app to a smartphone user. We survey and characterize the broader dimensionality of app stores, and explore how and why they influence software development practices, such as system design and release management. We begin by collecting a set of app store examples from web search queries. By analyzing and curating the results, we derive a set of features common to app stores. We then build a dimensional model of app stores based on these features, and we fit each app store from our web search result set into this model. Next, we performed unsupervised clustering to the app stores to find their natural groupings. Our results suggest that app stores have become an essential stakeholder in modern software development. They control the distribution channel to end users and ensure that the applications are of suitable quality; in turn, this leads to developers adhering to various store guidelines when creating their applications. However, we found the app stores operational model could vary widely between stores, and this variability could in turn affect the generalizability of existing understanding of app stores.

en cs.SE

Detail DOI Sumber

DOAJ Open Access 2023

A Malware Propagation Model with Dual Delay in the Industrial Control Network

Wei Yang, Qiang Fu, Yu Yao

The malware attacks targeting the industrial control network are gradually increasing, and the nonlinear phenomenon makes it difficult to predict the propagation behavior of malware. Once the dynamic system becomes unstable, the propagation of malware will be out of control, which will seriously threaten the security of the industrial control network. So, it is necessary to model and study the propagation of malware in the industrial control network. In this paper, a SIDQR model with dual delay is proposed by fully considering the characteristics of the industrial control network. By analyzing the nonlinear dynamics of the model, the Hopf bifurcation is discussed in detail when the value of dual delay is greater than zero, and the expression for the threshold is also provided. The results of the experiments indicate that the system may have multiple bifurcation points. By comparing different immune and quarantine rates, it is found that the immune rate can be appropriately increased and the isolation rate can be appropriately reduced in the industrial control network, which can suppress the spread of malware without interrupting the industrial production.

Electronic computers. Computer science

Detail DOI Sumber

DOAJ Open Access 2023

Security management of BYOD and cloud environment in Saudi Arabia

Khalid Almarhabi, Adel Bahaddad, Ahmed Mohammed Alghamdi

The increasing trend of Bring Your Own Device (BYOD) to work has led to a significant surge in the risks related to network security. This trend is very beneficial to employers and employees alike in any organisation. The wide infiltration of spyware, malware and similarly suspicious downloads into personal devices has forced the government to reconsider its policies regarding data security. The malicious programs get downloaded onto the personal devices without the user even realising. This could disastrously affect the individuals and the governments. In the case of such an event, the BYODs become risky as they can make unauthorised policy changes and leak sensitive data into the public domain. This type of privacy breach leads to a domino effect with major legal and financial implications, and a decreased productivity for the organisations and governments. This presents a huge challenge as the governments have to consider the user rights and privacy laws and also protect the networks from these attacks. In this study, the researchers have proposed a novel technical framework that could assist the Saudi government. This framework was designed after determining the challenges that were faced by the government, based on the citizen perspectives, to control all risks challenging the use of the BYODs. This framework decreased the number of system restrictions and enforced access control policies for BYODs and cloud environments. The preliminary results of this study were positive and indicated that the framework could decrease the problems related to access control.

Engineering (General). Civil engineering (General)

Detail DOI Sumber

DOAJ Open Access 2023

Improving Semantic Information Retrieval Using Multinomial Naive Bayes Classifier and Bayesian Networks

Wiem Chebil, Mohammad Wedyan, Moutaz Alazab et al.

This research proposes a new approach to improve information retrieval systems based on a multinomial naive Bayes classifier (MNBC), Bayesian networks (BNs), and a multi-terminology which includes MeSH thesaurus (Medical Subject Headings) and SNOMED CT (Systematized Nomenclature of Medicine of Clinical Terms). Our approach, which is entitled improving semantic information retrieval (IMSIR), extracts and disambiguates concepts and retrieves documents. Relevant concepts of ambiguous terms were selected using probability measures and biomedical terminologies. Concepts are also extracted using an MNBC. The UMLS (Unified Medical Language System) thesaurus was then used to filter and rank concepts. Finally, we exploited a Bayesian network to match documents and queries using a conceptual representation. Our main contribution in this paper is to combine a supervised method (MNBC) and an unsupervised method (BN) to extract concepts from documents and queries. We also propose filtering the extracted concepts in order to keep relevant ones. Experiments of IMSIR using the two corpora, the OHSUMED corpus and the Clinical Trial (CT) corpus, were interesting because their results outperformed those of the baseline: the P@50 improvement rate was +36.5% over the baseline when the CT corpus was used.

Information technology

Detail DOI Sumber

arXiv Open Access 2023

Software Reconfiguration in Robotics

Sven Peldszus, Davide Brugali, Daniel Strüber et al.

Robots often need to be reconfigurable$-$to customize, calibrate, or optimize robots operating in varying environments with different hardware). A particular challenge in robotics is the automated and dynamic reconfiguration to load and unload software components, as well as parameterizing them. Over the last decades, a large variety of software reconfiguration techniques has been presented in the literature, many specifically for robotics systems. Also many robotics frameworks support reconfiguration. Unfortunately, there is a lack of empirical data on the actual use of reconfiguration techniques in real robotics projects and on their realization in robotics frameworks. To advance reconfiguration techniques and support their adoption, we need to improve our empirical understanding of them in practice. We present a study of automated reconfiguration at runtime in the robotics domain. We determine the state-of-the art by reviewing 78 relevant publications on reconfiguration. We determine the state-of-practice by analyzing how four major robotics frameworks support reconfiguration, and how reconfiguration is realized in 48 robotics (sub-)systems. We contribute a detailed analysis of the design space of reconfiguration techniques. We identify trends and research gaps. Our results show a significant discrepancy between the state-of-the-art and the state-of-practice. While the scientific community focuses on complex structural reconfiguration, only parameter reconfiguration is widely used in practice. Our results support practitioners to realize reconfiguration in robotics systems, as well as they support researchers and tool builders to create more effective reconfiguration techniques that are adopted in practice.

en cs.RO, cs.SE

Detail DOI Sumber

arXiv Open Access 2023

Software Vulnerability Prediction Knowledge Transferring Between Programming Languages

Khadija Hanifi, Ramin F Fouladi, Basak Gencer Unsalver et al.

Developing automated and smart software vulnerability detection models has been receiving great attention from both research and development communities. One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, we address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. We use C source code samples to train a Convolutional Neural Network (CNN) model, then, we use Java source code samples to adopt and evaluate the learned model. We use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72\%. Additionally, we employ explainable AI to investigate how much each feature contributes to the knowledge transfer mechanisms between C and Java in the proposed model.

en cs.SE, cs.AI

Detail Sumber

Hasil untuk "Computer software"