Evaluation of repository-aware software engineering systems is often confounded by synthetic task design, prompt leakage, and temporal contamination between repository knowledge and future code changes. We present a time-consistent benchmark methodology that snapshots a repository at time T0, constructs repository-derived code knowledge using only artifacts available before T0, and evaluates on engineering tasks derived from pull requests merged in the future interval (T0, T1]. Each historical pull request is transformed into a natural-language task through an LLM-assisted prompt-generation pipeline, and the benchmark is formalized as a matched A/B comparison in which the same software engineering agent is evaluated with and without repository-derived code knowledge while all other variables are held constant. We also report a baseline characterization study on two open-source repositories, DragonFly and React, using three Claude-family models and four prompt granularities. Across both repositories, file-level F1 increases monotonically from minimal to guided prompts, reaching 0.8081 on DragonFly and 0.8078 on React for the strongest tested model. These results show that prompt construction is a first-order benchmark variable. More broadly, the benchmark highlights that temporal consistency and prompt control are core validity requirements for repository-aware software engineering evaluation.
Geon-Bo Kim, Begona Aranguren-Barrado, Shamsuzzoha Basunia
et al.
Cryogenic microcalorimeters are state-of-the-art radiation detectors using superconducting and quantum technologies. They can resolve complex X-ray and low-energy γ-ray spectra with ultra-high energy resolution of an order of 10 eV at 100 keV, enabling high-precision non-destructive assay (NDA) analysis of nuclear materials containing uranium, plutonium and other actinides. With significant technical advancements in microcalorimetry technology, microcalorimeters are now deployable to end-users such as the International Atomic Energy Agency (IAEA) for improved NDA. However, the accuracy of microcalorimetry analysis can be limited by nuclear data. There are several cases that the current nuclear data obtained by conventional radiation detector technologies is not sufficient to support microcalorimetry analysis. To address the growing need for improved nuclear data in microcalorimetry, the U.S. Department of Energy Office of International Nuclear Safeguards hosted a workshop on Microcalorimetry and Nuclear Data (MiND) in June 2023. Microcalorimetry experts and users, and nuclear structure evaluators and managers, and program sponsors attended the workshop with the main objective of identifying a roadmap for priority nuclear data, stakeholders, partnerships, and opportunities. This paper summarizes the outcome of the MiND workshop, including the priority list of nuclear data for microcalorimetry and a multi-laboratory measurement campaign to improve such nuclear data.
In this paper, we introduce OWLAPY, a comprehensive Python framework for OWL ontology engineering. OWLAPY streamlines the creation, modification, and serialization of OWL 2 ontologies. It uniquely integrates native Python-based reasoners with support for external Java reasoners, offering flexibility for users. OWLAPY facilitates multiple implementations of core ontology components and provides robust conversion capabilities between OWL class expressions and formats such as Description Logics, Manchester Syntax, and SPARQL. It also allows users to define custom workflows to leverage large language models (LLMs) in ontology generation from natural language text. OWLAPY serves as a well-tested software framework for users seeking a flexible Python library for advanced ontology engineering, including those transitioning from Java-based environments. The project is publicly available on GitHub at https://github.com/dice-group/owlapy and on the Python Package Index (PyPI) at https://pypi.org/project/owlapy/ , with over 50,000 downloads at the time of writing.
Engineering problems that apply machine learning often involve computationally intensive methods but rely on limited datasets. As engineering data evolves with new designs and constraints, models must incorporate new knowledge over time. However, high computational costs make retraining models from scratch infeasible. Continual learning (CL) offers a promising solution by enabling models to learn from sequential data while mitigating catastrophic forgetting, where a model forgets previously learned mappings. This work introduces CL to engineering design by benchmarking several CL methods on representative regression tasks. We apply these strategies to five engineering datasets and construct nine new engineering CL benchmarks to evaluate their ability to address forgetting and improve generalization. Preliminary results show that applying existing CL methods to these tasks improves performance over naive baselines. In particular, the Replay strategy achieved performance comparable to retraining in several benchmarks while reducing training time by nearly half, demonstrating its potential for real-world engineering workflows. The code and datasets used in this work will be available at: https://github.com/kmsamuel/cl-for-engineering-release.
Large Language Models (LLMs) are increasingly integrated into various daily tasks in Software Engineering such as coding and requirement elicitation. Despite their various capabilities and constant use, some interactions can lead to unexpected challenges (e.g. hallucinations or verbose answers) and, in turn, cause emotions that develop into frustration. Frustration can negatively impact engineers' productivity and well-being if they escalate into stress and burnout. In this paper, we assess the impact of LLM interactions on software engineers' emotional responses, specifically strains, and identify common causes of frustration when interacting with LLMs at work. Based on 62 survey responses from software engineers in industry and academia across various companies and universities, we found that a majority of our respondents experience frustrations or other related emotions regardless of the nature of their work. Additionally, our results showed that frustration mainly stemmed from issues with correctness and less critical issues such as adaptability to context or specific format. While such issues may not cause frustration in general, artefacts that do not follow certain preferences, standards, or best practices can make the output unusable without extensive modification, causing frustration over time. In addition to the frustration triggers, our study offers guidelines to improve the software engineers' experience, aiming to minimise long-term consequences on mental health.
Gianmario Voria, Giulia Sellitto, Carmine Ferrara
et al.
Machine learning's widespread adoption in decision-making processes raises concerns about fairness, particularly regarding the treatment of sensitive features and potential discrimination against minorities. The software engineering community has responded by developing fairness-oriented metrics, empirical studies, and approaches. However, there remains a gap in understanding and categorizing practices for engineering fairness throughout the machine learning lifecycle. This paper presents a novel catalog of practices for addressing fairness in machine learning derived from a systematic mapping study. The study identifies and categorizes 28 practices from existing literature, mapping them onto different stages of the machine learning lifecycle. From this catalog, the authors extract actionable items and implications for both researchers and practitioners in software engineering. This work aims to provide a comprehensive resource for integrating fairness considerations into the development and deployment of machine learning systems, enhancing their reliability, accountability, and credibility.
Considerable efforts have been made at the high school level to encourage girls to pursue software engineering careers and raise awareness about diversity within the field. Similarly, software companies have become more active in diversity and inclusion (D&I) topics, aiming to create more inclusive work environments. However, the way diversity and inclusion are approached inside software engineering university education remains less clear. This study investigates the current state of D&I in software engineering education and faculties in Finland. An online survey (N=30) was conducted among Finnish software engineering university teachers to investigate which approaches and case examples of D&I are most commonly used by software engineering teachers in Finland. In addition, it was researched how software engineering teachers perceive the importance of D&I in their courses. As a result of the quantitative and thematic analysis, a framework to identify attitudes, approaches, challenges and pedagogical strategies when implementing D&I themes in software engineering education is presented. This framework also offers a process for integrating D&I themes for the curriculum or at the faculty level. The findings of this study emphasize that there is a continuing need for diverse-aware education and training. The results underline the responsibility of universities to ensure that future professionals are equipped with the necessary skills and knowledge to promote D&I in the field of software engineering.
Logan Murphy, Torin Viger, Alessio Di Sandro
et al.
In critical software engineering, structured assurance cases (ACs) are used to demonstrate how key properties (e.g., safety, security) are supported by evidence artifacts (e.g., test results, proofs). ACs can also be studied as formal objects in themselves, such that formal methods can be used to establish their correctness. Creating rigorous ACs is particularly challenging in the context of software product lines (SPLs), wherein a family of related software products is engineered simultaneously. Since creating individual ACs for each product is infeasible, AC development must be lifted to the level of product lines. In this work, we propose PLACIDUS, a methodology for integrating formal methods and software product line engineering to develop provably correct ACs for SPLs. To provide rigorous foundations for PLACIDUS, we define a variability-aware AC language and formalize its semantics using the proof assistant Lean. We provide tool support for PLACIDUS as part of an Eclipse-based model management framework. Finally, we demonstrate the feasibility of PLACIDUS by developing an AC for a product line of medical devices.
Software engineering (SE) is a dynamic field that involves multiple phases all of which are necessary to develop sustainable software systems. Machine learning (ML), a branch of artificial intelligence (AI), has drawn a lot of attention in recent years thanks to its ability to analyze massive volumes of data and extract useful patterns from data. Several studies have focused on examining, categorising, and assessing the application of ML in SE processes. We conducted a literature review on primary studies to address this gap. The study was carried out following the objective and the research questions to explore the current state of the art in applying machine learning techniques in software engineering processes. The review identifies the key areas within software engineering where ML has been applied, including software quality assurance, software maintenance, software comprehension, and software documentation. It also highlights the specific ML techniques that have been leveraged in these domains, such as supervised learning, unsupervised learning, and deep learning. Keywords: machine learning, deep learning, software engineering, natural language processing, source code
Cognitive distraction and measurement noise are two distinct factors that significantly impact the performance of humans and engineering systems. Cognitive distraction occurs when an individual's attention is diverted from a task, while measurement noise refers to the random variation that can occur in system measurements. Although humans and engineering systems employ different methods to overcome these obstacles, the ultimate goal is to achieve optimal performance. An intriguing question arises: what are the similarities and differences between using the term "noise" in engineering and cognitive psychology? Additionally, it is worthwhile to explore whether the human brain and engineering control systems use similar or different approaches to attenuate noise. While this article does not provide a definitive answer, it emphasizes the importance of addressing this question and encourages further investigation.
Nuclear data is critical for many modern applications from stockpile stewardship to cutting edge scientific research. Central to these pursuits is a robust pipeline for nuclear modeling as well as data assimilation and dissemination. We summarize a small portion of the ongoing nuclear data efforts at Los Alamos for medium mass to heavy nuclei. We begin with an overview of the NEXUS framework and show how one of its modules can be used for model parameter optimization using Bayesian techniques. The mathematical framework affords the combination of different measured data in determining model parameters and their associated correlations. It also has the advantage of being able to quantify outliers in data. We exemplify the power of this procedure by highlighting the recently evaluated 239-Pu cross section. We further showcase the success of our tools and pipeline by covering the insight gained from incorporating the latest nuclear modeling and data in astrophysical simulations as part of the Fission In R-process Elements (FIRE) collaboration.
Tavian Barnes, Ken Jen Lee, Cristina Tavares
et al.
The traditional path to a software engineering career involves a post-secondary diploma in Software Engineering, Computer Science, or a related field. However, many software engineers take a non-traditional path to their career, starting from other industries or fields of study. This paper proposes a study on barriers faced by software engineers with non-traditional educational and occupational backgrounds, and possible mitigation strategies for those barriers. We propose a two-stage methodology, consisting of an exploratory study, followed by a validation study. The exploratory study will involve a grounded-theory-based qualitative analysis of relevant Reddit data to yield a framework around the barriers and possible mitigation strategies. These findings will then be validated using a survey in the validation study. Making software engineering more accessible to those with non-traditional backgrounds will not only bring about the benefits of functional diversity, but also serves as a method of filling in the labour shortages of the software engineering industry.
Context: Processing Software Requirement Specifications (SRS) manually takes a much longer time for requirement analysts in software engineering. Researchers have been working on making an automatic approach to ease this task. Most of the existing approaches require some intervention from an analyst or are challenging to use. Some automatic and semi-automatic approaches were developed based on heuristic rules or machine learning algorithms. However, there are various constraints to the existing approaches of UML generation, such as restriction on ambiguity, length or structure, anaphora, incompleteness, atomicity of input text, requirements of domain ontology, etc. Objective: This study aims to better understand the effectiveness of existing systems and provide a conceptual framework with further improvement guidelines. Method: We performed a systematic literature review (SLR). We conducted our study selection into two phases and selected 70 papers. We conducted quantitative and qualitative analyses by manually extracting information, cross-checking, and validating our findings. Result: We described the existing approaches and revealed the issues observed in these works. We identified and clustered both the limitations and benefits of selected articles. Conclusion: This research upholds the necessity of a common dataset and evaluation framework to extend the research consistently. It also describes the significance of natural language processing obstacles researchers face. In addition, it creates a path forward for future research.
Sergio García, Daniel Strüber, Davide Brugali
et al.
Robots that support humans by performing useful tasks (a.k.a., service robots) are booming worldwide. In contrast to industrial robots, the development of service robots comes with severe software engineering challenges, since they require high levels of robustness and autonomy to operate in highly heterogeneous environments. As a domain with critical safety implications, service robotics faces a need for sound software development practices. In this paper, we present the first large-scale empirical study to assess the state of the art and practice of robotics software engineering. We conducted 18 semi-structured interviews with industrial practitioners working in 15 companies from 9 different countries and a survey with 156 respondents (from 26 countries) from the robotics domain. Our results provide a comprehensive picture of (i) the practices applied by robotics industrial and academic practitioners, including processes, paradigms, languages, tools, frameworks, and reuse practices, (ii) the distinguishing characteristics of robotics software engineering, and (iii) recurrent challenges usually faced, together with adopted solutions. The paper concludes by discussing observations, derived hypotheses, and proposed actions for researchers and practitioners.
For software development companies, one of the most important objectives is to identify and acquire talented software engineers in order to maintain a skilled team that can produce competitive products. Traditional approaches for finding talented young software engineers are mainly through programming contests of various forms which mostly test participants' programming skills. However, successful software engineering in practice requires a wider range of skills from team members including analysis, design, programming, testing, communication, collaboration, and self-management, etc. In this paper, we explore potential ways to identify talented software engineering students in a data-driven manner through an Agile Project Management (APM) platform. Through our proposed HASE online APM tool, we conducted a study involving 21 Scrum teams consisting of over 100 undergraduate software engineering students in multi-week coursework projects in 2014. During this study, students performed over 10,000 ASD activities logged by HASE. We demonstrate the possibility and potentials of this new research direction, and discuss its implications for software engineering education and industry recruitment.
These are the proceedings of the 10th International Workshop on Formal Engineering approaches to Software Components and Architectures (FESCA). The workshop was held on March 23, 2013 in Rome (Italy) as a satellite event to the European Joint Conference on Theory and Practice of Software (ETAPS'13). The aim of the FESCA workshop is to bring together both young and senior researchers from formal methods, software engineering, and industry interested in the development and application of formal modelling approaches as well as associated analysis and reasoning techniques with practical benefits for component-based software engineering. FESCA aims to address the open question of how formal methods can be applied effectively to these new contexts and challenges. FESCA is interested in both the development and application of formal methods in component-based development and tries to cross-fertilize their research and application.
Neutrinos produced during the collapse of a massive star are trapped in a nuclear medium (the proto-neutron star). Typically, neutrino energies (10-100 MeV) are of the order of nuclear giant resonances energies. Hence, neutrino propagation is modified by the possibility of coherent scattering on nucleons. We have compared the predictions of different nuclear interaction models. It turns out that their main discrepancies are related to the density dependence of the k-effective mass as well as to the onset of instabilities as density increases. This last point had led us to a systematic study of instabilities of infinite matter with effective Skyrme-type interactions. We have shown that for such interactions there is always a critical density, above which the system becomes unstable.