With the advancement of Agentic AI, researchers are increasingly leveraging autonomous agents to address challenges in software engineering (SE). However, the large language models (LLMs) that underpin these agents often function as black boxes, making it difficult to justify the superiority of Agentic AI approaches over baselines. Furthermore, missing information in the evaluation design description frequently renders the reproduction of results infeasible. To synthesize current evaluation practices for Agentic AI in SE, this study analyzes 18 papers on the topic, published or accepted by ICSE 2026, ICSE 2025, FSE 2025, ASE 2025, and ISSTA 2025. The analysis identifies prevailing approaches and their limitations in evaluating Agentic AI for SE, both in current research and potential future studies. To address these shortcomings, this position paper proposes a set of guidelines and recommendations designed to empower reproducible, explainable, and effective evaluations of Agentic AI in software engineering. In particular, we recommend that Agentic AI researchers make their Thought-Action-Result (TAR) trajectories and LLM interaction data, or summarized versions of these artifacts, publicly accessible. Doing so will enable subsequent studies to more effectively analyze the strengths and weaknesses of different Agentic AI approaches. To demonstrate the feasibility of such comparisons, we present a proof-of-concept case study that illustrates how TAR trajectories can support systematic analysis across approaches.
Tanja E. J. Vos, Tijs van der Storm, Alexander Serebrenik
et al.
Software engineering is the invisible infrastructure of the digital age. Every breakthrough in artificial intelligence, quantum computing, photonics, and cybersecurity relies on advances in software engineering, yet the field is too often treated as a supportive digital component rather than as a strategic, enabling discipline. In policy frameworks, including major European programmes, software appears primarily as a building block within other technologies, while the scientific discipline of software engineering remains largely absent. This position paper argues that the long-term sustainability, dependability, and sovereignty of digital technologies depend on investment in software engineering research. It is a call to reclaim the identity of software engineering.
Mairieli Wessel, Daniel Feitosa, Sangeeth Kochanthara
Rising publication pressure and the routine use of generative AI tools are reshaping how software engineering research is produced, assessed, and taught. While these developments promise efficiency, they also raise concerns about skill degradation, responsibility, and trust in scholarly outputs. This vision paper employs Design Fiction as a methodological lens to examine how such concerns might materialise if current practices persist. Drawing on themes reported in a recent community survey, we construct a speculative artifact situated in a near future research setting. The fiction is used as an analytical device rather than a forecast, enabling reflection on how automated assistance might impede domain knowledge competence, verification, and mentoring practices. By presenting an intentionally unsettling scenario, the paper invites discussion on how the software engineering research community in the future will define proficiency, allocate responsibility, and support learning.
This research presents a methodology for assessing the contractor's abidance towards preserving the environment during the construction phase. This goal is achieved by the indication of the most prevalent factors. Based on a thorough literature review and experts' review, a list of forty factors is presented, which are grouped into nine categories. These categories are: Solid Waste Management; Water Management; Energy Efficiency; Pollutants Control; Traffic Management; Site Arrangement; Procurement; Awareness Leverage & Education; and Social Governance. Firstly, a two-step questionnaire survey is conducted in this research to review and assess the factors list extracted from the literature review. Upon completing the questionnaire survey, statistical analysis is applied, including the Analysis of Variance (ANOVA) test, Exploratory Factors Analysis (EFA), and the Cronbach's Alpha test for reliability, resulting in thirty-two final factors list after eliminating eight factors. Using the weights calculated by the Best-Worst Method (BWM) and the assessment benchmarks set for each factor, the environmental scoring sheet is generated along with the overall scoring evaluation thresholds to indicate the holistic environmental performance grade. Finally, the application of the proposed research methodology is presented by applying the scoring tool through a case study.
Renewable energy sources, Environmental engineering
Jeanne Sanders, Joseph Mirabelli, Eileen Johnson
et al.
Background: Undergraduate engineering students experience high stress and exhibit help-seeking behaviors less than non-engineering peers. Developing a deeper, comprehensive understanding of their experiences is a critical step to identifying potential changes to reduce their stress. Research identifying structural components that impact student stress can inform structural changes that decrease student stress and thus support engineering students’ mental health. Purpose/Hypothesis: We examined how narratives of engineering undergraduate experiences with stress highlight the relationship between control and identified hindrances. We then used these relationships to investigate underlying structural elements. Design/Method: We interviewed fourteen undergraduate engineering students at an R2 institution in the northeastern United States. To create narratives, we conducted a tri-fold process that consisted of thematic analysis, identification of key quotes, and arts-based memo analysis. These narratives were mapped onto the Job-Hindrance-Control-Support (JHCS) model to identify structural elements for potential change. Results: The resulting composite narratives of George and Maya presented compelling stories of students’ experiences with stress and social support that highlight underlying structural systems, and their sources of support differed. Identified structural elements impacting their experiences included their physical proximity to campus, financial resources, and support for both time management and social-emotional regulation. Conclusions: Undergraduate engineering students commonly experience high levels of stress, and recognition of identified key structural elements followed by informed, deliberate action may be one way to support student mental health.
The paper entitled "Qualitative Methods in Empirical Studies of Software Engineering" by Carolyn Seaman was published in TSE in 1999. It has been chosen as one of the most influential papers from the third decade of TSE's 50 years history. In this retrospective, the authors discuss the evolution of the use of qualitative methods in software engineering research, the impact it's had on research and practice, and reflections on what is coming and deserves attention.
Systems engineering (SE) is evolving with the availability of generative artificial intelligence (AI) and the demand for a systems-of-systems perspective, formalized under the purview of mission engineering (ME) in the US Department of Defense. Formulating ME problems is challenging because they are open-ended exercises that involve translation of ill-defined problems into well-defined ones that are amenable for engineering development. It remains to be seen to which extent AI could assist problem formulation objectives. To that end, this paper explores the quality and consistency of multi-purpose Large Language Models (LLM) in supporting ME problem formulation tasks, specifically focusing on stakeholder identification. We identify a relevant reference problem, a NASA space mission design challenge, and document ChatGPT-3.5's ability to perform stakeholder identification tasks. We execute multiple parallel attempts and qualitatively evaluate LLM outputs, focusing on both their quality and variability. Our findings portray a nuanced picture. We find that the LLM performs well in identifying human-focused stakeholders but poorly in recognizing external systems and environmental factors, despite explicit efforts to account for these. Additionally, LLMs struggle with preserving the desired level of abstraction and exhibit a tendency to produce solution specific outputs that are inappropriate for problem formulation. More importantly, we document great variability among parallel threads, highlighting that LLM outputs should be used with caution, ideally by adopting a stochastic view of their abilities. Overall, our findings suggest that, while ChatGPT could reduce some expert workload, its lack of consistency and domain understanding may limit its reliability for problem formulation tasks.
Shavindra Wickramathilaka, John Grundy, Kashumi Madampe
et al.
The use of diverse mobile applications among senior users is becoming increasingly widespread. However, many of these apps contain accessibility problems that result in negative user experiences for seniors. A key reason is that software practitioners often lack the time or resources to address the broad spectrum of age-related accessibility and personalisation needs. As current developer tools and practices encourage one-size-fits-all interfaces with limited potential to address the diversity of senior needs, there is a growing demand for approaches that support the systematic creation of adaptive, accessible app experiences. To this end, we present AdaptForge, a novel model-driven engineering (MDE) approach that enables advanced design-time adaptations of mobile application interfaces and behaviours tailored to the accessibility needs of senior users. AdaptForge uses two domain-specific languages (DSLs) to address age-related accessibility needs. The first model defines users' context-of-use parameters, while the second defines conditional accessibility scenarios and corresponding UI adaptation rules. These rules are interpreted by an MDE workflow to transform an app's original source code into personalised instances. We also report evaluations with professional software developers and senior end-users, demonstrating the feasibility and practical utility of AdaptForge.
Large Language Models (LLMs) are revolutionizing software engineering (SE), with special emphasis on code generation and analysis. However, their applications to broader SE practices including conceptualization, design, and other non-code tasks, remain partially underexplored. This research aims to augment the generality and performance of LLMs for SE by (1) advancing the understanding of how LLMs with different characteristics perform on various non-code tasks, (2) evaluating them as sources of foundational knowledge in SE, and (3) effectively detecting hallucinations on SE statements. The expected contributions include a variety of LLMs trained and evaluated on domain-specific datasets, new benchmarks on foundational knowledge in SE, and methods for detecting hallucinations. Initial results in terms of performance improvements on various non-code tasks are promising.
In the construction and operation stage of shield tunnel, mastering and clarifying the mechanical characteristics of the overall structure is essential for ensuring shield tunnel safety. In this paper, the shell-spring model is established by ABAQUS finite element software for the ultra-high water pressure submarine shield tunnel. The mechanical behavior of shield segment structure under varying water pressure, different key block position and different strata is studied and analyzed. The results show that at the same segment assembly position, the axial force of the segment increases greatly with the increase of water pressure, and the growth rate is as high as 150 %. The internal force of the segment is mainly axial force. Under the ultra-high water pressure, the corresponding position of the maximum deformation of the segment is related to the position of the segment joint at the arch bottom. In the staggered assembly, the axial force fluctuation near the arch bottom is larger than that of the straight assembly, and the peak bending moment and the maximum axial force are near the invert. In addition, near the 120 degree of the vault, the bending moment oscillates obviously near the joint in the stratum with small coefficient of soil reaction. The larger the coefficient of soil reaction is, the more uniform the axial force of the whole segment is. The research results provide a theoretical insights for the optimization design of shield tunnel segments.
Architectural engineering. Structural engineering of buildings, Structural engineering (General)
Eduard C. Groen, Kazi Rezoanur Rahman, Nikita Narsinghani
et al.
The farming domain has seen a tremendous shift towards digital solutions. However, capturing farmers' requirements regarding Digital Farming (DF) technology remains a difficult task due to domain-specific challenges. Farmers form a diverse and international crowd of practitioners who use a common pool of agricultural products and services, which means we can consider the possibility of applying Crowd-based Requirements Engineering (CrowdRE) for DF: CrowdRE4DF. We found that online user feedback in this domain is limited, necessitating a way of capturing user feedback from farmers in situ. Our solution, the Farmers' Voice application, uses speech-to-text, Machine Learning (ML), and Web 2.0 technology. A preliminary evaluation with five farmers showed good technology acceptance, and accurate transcription and ML analysis even in noisy farm settings. Our findings help to drive the development of DF technology through in-situ requirements elicitation.
Explicit time integration schemes coupled with Galerkin discretizations of time-dependent partial differential equations require solving a linear system with the mass matrix at each time step. For applications in structural dynamics, the solution of the linear system is frequently approximated through so-called mass lumping, which consists in replacing the mass matrix by some diagonal approximation. Mass lumping has been widely used in engineering practice for decades already and has a sound mathematical theory supporting it for finite element methods using the classical Lagrange basis. However, the theory for more general basis functions is still missing. Our paper partly addresses this shortcoming. Some special and practically relevant properties of lumped mass matrices are proved and we discuss how these properties naturally extend to banded and Kronecker product matrices whose structure allows to solve linear systems very efficiently. Our theoretical results are applied to isogeometric discretizations but are not restricted to them.
Sanin Haverić, Maida Hadžić Omanović, Tamara Ćetković Pećar
et al.
Spontaneous chromosomal aberrations are structural or numerical changes of chromosomes that occur naturally, without exposure to external genotoxic factors. They are not inherited, occur randomly in the karyotype, and do not have direct clinical significance. However, they can affect genomic instability and disease predisposition. They can result from DNA replication or repair processes errors, and typically are observed in cells that are actively dividing. Spontaneous chromosomal aberrations may arise due to the natural chromosomal instability and can be elevated in individuals exposed to mutagens. We analyzed frequencies of spontaneous chromosomal aberrations in 137 individuals subjected to karyotype analysis at the Laboratory for Cytogenetics and Genotoxicology, University of Sarajevo – Institute for Genetic Engineering and Biotechnology, during 2008-2023. Whole blood samples were cultivated for 72 hours with the thymidine added in the 48th hour. Metaphases were arrested by colcemid 60 minutes before harvesting. GTG banding was performed and slides were analyzed under 1000x magnification in accordance with An International System for Human Cytogenetic Nomenclature and E.C.A. Cytogenetic Guidelines and Quality Assurance. Constitutionally aberrant karyotypes were found in 2.92% of analysed individuals as well as altered karyotypes considered as normal chromosomal variants. In the total of 3092 analyzed metaphases, 20 spontaneous chromosomal aberrations were found in 13 individuals. This study contributes to the limited knowledge of the cytogenetic status of the Bosnian and Herzegovinian population. Further monitoring of spontaneous chromosomal aberrations incidences is recommended.
Abstract Background Females are underrepresented in Science, Technology, Engineering, and Mathematics (STEM) fields all over the world. To encourage more girls to choose STEM majors and careers, it is critical to increase their interest in STEM careers. Many studies have investigated the factors that influence females' entry into STEM fields, but few studies have explored the gender differences in the relationships between these factors. Therefore, based on the Social Cognitive Career Theory, this study explored the gender differences in the effects of environmental factors (school education, informal education, social support, and media) on high school students' interest in STEM careers through the mediating roles of STEM self-efficacy and STEM careers perceptions. Results A questionnaire survey was conducted among 1240 high school students in Hunan Province, China, and the results of t-test, regression analysis, and structural equation model multi-group comparison showed that: Firstly, the scores of male students in all the dimensions except for STEM career perception were significantly higher than those of female students. Secondly, the environmental factor that had the greatest effect on male and female students' interest in STEM careers was different. Finally, there were gender differences in the mediating roles of STEM self-efficacy and STEM careers perceptions between environmental factors and interest in STEM careers. Conclusions This study revealed the influence mechanisms and gender differences in male and female students' interest in STEM careers in the context of Chinese Confucian culture, and the conclusions are as follows: (1) Male students' interest in STEM careers was significantly higher than that of female students; (2) The environmental factors that had the greatest effect on male and female students' interest in STEM careers were social support and media, respectively; and (3) Environmental factors could affect male students' interest in STEM careers through the mediating roles of STEM self-efficacy and STEM career perception, while environmental factors could affect female students' interest in STEM careers through the mediating role of STEM self-efficacy. Finally, the mediating mechanisms of STEM self-efficacy and STEM career perception between environmental factors and interest in STEM careers, and the importance of STEM self-efficacy for female students were discussed.
Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher
et al.
Conversational Generative AI (convo-genAI) is revolutionizing Software Engineering (SE) as engineers and academics embrace this technology in their work. However, there is a gap in understanding the current potential and pitfalls of this technology, specifically in supporting students in SE tasks. In this work, we evaluate through a between-subjects study (N=22) the effectiveness of ChatGPT, a convo-genAI platform, in assisting students in SE tasks. Our study did not find statistical differences in participants' productivity or self-efficacy when using ChatGPT as compared to traditional resources, but we found significantly increased frustration levels. Our study also revealed 5 distinct faults arising from violations of Human-AI interaction guidelines, which led to 7 different (negative) consequences on participants.