Abstract Large language models (LLMs) are evolving into engines for scientific discovery, yet the assumption that biological understanding requires domain-specific pre-training remains unchallenged. Here, we report that general-purpose LLMs possess an emergent capability for biological structural discovery. First, we demonstrate that a small-scale GPT-2, fine-tuned solely on English paraphrasing, achieves ∼84% zero-shot accuracy in protein homology detection, where network-based interpretability confirms a deep structural isomorphism between human language and the language of life. Scaling to massive models (e.g., Qwen-3) reveals a phase transition, achieving near-perfect accuracy (∼100%) on standard tasks while maintaining 75% precision on specially constructed remote homology datasets. Chain-of-Thought interpretability reveals that these models transcend simple sequence alignment, leveraging implicit structural knowledge to perform reasoning akin to "mental folding." We formalize this cross-modal universality through the BioPAWS benchmark. Our work establishes a minimalist paradigm for AI for Science, proving that abstract logical structures distilled from human language constitute a powerful cognitive prior for decoding the complex syntax of biology.
Raul Almeida, Frederico Pereira, Dário Machado
et al.
As autonomous vehicles (AVs) become part of urban environments, pedestrian safety and interactions with these vehicles are critical to creating sustainable, walkable cities. Intuitive pedestrian-vehicle communication is essential not only for reducing crash risk but also for supporting policies that promote active mobility and efficient traffic flow. This study investigates pedestrian crossing behavior in a fully immersive virtual reality environment, building on previous work by the authors conducted in a CAVE-type simulator. Participants crossed between a conventional vehicle and an AV when they perceived it was safe. The analysis examines how external human–machine interfaces (eHMIs) influence crossing decisions, collisions, safety margins, and crossing initiation time (CIT) across different vehicle speeds and traffic gaps. Three hypotheses were tested regarding the effects of eHMIs on CIT, risk-taking behavior, and perceived safety. Results show that eHMIs significantly affect pedestrian decisions: participants delayed crossings when the eHMI indicated non-yielding behavior and initiated crossings earlier when yielding was signaled. Risk-taking behavior increased at higher vehicle speeds and shorter time gaps. Although perceived safety did not increase, behavioral results indicate reliance on visual cues. These findings underscore the importance of standardizing eHMIs to support pedestrian safety and sustainable urban mobility.
Shaoyan Shi,1 Xingxing Yu,2 Xuehai Ou,1 Changming Zheng,1 Fei Xie,1 Yansheng Huang3 1Department of Hand Surgery, Honghui Hospital, Xi’an Jiaotong University, Xi’an, Shaanxi, 710000, People’s Republic of China; 2Department of Laboratory Medicine, Xi’ an Medical College, Xi’an, Shaanxi, 710000, People’s Republic of China; 3Department of Spine Surgery, Honghui Hospital, Xi’an Jiaotong University, Xi’an, Shaanxi, 710000, People’s Republic of ChinaCorrespondence: Yansheng Huang, Email yshg1991@163.comAbstract: Peripheral nerve injuries (PNIs) remain a major clinical challenge, with current surgical interventions often falling short of restoring full function. Nanoparticle (NP)-engineered platforms are emerging as transformative tools in peripheral nerve repair by enabling multimodal therapeutic delivery, spatiotemporal control of the microenvironment, and biomimetic structural support. In this review, we summarize the recent advances in the design of inorganic, polymeric, and hybrid NPs that deliver neurotrophic factors, anti-inflammatory agents, and genetic material with high precision. Functionalization strategies—ranging from conductive and piezoelectric materials to antioxidant and immunomodulatory components—enable dynamic regulation of cellular behaviors critical for regeneration. Integration of NPs into next-generation scaffolds, including smart-responsive conduits and bioactive matrices, enhances axonal guidance and Schwann cell support. We further discuss preclinical outcomes demonstrating robust functional recovery and address translational barriers, including NP toxicity, scalable fabrication, and regulatory considerations. Finally, we outline future directions involving theranostic systems and AI-guided design for personalized nerve repair. Collectively, NP-engineered systems represent a paradigm shift in peripheral nerve regeneration, offering a multifaceted approach that bridges material science, bioengineering, and clinical translation.Keywords: peripheral nerve injuries, nanoparticle, engineering, regeneration, translation
The IT industry provides supportive pathways such as returnship programs, coding boot camps, and buddy systems for women re-entering their job after a career break. Academia, however, offers limited opportunities to motivate women to return. We propose a diverse multicultural research project investigating the challenges faced by women with software engineering (SE) backgrounds re-entering academia or related research roles after a career break. Career disruptions due to pregnancy, immigration status, or lack of flexible work options can significantly impact women's career progress, creating barriers for returning as lecturers, professors, or senior researchers. Although many companies promote gender diversity policies, such measures are less prominent and often under-recognized within academic institutions. Our goal is to explore the specific challenges women encounter when re-entering academic roles compared to industry roles; to understand the institutional perspective, including a comparative analysis of existing policies and opportunities in different countries for women to return to the field; and finally, to provide recommendations that support transparent hiring practices. The research project will be carried out in multiple universities and in multiple countries to capture the diverse challenges and policies that vary by location.
Large Language Models (LLMs) are increasingly used in empirical software engineering (ESE) to automate or assist annotation tasks such as labeling commits, issues, and qualitative artifacts. Yet the reliability and reproducibility of such annotations remain underexplored. Existing studies often lack standardized measures for reliability, calibration, and drift, and frequently omit essential configuration details. We argue that LLM-based annotation should be treated as a measurement process rather than a purely automated activity. In this position paper, we outline the \textbf{Operationalization for LLM-based Annotation Framework (OLAF)}, a conceptual framework that organizes key constructs: \textit{reliability, calibration, drift, consensus, aggregation}, and \textit{transparency}. The paper aims to motivate methodological discussion and future empirical work toward more transparent and reproducible LLM-based annotation in software engineering research.
Hashini Gunatilake, John Grundy, Rashina Hoda
et al.
Empathy plays a critical role in software engineering (SE), influencing collaboration, communication, and user-centred design. Although SE research has increasingly recognised empathy as a key human aspect, there remains no validated instrument specifically designed to measure it within the unique socio-technical contexts of SE. Existing generic empathy scales, while well-established in psychology and healthcare, often rely on language, scenarios, and assumptions that are not meaningful or interpretable for software practitioners. These scales fail to account for the diverse, role-specific, and domain-bound expressions of empathy in SE, such as understanding a non-technical user's frustrations or another practitioner's technical constraints, which differ substantially from empathy in clinical or everyday contexts. To address this gap, we developed and validated two domain-specific empathy scales: EmpathiSEr-P, assessing empathy among practitioners, and EmpathiSEr-U, capturing practitioner empathy towards users. Grounded in a practitioner-informed conceptual framework, the scales encompass three dimensions of empathy: cognitive empathy, affective empathy, and empathic responses. We followed a rigorous, multi-phase methodology, including expert evaluation, cognitive interviews, and two practitioner surveys. The resulting instruments represent the first psychometrically validated empathy scales tailored to SE, offering researchers and practitioners a tool for assessing empathy and designing empathy-enhancing interventions in software teams and user interactions.
Xin Zhang, Lissette Iturburu, Juan Nicolas Villamizar
et al.
Structural drawings are widely used in many fields, e.g., mechanical engineering, civil engineering, etc. In civil engineering, structural drawings serve as the main communication tool between architects, engineers, and builders to avoid conflicts, act as legal documentation, and provide a reference for future maintenance or evaluation needs. They are often organized using key elements such as title/subtitle blocks, scales, plan views, elevation view, sections, and detailed sections, which are annotated with standardized symbols and line types for interpretation by engineers and contractors. Despite advances in software capabilities, the task of generating a structural drawing remains labor-intensive and time-consuming for structural engineers. Here we introduce a novel generative AI-based method for generating structural drawings employing a large language model (LLM) agent. The method incorporates a retrieval-augmented generation (RAG) technique using externally-sourced facts to enhance the accuracy and reliability of the language model. This method is capable of understanding varied natural language descriptions, processing these to extract necessary information, and generating code to produce the desired structural drawing in AutoCAD. The approach developed, demonstrated and evaluated herein enables the efficient and direct conversion of a structural drawing's natural language description into an AutoCAD drawing, significantly reducing the workload compared to current working process associated with manual drawing production, facilitating the typical iterative process of engineers for expressing design ideas in a simplified way.
Daniel R. Clarkson, Lawrence A. Bull, Chandula T. Wickramarachchi
et al.
Regression is a fundamental prediction task common in data-centric engineering applications that involves learning mappings between continuous variables. In many engineering applications (e.g.\ structural health monitoring), feature-label pairs used to learn such mappings are of limited availability which hinders the effectiveness of traditional supervised machine learning approaches. The current paper proposes a methodology for overcoming the issue of data scarcity by combining active learning with hierarchical Bayesian modelling. Active learning is an approach for preferentially acquiring feature-label pairs in a resource-efficient manner. In particular, the current work adopts a risk-informed approach that leverages contextual information associated with regression-based engineering decision-making tasks (e.g.\ inspection and maintenance). Hierarchical Bayesian modelling allow multiple related regression tasks to be learned over a population, capturing local and global effects. The information sharing facilitated by this modelling approach means that information acquired for one engineering system can improve predictive performance across the population. The proposed methodology is demonstrated using an experimental case study. Specifically, multiple regressions are performed over a population of machining tools, where the quantity of interest is the surface roughness of the workpieces. An inspection and maintenance decision process is defined using these regression tasks which is in turn used to construct the active-learning algorithm. The novel methodology proposed is benchmarked against an uninformed approach to label acquisition and independent modelling of the regression tasks. It is shown that the proposed approach has superior performance in terms of expected cost -- maintaining predictive performance while reducing the number of inspections required.
The creation of a Software Requirements Specification (SRS) document is important for any software development project. Given the recent prowess of Large Language Models (LLMs) in answering natural language queries and generating sophisticated textual outputs, our study explores their capability to produce accurate, coherent, and structured drafts of these documents to accelerate the software development lifecycle. We assess the performance of GPT-4 and CodeLlama in drafting an SRS for a university club management system and compare it against human benchmarks using eight distinct criteria. Our results suggest that LLMs can match the output quality of an entry-level software engineer to generate an SRS, delivering complete and consistent drafts. We also evaluate the capabilities of LLMs to identify and rectify problems in a given requirements document. Our experiments indicate that GPT-4 is capable of identifying issues and giving constructive feedback for rectifying them, while CodeLlama's results for validation were not as encouraging. We repeated the generation exercise for four distinct use cases to study the time saved by employing LLMs for SRS generation. The experiment demonstrates that LLMs may facilitate a significant reduction in development time for entry-level software engineers. Hence, we conclude that the LLMs can be gainfully used by software engineers to increase productivity by saving time and effort in generating, validating and rectifying software requirements.
Around the world, many unreinforced masonry buildings have been constructed for different usages, such as schools. Studies have shown the seismic vulnerability of these buildings. Thus nonlinear analysis and seismic assessment of these buildings and improving the retrofitting methods are necessary. One of the retrofitting methods in these buildings is the use of shear walls. In most seismic rehabilitation projects of masonry buildings, piles are used in the foundations of shear walls, and the major retrofit project costs are the foundations and piles. In order to improve the accuracy of the seismic assessment of these buildings, this study investigates the effect of soil and structure interaction on the seismic behavior of these buildings. To reduce the cost of retrofitting shallow strip foundation for new shear walls were added including the effect of rocking, sliding, and settlement responses. It was shown that the interaction of soil and structure in the seismic behavior of masonry buildings retrofitted by squat concrete shear walls reduces the base shear and increases the maximum drift of the building. If this increase in the lateral drift of the building can be tolerated, it will reduce considerably the cost of retrofitting unreinforced masonry buildings.
Software engineering capabilities are increasingly important to the success of economic and political blocs. This paper analyzes quantity and quality of software engineering research output originating from the US, Europe, and China over time. The results indicate that the quantity of research is increasing across the board with Europe leading the field. Depending of the scope of the analysis, either the US or China come in second. Regarding research quality, Europe appears to be lagging the other blocs, with China having caught up to and even having overtaken the US over time.
Fluorescent molecules are versatile nanoscale emitters that enable detailed observations of biophysical processes with nanoscale resolution. Because they are well-approximated as electric dipoles, imaging systems can be designed to visualize their 3D positions and 3D orientations, so-called dipole-spread function (DSF) engineering, for 6D super-resolution single-molecule orientation-localization microscopy (SMOLM). We review fundamental image-formation theory for fluorescent di-poles, as well as how phase and polarization modulation can be used to change the image of a dipole emitter produced by a microscope, called its DSF. We describe several methods for designing these modulations for optimum performance, as well as compare recently developed techniques, including the double-helix, tetrapod, crescent, and DeepSTORM3D learned point-spread functions (PSFs), in addition to the tri-spot, vortex, pixOL, raPol, CHIDO, and MVR DSFs. We also cover common imaging system designs and techniques for implementing engineered DSFs. Finally, we discuss recent biological applications of 6D SMOLM and future challenges for pushing the capabilities and utility of the technology.
Bidirectional DC-DC converters allow power to be transferred in any direction between two electrical sources. These converters are increasingly employed in a variety of applications, including battery chargers and dischargers, energy storage devices, electrical vehicle motor drives, aircraft power systems, telecom power supplies, and others, due to their ability to reverse the direction of power flow. One of these basic types of bidirectional DC-DC converters is the SEPIC-ZETA converter. In this paper, the structure of this converter has been studied when MOSFET power switches are employed. Also, an electrical thermal analysis, which is based on the ambient temperature (between 25 °C and 40 °C), has been employed by using two MOSFET models (UJ3C065080K3S and SCT50N120). The study shows the effects of utilizing different MOSFET models on power losses and thermal analysis. According to the simulation results, the junction temperature of the MOSFET was 151.38 °C in the forwarding mode and for the first model (UJ3C065080K3S) at T = 40 °C, while the MOSFET junction temperature was 158.5 °C in the backward mode. In the second model (SCT50N120) and at the same T = 40°C, the MOSFET junction temperature exceeds 130.6°C in the forwarding mode. When the converter was operating in backward mode, its junction temperature was 128.7 °C. The bidirectional SEPIC-ZETA converter performs better in the second model of the MOSFET (SCT50N120).
Takat B. Rawal, Maï Zahran, Brittiny Dhital
et al.
BACKGROUND Lignin, the second most abundant biopolymer on earth, plays a major structural role in plants, conferring mechanical strength and regulating water conduction. Understanding the three-dimensional structure of lignin is important for fundamental reasons as well as engineering plants towards lignin valorization. Lignin lacks a specific primary sequence, making its average chemical composition the focus of most recent studies. However, it remains unclear whether the 3D structure of lignin molecules depends on their sequence. METHODS We performed all-atom molecular dynamics simulation of three S/G-lignin molecules with the same average composition but different sequence. RESULTS A detailed statistical analysis of the radius of gyration and relative shape anisotropy reveals that the lignin sequence has no statistically significant effect on the three-dimensional structure. We found however, that homopolymers of C-lignin with the same molecular weight have smaller radii of gyration than S/G-lignin. We attribute this to lower hydroxyl content of C-lignin, which makes it more compact and rigid. CONCLUSIONS The 3D structure of lignin is influenced by the overall content of monomeric units and interunit linkages and not by its precise primary sequence. General significance Lignin is assumed to not have a well-defined primary structure. The results presented here demonstrate there are no significant differences in the 3D structure of lignin molecules with the same average composition but different primary sequence.
Demetrio Cristiani, C. Sbarufatti, F. Cadini
et al.
A key issue affecting the performances of every human-conceived engineering system is its degradation, fatigue crack growth being one of the major structural deterioration phenomena. Fatigue crack growth is usually modelled as a stochastic process: uncertainty sources lie both in the item and in the physical degradation process variability. Fatigue crack growth deserves close attention, especially considering that condition-based maintenance methodologies are recently experiencing a major drive to increase their technology readiness level, requiring validated diagnostic and prognostic methodologies which should be capable of operating online and in real-time. In this regard, particle filters provide a consistent Bayesian framework, where the posterior distribution of the system degradation state is recursively approximated based on a time-growing stream of observations measuring the system response, enabling, in general, increasingly informed lifetime estimates. However, the real-time operation capability of such methods is hindered by their requirements in terms of computational power, which is mainly due to the complexity of the structural models they rely upon. Within this work, a comprehensive particle filter framework, able to deal with fatigue crack growth uncertainty sources while simultaneously addressing the computational burden issue, is proposed. The algorithm structure enables to simultaneously perform the diagnosis and prognosis of fatigue crack growth, while the adoption of the augmented state formulation allows to address scenarios where the degradation process of fatigue crack growth fails to meet the degradation model ruling the particle filter. Artificial neural networks–based surrogate modelling is adopted at different stages and embedded within the particle filter algorithm, relieving the computational burden associated with the evaluation of the trajectory likelihoods as well as enabling a fast estimation of the remaining useful life. Both simulated and experimental data sets regarding fatigue crack growth in an aluminium aeronautical panel are used for the algorithm testing, additionally proving the validity and effectiveness thereof by means of common prognostic performance metrics.
Mohammed Bentahar, Habib Benzaama, Mahmoudi Noureddin
The objective of this work is to study the effects of contact parameters on the cracking parameters of a specimen and a pad assembly. These parameters have been studied and evaluated by the finite element method analysis in two dimensions fretting fatigue model through the Abaqus calculation code. Different values of the coefficient of friction of 0.1, 0.3 and 0.6 were applied on the various lengths in contact for a = 0.1, 0.5 and 1mm. Thus, on the various values of angle of orientation of the crack equal to 15 °, 30 ° and 45 °. In addition, elements of the type (CPE4R) and the criterion of maximum tangential stress were applied. The curves of the crack parameters such as the SIF coefficients and the integral J were obtained and discussed.
Mechanical engineering and machinery, Structural engineering (General)
Motivated from the quadratic dependence of peak structural displacements to the pulse period, $T_p$, of pulse-like ground motions, this paper revisits the $T_p$--$M_\text{W}$ relations of ground motions generated from near-source earthquakes with epicentral distances, $D\leq$ 20 km. A total of 1260 ground motions are interrogated with wavelet analysis to identify energetic acceleration pulses (not velocity pulses) and extract their optimal period, $T_p$, amplitude, $a_p$, phase, $φ$ and number of half-cycles, $γ$. The interrogation of acceleration records with wavelet analysis is capable of extracting shorter-duration distinguishable pulses with engineering significance, which override the longer near-source pulses. Our wavelet analysis identified 109 pulse-like records from normal faults, 188 records from reverse faults and 125 records from strike-slip faults, all with epicentral distances $D\leq$ 20 km. Regression analysis on the extracted data concluded that the same $T_p$--$M_\text{W}$ relation can be used for pulse-like ground motions generated either from strike-slip faults or from normal faults; whereas, a different $T_p$--$M_{\text{W}}$ relation is proposed for reverse faults. The study concludes that for the same moment magnitude, $M_{\text{W}}$, the pulse periods of ground motions generated from strike-slip faults are on average larger than these from reverse faults. Most importantly, our wavelet analysis on acceleration records produces $T_p$--$M_{\text{W}}$ relations with a lower slope than the slopes of the $T_p$--$M_{\text{W}}$ relations presented by past investigators after merely fitting velocity pulses. As a result, our proposed $T_p$--$M_{\text{W}}$ relations yield lower $T_p$ values for larger-magnitude earthquakes (say $M_{\text{W}}>$ 6), allowing for the estimation of dependable peak structural displacements that scale invariably with $a_pT_p^{\text{2}}$.
Sharing research artifacts is known to help people to build upon existing knowledge, adopt novel contributions in practice, and increase the chances of papers receiving attention. In Model-Driven Engineering (MDE), openly providing research artifacts plays a key role, even more so as the community targets a broader use of AI techniques, which can only become feasible if large open datasets and confidence measures for their quality are available. However, the current lack of common discipline-specific guidelines for research data sharing opens the opportunity for misunderstandings about the true potential of research artifacts and subjective expectations regarding artifact quality. To address this issue, we introduce a set of guidelines for artifact sharing specifically tailored to MDE research. To design this guidelines set, we systematically analyzed general-purpose artifact sharing practices of major computer science venues and tailored them to the MDE domain. Subsequently, we conducted an online survey with 90 researchers and practitioners with expertise in MDE. We investigated our participants' experiences in developing and sharing artifacts in MDE research and the challenges encountered while doing so. We then asked them to prioritize each of our guidelines as essential, desirable, or unnecessary. Finally, we asked them to evaluate our guidelines with respect to clarity, completeness, and relevance. In each of these dimensions, our guidelines were assessed positively by more than 92\% of the participants. To foster the reproducibility and reusability of our results, we make the full set of generated artifacts available in an open repository at \texttt{\url{https://mdeartifacts.github.io/}}.
Software project management makes extensive use of predictive modeling to estimate product size, defect proneness and development effort. Although uncertainty is acknowledged in these tasks, fuzzy inference systems, designed to cope well with uncertainty, have received only limited attention in the software engineering domain. In this study we empirically investigate the impact of two choices on the predictive accuracy of generated fuzzy inference systems when applied to a software engineering data set: sampling of observations for training and testing; and the size of the rule set generated using fuzzy c-means clustering. Over ten samples we found no consistent pattern of predictive performance given certain rule set size. We did find, however, that a rule set compiled from multiple samples generally resulted in more accurate predictions than single sample rule sets. More generally, the results provide further evidence of the sensitivity of empirical analysis outcomes to specific model-building decisions.