Adaptation Regularization: A General Framework for Transfer Learning
Mingsheng Long, Jianmin Wang, Guiguang Ding
et al.
Domain transfer learning, which learns a target classifier using labeled data from a different distribution, has shown promising value in knowledge discovery yet still been a challenging problem. Most previous works designed adaptive classifiers by exploring two learning strategies independently: distribution adaptation and label propagation. In this paper, we propose a novel transfer learning framework, referred to as Adaptation Regularization based Transfer Learning (ARTL), to model them in a unified way based on the structural risk minimization principle and the regularization theory. Specifically, ARTL learns the adaptive classifier by simultaneously optimizing the structural risk functional, the joint distribution matching between domains, and the manifold consistency underlying marginal distribution. Based on the framework, we propose two novel methods using Regularized Least Squares (RLS) and Support Vector Machines (SVMs), respectively, and use the Representer theorem in reproducing kernel Hilbert space to derive corresponding solutions. Comprehensive experiments verify that ARTL can significantly outperform state-of-the-art learning methods on several public text and image datasets.
561 sitasi
en
Computer Science
Functional characterization of 2,3-oxidosqualene cyclase8 in Taraxacum mongolicum: overexpression enhances taraxasterol biosynthesis and antioxidant capacity
Lu Yang, Huan He, Guangzhi Jiang
et al.
Abstract Background Taraxasterol, a pentacyclic triterpenoid compound, has been widely used in traditional and modern medicine because of its pharmacological properties such as anti-inflammatory, antioxidative and antitumor effects. 2,3-oxidosqualene cyclase (OSC) is highly important for the generation of phytosterols and triterpenoid compounds and the structural diversity of natural products. However, the specific role of TmOSCs in the taraxasterol biosynthesis pathway has not yet been precisely resolved. Results In this study, 10 TmOSC gene family members were identified via the Taraxacum mongolicum (dandelion) genome and classified into three subgroups. Phylogenetic and collinearity analyses revealed evolutionary conservation among OSC proteins from Asteraceae species. RNA-seq data analysis revealed that TmOSC8 and TmOSC10 were highly expressed in nutrient-containing tissues. In addition, methyl jasmonate (MeJA) and abscisic acid (ABA) significantly induced the expression of TmOSC3, and the change in its relative expression was consistent with the taraxasterol content. The relative expression level of the TmOSC8-overexpressioning line was significantly increased by approximately 20 fold compared with that of the wild type, and the content of taraxasterol was significantly increased to 3 fold greater than that of the wild type. Conclusion This study systematically analyzed the evolutionary characteristics and expression patterns of the TmOSC gene family, suggested potential key roles of TmOSC3 and TmOSC8 in sterol synthesis and stress response, and provided a preliminary theoretical basis for the metabolic engineering of medicinal components in dandelion.
Synthetic Geology: Structural Geology Meets Deep Learning
Simon Ghyselincks, Valeriia Okhmak, Stefano Zampini
et al.
Abstract Reconstructing the structural geology and mineral composition of the first few kilometers of the Earth's subsurface from sparse or indirect surface observations remains a long‐standing challenge with critical applications in mineral exploration, geohazard assessment, and geotechnical engineering. This inherently ill‐posed problem is often addressed by classical geophysical inversion methods, which typically yield a single maximum‐likelihood model that fails to capture the full range of plausible geology. The adoption of modern deep learning methods has been limited by the lack of large 3D training data sets. We address this gap with StructuralGeo, a geological simulation engine that mimics eons of tectonic, magmatic, and sedimentary processes to generate a virtually limitless supply of realistic synthetic 3D lithological models. Using this data set, we train both unconditional and conditional generative flow‐matching models with a 3D attention U‐Net architecture. The resulting foundation model can reconstruct multiple plausible 3D scenarios from surface topography and sparse borehole data, depicting structures such as layers, faults, folds, and dikes. By sampling many reconstructions from the same observations, we introduce a probabilistic framework for estimating the size and extent of subsurface features. While the realism of the output is bounded by the fidelity of the training data to true geology, this combination of simulation and generative AI functions offers a flexible prior for probabilistic modeling, regional fine‐tuning, and use as an AI‐based regularizer in traditional geophysical inversion workflows.
Geophysics. Cosmic physics, Information technology
Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering
Daniel Rodriguez-Cardenas, Xiaochang Li, Marcos Macedo
et al.
Large language models for code are advancing fast, yet our ability to evaluate them lags behind. Current benchmarks focus on narrow tasks and single metrics, which hide critical gaps in robustness, interpretability, fairness, efficiency, and real-world usability. They also suffer from inconsistent data engineering practices, limited software engineering context, and widespread contamination issues. To understand these problems and chart a path forward, we combined an in-depth survey of existing benchmarks with insights gathered from a dedicated community workshop. We identified three core barriers to reliable evaluation: the absence of software-engineering-rich datasets, overreliance on ML-centric metrics, and the lack of standardized, reproducible data pipelines. Building on these findings, we introduce BEHELM, a holistic benchmarking infrastructure that unifies software-scenario specification with multi-metric evaluation. BEHELM provides a structured way to assess models across tasks, languages, input and output granularities, and key quality dimensions. Our goal is to reduce the overhead currently required to construct benchmarks while enabling a fair, realistic, and future-proof assessment of LLMs in software engineering.
ECNet is an evolutionary context-integrated deep learning framework for protein engineering
Yunan Luo, Guangde Jiang, Tianhao Yu
et al.
Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates. Protein engineering is an active area of research in which machine learning has proven quite powerful. Here, the authors present a deep learning method that integrates both general and protein-specific sequence representations to improve the engineering of one’s protein of interest.
Engineering Electrodes with Robust Conducting Hydrogel Coating for Neural Recording and Modulation
Jiajun Zhang, Lulu Wang, Yunhe Xue
et al.
Coating conventional metallic electrodes with conducting polymers has enabled the essential characteristics required for bioelectronics, such as biocompatibility, electrical conductivity, mechanical compliance, and the capacity for structural and chemical functionalization of the bioelectrodes. However, the fragile interface between the conducting polymer and the electrode in wet physiological environment greatly limits their utility and reliability. Here, a general yet reliable strategy to seamlessly interface conventional electrodes with conducting hydrogel coatings is established, featuring tissue‐like modulus, highly‐desirable electrochemical properties, robust interface, and long‐term reliability. Numerical modeling reveals the role of toughening mechanism, synergy of covalent anchorage of long‐chain polymers, and chemical cross‐linking, in improving the long‐term robustness of the interface. Through in vivo implantation in freely‐moving mouse models, it is shown that stable electrophysiological recording can be achieved, while the conducting hydrogel–electrode interface remains robust during the long‐term low‐voltage electrical stimulation. This simple yet versatile design strategy addresses the long‐standing technical challenges in functional bioelectrode engineering, and opens up new avenues for the next‐generation diagnostic brain‐machine interfaces.
Study on B4C Particulates Size on Mechanical Behavior, Fractured Surface and Optimization of the Wear Parameters of the Al7075 Composites by Statistical Approach
M. Ravikumar
Aluminum composites with varied weight percentages of 0-2.5 B4C particles and micro- and nanoparticle sizes were fabricated by stir-casting. The material's mechanical and wear characteristics were evaluated. We used dry pin-on-disc wear testing to examine the wear behavior of both micro and nano composites. In the sliding wear trials, different particle sizes (micro and nano), sliding distances (1500 m and 3000 m), and sliding speeds (3 m/s and 6 m/s) were employed. Scanning Electron Microscope (SEM) was utilized in the experiment to examine the materials and microstructures of several composites. Uniform dispersion of the micro and nano particles was readily evident in the SEM image. B4C particle microhardness increased by 16.06 % in nano composites and 10.78 % in micro composites. In a similar way, B4C particles' tensile strength increased by 12.90% in nano composites and 8.78% in micro composites. Taguchi design for experimental technique was applied to a L8 orthogonal array in order to design and ascertain the effects of sliding distance, sliding speed, and particle size on dry sliding wear behavior. ANOVA study showed that the most significant influencing factor on wear resistance was particle size (61.29%), followed by sliding speed (17.27%) as well as sliding distance (14.20%). From the confirmatory tests, the Coefficient of Friction (COF) of the produced composites had a maximum error of 9.09 % and the error of 3.33 % was found in the wear rate which was within the acceptable limit. The wornout surface shows that the composite reinforced with nanoparticles has a smooth wear surface with a finer wear scar.
Mechanical engineering and machinery, Structural engineering (General)
A spatiotemporal recurrent neural network for missing data imputation in tunnel monitoring
Junchen Ye, Yuhao Mao, Ke Cheng
et al.
Given the swift proliferation of structural health monitoring (SHM) technology within tunnel engineering, there is a demand on proficiently and precisely imputing the missing monitoring data to uphold the precision of disaster prediction. In contrast to other SHM datasets, the monitoring data specific to tunnel engineering exhibits pronounced spatiotemporal correlations. Nevertheless, most methodologies fail to adequately combine these types of correlations. Hence, the objective of this study is to develop spatiotemporal recurrent neural network (ST-RNN) model, which exploits spatiotemporal information to effectively impute missing data within tunnel monitoring systems. ST-RNN consists of two moduli: a temporal module employing recurrent neural network (RNN) to capture temporal dependencies, and a spatial module employing multilayer perceptron (MLP) to capture spatial correlations. To confirm the efficacy of the model, several commonly utilized methods are chosen as baselines for conducting comparative analyses. Furthermore, parametric validity experiments are conducted to illustrate the efficacy of the parameter selection process. The experimentation is conducted using original raw datasets wherein various degrees of continuous missing data are deliberately introduced. The experimental findings indicate that the ST-RNN model, incorporating both spatiotemporal modules, exhibits superior interpolation performance compared to other baseline methods across varying degrees of missing data. This affirms the reliability of the proposed model.
Engineering geology. Rock mechanics. Soil mechanics. Underground construction
Effect of Data Imbalance in Predicting Student Performance in a Structural Analysis Graduate Attribute-Based Module Using Random Forest Machine Learning
Masikini Lugoma, Abel Omphemetse Zimbili, Masengo Ilunga
et al.
This study uses Random Forest algorithm to model students' final year mark in an engineering technology module taught by the University of South Africa. The algorithm uses a supervised learning classification technique to map the different assessment marks and the final mark. Hence, the latter are labelled instances whereas the former constitute the features. Random Forest (RF) has been applied to Structural Analysis 3, which takes into consideration the graduate attribute concept or level of competence as far as assessments are concerned. Firstly, the RF is subjected to imbalanced binary classes, then balanced classes are achieved by Synthetic Minority Oversampling Technique (SMOTE) and class weights adjustment techniques. The results showed that SMOTE brought an improvement in accuracy of 3%. It was also revealed that an increase of 4, 15 and 9% in precision, recall and F1-Score were observed in predicting non-competent students. An increase of 4 and 3% was noticed in the case of the precision and F1-Score respectively in predicting competent students, whereas the recall did not display any change. Despite the RF with SMOTE overperformed standard RF and RF class weights adjustment, all three algorithms were good candidates in the prediction of student performance. RF-SMOTE could be suggested as a guiding instrument when dealing with imbalanced data.
Information technology, Communication. Mass media
A Unified Perspective on Musical Structure: Applying Agawu's Theory to Divergent Interpretations of Form in Mendelssohn's Violin Concerto
Setareh Beheshti, Iman Fakhr, saeed Majidi
Form is one of the most important and challenging concepts in music. Music scholars have long offered diverse interpretations and definitions of musical form, but the multiplicity of interpretations can sometimes lead to confusion and impede the attainment of a clear understanding of the structure of musical works. This is partly due to ‘reverse-engineering.’ When a compositional form is created, the composer may or may not be thinking primarily about structure. The aesthetic message is at the forefront of the composer’s creative conscience followed by thematic phrases, the connective bridges, timbres of sound (orchestration) and most importantly artistic satisfaction. Theoreticians get involved with a piece of music after it has been written, hence their point of view is an approximation of the composer’s intent. Over the years, formal structure has become conclusive evidence for formal musical analysis, even though it is in the aftermath of the creative process. This is the main reason why theories and examples are often hindered by exceptions and compromised by unique forms and structures. Over the years certain various analytical models have been widely accepted in order to highlight or emphasize certain structural elements in musical forms. The most common model is the sonata form which for the most part reflects the structural form of most repertoire from the mid-18th century up to the present. But this model like others, only illuminates specific aspects of a musical structure, while overlooking compositional details that can stand to be further investigated. Therefore, conducting multifarious analyses on one musical structure can reveal more facets and result in a deeper understanding of the work. However, one must be aware that diversity in analytical perspectives can also lead to multiplicity and ambiguity in understanding musical structure, especially in the Romantic period. For this reason, Agawu, based on the archetypal tripartite structure of beginning, middle, and end, has proposed a theory for analyzing the structure of Romantic music. By simplifying the overall viewpoint of a musical form, Agawu allows for multiple persperctives or analysis to co-exist within one oeuvre.This qualitative research endeavors to apply Agawu's theory to provide a unified formulation of two different analytical approaches to the structure of Mendelssohn's Violin Concerto; a work whose structural innovations have been little studied. In this regard, using a descriptive-analytical method, the structure of the case study was analyzed using both the sonata form and arch form approaches, and it was determined which aspects of the structure were clarified by these approaches. Then, using Agawu's theory based on the two criteria of position and function, the structure of the case study was analyzed, and finally, two other analytical approaches were also formulated under Agawu's tripartite model to achieve a unified understanding of the different analytical models. By using a more general and simplified model, as suggested by Agawu, musicians and theoreticians are not limited to looking at a musical work with just one analysis. By allowing mulitiple perspectives for interpretation and examination, a deeper understanding of the creative process can be achieved.
Music and books on Music, Fine Arts
A Systematic Review of Common Beginner Programming Mistakes in Data Engineering
Max Neuwinger, Dirk Riehle
The design of effective programming languages, libraries, frameworks, tools, and platforms for data engineering strongly depends on their ease and correctness of use. Anyone who ignores that it is humans who use these tools risks building tools that are useless, or worse, harmful. To ensure our data engineering tools are based on solid foundations, we performed a systematic review of common programming mistakes in data engineering. We focus on programming beginners (students) by analyzing both the limited literature specific to data engineering mistakes and general programming mistakes in languages commonly used in data engineering (Python, SQL, Java). Through analysis of 21 publications spanning from 2003 to 2024, we synthesized these complementary sources into a comprehensive classification that captures both general programming challenges and domain-specific data engineering mistakes. This classification provides an empirical foundation for future tool development and educational strategies. We believe our systematic categorization will help researchers, practitioners, and educators better understand and address the challenges faced by novice data engineers.
Work in Progress: AI-Powered Engineering-Bridging Theory and Practice
Oz Levy, Ilya Dikman, Natan Levy
et al.
This paper explores how generative AI can help automate and improve key steps in systems engineering. It examines AI's ability to analyze system requirements based on INCOSE's "good requirement" criteria, identifying well-formed and poorly written requirements. The AI does not just classify requirements but also explains why some do not meet the standards. By comparing AI assessments with those of experienced engineers, the study evaluates the accuracy and reliability of AI in identifying quality issues. Additionally, it explores AI's ability to classify functional and non-functional requirements and generate test specifications based on these classifications. Through both quantitative and qualitative analysis, the research aims to assess AI's potential to streamline engineering processes and improve learning outcomes. It also highlights the challenges and limitations of AI, ensuring its safe and ethical use in professional and academic settings.
Ten Simple Rules for Catalyzing Collaborations and Building Bridges between Research Software Engineers and Software Engineering Researchers
Nasir U. Eisty, Jeffrey C. Carver, Johanna Cohoon
et al.
In the evolving landscape of scientific and scholarly research, effective collaboration between Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) is pivotal for advancing innovation and ensuring the integrity of computational methodologies. This paper presents ten strategic guidelines aimed at fostering productive partnerships between these two distinct yet complementary communities. The guidelines emphasize the importance of recognizing and respecting the cultural and operational differences between RSEs and SERs, proactively initiating and nurturing collaborations, and engaging within each other's professional environments. They advocate for identifying shared challenges, maintaining openness to emerging problems, ensuring mutual benefits, and serving as advocates for one another. Additionally, the guidelines highlight the necessity of vigilance in monitoring collaboration dynamics, securing institutional support, and defining clear, shared objectives. By adhering to these principles, RSEs and SERs can build synergistic relationships that enhance the quality and impact of research outcomes.
GNN-RE: Graph Neural Networks for Reverse Engineering of Gate-Level Netlists
Lilas Alrahis, A. Sengupta, J. Knechtel
et al.
This work introduces a generic, machine learning (ML)-based platform for functional reverse engineering (RE) of circuits. Our proposed platform GNN-RE leverages the notion of graph neural networks (GNNs) to: 1) represent and analyze flattened/unstructured gate-level netlists; 2) automatically identify the boundaries between the modules or subcircuits implemented in such netlists; and 3) classify the subcircuits based on their functionalities. For GNNs in general, each graph node is tailored to learn about its own features and its neighboring nodes, which is a powerful approach for the detection of any kind of subgraphs of interest. For GNN-RE, in particular, each node represents a gate and is initialized with a feature vector that reflects on the functional and structural properties of its neighboring gates. GNN-RE also learns the global structure of the circuit, which facilitates identifying the boundaries between subcircuits in a flattened netlist. Initially, to provide high-quality data for training of GNN-RE, we deploy a comprehensive dataset of foundational designs/components with differing functionalities, implementation styles, bit widths, and interconnections. GNN-RE is then tested on the unseen shares of this custom dataset, as well as the EPFL benchmarks, the ISCAS-85 benchmarks, and the 74X series benchmarks. GNN-RE achieves an average accuracy of 98.82% in terms of mapping individual gates to modules, all without any manual intervention or postprocessing. We also release our code and source data.
87 sitasi
en
Computer Science
Geometric Design Methodology for Deployable Self-Locking Semicylindrical Structures
Zhanwei Zhao, Lei Yu
Due to their unique bistable characteristics, deployable self-locking structures are suitable for many engineering fields. Without changing the geometrical composition, such structures can be unfolded and locked solely by the elastic deformation of materials. However, their further applications are hampered by the lack of simple and systematic geometric design methodologies that consider arbitrary structural curvature profiles. This paper proposes such a methodology for double-layer semicylindrical grid structures to simplify their cumbersome geometric design. The proposed methodology takes joint sizes into account to ensure that the design results can be applied to actual projects without further adjustments. By introducing symmetry into the structural units (SUs) and selecting reasonable geometric parameters that describe the structural side elevation profile, a concise set of simultaneous nonlinear geometric constraint equations is established, the solution of which provides the geometric parameter values of the grid shape. On this basis, the remaining geometric parameter values, i.e., the geometric parameter values of the inner scissor-like elements (SLEs) of each SU, can be achieved from the equations derived from general SLEs. Design examples and the assembled physical grid structure indicate the feasibility and wide applicability of the proposed geometric design methodology.
LOGO AND THE DESIGN PRINCIPLES
Victor ADIR, Nicoleta Elisabeta PASCU, George ADIR
et al.
This paper has tried to explain the importance of the general and special principles in logo design. The study was realized on a lot of logos to understand when a designer could use the principles to create wonderful graphic representations. It is about symmetry, asymmetry, proportion, rhythm and harmony, substitution, juxtaposition, using different geometric shapes, lines, curves, silhouettes and stylizations, mirror and illustrative representation and so on. We have explained how to use graphic representations in some fields of activity and to choose the best symbol for a company/university etc. And, of course, it was a significant part about the redesign working. This paper presents a few interpretations and conclusions concerning the design principles applied for logos.
Architectural engineering. Structural engineering of buildings, Engineering design
An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots
Ebube Alor, Ahmad Abdellatif, SayedHassan Khatoonabadi
et al.
Software engineering (SE) chatbots are increasingly gaining attention for their role in enhancing development processes. At the core of chatbots are Natural Language Understanding platforms (NLUs), which enable them to comprehend user queries but require labeled data for training. However, acquiring such labeled data for SE chatbots is challenging due to the scarcity of high-quality datasets, as training requires specialized vocabulary and phrases not found in typical language datasets. Consequently, developers often resort to manually annotating user queries -- a time-consuming and resource-intensive process. Previous approaches require human intervention to generate rules, called labeling functions (LFs), that categorize queries based on specific patterns. To address this issue, we propose an approach to automatically generate LFs by extracting patterns from labeled user queries. We evaluate our approach on four SE datasets and measure performance improvement from training NLUs on queries labeled by the generated LFs. The generated LFs effectively label data with AUC scores up to 85.3% and NLU performance improvements up to 27.2%. Furthermore, our results show that the number of LFs affects labeling performance. We believe that our approach can save time and resources in labeling users' queries, allowing practitioners to focus on core chatbot functionalities rather than manually labeling queries.
Requirements Engineering for Research Software: A Vision
Adrian Bajraktari, Michelle Binder, Andreas Vogelsang
Modern science is relying on software more than ever. The behavior and outcomes of this software shape the scientific and public discourse on important topics like climate change, economic growth, or the spread of infections. Most researchers creating software for scientific purposes are not trained in Software Engineering. As a consequence, research software is often developed ad hoc without following stringent processes. With this paper, we want to characterize research software as a new application domain that needs attention from the Requirements Engineering community. We conducted an exploratory study based on 8 interviews with 12 researchers who develop software. We describe how researchers elicit, document, and analyze requirements for research software and what processes they follow. From this, we derive specific challenges and describe a vision of Requirements Engineering for research software.
A Road-Map for Transferring Software Engineering methods for Model-Based Early V&V of Behaviour to Systems Engineering
Johan Cederbladh, Antonio Cicchetti
In this paper we discuss the growing need for system behaviour to be validated and verified (V&V'ed) early in model-based systems engineering. Several aspects push companies towards integration of techniques, methods, and processes that promote specific and general V&V activities earlier to support more effective decision-making. As a result, there are incentives to introduce new technologies to remain competitive with the recently drastic changes in system complexity and heterogeneity. Performing V&V early on in development is a means of reducing risk for later error detection while moving key activities earlier in a process. We present a summary of the literature on early V&V and position existing challenges regarding potential solutions and future investigations. In particular, we reason that the software engineering community can act as a source for inspiration as many emerging technologies in the software domain are showing promise in the wider systems domain, and there already exist well formed methods for early V&V of software behaviour in the software modelling community. We conclude the paper with a road-map for future research and development for both researchers and practitioners to further develop the concepts discussed in the paper.